Looking for Sharepoint experts!

I’m looking for more Sharepoint developers, consultants, analysts, to exchange information, implementation experiences and projects with! If you are one, you know one, you might know one but you are not entirely sure. Refer him/her to me. That would make me very happy.

LinkedIn: http://www.linkedin.com/in/ebosch

The role of MOSS in an large scale ECM environment (Part 1)

In this series

Part one
Part two

Introduction

One of my clients asked me whether MOSS can be considered a ‘true’ ECM application. That certainly raised an eyebrow. Well, sure it supports ECM functionality like record management, document management IRM and sorts but does that make MOSS an enterprise ready ‘true’ ECM system?

The couple of weeks after I have been discussing the matter with ECM and fellow Sharepoint consultants. The whole large scale ECM world was new for me, but it is quite an interesting one. Now, I am not an ECM wizard but I wish to share my experiences nevertheless.

In this series of posts I will dig deeper into the matter and provide a insight in how such an enterprise works. Furthermore, I will discuss the capabilities and the role of MOSS in such an organization. What you should and what you should not do.

Defining the scope

So the first question one might ask is, what defines an ECM system? In what context are we going examine Sharepoint’s ECM abilities? For answering this question I will be referring to Wikipedia’s definition (AIIM)

“Enterprise Content Management is the technologies used to Capture, Manage, Store, Preserve, and Deliver content and documents related to organizational processes. ECM tools and strategies allow the management of an organization’s unstructured information, wherever that information exists.”

To understand the span of this abstract explanation one needs to have at least limit knowledge on what it takes to be an ECM enterprise. So let’s take a look at your typical ECM enterprise.

Anatomy of an ECM enterprise in a nutshell

Imagine, that this nifty diagram is our typical ECM enterprise called ‘claims-R-us’. Everyday large amounts of claims forms are send trough regular paper mail and need to be extracted, managed, audited, augmented, stored and destroyed in our enterprise. As you can see, our role model enterprise exists out of a few functional areas:

Naamloos-1

Information acquisition (process services)

At some point information needs to be imported in the system trough channels. Each channel represent an information flow. This can be of all sorts like paper (regular forms), recorded sound (phone) or information from third parties.

Some channels require conversion of the original media before it can be stored digitally. Think of paper documents that need to be scanned to a certain format. Often law regulations enforces that information that enters the system and can be identified as a record, needs to be stored as-is thus in its original format.

Its not unusual for a large scale enterprise to scan up to half a million forms a day by using multiple scanners. The scanning process is the responsibility of a DIS (Digital Imaging System) .The information from these channels are often augmented with information that enables the process to identify their further journey trough the enterprise.

Information management (content services)

The information that enters the system needs to be classified. Classification happens often from an business point-of-view. So when the scanned document is an ‘car damage form’ it will be classified a such. Often there are multiple classifications possible per document. There will be made a record to keep track of the document instance and the classification data. Usually at this point an separation of document flow will occur. The system will identify which classifications of information can be files directly in the archive (like receipts) or whether it needs further processing (for instance analyze, addition and mutation of claim forms by humans)

Records management needs to be applied that enforces that law regulations such as DoD 5015.2-STD are met. Transformations may be required to make sure that the document is un-modifiable. When necessary IRM is applied. Information Workers work in processes to create new documents or edit existing ones.

This information management functional area is the central information system of your enterprise. All the common document management functionality such as editing, checking in and out, information worker flows take place here.

Storage management (repository services)

Information that is significant for your business needs to be stored somewhere safe. Excuses me as I go a little in depth here since this storage is such an important part of the process.

For this high volume storage of documents, enterprises often implement hierarchical storage management (HSM) which is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. (Wiki). The surpassing of storage medium is mostly based on activity. For instance, when there is lots of file activity the file remains on the high-speed storage device, with lesser activity it gets demoted to the slower disks.

Often these systems implement an technique called “Content Addressed Storage” (CAS). Which can be seen as a wardrobe at your local club. You give your coat, get a number in return, later on you supply the number and you will get your coat back. A CAS system does basically the same thing with files. It takes care itself on where to store the file, how it works internally, is not for you to worry. You can retrieve it any time. EMC Centera is an example of such a CAS system.

Coming up in the next series

So now we described an typical enterprise we can discuss the matter further in our next post. Where we will describe the possible roles of Sharepoint within this organisation.

Recommended reading

So, is MOSS an ECM Tool or Not?
http://www.c3associates.com/2007/04/25/so-is-moss-an-ecm-tool-or-not/

Update (17th-August): Edited some links and made introduction.

BDC: Writing programmatically to the BDC using GenericInvoker

Last couple of days I was ranting about my upcoming BDC Toolkit project, I didn’t fully test the writing theory yet.  There was quite a lack of documentation and I was worried for a while that I had to come back to my idea but luckily I don’t have to.

So you can actually write to your BDC sources using the object-model and the GenericInvoker method instance. It ain’t that hard, but it might be a bit getting used to.

For this sample we will need a table to write to (imagine that this rather poor table as your enormous enterprise datamodel). For simplicity sake I am writing directly to a table, if you have read my previous post you will know that you should use a stored procedure as an indirection layer for your datamodel.

The following script has been used for creating the table.

USE [MYDATABASE]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Test](
    [MyValue] [int] NULL
) ON [PRIMARY]

First off, you need your regular XML application definition file. I clipped it for readability purposes.

<Method Name="AddRecordMethod">
    <Properties>
        <Property Name="RdbCommandText" Type="System.String">
            INSERT INTO Test (MyValue) VALUES (@MyValue)
        </Property>
        <Property Name="RdbCommandType"
                  Type="System.Data.CommandType">Text</Property>
    </Properties>
    <Parameters>
        <Parameter Direction="In" Name="@MyValue">
            <TypeDescriptor TypeName="System.Int32"
                 IdentifierName="MyValue"  Name="MyValue"/>
        </Parameter>
        <Parameter Direction="Return" Name="MyReturnValue">
            <TypeDescriptor TypeName="System.Int32" Name="MyReturnValue"/>
        </Parameter>
    </Parameters>
    <MethodInstances>
        <MethodInstance Name="AddRecord" Type="GenericInvoker"
                ReturnParameterName="MyReturnValue" />
    </MethodInstances>
</Method>

As you can see in the XML definition above, I’m using an generic invoker as my method instance. The return value serves no purpose here but is required by the MethodInstance tag.

The code below is used for executing this method on the Business Data Catalog.

SqlSessionProvider.Instance().
   SetSharedResourceProviderToUse(yourSSPName);

NamedLobSystemInstanceDictionary sysInstances = ApplicationRegistry.
  GetLobSystemInstances();

LobSystemInstance writeInstance = sysInstances["yourInstanceName"];

NamedEntityDictionary entities = writeInstance.GetEntities();
Entity testEntity = entities["yourEntityName"];

NamedMethodInstanceDictionary instances = testEntity.
GetMethodInstances();

//Gets the method definition for our AddRecord invoker
Method mymethod = instances["AddRecord"].GetMethod();

//Creates a set of default paramers (which we are going to overrule)
object[] param = mymethod.
     CreateDefaultParameterInstances(instances["AddRecord"]);

param[0] = 31; //This is the MyValue parameter
                         //we defined earlier

//Finally excute the method
testEntity.Execute(instances["AddRecord"], writeInstance, ref param);

So, now you can use the BDC as your unified access for reading and writing to your external system.

BDC: Introducing the BDCToolkit

I’ve created an BDC Toolkit stub project on CodePlex that should make life somewhat easier for developers that are struggling with the BDC runtime model. This can generate an typed layer to your BDC.  

Here is an excerpt from the project page:

So, how does it work?

The console application generates strongly typed code from your ‘Business Data Catalog Application Definition’ that describes the external system and generates one of the following artifacts:

Typed Data-Access-Layer
A strongly typed set of C# classes that provide an indirection layer for programming against the BDC. These classes map directly against the methods defined in the BDC application definition file. This way the developers don’t need to worry about the BDC object model, and enables them to focus more on the overall functionality of the integration process. The classes are two way, so reading and writing using the BDC is covered.

Web service Layer
A strongly typed web service that enables users to access their BDC application definition methods using a web service. It uses the same DAL as above but provides a thin web service layer so that users can easily integrate their BDC application definition in products like InfoPath.

So what are you waiting for? Check it out: BDC Toolkit. I will be committing files soon!

BDC: General tips on integrating your external data in MOSS

Last week, one of my clients asked me to explain a bit about the Business Data Catalog (BDC) and how they can fully exploit its power. In this specific case they needed to bind a supplier instance to a contract content type. Since I’ve got some field experience using the BDC I thought I’d share a bit of my personal best practices on how to do proper integration.  If you want technical information on how to write a application definition; you can view the links at the end of my post.

First off, create indirection layers for your data

When you want to integrate your data, make a data contract with the supplier. In other words, if your data is maintained by someone else or another department within your company. Let them create extra views (when using an database) on the external data specifically for your BDC application definition. This will be your indirection layer such that they can easily change their data model as long as they keep the views intact. The same goes for web services, create an indirection layer before connecting to a third party web service if you want to be on the safe side. By using such a layer your maintainability will increase, you still give the third party the freedom to change their data model as long as they respect your data contract, and you still have an direction to point your finger to when it breaks. How convenient.

Ensure that BDC application doesn’t expose any information that shouldn’t be seen

It is easy to accidentally expose confidential information from your external datasource to your Sharepoint users when you are happily mapping tables.  So make sure the users are allowed to see the information from the back end. If there is a security policy you need to enforce on entity, instance or record level you can implement the CheckAccess method.  This will trim your security by asking for the rights by passing an set of ID’s. You need to supply the back-end business logic for checking the authorization. This method will also be called when crawling the BDC.

You can find more on this subject here: http://msdn2.microsoft.com/en-us/library/bb447548.aspx

Make sure the external source is ready for the extra load

Well, this seems quite obvious, but I’ve seen cases where an administrator was quite busy figuring out why his SQL server kept getting a high load. When you introduce an extra view upon your data from Sharepoint, your load is going to be increased. Make sure your environment is ready for such an increase. This is especially important if you use BDC’s AccessChecker implementation cause this can be quite a stress on your server depending on how well you’ve managed to write your back-end access check logic. And whether your CheckAccess method supports passing of multiple id’s to check. If this is not the case, the BDC will automatically spawn a threads to acquire the rights information for every id to check, thus possibility resulting in a high load.

Use the BDC as single point of read/write to external datasource

If for some well-thought reason you need to write to your external datasource you could use the BDC as well. Therefore you can define your BDC as your single point of entrance to your external datasource for reading and writing. You can create an GenericInvoker method in your application definition, which can be called from the BDC object model. A best practice is to create an Data Access Layer (DAL) class that abstracts and encapsulates the logic for calling the methods on the BDC object model. For instance, you could make a class that bares the name: TaskEventLogger with methods like WriteTaskEvent(string eventname) that maps to methods of the same name in the application definition.

Since the application definition supports versioning you could even dynamically ask the BDC object model whether the version of the application definition loaded matches the DAL you specifically wrote for that application definition. How about that?

More information on the BDC (Microsoft):
http://msdn2.microsoft.com/en-us/library/ms563661.aspx

Del.icio.us:
http://del.icio.us/popular/BDC