Storing files externally in MOSS to bridge ECM requirements

For those who have been with me from the beginning know that I’ve spoken quite a lot about ECM CAS (Content Addressed Storage) & HSM systems and the lack of storing files externally (ON HSM systems for instance, or file systems) in MOSS. This was one of the major drawbacks in comparison with the top of the market ECM suppliers.

I’ve blogged a long time ago about the the External Storage API, which was back then called BLOB API. You can read more about CAS & HSM in my ECM series here.

Quite recently Pav Cherney wrote an article on TechNet on how to implement your own storage provider using the ISPExternalBinaryProvider interface. This excellent article explains in detail what you need to implement your own external storage API.

Go have a look at his article here.

More on the ISPExternalBinaryProvider interface:

AIIM ‘Sharepoint Meets ECM’ sessions reviewed

Greg Clark of C3 associates has been attending the ‘AIIM Sharepoint Meets ECM’ sessions in Chicago and has written a couple of excellent blogs about Sharepoint as an ECM platform. The blog covers different opinions on the richness of MOSS as an ECM system. I recommend reading them if you are into ECM.

Liveblogging AIIM SharePoint Meets ECM Session (1 of 5)

SharePoint Meets ECM: Doculabs Summary (2 of 5)

SharePoint Meets ECM: Document Imaging Breakout Sessions (3 of 5)

SharePoint Meets ECM: SharePoint as an ECM Platform (4 of 5)

SharePoint Meets ECM: Doculabs on the Positioning of SharePoint and Traditional ECM Tools (5 of 5)

External storage API for WSS, opening the door for CAS and HSM storage?

In my ECM article I mentioned that WSS didn’t provide support external storage for files and documents. However, Microsoft has recently released new functionality in an API trough a hotfix (which is probably why I didn’t notice it before. Who puts new functionality in a hotfix?) . I didn’t have any time yet to investigate further in this API, but sounding from the description it looks promising when there is an requirement for CAS & HSM storage of Sharepoint documents.

“An external storage API is available for Microsoft Windows Sharepoint Services 3.0. The external storage API lets you store documents or files on an external storage device other than Microsoft SQL Server. This API also lets you upgrade existing Windows Sharepoint Services 3.0 sites to point to an external storage device.”

It could be a very powerful feature in the records center in combination with the DoD-5015.2 that should be released fall 2007. Too bad that  there isn’t any documentation available yet.

You can find information on the hotfix here:
http://support.microsoft.com/kb/938499/en-us

The role of MOSS in an large scale ECM environment (Part 2)

In this series

Part one
Part two

Introduction

In the previous post I gave an introduction what it takes to be an ECM enterprise. At this point any Sharepoint professional can already detect what functional requirements lack in the current MOSS implementation. However, be aware that only a relatively small portion of companies will actually need all the requirements of an such an ECM enterprise. So you want to review whether the standard MOSS ECM features provide enough functionality for your company. In this post I will dig deeper on the matter, and where to draw the line in the gray area of MOSS only or extending MOSS with third party ECM tooling. In many situations it is  unclear for both parties. The ECM party knows what it means to be an ‘true’ ECM but doesn’t seem have enough Sharepoint knowledge or experience to make a comprehensive decision. On the other hand, the overall Sharepoint party is new to ECM and has none to limited ECM field experience.

Its not really a matter whether the full ECM functionality is possible in Sharepoint since the foundation of MOSS is excellent and every ECM requirement can be custom build on top of MOSS. But more whether it is out-of the-box available, and ready for the real world.  For instance, if you want to store millions of files you can easily develop logic that distribute the files over different document libraries and folders. Single instance archiving and CAS can be achieved custom policy development. So you can basically bend Sharepoint enough to be full ECM compliant. However, custom development is costly and it is your responsibility to maintain the code throughout your enterprise existence. You also want to incorporate more tooling and content services to manage your content on an overall basis. You also need to keep in mind to keep the code working with every iteration of MOSS.

Is MOSS out-of-the-box ECM enterprise ready?

As an independent Sharepoint consultant I feel to obliged to advise clients on whether to use Sharepoint standalone or consider augmenting Sharepoint with third party ECM tooling. So is the current version of MOSS out-of-the-box ECM enterprise ready? Well, it really depends on your company requirements. As it comes to the features, MOSS has a wide range of ECM capabilities that would cover most companies’ needs. However, its not as specialized or out-of-the-box ready as the older ECM players of this world like OpenText, Interwoven or EMC2. MOSS has very good and tight office integration and user experience for creation of documents and supersedes other players in this area. But lacks the out-of-the box ECM features like document imaging, single instance archiving and hierarchical storage management that other vendors provide.

MOSS is very good for the ‘active’ part of document management like ad-hoc document creation, collaboration, basic approval workflows and sorts. But for more complex services one might rely on other vendor’s products or custom development. As for record management MOSS is certainly going in the right direction, but at this iteration of MOSS there is no information on the amount of data the record center can handle. Also the current out-of-the-box functionality is rather limited. However, the 5015.2-STD add-on pack which will be released later this year (Fall 2007) will increase the amount of functionality, and will cover probably enough of the record management requirements for most companies. There is however no detailed information available on what this record management pack will include, or whether it supports CAS or HSM.

Common ECM scenarios when one might consider the use third party add-ons or custom development

I’ve created an basic ECM requirement list that can be used to identify whether one might want to extend MOSS with third party tooling. Keep in mind that this is just a very limited set of requirements. At the moment I am building an extensive requirement list that can act as a guide when making a choice.

ECM requirement
Need for a single instance archiving
Need for hierarchical storage management or content addressed storage
Need for complex workflows, that need to be globally monitored
Need for high volume batch import & export of large quantities of files
Need for multiple classifications of files
Need for meta-data inheritance
Need for digital imaging
Need for extensive reporting over content
Need for rich content services like case management
Need for extremely large storage (Dozens of TBS)

The role of MOSS in an ECM enterprise

Depending on the amount of content and the level of ECM complexity your company requires, one could go for an MOSS only solution or augment MOSS. If the content management complexity is fairly limited, no rich content services required and the amount of content doesn’t fall in the range of HSM you could configure MOSS as your central record system. I do recommend to configure a very hard and explicit line between what is active and what is passive and can be declared as an record. Without such a explicit architecture you are risking fragmentation of your information.

Personally, I think that an augmentation or an co-existence of both ECM systems would create the ideal solution for a large full fledged ECM enterprise. This way one can use MOSS for the active in-flight part of document management such as ad-hoc creation and management of content and basic workflow. Use sites for showing context related files from external repositories. Use the CAS and HSM, DIS, case management functionality provided by third party tooling. When information is solid enough to be of critical value and can be defined as a record, it can be transparently transferred from MOSS to a more advanced ECM system. Also from my point of view, Sharepoint should not be bothered with the high speed import of large quantities of files. After classification only the files that need to be processed by humans should be promoted to the Sharepoint environment to be augmented. The others can go to the advanced ECM system directly.

However the tooling nowadays  provided by the leading ECM parties create an overlap of functionality which you might not want to pay for. The whitepaper of Doculabs describes more on this issue. Personally, I think its a matter of time before ECM vendors start building software solely for bridging the gap without the functional overlap. Microsoft surely provides enough extension points and documentation in its MOSS & WSS foundation to enable this.

What others say

Gartner has written an couple of good documents on MOSS regarding to its ECM capabilities and its position in the ECM world. Unfortunately, since these require you to be a member of Gartner, I can’t go in detail. Make sure to get them if you are.

Doculabs provides an excellent (free) whitepaper regarding the coexistence of MOSS in an ECM environment.

AIIM has an (free) seminar dedicated solely on how to efficiently let MOSS2007 and core ECM systems coexist in the same environment. If you live in the US I would recommend visiting. You can find the link below. I would highly recommend reading this. 

There are more resources available in the section ‘Recommended Reading’.

What’s next

My upcoming project will be a WCM one, so I probably won’t be blogging too much these two months about this topic. However, I am scheduled for another ECM project which involves integrating MOSS with an external ECM system later this year.

Recommended reading

Gartner:
 – Microsoft’s 2007 Sharepoint Products and Technologies in Action (June 2007) – (Excellent)
 – Q&A: Microsoft’s Content Management Software and Strategy (September 2006) 

Doculabs:
 – Analyst Report: Doculabs: The Coexistence of Sharepoint and Advanced ECM platforms ? What You Need to Know  (See the solution information bar)
 – Microsoft Sharepoint 2007 and Your Existing ECM Solution: Which Should Be the System of Record?

AIIM:
 – Free seminar by AIIM  ‘Sharepoint meets ECM’ 

Misc:
 – Office 2007, Records Management, and ECM

Blogs:
 – Sharepoint meets ECM  (AIIM)
 – Better ECM
 – So is MOSS an ECM tool or not? (C3 Associates)
 – EMC Industry Watch

The role of MOSS in an large scale ECM environment (Part 1)

In this series

Part one
Part two

Introduction

One of my clients asked me whether MOSS can be considered a ‘true’ ECM application. That certainly raised an eyebrow. Well, sure it supports ECM functionality like record management, document management IRM and sorts but does that make MOSS an enterprise ready ‘true’ ECM system?

The couple of weeks after I have been discussing the matter with ECM and fellow Sharepoint consultants. The whole large scale ECM world was new for me, but it is quite an interesting one. Now, I am not an ECM wizard but I wish to share my experiences nevertheless.

In this series of posts I will dig deeper into the matter and provide a insight in how such an enterprise works. Furthermore, I will discuss the capabilities and the role of MOSS in such an organization. What you should and what you should not do.

Defining the scope

So the first question one might ask is, what defines an ECM system? In what context are we going examine Sharepoint’s ECM abilities? For answering this question I will be referring to Wikipedia’s definition (AIIM)

“Enterprise Content Management is the technologies used to Capture, Manage, Store, Preserve, and Deliver content and documents related to organizational processes. ECM tools and strategies allow the management of an organization’s unstructured information, wherever that information exists.”

To understand the span of this abstract explanation one needs to have at least limit knowledge on what it takes to be an ECM enterprise. So let’s take a look at your typical ECM enterprise.

Anatomy of an ECM enterprise in a nutshell

Imagine, that this nifty diagram is our typical ECM enterprise called ‘claims-R-us’. Everyday large amounts of claims forms are send trough regular paper mail and need to be extracted, managed, audited, augmented, stored and destroyed in our enterprise. As you can see, our role model enterprise exists out of a few functional areas:

Naamloos-1

Information acquisition (process services)

At some point information needs to be imported in the system trough channels. Each channel represent an information flow. This can be of all sorts like paper (regular forms), recorded sound (phone) or information from third parties.

Some channels require conversion of the original media before it can be stored digitally. Think of paper documents that need to be scanned to a certain format. Often law regulations enforces that information that enters the system and can be identified as a record, needs to be stored as-is thus in its original format.

Its not unusual for a large scale enterprise to scan up to half a million forms a day by using multiple scanners. The scanning process is the responsibility of a DIS (Digital Imaging System) .The information from these channels are often augmented with information that enables the process to identify their further journey trough the enterprise.

Information management (content services)

The information that enters the system needs to be classified. Classification happens often from an business point-of-view. So when the scanned document is an ‘car damage form’ it will be classified a such. Often there are multiple classifications possible per document. There will be made a record to keep track of the document instance and the classification data. Usually at this point an separation of document flow will occur. The system will identify which classifications of information can be files directly in the archive (like receipts) or whether it needs further processing (for instance analyze, addition and mutation of claim forms by humans)

Records management needs to be applied that enforces that law regulations such as DoD 5015.2-STD are met. Transformations may be required to make sure that the document is un-modifiable. When necessary IRM is applied. Information Workers work in processes to create new documents or edit existing ones.

This information management functional area is the central information system of your enterprise. All the common document management functionality such as editing, checking in and out, information worker flows take place here.

Storage management (repository services)

Information that is significant for your business needs to be stored somewhere safe. Excuses me as I go a little in depth here since this storage is such an important part of the process.

For this high volume storage of documents, enterprises often implement hierarchical storage management (HSM) which is a data storage technique which automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as hard disk drive arrays, are more expensive (per byte stored) than slower devices, such as optical discs and magnetic tape drives. (Wiki). The surpassing of storage medium is mostly based on activity. For instance, when there is lots of file activity the file remains on the high-speed storage device, with lesser activity it gets demoted to the slower disks.

Often these systems implement an technique called “Content Addressed Storage” (CAS). Which can be seen as a wardrobe at your local club. You give your coat, get a number in return, later on you supply the number and you will get your coat back. A CAS system does basically the same thing with files. It takes care itself on where to store the file, how it works internally, is not for you to worry. You can retrieve it any time. EMC Centera is an example of such a CAS system.

Coming up in the next series

So now we described an typical enterprise we can discuss the matter further in our next post. Where we will describe the possible roles of Sharepoint within this organisation.

Recommended reading

So, is MOSS an ECM Tool or Not?
http://www.c3associates.com/2007/04/25/so-is-moss-an-ecm-tool-or-not/

Update (17th-August): Edited some links and made introduction.