Metadata and Version Control of Assets

 

Approving and publishing content, like press releases or financial statements, usually follows some sort of formal procedure. The main focus when managing this process, is on assuring quality and, increasingly, to also document the release process to mitigate risks involved.

 

It's obvious why the final version of an asset that is published gets the most attention. However, one should keep in in mind that most published content is later amended, updated, translated, personalized or altered in other ways. Therefore, it's usually not just one asset, but an array of related assets that need to be managed properly. The resulting complexity of the multitude of such dependent document lifecycle processes, and especially the planning process to create these assets, requires a wealth of supportive information: the target audiences for the content; the worklist to create the asset; who approves the content and takes which part of the responsibility; the usage policies and restrictions; or the feedback on the content, that determines how to improve future versions of an asset. All this data is known as metadata, data that describes the actual asset.

 

Metadata in SharePoint - Hassle or Potential?

 

As with workflow, metadata involves almost everyone. Practically though, for both metadata and workflow, their adoption rate is much higher with project managers than with the individual contributors to the process. Most formal workflow solutions installed today simply don't present the metadata well enough to become a natural step of the process, and as nobody likes to be fenced in by modal workflows and complex and inconsistent user interfaces for workflow and metadata, creative users usually avoid metadata work, if given the option.

 

Not surprisingly, most metadata comes informally as part of email messages, just as much as the asset itself comes as an enclosure to the same email. Clearly, this is a weak approach given that audit trails or process efficiency analysis are almost impossible to conduct in such scenarios.

 

Adobe was one of the first, amongst the dominant vendors of software for creative users, to give metadata a central and visible place. They introduced xmp, an xml-based generic metadata standard. They also added the Bridge, to visualize and manage the metadata centrally, and Version Cue, for version control over assets, to their Creative Suite product. While the adoption of metadata increased somewhat afterwards, the use of metadata still largely suffers from inconsistencies, and metadata entered inconsistently is simply no better than no metadata at all. The solution to this problem can only come from a tie-in with a workflow technology that streamlines and supports a specific set of business processes and harvests the required metadata quasi as a "side-effect".

 

Office 2007 implements a good approach to deal with metadata (see the previous article on "Workflow in SharePoint"), that can tie into workflow, but how does the underlying SharePoint system conceptually support metadata, and can administrators also easily include metadata in the overall solution that needs to be deployed?

 

Previous versions of SharePoint have been extremely limited in regards to metadata. At first sight, SharePoint 2007 is still short on a number of important points and, "out of the box", it certainly is not a "killer app" for the leading Digital Asset Management systems (which focus on metadata). The good news is, that SharePoint now finally includes a number of concepts that enable developers to turn SharePoint into a very decent Digital Asset Management System that will ultimately cannibalize the business of the established DAM vendors.

 

Structuring and Visualizing Metadata
SharePoint 2007 builds on a central new concept of so-called "content types". It's now possible to have multiples of these content types be managed within one document library.

 

Each content type can have with its own metadata that can be easily set up by the content administrator. A content types also "inherits" the metadata from its "parent" content type (and potentially additional ancestors). Metadata fields on content types are called "Columns" in the SharePoint user interface.

 

Metadata is usually structured into so-called schemas, like xmp Basic or Dublin Core, and even the proprietary metadata required for specific business processes is usually better managed if structured and grouped into schemas.

SharePoint predefines Dublin Core as an example for additional metadata, beyond the typical metadata that comes with a specific document type. Other schemas can be easily added as groups of site columns, or separate content types.

 

The user interface SharePoint provides to make the metadata accessible is nothing to get excited about. For any efficient metadata entry, especially when entering large amounts of metadata, a custom user interface would most likely have to be developed. It's pretty easy though to define custom views, either for personal or public access, that define what metadata is displayed.

 

SharePoint also includes so-called "computed columns" and show-cases that feature with three low-res versions for images managed by the Picture Library that is now included: a Thumbnail, a Preview and a Web Preview, which should enable decent output on modern mobile phones.

 

Proliferating Metadata
A core functionality around metadata is that the system selectively extracts the metadata that comes with an asset, and also to store any metadata, entered in the system, back into the asset itself (called "Promotion" and "Demotion" in SharePoint). SharePoint's "Parser" interface implements this conceptually, but support is limited "out of the box" to the documents that are managed by Office 2007, and their metadata. That means that additional software or customization is required to extend that valuable concept for all content and document types used in a solution.

 

Extracting selected metadata fields alone is not enough: with the various existing schemas potentially overlapping, and sometimes using uncommon naming for metadata fields, it's important that metadata fields in the asset can be flexibly associated with Columns, that provide a more meaningful name. Conceptually SharePoint supports this need, but out of the box it practically doesn't, so additional customization is likely required.

 

An indication that Microsoft has been behind on implementing proper support for metadata is that only just now with SharePoint 2007, it's possible to index selected columns which results in much better performance for large libraries and lists.

 

 

 

Orchestrating Metadata
The Business Data Catalog capability of MOSS 2007 is a fundamental mechanism that is essential for a Portal. One of the many ways to leverage data in other LoB applications (Line of Business), like CRM (Customer Relationship Management) or PIM (Product Information Management), is to access and seamlessly display metadata from these otherwise independent solutions. MOSS 2007 does that extremely well: LoB metadata can even be visualized in the Office 2007 applications next to next with metadata managed in SharePoint itself.

 

Searching on Metadata
Searching metadata is yet another adventure to set up: not enough with the confusion around fields vs. columns, when setting up search capabilities in SharePoint's Central Administration tool on the server, SharePoint columns are listed under crawled properties, which have to be managed against so-called "managed properties". Setting them up is a cumbersome process with the standard admin user interface. Additionally, it's necessary to customize the search pages to show the managed properties that have been created, so in short, one might be simply better off to implement a custom search web part for a specific library in SharePoint that provides convenient access to the metadata search criteria and the search results, including the metadata.

 

The file format support for SharePoint out of the box is still very limited, with less than two dozen formats. Additional "iFilters" for format support are available from other vendors, but they are also sparse. Microsoft has unofficially announced that they will deliver an "iFilter Package" in the June 2007 timeframe, but no specific promises have been made. This will be an area that will be important to judge how relevant SharePoint will become for managing assets.

 

Versions, Variations, Renditions, Derivatives and Related Assets

 

The version control system included in SharePoint 2007 really makes life easy. It provides the most important basic functionality, like major and minor revisions, and ties in nicely with the general approval processes and even tracks changes to the metadata. It leaves the option to require a formal check out when editing documents, so it doesn't necessarily add an additional step to start working on a document. Setup is easy, and so is using it.

 

The SharePoint version control system builds on the WebDAV (Digital Authoring and Versioning) standard. It practically means that the system is very open, and assets can be accessed simply through a URL. Some creative tools in the market do not support writing to WebDAV solutions, so it's necessary to check if all tools involved support WebDAV.

 

Microsoft recommends not to store documents with a size of over 50MB in SharePoint. This should not be a problem for the bulk of documents managed, but videos or even very large still images can easily exceed that size and therefore should be stored externally. It's possible to manage the metadata in SharePoint and reference an external asset. However, this potentially adds the additional problem of running two separate version control systems, which defies the goal of central administration.

 

Another important functionality of SharePoint is its ability to copy and render assets, and automatically build and maintain a reference between the original asset and its derivative. It even stores a link back to the converter that was used, and also knows what version of an asset served as the basis for the derivative. This capability surfaces nicely in one of the core features of the new "slide library" in MOSS 2007: when a slide from the slide library is re-purposed for a presentation, PowerPoint can keep the link, and when later opening that presentation, PowerPoint will notify the user if the original slide has meanwhile been changed.

 

It's also good to see that the conversion process is built in a way which allows balancing the load of the sometimes time-consuming process of converting large documents. Yet, this is only a conceptual benefit at this point, as the number of converters available is still quite limited. Here too it will be interesting to monitor how much support SharePoint will get from independent software vendors, adding conversion functionality for more file formats.

 

Summary

 

SharePoint 2007 bags a lot of improvements when it comes to metadata and version control. However, on many topics, it's still limited to concepts, and gaps are quite visible towards providing complete solutions.

 

To make SharePoint an anchor point for managing Work in Progress it mainly requires the right workflows, seamless integration with the authoring and review tools, and especially intuitive access and "nudging users" to cultivate the metadata. SharePoint can be the basis for all this, but it still requires other vendors and system integrators to close these gaps.

When reading the Blogs and feeling the excitement around SharePoint, especially in the developer community, I predict that it won't be long before we see many of these gaps starting to close, and complete end-to-end workflow solutions begin to appear.