March 22, 2011, 11:57 p.m.
posted by redstink
While projects to build an enterprise repository are often less than successful, the effort continues because there are at least four good reasons to have a standard, shared repository for metadata.
When tools can exchange metadata, you can reuse existing metadata to help define each new step in the implementation process. Column names and descriptions captured in the data model design step can be used to build the relational tables, reused to populate the OLAP engine, and then used again to help populate the front-end tool’s metadata layer, for example.
If the metadata is in a standard form, your investment in defining objects and processes is protected—you are not locked in to a particular tool. For example, Reporting Services Report Definition Language (RDL) is a language standard that describes a report. If all your reports are stored in RDL, you could potentially switch to a new front-end tool and still be able to use your existing report library. (This benefit is not particularly popular with tool vendors.)
A central repository gives you a single, common understanding of the contents and structure of the data warehouse—it is the best documentation you could have. And, to the extent that this shared repository holds the official, active metadata for each tool, you know it is the current version and not a copy that may be out of date.
An integrated metadata repository allows you to more easily assess changes through impact and lineage analysis. The two concepts are essentially like looking down the same pipe from either end. In impact analysis, you want to know what downstream elements will be affected as a result of a potential change, like dropping a table. In lineage analysis, you want to know how a certain element came into being—what sources and transformations it went through to get to where it is.
You’d like to have a standard repository for metadata—at least in the DW/BI environment. The tools would write their metadata to the repository during the design phase and read it back in during the execution phase. In theory, any tool can write to and read from the standard model.
As it turns out, there is a standard framework for data warehouse-oriented metadata called the Common Warehouse Metamodel (CWM) that was first published by the Object Management Group (OMG) in 2001. The CWM is an object-based attempt to provide a metadata store for the various elements of a data warehouse. As we understand it, the overall standard is composed of three parts:
The Common Warehouse Metamodel (CWM) is a domain model (or metamodel, as OMG calls it) of the data warehouse. It has over a dozen subdomains (or sub-metamodels), including ones to describe relational elements, transformations, OLAP, and front-end components.
The Meta Object Facility (MOF) is essentially the definition language used to define a specific instance of the CWM.
XML Metadata Interchange (XMI) provides the structure to support the interchange of metadata with a standard format based on XML.
We say “as we understand it,” because the CWM standards set is long, complex, and difficult to navigate for the average data warehouse professional. The sheer volume of information is daunting with over 1,200 pages of documentation, plus the XML document for XMI, and the interface definition language (IDL) for the MOF. Beyond that, the standard is managed by an object-oriented organization and is based on CORBA and UML. There’s a lot to unravel before one can begin to understand it. That said, the metadata manager on your team should spend some time figuring out the CWM and how it plays in your vendors’ responses to the metadata problem.
While the CWM was a significant attempt to help solve the problem of metadata proliferation, the current status of the CWM is unclear. Vendor adoption of standards is always slow because each individual tool vendor has a vested interest in keeping its metadata proprietary. In fairness, some vendors believe the CWM is not flexible or comprehensive enough to support their metadata needs. Even so, many ETL and front-end tool vendors claim some level of support for CWM, but what this means is uncertain. For example, it is possible for a tool to write its metadata out to extended, custom structures in the CWM—structures that are essentially unreadable to other tools. This obviously does not help in the exchange or reuse of metadata, even if it does allow the vendor to claim CWM compliance. Further uncertainty comes from the fact that as of the time of this writing, there has been very little recent activity from the OMG on the CWM standard. There are no white papers on the topic on their web site. Version 1.1 of the standard came out in March of 2003, and the most recent press and articles listed on the main CWM page are from the year 2000. While the concept is valuable, it is too soon to celebrate the success of the CWM.
In any case, all this talk about standards is definitely a future-oriented discussion for us because SQL Server 2005 does not support the CWM standard.