Data Integration Guidelines

Data Integration Guidelines

Recall that enterprise data may be kept in various types of data stores, including relational databases, non-relational databases, directory servers, and so forth. Some common strategies that help to integrate data from these different types of data stores are data mapping, data transformation, and data filtering.

1 Data Mapping in EAI Applications

Enterprises use various types of data stores to hold data, and each such data store persists the data according to its own layout and using its own data types. This is true even within a single enterprise. Thus, there are many varied external representations of data. Often, external data representations are relational in nature because most commonly data is stored in a relational database. EAI applications need to know how to access this data. It is important to map external data to the data structures used within an EAI application.

An important issue to consider when integrating data sources is to decide on a mapping layer. Generally, you have these options for relational data:

graphics/box_icon.gif Create a formal object-oriented data model.

graphics/box_icon.gif Create a generic data-holder layer.

The formal object-oriented data model relies on object-relational mapping technologies to map data from relational data sources to an object-oriented format. You may use such mapping technologies as Enterprise JavaBeans container-managed persistence, Java Data Objects, or even the data access object strategy. There are a number of advantages to this option. For one, you can reap the traditional advantages of an object-oriented approach, notably reusability, since you establish a mapping layer that can be reused by other applications. When you use the EJB container-managed persistence technology, you also can rely on the EJB container's security features to control access to the data. You also leverage the performance benefits of the EJB container-managed persistence engine, which uses data caching to improve performance. Finally, by using these technologies, you can take advantage of the mapping tools that come with them.

The other option for representing relational data is to create a generic layer to hold the data. For this approach, you use JDBC APIs to handle data from relational data sources. The JDBC RowSet technology, in particular, makes it easy to access relational data in an efficient manner. (Keep in mind, however, that JDBC RowSet technology is not yet standard in the J2EE platform.) The RowSet technology, through the WebRowSet feature, gives you an XML view of the data source. Its CachedRowSet capabilities lets you access data in a disconnected fashion, while the FilteredRowSet functions give you the ability to manage data in a disconnected manner, without requiring heavyweight DBMS support. By using a generic layer to hold data, you have a simpler design, since there is no real layering, and you avoid the conceptual weight of a formal data model. This approach may have better performance, particularly when access is tabular in nature (such as when selecting several attributes from all rows that match a particular condition).

graphics/box_icon.gif To access data stored in non-relational data sources, it is best to use connectors.

Connectors let you plug in non-relational data sources to the J2EE environment. For non-relational systems that do not have ready-made connectors available, such as home-grown systems, you need to write your own connector. See "Integrating Custom Legacy Systems" on page 283.

2 Data Transformation

Data transformation—the ability to convert data to an acceptable application format—is a common requirement for newer applications that need to access legacy data. Such transformation functionality is necessary because most legacy systems were not designed to handle the requirements of these subsequent applications.

To illustrate, a legacy system may store dates using an eight-digit integer format; for example, storing the date September 23, 2003 as 20030923. Another application that accesses this legacy system needs the same date formatted as either MM/DD/YYYY or 09/23/2003. The application needs to access the eight-digit date from the legacy system and transform it to a usable format.

Another example of data transformation might involve customer data. Customer data spans a range of information, and might include identity and address information as well as credit and past ordering information. Different systems may be interested in different parts of this customer data, and hence each system may have a different notion of a customer.

Even schemas, including industry-standard schemas such as Electronic Data Interchange For Administration, Commerce, and Transport (EDIFact), Universal Business Language (UBL), and RosettaNet, must be transformed to each other. Often enterprises need to use these industry-standard formats for external communications while at the same time using proprietary formats for internal communications.

One way you might solve the data transformation problem is to require that all systems use the same standard data format. Unfortunately, this solution is unrealistic and impractical, as illustrated by the Y2K problem of converting the representation of a calendar year from two digits to four digits. Although going from two to four digits should be a minor change, the cost to fix this problem was enormous. System architects must live with the reality that data transformations are here to stay, since different systems will inevitably have different representations of the same information.

graphics/box_icon.gif A good strategy for data transformation is to use the canonical data model.

An enterprise might set up one canonical data model for the entire enterprise or separate models for different aspects of its business. Essentially, a canonical data model is a data model independent of any application. For example, a canonical model might use a standard format for dates, such as MM/DD/YYYY. Rather than transforming data from one application's format directly to another application's format, you transform the data from the various communicating applications to this common canonical model. You write new applications to use this common format and adapt legacy systems to the same format.

graphics/box_icon.gif Use XML to represent a canonical data model.

XML provides a good means to represent this canonical data model, for a number of reasons:

  • XML, through a schema language, can rigorously represent types. By using XML to represent your canonical model, you can write various schemas that unambiguously define the data model. Before XML, many times canonical data models were established but never documented by their developers. If you were lucky, developers might have described the model in a text document. It was easy for such a document to get out of sync with the actual types used by the model.

  • XML schemas are enforceable. You can validate an XML document to ensure that it conforms to the schema of the canonical data model. A text document cannot enforce use of the proper types.

  • XML is easier to convert to an alternate form, by using declarative means such as style sheets.

  • XML is both platform and programming-language neutral. Hence, you can have a variety of systems use the same canonical data model without requiring a specific programming language or platform.

This is not meant to imply that XML is perfect, since there are some disadvantages to using XML to represent a canonical model. For one, to use XML you need to either learn the XML schema language (XSL) or use a good tool. Often, transforming XML requires you to use XSL, which is not an easy language to learn, especially since XSL is different from traditional programming languages. There are also performance overheads with XML. However, for many enterprise settings (and especially for integration purposes) the benefits of using XML outweigh its disadvantages, and the benefits of XML become more significant when you factor in future maintenance costs.

Code Figure shows an XML document representing invoice information. This might be the canonical model of an invoice used by the adventure builder enterprise. Since this document is published internally, all new applications requiring this data type can make use of it.

3. XML Document With Invoice Information
<?xml version="1.0" encoding="UTF-8"?>







   <bpi: InvoiceId>1234</bpi:InvoiceId>

   <bpi:OPCPoId>AB-j2ee-1069278832687</bpi: OPCPoId>


   <bpi: status>COMPLETED</bpi:status>


It is usually a good idea to provide the schema for the canonical data model. By having the schema available, you can validate the translated documents against it and newer applications can use the schema to define their own models. Code Figure shows the XSD schema file for this invoice information.

4. XSD Schema for Invoice Information
<?xml version="1.0" encoding="UTF-8"?>






  <xsd:element name="Invoice" type="InvoiceType"/>

  <xsd:complexType name="InvoiceType">


           <xsd:element name="InvoiceId" type="xsd:string"/>

           <xsd:element name="OPCPoId" type="xsd:string"/>

           <xsd:element name="SupplierId" type="xsd:string"/>

           <xsd:element name="status" type="xsd:string"/>




In addition to the XML form, the canonical model representation may be needed in the Java Object form. Often, the Java Object form is needed, for example, when a substantial amount of business logic is written in the J2EE application server. To use the canonical form for new code, you convert the canonical data types (from an XML schema to Java or vice versa) in Java objects. That is, you use the XML documents or their Java object representations as the stable integration point. (See "Web Services Approach" on page 266.) For example, Code Figure shows the equivalent Java classes for the canonical model of invoice information.

5. Java Class Equivalents for Canonical Invoice Information
public class Invoice {

   public string getInvoiceId();

   public Address getOPCPoId();

   public string getSupplierId();

   public string getStatus();

   // class implementation ...


Once the XML is defined, you can also use tools such as JAXB to generate the Java classes. (See "Flexible Mapping" on page 148.)

After establishing a canonical data model, you must devise a strategy to convert any alternate data representations to this model. Because it plugs its enterprise systems—billing, order processing, and CRM—in via an application server, the adventure builder enterprise exposes the canonical data model only through the external interfaces exposed by the application server. That is, only those components with external interfaces—Web service applications, remote enterprise beans, and so forth—expose the canonical data model. Since the external world needs (and sees) only the canonical data model, the adventure builder enterprise must transform its internal data representations—which have their own data model devised by its various EISs—to this same canonical model.

The data translation between the internal and external representations can be done before the data goes from the EIS into the application server—that is, the application server internally uses the canonical data model, which is generally recommended. Or, the data translation can take place just prior to sending the data out to the external world—that is, the application server uses the various EISs' native data representations. Sometimes the business logic necessitates this latter approach because the logic needs to know the precise native format. The translation is accomplished in one of two ways:

  1. Use XSL style sheets to transform these alternate data representations, either when data comes in or when data goes out. XSL style sheets work for XML-based interfaces. In this approach, the application server internally uses the EIS native formats and the translation happens just before the data either goes out to the external world or comes in from the outside.

  2. Use a façade approach and do programmatic object mapping in the DAO layer. That is, you set up a DAO to connect an EIS to the application server. Write the DAO so that it exposes only the canonical data model and have it map any incoming data to the appropriate internal data model. Since this approach converts incoming data to the canonical form when the data arrives at the application server, the business logic internally uses the canonical data representation.

To understand the XSL style sheet approach, let's consider how the adventure builder enterprise receives invoices from various suppliers. In adventure builder's case, various suppliers submit invoices, and each supplier may have a different representation (that is, a different format) of the invoice information. Furthermore, adventure builder's various EISs may each have a different representation of the same invoice information. Code Figure shows an example of a typical supplier's invoice.

6. Example of Supplier Invoice Information
<?xml version="1.0" encoding="UTF-8"?>

<Invoice xmlns=""










   <HotelAddress>1234 Main Street, Sometown 12345, USA


<CancelPolicy>No Cancellations 24 hours prior</CancelPolicy>


Compare this listing of invoice information with that of adventure builder's canonical model, shown in Code Figure. adventure builder can convert an invoice from this supplier to its canonical data model by applying in the interaction layer of the Web service the style sheet shown in Code Figure. The style sheet defines rules, such as those for matching templates (<xsl:template match-...>), that are applied to the supplier invoice and corresponding XML fragment when the canonical model is generated.

7. Style Sheet to Convert Supplier Invoice to Canonical Model
<xsl:stylesheet version='1.0'



     <xsl:template match="text()"/>

     <xsl:template match="@*"/>

     <xsl:template match="bpi:Invoice">

      <bpi:Invoice xmlns:xsi=








     <xsl:template match="bpi:InvoiceRef">

           <bpi:InvoiceId><xsl:value-of  select="text()"/>



     <xsl:template match="bpi:OPCPoId">

         <bpi:OPCPoId><xsl:value-of select="text()"/></bpi:OPCPoId>


     <xsl:template match="bpi:SupplierId">

          <bpi:SupplierId><xsl:value-of  select="text()"/>



     <xsl:template match="bpi:Status">

           <bpi:status><xsl:value-of select="text()"/></bpi:status>



The adventure builder application applies the style sheet when an invoice is sent to a supplier. See "Reuse and Pool Parsers and Style Sheets" on page 188 for more information about pooling style sheets.

Transition façades, which can apply to any object representation of data, are a more general solution for transformations. You can use transition façades to hide extra information or to provide simple mappings. To use a façade, you write a Java class within which you do manual data transformation and mappings.

You should also consider using one of the many tools available for data transformations. These tools simplify the data transformation task and make it easier to maintain.

graphics/box_icon.gif When data is in an XML document, it is easier to write XSL style sheets that do the required transformations.

graphics/box_icon.gif When you access data using EJB container-managed persistence, you can either directly modify the container-managed persistent classes or write façades to do the required transformations.

3 Data Filtering

When you do not have access to an application's code, such as for off-the-shelf packaged applications or for applications that cannot be modified because they are critical to a working business system, you might consider using filtering. That is, you construct a filter that sits in front of the application and does all necessary data translations.

Generally, data filtering goes hand-in-hand with data transformations. The canonical data model, because it must support all use cases within an enterprise, is often a good candidate for filtering. Since many applications do not need access to all data fields, you can filter data and simplify application development and improve performance by reducing the amount of data that is exchanged.

There are two types of filtering, and each has its own use cases:

  1. Filtering that hides information but saves it for later use

  2. Filtering that outputs only needed information

For example, in the adventure builder enterprise, a workflow manager receives the invoices sent by the different suppliers. Since it needs only an identifier to identify the workflow associated with the invoice, the workflow manager can filter the document to retrieve only this information. However, since it may need to pass the entire invoice—all the data fields in the invoice—to the next step in the workflow, the workflow manager must preserve the entire document. The workflow manager can accomplish this using flexible mapping. (See "Flexible Mapping" on page 148.)

On the other hand, you may want to apply data filtering before sending information. For example, a credit card processing component of a workflow may need to send credit information. The component should send only information required for privacy protection and should use data filtering to remove information that need not be passed to another application.

Filtering can be applied at the database level, too. Using filtering, you can obtain a simplified view of the data tailored to a particular application. In addition, using EJB container-managed persistence for data transformation makes it easier to filter data. The container-managed persistence mapping tools let you select a subset of the database schema, and this is analogous to filtering. If you are using the JDBC RowSet approach, you need only select the columns for the data that you care about.

You can also write façades that appropriately filter the data. In this case, the client code accesses data only through these façades.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows