Designing XML-Based Applications





Designing XML-Based Applications

There are a number of considerations to keep in mind when designing XML-based applications, particularly Web service applications. For one, you may need to design an XML schema specific for your domain. You also need to consider how your application intends to receive and send documents, and how and when to go about validating those documents. It is also important to separate the XML document processing from the application's business logic processing. ("Choosing Processing Models" on page 151 discusses in more detail separating XML document from business logic processing.)

Whether you design your own domain-specific schema or rely on standard vertical schemas, you still must understand the dynamics of mapping the application's data model to the schema. You also need to consider the processing model, and whether to use a document-centric model or an object-centric model.

These issues are discussed in the next sections.

1 Designing Domain-Specific XML Schemas

Despite the availability of more and more vertical domain schemas, application developers still may have to define application-specific XML schemas that must be agreed upon and shared between interoperating participants. With the introduction of modern schema languages such as XSD, which introduced strong data typing and type derivation, XML schema design shares many of the aspects of object-oriented design especially with respect to modularization and reuse.

The design of domain-specific XML schemas breaks down according to the definition of XML schema types, their relationship to other types, and any constraints to which they are subjected. The definitions of such XML schema types, relationships, and constraints are typically the result of the analysis of the application domain vocabulary (also called the business vocabulary). As much as possible, schema designers should leverage already-defined public vertical domain schema definitions to promote greater acceptance and interoperability among intended participants. The designers of new schemas should keep interoperability concerns in mind and try to account for reuse and extensibility. Figure shows the UML model of a typical XML schema.

Figure. Model for an XML Schema (Invoice.xsd)

graphics/04fig02.gif


The strong similarity between object-oriented design and XML schema design makes it possible to apply UML modelling when designing XML schemas. Designers can use available software modelling tools to model XML schemas as UML class diagrams and, from these diagrams, to generate the actual schemas in a target schema language such as XSD. Code Figure shows an example of a schema based on XSD.

Code Figure. An Invoice XSD-Based Schema (Invoice.xsd)

<?xml version="1.0" encoding="UTF-8"?>



<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" ...>

   <xsd:element name="Invoice">

      <xsd:complexType>

         <xsd:sequence>

            <xsd:element name="OrderId" type="xsd:string" />

            ...

            <xsd:element name="ShippingDate" type="xsd:date" />

            <xsd:element name="LineItems">

               <xsd:complexType>

                  <xsd:sequence>

                     <xsd:element type="lineItem" minOccurs="1"

                     maxOccurs="unbounded" />

                  </xsd:sequence>

               </xsd:complexType>

               <xsd:unique name="itemIdUniqueness">

                  <xsd:selector xpath="LineItem"/>

                  <xsd:field xpath="@itemId"/>

               </xsd:unique>



            </xsd:element>

         </xsd:sequence>

      </xsd:complexType>

   </xsd:element>



   <xsd:complexType name="lineItem">

      <xsd:attribute name="categoryId" type="xsd:string"

      use="required" />

      ...

      <xsd:attribute name="unitPrice" type="positiveDecimal"

      use="required" />

   </xsd:complexType>



   <xsd:simpleType name="positiveDecimal">

      <xsd:restriction base="xsd:decimal">

         <xsd:minInclusive value="0.0" />

      </xsd:restriction>

   </xsd:simpleType>

</xsd:schema>


To illustrate, consider the Universal Business Language (UBL) library, which provides a standard library of XML business documents, such as purchase orders, invoices, and so forth. UBL is a conceptual model of a collection of object classes and associations, called business information entities (BIES). These entities are organized into specific hierarchies, from which specific document types are assembled. As a result, UBL is:

  • An XML-based business language

  • Built on existing electronic data interchange (EDI) and XML business-to-business schemas or vocabularies

  • Applicable across industry sectors and electronic trade domains

  • Designed to be modular, reusable, and extensible

Additionally, as with any software design, there must be a balance between reusability, maintainability, and performance. This holds true both for the design of the XML schema itself and the logical and physical layout of the XML documents or schema instances. For example, consider a schema that reuses type and element definitions from other schemas. Initially loading this schema may require numerous network connections to resolve these external definitions, resulting in a significant performance overhead. Although this issue is well understood, and some XML-processing technologies may provide solutions in the form of XML entity catalogs, the developer may have to explicitly address this issue. Similarly, dynamically-generated instance of a document may be laid out such that it uses external entity references to include static or less dynamic fragments rather than embedding these fragments. This arrangement may potentially require the consumer of this document to issue many network connections to retrieve these different fragments. Although this sort of modularization and inclusion may lead to significant network overhead, it does allow consumers of document schemas and instances to more finely tune caching mechanisms. See "Performance Considerations" on page 182.

Generally, document schema design and the layout of document instances closely parallel object-oriented design. In addition, design strategies exist that identify and provide well-defined solutions to common recurring problems in document schema design.

Keep the following recommendations in mind when designing an XML schema:

graphics/box_icon.gif Adopt and develop design patterns, naming conventions, and other best practices similar to those used in object-oriented modelling to address the issues of reuse, modularization, and extensibility.

graphics/box_icon.gif Leverage existing horizontal schemas, and vertical schemas defined within your industry, as well as the custom schemas already defined within your enterprise.

graphics/box_icon.gif Do not solely rely on self-describing element and attribute names. Comment and document custom schemas.

graphics/box_icon.gif Use modelling tools that support well-known schema languages such as XSD.

Keep in mind that reusing schemas may enable the reuse of the corresponding XML processing code.

2 Receiving and Sending XML Documents

XML schemas of documents to be consumed and produced are part of the overall exposed interface of an XML-based application. The exposed interface encompasses schemas of all documents passed along with incoming and outgoing messages regardless of the message-passing protocol—SOAP, plain HTTP, or JMS.

Typically, an application may receive or return XML documents as follows:

  • Received through a Web service endpoint: either a JAX-RPC service endpoint or EJB service endpoint if the application is exposed as a Web service. (See Chapter 3 for more information.)

  • Returned to a Web service client: if the application is accessing a Web service through JAX-RPC. (See Chapter 5 for more details.)

  • Through a JMS queue or topic (possibly attached to a message-driven bean in the EJB tier) when implementing a business process workflow or implementing an asynchronous Web service architecture. (See "Delegating Web Service Requests to Processing Layer" on page 92.)

Note that a generic XML-based application can additionally receive and return XML documents through a servlet over plain HTTP.

Recall from Chapter 3 that a Web service application must explicitly handle certain XML schemas—schemas for SOAP parameters that are not bound to Java objects and schemas of XML documents passed as attachments to SOAP messages. Since the JAX-RPC runtime passes SOAP parameter values (those that are not bound to Java objects) as SOAPElement document fragments, an application can consume and process them as DOM trees—and even programmatically bind them to Java objects—using XML data-binding techniques such as JAXB. Documents might be passed as attachments to SOAP messages when they are very large, legally binding, or the application processing requires the complete document infoset. Documents sent as attachments may also conform to schemas defined in languages not directly supported by the Web service endpoint.

Code Figure and Code Figure illustrate sending and receiving XML documents.

2. Sending an XML Document Through a Web Service Client Stub

public class SupplierOrderSender {

   private SupplierService_Stub supplierService;



   public SupplierOrderSender(URL serviceEndPointURL) {

      // Create a supplier Web service client stub

      supplierService = ...

      return;

   }

   // Submits a purchase order document to the supplier Web service

   public String submitOrder(Source supplierOrder)

      throws RemoteException, InvalidOrderException {

      String trackingNumber

         = supplierService.submitOrder(supplierOrder);

      return trackingNumber;

   }

}


3. Receiving an XML Document Through a Web Service Endpoint

public class SupplierServiceImpl implements SupplierService, ... {



   public String submitOrder(Source supplierOrder)

      throws InvalidOrderException, RemoteException {

      SupplierOrderRcvr supplierOrderRcvr

         = new SupplierOrderRcvr();

      // Delegate the processing of the incoming document

      return supplierOrderRcvr.receive(supplierOrder);

   }

}


JAX-RPC passes XML documents that are attachments to SOAP messages as abstract Source objects. Thus, you should assume no specific implementation—StreamSource, SAXSource, or DOMSource—for an incoming document. You should also not assume that the underlying JAX-RPC implementation will validate or parse the document before passing it to the Web service endpoint. The developer should programmatically ensure that the document is valid and conforms to an expected schema. (See the next section for more information about validation.) The developer should also ensure that the optimal API is used to bridge between the specific Source implementation passed to the endpoint and the intended processing model. See "Use the Most Appropriate API" on page 184.

Producing XML documents that are to be passed as attachments to SOAP operations can use any XML processing model, provided the resulting document can be wrapped into a Source object. The underlying JAX-RPC is in charge of attaching the passed document to the SOAP response message. For example, Code Figure and Code Figure show how to send and receive XML documents through a JMS queue.

4. Sending an XML Document to a JMS Queue

public class SupplierOrderRcvr {

   private QueueConnectionFactory queueFactory;

   private Queue queue;



   public SupplierOrderRcvr() throws RemoteException {

      queueFactory = ...; // Lookup queue factory

      queue = ...; // Lookup queue

      ...

   }



   public String receive(Source supplierOrder)

      throws InvalidOrderException {

      // Preprocess (validate and transform) the incoming document

      String document = ...

      // Extract the order id from the incoming document

      String orderId = ...

      // Forward the transformed document to the processing layer

      // using JMS

      QueueConnection connection

         = queueFactory.createQueueConnection();

      QueueSession session = connection.createQueueSession(...);

      QueueSender queueSender = session.createSender(queue);

      TextMessage message = session.createTextMessage();

      message.setText(document);

      queueSender.send(message);

      return orderId;

   }

}


5. Receiving an XML Document Through a JMS Queue

public class SupplierOrderMDB

       implements MessageDrivenBean, MessageListener {

   private OrderFulfillmentFacadeLocal poProcessor = null;



   public SupplierOrderMDB() {}



   public void ejbCreate() {

      // Create a purchase order processor

      poProcessor = ...

   }

   // Receives the supplier purchase order document from the

   // Web service endpoint (interaction layer) through a JMS queue

   public void onMessage(Message msg) {

      String document = ((TextMessage) msg).getText();

      // Processes the XML purchase order received by the supplier

      String invoice = poProcessor.processPO(document);

      ...

   }

}


There are circumstances when a Web service may internally exchange XML documents through a JMS queue or topic. When implementing an asynchronous architecture, the interaction layer of a Web service may send XML documents asynchronously using JMS to the processing layer. Similarly, when a Web service implements a workflow, the components implementing the individual stages of the workflow may exchange XML documents using JMS. From a developer's point of view, receiving or sending XML documents through a JMS queue or topic is similar in principle to the case of passing documents as SOAP message attachments. XML documents can be passed through a JMS queue or topic as text messages or in a Java-serialized form when those documents can be bound to Java objects.

3 Validating XML Documents

Once a document has been received or produced, a developer may—and most of the time must—validate the document against the schema to which it is supposed to conform. Validation, an important step in XML document handling, may be required to guarantee the reliability of an XML application. An application may legitimately rely on the parser to do the validation and thus avoid performing such validation itself.

However, because of the limited capabilities of some schema languages, a valid XML document may still be invalid in the application's domain. This might happen, for example, when a document is validated using DTD, because this schema language lacks capabilities to express strongly-typed data, complex unicity, and cross-reference constraints. Other modern schema languages, such as XSD, more rigorously—while still lacking some business constraint expressiveness—narrow the set of valid document instances to those that the business logic can effectively process. Regardless of the schema language, even when performing XML validation, the application is responsible for enforcing any uncovered domain-specific constraints that the document may nevertheless violate. That is, the application may have to perform its own business logic-specific validation in addition to the XML validation.

To decide where and when to validate documents, you may take into account certain considerations. Assuming a system—by system we mean a set of applications that compose a solution and that define a boundary within which trusted components can exchange information—one can enforce validation according to the following observations. (See Figure.)

  1. Documents exchanged within the components of the system may not require validation.

  2. Documents coming from outside the system, especially when they do not originate from external trusted sources, must be validated on entry.

  3. Documents coming from outside the system, once validated, may be exchanged freely between internal components without further validation.

3. Validation of Incoming Documents

graphics/04fig03.gif


For example, a multitier e-business application that exchanges documents with trading partners through a front end enforces document validity at the front end. Not only does it check the validity of the document against its schema, but the application also ensures that the document type is a schema type that it can accept. It then may route documents to other applications or servers so that the proper services can handle them. Since they have already been validated, the documents do not require further validation. In a Web service, validation of incoming documents is typically performed in the interaction layer. Therefore, the processing layer may not have to validate documents it receives from the interaction layer.

Some applications may have to receive documents that conform to different schemas or different versions of a schema. In these cases, the application cannot do the validation up front against a specific schema unless the application is given a directive within the request itself about which schema to use. If no directive is included in the request, then the application has to rely on a hint provided by the document itself. Note that to deal with successive versioning of the same schema—where the versions actually modify the overall application's interface—it sometimes may be more convenient for an application to expose a separate endpoint for each version of the schema.

To illustrate, an application must check that the document is validated against the expected schema, which is not necessarily the one to which the document declares it conforms. With DTD schemas, this checking can be done only after validation. When using DOM, the application may retrieve the system or public identifier (SystemID or PublicID) of the DTD to ensure it is the identifier of the schema expected (Code Figure), while when using SAX, it can be done on the fly by handling the proper event. With JAXP 1.2 and XSD (or other non-DTD schema languages), the application can specify up-front the schema to validate against (Code Figure); the application can even ignore the schema referred to by the document itself.

6. Ensuring the Expected Type of a DTD-Conforming Document

public static boolean checkDocumentType(Document document,

       String dtdPublicId) {

   DocumentType documentType = document.getDoctype();

   if (documentType != null) {

       String publicId = documentType.getPublicId();

       return publicId != null && publicId.equals(dtdPublicId);

   }

   return false;

}


7. Setting the Parser for Validation in JAXP 1.2

public static final String W3C_XML_SCHEMA

   = "http://www.w3.org/2001/XMLSchema";

public static final String JAXP_SCHEMA_LANGUAGE

   = "http://java.sun.com/xml/jaxp/properties/schemaLanguage";

public static final String JAXP_SCHEMA_SOURCE

   = "http://java.sun.com/xml/jaxp/properties/schemaSource";



public static SAXParser createParser(boolean validating,

       boolean xsdSupport, CustomEntityResolver entityResolver,

       String schemaURI) throws ... {

   // Obtain a SAX parser from a SAX parser factory

   SAXParserFactory parserFactory

      = SAXParserFactory.newInstance();

   // Enable validation

   parserFactory.setValidating(validating);

   parserFactory.setNamespaceAware(true);

   SAXParser parser = parserFactory.newSAXParser();

   if (xsdSupport) { // XML Schema Support

      try {

         // Enable XML Schema validation

         parser.setProperty(JAXP_SCHEMA_LANGUAGE,

            W3C_XML_SCHEMA);

         // Set the validating schema to the resolved schema URI

         parser.setProperty(JAXP_SCHEMA_SOURCE,

            entityResolver.mapEntityURI(schemaURI));

      } catch(SAXNotRecognizedException exception) { ... }

   }

   return parser;

}


When relying on the schemas to which documents internally declare they are conforming (through a DTD declaration or an XSD hint), for security and to avoid external malicious modification, you should keep your own copy of the schemas and validate against these copies. This can be done using an entity resolver, which is an interface from the SAX API (org.xml.sax.EntityResolver), that forcefully maps references to well-known external schemas to secured copies.

To summarize these recommendations:

graphics/box_icon.gif Validate incoming documents at the system boundary, especially when documents come from untrusted sources.

graphics/box_icon.gif When possible, enforce validation up-front against the supported schemas.

graphics/box_icon.gif When relying on internal schema declarations (DTD declaration, XSD hint, and so forth):

graphics/box_icon.gif Reroute external schema references to secured copies.

graphics/box_icon.gif Check that the validating schemas are supported schemas.

4 Mapping Schemas to the Application Data Model

After defining the application interface and the schemas of the documents to be consumed and produced, the developer has to define how the document schemas relate or map to the data model on which the application applies its business logic. We refer to these document schemas as external schemas. These schemas may be specifically designed to meet the application's requirements, such as when no preexisting schemas are available, or they may be imposed on the developer. The latter situation, for example, may occur when the application intends to be part of an interacting group within an industry promoting standard vertical schemas. (For example, UBL or ebXML schemas.)

4.1 Mapping Design Strategies

Depending on an application's requirements, there are three main design strategies or approaches for mapping schemas to the application data model. (See Figure.)

  1. An "out-to-in" approach— The developer designs the internal data model based on the external schemas.

  2. A "meet-in-the-middle" approach— The developer designs the data model along with an internal generic matching schema. Afterwards, the developer defines transformations on the internal schema to support the external schemas.

  3. An "in-to-out" approach, or legacy adapter— This approach is actually about how to map an application data model to schemas. The developer designs the exposed schema from an existing data model.

4. Out-to-In Approach for Mapping Schemas to the Data Model Classes

graphics/04fig04.gif


Figure, Figure, and Figure show the sequencing of the activities involved at design time and the artifacts (schemas and classes) used or produced by these activities. The figures also show the relationships between these artifacts and the runtime entities (documents and objects), as well as the interaction at runtime between these entities.

5. Meet-in-the-Middle Approach for Mapping Schemas to Data Model Classes

graphics/04fig05.gif


6. Legacy Adapter (In-to-Out) Approach for Mapping Schemas to Data Model Classes

graphics/04fig06.gif


The first approach (Figure), which introduces a strong dependency between the application's data model and logic and the external schemas, is suitable only for applications dedicated to supporting a specific interaction model. A strong dependency such as this implies that evolving or revising the external schemas impacts the application's data model and its logic.

The second approach (Figure), which introduces a transformation or adaptation layer between the application's data model and logic and the external schemas, is particularly suitable for applications that may have to support different external schemas. Having a transformation layer leaves room for supporting additional schemas and is a natural way to account for the evolution of a particular schema. The challenge is to devise an internal schema that is sufficiently generic and that not only shields the application from external changes (in number and revision) but also allows the application to fully operate and interoperate. Typically, such an internal schema either maps to a minimal operational subset or common denominator of the external schemas, or it maps to a generic, universal schema of the application's domain. (UBL is an example of the latter case.) The developer must realize that such an approach has some limitations—it is easier to transform from a structure containing more information to one with less information than the reverse. Therefore, the choice of the generic internal schema is key to that approach. Code Figure shows how to use a stylesheet to transform an external, XSD-based schema to an internal, DTD-based schema.

8. Stylesheet for Transforming from External XSD-Based Schema to Internal DTD-Based Schema

<?xml version="1.0" encoding="UTF-8"?>



<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

   xmlns:so="http://blueprints.j2ee.sun.com/SupplierOrder"

   xmlns:li="http://blueprints.j2ee.sun.com/LineItem"

   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

   version="1.0">



   <xsl:output method="xml" indent="yes" encoding="UTF-8"

      doctype-public="-//Sun Microsystems, Inc. -

         J2EE Blueprints Group//DTD SupplierOrder 1.1//EN"

      doctype-system="/com/sun/j2ee/blueprints/

         supplierpo/rsrc/schemas/SupplierOrder.dtd" />



   <xsl:template match="/">

      <SupplierOrder>

          <OrderId><xsl:value-of select="/

          so:SupplierOrder/so:OrderId" /></OrderId>

          <OrderDate><xsl:value-of select="/

             so:SupplierOrder/so:OrderDate" /></OrderDate>

          <xsl:apply-templates select=".//

          so:ShippingAddress|.//li:LineItem"/>

      </SupplierOrder>

   </xsl:template>



   <xsl:template match="/so:SupplierOrder/

          so:ShippingAddress">

      ...

   </xsl:template>



   <xsl:template match="/so:SupplierOrder/so:LineItems/

          li:LineItem">

      ...

   </xsl:template>

</xsl:stylesheet>


Normally, developers should begin this design process starting from the application's interface definition plus the XML schemas. In some situations, a developer may have to work in reverse; that is, start from the inside and work out. See the third approach, shown in Figure.

The developer may have to use the application's data model to create a set of matching schemas, which would then be exposed as part of the application's interface. This third approach is often used when existing or legacy applications need to expose an XML-based interface to facilitate a loosely coupled integration in a broader enterprise system. This technique is also known as legacy adapters or wrappers. In these cases, the application's implementation determines the interface and the schemas to expose. In addition, this approach can be combined with the meet-in-the-middle approach to provide a proper adaptation layer, which in turn makes available a more interoperable interface that is, an interface that is not so tightly bound to the legacy application. See Chapter 6 for more information on application integration.

4.2 Flexible Mapping

In complement to these approaches, it is possible to map complete documents or map just portions of documents. Rather than a centralized design for mapping from an external schema to a well-defined internal schema, developers can use a decentralized design where components map specific portions of a document to an adequate per-component internal representation. Different components may require different representations of the XML documents they consume or produce. Such a decentralized design allows for flexible mapping where:

  • A component may not need to know the complete XML document. A component may be coded against just a fragment of the overall document schema.

  • The document itself may be the persistent core representation of the data model. Each component maps only portions of the document to transient representation in order to apply their respective logic and then modifies the document accordingly.

  • Even if the processing model is globally document-centric (see "Choosing Processing Models" on page 151), each component can—if adequate—locally implement a more object-centric processing model by mapping portions of the document to domain-specific objects.

  • Each component can handle the document using the most effective or suitable XML processing technique. (See "Choosing an XML Processing Programming Model" on page 164.)

This technique is particularly useful for implementing a document-oriented workflow where components exchange or have access to entire documents but only manipulate portions of the documents. For example, Figure shows how a PurchaseOrder document may sequentially go through all the stages of a workflow. Each stage may process specific information within the document. A credit card processing stage may only retrieve the CreditCard element from the PurchaseOrder document. Upon completion, a stage may "stamp" the document by inserting information back into the document. In the case of a credit card processing stage, the credit card authorization date and status may be inserted back into the PurchaseOrder document.

7. Flexible Mapping Applied to a Travel Service Scenario

graphics/04fig07.gif


4.3 XML Componentization

For a document-centric processing model, especially when processing documents in the EJB tier, you may want to create generic, reusable components whose state is serializable to and from XML. (See "Designing Domain-Specific XML Schemas" on page 131.) For example, suppose your application works with an Address entity bean whose instances are initialized with information retrieved from various XML documents, such as purchase order and invoice documents. Although the XML documents conform to different schemas, you want to use the same component—the same Address bean—without modification regardless of the underlying supported schema.

A good way to address this issue is to design a generic XML schema into which your component state can be serialized. From this generic schema, you can generate XML-serializable domain-specific or content objects that handle the serialization of your component state.

You can generate the content objects manually or automatically by using XML data-binding technologies such as JAXB. Furthermore, you can combine these XML-serializable components into composite entities with corresponding composite schemas.

When combined with the "meet-in-the-middle" approach discussed previously, you can apply XSLT transformations to convert XML documents conforming to external vertical schemas into your application's supported internal generic schemas. Transformations can also be applied in the opposite direction to convert documents from internal generic schemas to external vertical schemas.

For example, Figure, which illustrates XML componentization, shows a PurchaseOrderBean composite entity and its two components, AddressBean and LineItemBean. The schemas for the components are composed in the same way and form a generic composite schema. Transformations can be applied to convert the internal generic composite PurchaseOrder schema to and from several external vertical schemas. Supporting an additional external schema is then just a matter of creating a new stylesheet.

8. Composite XML-Serializable Component

graphics/04fig08.gif


5 Choosing Processing Models

An XML-based application may either apply its business logic directly on consumed or produced documents, or it may apply its logic on domain-specific objects that completely or partially encapsulate the content of such documents. Domain-specific objects are Java objects that may not only encapsulate application domain-specific data, but may also embody application domain-specific behavior.

An application's business logic may directly handle documents it consumes or produces, which is called a document-centric processing model, if the logic:

  • Relies on both document content and structure

  • Is required to punctually modify incoming documents while preserving most of their original form, including comments, external entity references, and so forth

In a document-centric processing model, the document processing may be entangled with the business logic and may therefore introduce strong dependencies between the business logic and the schemas of the consumed and produced documents—the "meet-in-the-middle" approach (discussed in "Mapping Schemas to the Application Data Model" on page 143) may, however, alleviate this problem. Moreover, the document-centric processing model does not promote a clean separation between business and XML programming skills, especially when an application developer who is more focused on the implementation of the business logic must additionally master one or several of the XML processing APIs.

There are cases that require a document-centric processing model, such as:

  • The schema of the processed documents is only partially known and therefore cannot be completely bound to domain-specific objects; the application edits only the known part of the documents before forwarding them for further processing.

  • Because the schemas of the processed documents may vary or change greatly, it is not possible to hard-code or generate the binding of the documents to domain-specific objects; a more flexible solution is required, such as one using DOM with XPath.

A typical document-centric example is an application that implements a data-driven workflow: Each stage of the workflow processes only specific information from the incoming document contents, and there is no central representation of the content of the incoming documents. A stage of the workflow may receive a document from an earlier stage, extract information from the document, apply some business logic on the extracted information, and potentially modify the document before sending it to the next stage.

Generally, it is best to have the application's business logic directly handle documents only in exceptional situations, and to do so with great care. You should instead consider applying the application's business logic on domain-specific objects that completely or partially encapsulate the content of consumed or produced documents. This helps to isolate the business logic from the details of XML processing.

Keep in mind that schema-derived classes, which are generated by JAXB and other XML data-binding technologies (see "XML Data-Binding Programming Model" on page 169), usually completely encapsulate the content of a document. While these schema-derived classes isolate the business logic from the XML processing details—specifically parsing, validating, and building XML documents—they still introduce strong dependencies between the business logic and the schemas of the consumed and produced documents. Because of these strong dependencies, and because they may still retain some document-centric characteristics (especially constraints), applications may still be considered document centric when they apply business logic directly on classes generated by XML data-binding technologies from the schemas of consumed and produced documents. To change to a pure object-centric model, the developer may move the dependencies on the schemas down by mapping schema-derived objects to domain-specific objects. The domain-specific object classes expose a constant, consistent interface to the business logic but internally delegate the XML processing details to the schema-derived classes. Overall, such a technique reduces the coupling between the business logic and the schema of the processed documents. "Abstracting XML Processing from Application Logic" on page 155 discusses a generic technique for decoupling the business logic and the schema of the processed documents.

A pure object-centric processing model requires XML-related issues to be kept at the periphery of an application—that is, in the Web service interaction layer closest to the service endpoint, or, for more classical applications, in the Web tier. In this case, XML serves only as an additional presentation media for the application's inputs and outputs. When implementing a document-oriented workflow in the processing layer of a Web service, or when implementing the asynchronous Web service interaction layer presented in "Delegating Web Service Requests to Processing Layer" on page 92, an object-centric processing model may still be enforced by keeping the XML-related issues within the message-driven beans that exchange documents.

Note that the object- and document-centric processing models may not be exclusive of one another. When using the flexible mapping technique mentioned earlier, an application may be globally document-centric and exchange documents between its components, and some components may themselves locally process part of the documents using an object-centric processing model. Each component may use the most adequate processing model for performing its function.

6 Fragmenting Incoming XML Documents

When your service's business logic operates on the contents of an incoming XML document, it is a good idea to break XML documents into logical fragments when appropriate. The processing logic receives an XML document containing all information for processing a request. However, the XML document usually has well-defined segments for different entities, and each segment contains the details about a specific entity.

graphics/box_icon.gif Rather than pass the entire document to different components handling various stages of the business process, it's best if the processing logic breaks the document into fragments and passes only the required fragments to other components or services that implement portions of the business process logic.

Figure shows how the processing layer might process an XML document representing an incoming purchase order for a travel agency Web service. The document contains details such as account information, credit card data, travel destinations, dates, and so forth. The business logic involves verifying the account, authorizing the credit card, and filling the airline and hotel portions of the purchase order. It is not necessary to pass all the document details to a business process stage that is only performing one piece of the business process, such as account verification. Passing the entire XML document to all stages of the business process results in unnecessary information flows and extra processing. It is more efficient to extract the logical fragments—account fragment, credit card fragment, and so forth—from the incoming XML document and then pass these individual fragments to the appropriate business process stages in an appropriate format (DOM tree, Java object, serialized XML, and so forth) expected by the receiver.

9. Fragmenting an Incoming XML Document for a Travel Service

graphics/04fig09.gif


While it is complementary to most of the mapping design strategies presented in "Mapping Design Strategies" on page 143, this technique is best compared against the flexible mapping design strategy. (See "Flexible Mapping" on page 148.) Flexible mapping advocates a decentralized mapping approach: Components or stages in a workflow each handle the complete incoming document, but each stage only processes the appropriate part of the document. Fragmenting an incoming document can be viewed as a centralized implementation of the flexible mapping design. Fragmenting an incoming document, by suppressing redundant parsing of the incoming document and limiting the exchanges between stages to the strictly relevant data, improves performance over a straightforward implementation of flexible mapping. However, it loses some flexibility because the workflow dispatching logic is required to specifically know about (and therefore depend on) the document fragments and formats expected by the different stages.

Fragmenting a document has the following benefits:

  • It avoids extra processing and exchange of superfluous information throughout the workflow.

  • It maximizes privacy because it limits sending sensitive information through the workflow.

  • It centralizes some of the XML processing tasks of the workflow and therefore simplifies the overall implementation of the workflow.

  • It provides greater flexibility to workflow error handling since each stage handles only business logic-related errors while the workflow dispatching logic handles document parsing and validation errors.

7 Abstracting XML Processing from Application Logic

As mentioned earlier, the developer of an XML-based application and more specifically of a Web service application, may have to explicitly handle XML in the following layers of the application:

  • In the interaction layer of a Web service in order to apply some pre- or post-processing, such as XML validation and transformation to the exchanged documents. (See "Receiving Requests" on page 89 and "Formulating Responses" on page 98.) Moreover, when the processing layer of a Web service implements an object-centric processing model, the interaction layer may be required to map XML documents to or from domain-specific objects before delegating to the processing layer by using one of the three approaches for mapping XML schemas to the application data model presented in "Mapping Schemas to the Application Data Model" on page 143.

  • In the processing layer of a Web service when implementing a document-centric processing model. (See "Handling XML Documents in a Web Service" on page 105.) In such a case, the processing layer may use techniques such as flexible mapping or XML componentization. See "Flexible Mapping" on page 148 and "XML Componentization" on page 149.

With the object-centric processing model—when XML document content is mapped to domain-specific objects—the application applies its business logic on the domain-specific objects rather than the documents. In this case, only the interaction logic may handle documents. However, in the document-centric model, the application business logic itself may directly have to handle the documents. In other words, some aspects of the business model may be expressed in terms of the documents to be handled.

There are drawbacks to expressing the business model in terms of the documents to be handled. Doing so may clutter the business logic with document processing-related logic, which should be hidden from application developers who are more focused on the implementation of the business logic. It also introduces strong dependencies between the document's schemas and the business logic, and this may cause maintainability problems particularly when handling additional schemas or supporting new versions of an original schema (even though those are only internal schemas to which documents originally conforming to external schemas have been converted). Additionally, since there are a variety of APIs that support various XML processing models, such a design may lock the developer into one particular XML-processing API. It may make it difficult, and ineffective from a performance perspective, to integrate components that use disparate processing models or APIs.

The same concerns—about maintainability in the face of evolution and the variety of XML processing models or APIs—apply to some extent for the logic of the Web service interaction layer, which may be in charge of validating exchanged documents, transforming them from external schemas to internal schemas and, in some cases, mapping them to domain-specific objects.

For example, consider a system processing a purchase order that sends the order to a supplier warehouse. The supplier, to process the order, may need to translate the incoming purchase order from the external, agreed-upon schema (such as an XSD-based schema) to a different, internal purchase order schema (such as a DTD-based schema) supported by its components. Additionally, the supplier may want to map the purchase order document to a purchase order business object. The business logic handling the incoming purchase order must use an XML-processing API to extract the information from the document and map it to the purchase order entity. In such a case, the business logic may be mixed with the document-handling logic. If the external purchase order schema evolves or if an additional purchase order schema needs to be supported, the business logic will be impacted. Similarly, if for performance reasons you are required to revisit your choice of the XML-processing API, the business logic will also be impacted. The initial choice of XML-processing API may handicap the integration of other components that need to retrieve part or all of the purchase order document from the purchase order entity.

The design shown in Figure, which we refer to as the XML document editor (XDE) design, separates application logic (business or interaction logic) from document processing logic. Following a design such as this helps avoid the problems just described.

10. Basic Design of an XML Document Editor

graphics/04fig10.gif


The term "Editor" used here refers to the capability to programmatically create, access, and modify—that is, edit—XML documents. The XML document editor design is similar to the data access object design strategy, which abstracts database access code from a bean's business logic.

The XML document editor implements the XML document processing using the most relevant API, but exposes only methods relevant to the application logic. Additionally, the XML document editor should provide methods to set or get documents to be processed, but should not expose the underlying XML processing API. These methods should use the abstract Source class (and Result class) from the JAXP API, in a similar fashion as JAX-RPC, to ensure that the underlying XML-processing API remains hidden. If requirements change, you can easily switch to a different XML processing technique without modifying the application logic. Also, a business object (such as an enterprise bean) that processes XML documents through an XML document editor should itself only expose accessor methods that use the JAXP abstract Source or Result class. Moreover, a business object or a service endpoint can use different XML document editor design strategies, combined with other strategies for creating factory methods or abstract factories (strategies for creating new objects where the instantiation of those objects is deferred to a subclass), to uniformly manipulate documents that conform to different schemas. The business object can invoke a factory class to create instances of different XML document editor implementations depending on the schema of the processed document. This is an alternate approach to applying transformations for supporting several external schemas.

Figure shows the class diagram for a basic XML document editor design, while Figure shows the class diagram for an XML document editor factory design. You should consider using a similar design in the following situations:

  • When you want to keep the business objects focused on business logic and keep code to interact with XML documents separate from business logic code.

  • In a similar way, when you want to keep the Web service endpoints focused on interaction logic and keep code to pre- and post-process XML documents separate from interaction logic code.

  • When you want to implement a flexible mapping design where each component may manipulate a common document in the most suitable manner for itself.

  • When requirements might evolve (such as a new schema to be supported or a new version of the same schema) to where they would necessitate changes to the XML-processing implementation. Generally, you do not want to alter the application logic to accommodate these XML-processing changes. Additionally, since several XML-processing APIs (SAX, DOM, XSLT, JAXB technology, and so forth) may be relevant, you want to allow for subsequent changes to later address such issues as performance and integration.

  • When different developer skill sets exist. For example, you may want the business domain and XML-processing experts to work independently. Or, you may want to leverage particular skill sets within XML-processing techniques.

11. Factory Design to Create Schema-Specific XML Document Editors

graphics/04fig11.gif


Figure and Code Figure give an example of a supplier Web service endpoint using an XML document editor to preprocess incoming purchase order documents.

9. Supplier Service Endpoint Using XML Document Editor

public class SupplierOrderXDE extends

          XMLDocumentEditor.DefaultXDE {

   public static final String DEFAULT_ENCODING = "UTF-8";

   private Source source = null;

   private String orderId = null;





   public SupplierOrderXDE(boolean validating, ...) {

      // Initialize XML processing logic

   }

   // Sets the document to be processed

   public void setDocument(Source source) throws ... {

      this.source = source;

   }

   // Invokes XML processing logic to validate the source document,

   // extract its orderId, transform it into a different format,

   // and copy the resulting document into the Result object

   public void copyDocument(Result result) throws ... {

      orderId = null;

      // XML processing...

   }

   // Returns the processed document as a Source object

   public Source getDocument() throws ... {

      return new StreamSource(new StringReader(

          getDocumentAsString()));

   }

   // Returns the processed document as a String object

   public String getDocumentAsString() throws ... {

      ByteArrayOutputStream stream = new ByteArrayOutputStream();

      copyDocument(new StreamResult(stream));

      return stream.toString(DEFAULT_ENCODING);

   }

   // Returns the orderId value extracted from the source document

   public String getOrderId() {

      return orderId;

   }

}


graphics/box_icon.gif To summarize, it is recommended that you use a design similar to the XML Document Editor presented above to abstract and encapsulate all XML document processing. In turn, the business object or service endpoint using such a document editor only invokes the simple API provided by the document editor. This hides all the complexities and details of interacting with the XML document from the business object clients.

12. Class Diagram of Supplier Service Using XML Document Editor

graphics/04fig12.jpg


As noted earlier, this design is not limited to the document-centric processing model where the application applies its business logic on the document itself. In an object-centric processing model, document editors can be used by the Web service interaction layer closest to the service endpoint, to validate, transform, and map documents to or from domain-specific objects. In this case, using the document editor isolates the interaction logic from the XML processing logic.

8 Design Recommendation Summary

When you design an XML-based application, specifically one that is a Web service, you must make certain decisions concerning the processing of the content of incoming XML documents. Essentially, you decide the "how, where, and what" of the processing: You decide the technology to use for this process, where to perform the processing, and the form of the content of the processing.

In summary, keep in mind the following recommendations:

graphics/box_icon.gif When designing application-specific schemas, promote reuse, modularization, and extensibility, and leverage existing vertical and horizontal schemas.

graphics/box_icon.gif When implementing a pure object-centric processing model, keep XML on the boundary of your system as much as possible—that is, in the Web service interaction layer closest to the service endpoint, or, for more classical applications, in the presentation layer. Map document content to domain-specific objects as soon as possible.

graphics/box_icon.gif When implementing a document-centric processing model, consider using the flexible mapping technique. This technique allows the different components of your application to handle XML in a way that is most appropriate for each of them.

graphics/box_icon.gif Strongly consider validation at system entry points of your processing model—specifically, validation of input documents where the source is not trusted.

graphics/box_icon.gif When consuming or producing documents, as much as possible express your documents in terms of abstract Source and Result objects that are independent from the actual XML-processing API you are using.

graphics/box_icon.gif Consider a "meet-in-the-middle" mapping design strategy when you want to decouple the application data model from the external schema that you want to support.

graphics/box_icon.gif Abstract XML processing from the business logic processing using the XML document editor design strategy. This promotes separation of skills and independence from the actual API used.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows