XML Messaging


As we explained in Chapter 1, XML messaging is one of the most important and promising application areas of XML. In particular, XML messaging is expected to play a central role in business-to-business (B2B) collaborations in the exchange of XML documents between applications of the businesses. Furthermore, even within large businesses where Enterprise Application Integration (EAI) is concerned, their legacy applications are being integrated with XML messaging in a loosely coupled manner. In this section, we review why XML messaging is receiving considerable attention, relating it to distributed computing. We also show how to structure XML messaging and discuss why the Simple Object Access Protocol (SOAP) is so important in XML messaging. SOAP itself will be reviewed in great detail in Section 12.2.

1 Distributed Computing and Interoperability

When the computer was introduced in the industry, each company tried to computerize its in-house business operations independently from the others. Until only a decade ago, most of these in-house application systems were not connected with each other. As a result, people had to perform daily B2B operations with a phone or fax. Obviously, this way of doing business is not cost-effective, and businesses can easily lose opportunities or risk increased costs because of human error. Since then, computerizing B2B operations has become one of the most important goals for both industry and computer vendors. For example, the automobile industry has tried to computerize its supply chain management system to reduce labor costs, lead time for product supply, and simple wrong operations. Such an effort was started independently by some advanced groups of companies, and therefore there was an increasing need for a standardized means of integrating their efforts.

In the early 1990s, the idea of Electronic Data Interchange (EDI) emerged for standardizing data formats to be exchanged among companies in the same group. The idea was successful and accepted by many enterprise companies. People, however, started to be aware of the limitation that once EDI was used in daily operations, it was not very easy to modify a binary data format for the extension requirements caused by changes in business environments. EDI was not widely accepted by medium-size and small companies because the cost of introducing EDI technology was sometimes very expensive.

Distributed computing technology became mature in the mid-1990s and appli cable for B2B transactions. One of the most important technologies was Remote Procedure Call (RPC), which allows a program residing in a remote computer to be invoked, just like a local program. The Object Management Group (OMG) was formed to standardize RPC for the interoperability of heterogeneous application systems. It was named the Common Object Request Broker Architecture (CORBA) based on object-oriented programming, which allows remote objects to exchange messages. The OMG defines the Interface Definition Language (IDL) to specify APIs for each object. IDL is designed independently of any other language, and language bindings are defined to map between IDL and specific programming languages. The OMG also defines many useful services, including Object Services such as Transaction and Security Services, so that they cover most of the technologies needed for B2B system integration. More than 500 companies have joined the OMG. Many people regarded CORBA as a promising technology to overcome the technical limitations of EDI. Even though the OMG still continues to revise CORBA and enrich Object Services, unfortunately CORBA seems to be losing the interest of developers. One of the most serious problems is the size of the CORBA specification. The core specification has over a thousand pages, which is beyond the capacity of human understanding and thus prevents casual programmers from using it. Another problem is that the language-binding mechanism does not always work well enough to absorb the heterogeneity between different programming languages.

As for interoperability, although the efforts previously described focused on it for enterprise applications, we cannot ignore the big movement that happened in the late 1990s—the Internet and the Web. With the Internet, not only developers but also casual computer users could freely access documents distributed throughout the world by using any Web browser and could exchange e-mail by using any mailer. Most of these people do not have to know what the computer system is, how the network configuration is used in the target server, what kind of operating system is running, and which programming language is used to write such an application. Even though the network protocols (for example, HTTP and SMTP) and data-encoding methods (for example, Base64) used in the Internet and the Web are simpler and less efficient than those used in EDI and CORBA, the significant fact is that simplicity and interoperability (or connectivity) are much more important than efficiency. We learned that as the Internet and the Web changed the world just as the telephone and fax did.

As part of the discussion of interoperability, we may want to integrate applica tions by means of standard Internet technologies. The concept of "Web services," recently promoted by many software vendors, addresses such a requirement. One of the most generic definitions is "Web services are applications that can be accessed via standard technologies such as XML and HTTP." In this sense, the XML-processing servlets discussed in Chapter 10 can be considered Web ser vices. However, it is often insufficient to exchange XML documents merely between businesses for application integration.

In this chapter, we clarify the issues for XML-based application integration, paying attention to XML messaging, and see how such issues are solved with SOAP. Then, in the next chapter, we provide an overall picture of Web services, showing the building blocks for Web services architecture.

2 Overview of XML Messaging

Although the term "XML messaging" indicates that XML documents are exchanged between applications, it is a good idea to introduce a communication stack concept, as in the OSI seven-layer model, to understand the term more precisely. Figure illustrates a three-layer model for an XML messaging stack. Let us remember the examples in Chapter 10 in which XML documents were transmitted over HTTP. Such an approach is considered a mere combination of HTTP at the transport layer and XML documents at the application layer. You might be required to adopt other transports, such as the Simple Mail Transfer Protocol (SMTP) and Java Message Service (JMS), instead of HTTP. However, you may have some questions: Why do we need an intermediate layer called the messaging layer? How is RPC related to XML messaging in more detail? What is the difference between RPC and document-centric messaging? We are here to help.

1. XML messaging stack


One of the motivations for the messaging layer stems from end-to-end communication. As we showed in Chapter 11, recent enterprise systems have a multilayer structure. Even in the simplest case, you have an HTTP server, a servlet engine, an EJB container and a database. In addition, you may have a firewall, a network dispatcher, and a reverse proxy in front of the HTTP server, and you may have legacy applications to be accessed via JMS from the servlet engine. If we take an HTTP-based approach in Chapter 10 for B2B collaboration, we can only reach the servlet engine in some cases because backend applications are often accessible only via other transports, such as the Internet Inter-ORB Protocol (IIOP) and IBM MQSeries. Such configuration is very common in enterprise companies. If the target of your XML document is a backend application behind IBM MQSeries, you have to prepare an intermediary application by hand. The messaging layer solves such problems by defining a concept of the end-to-end message path in a transport-agnostic manner. We review this concept in more detail in the Header Processing and Intermediary section.

At the messaging layer, an envelope is defined to include application-specific XML documents. It seems redundant at first glance but is convenient because you can add various functions in a flexible manner. Let us consider some real-world examples.

  • You can easily add information on the envelope. Assume that you want to send a purchase order document, which describes only the names of the company and the department. You may include the document within an envelope, adding the address of the company, the person who should receive the envelope, and so on. Note that you can add information without changing the contents.

  • Physical envelopes are not transparent, unlike a digital envelope. Therefore, others cannot see the contents. Furthermore, if the recipient of the envelope is required to show identification, we can ensure that the envelope is delivered to the correct person. In summary, envelopes serve as security and privacy.

  • An envelope can contain more than one document and can contain other types of attachments other than documents, such as images and videos.

These examples show that envelopes are useful for additional facets. Let us shift our focus to envelopes as digital data. Typically, an envelope consists of two parts: a body, or payload, containing application-specific data and a header containing additional facets. Based on this structure, you can include application-independent facets within the header, such as security, routing, and transaction information.

The concept of the envelope is not new. But it is used in many new ways. For example, in electronic mail, various headers are defined in addition to the main text, such as a receiver address, sender address, subject, character encoding, and routing. As for HTTP, typically used by Web browsers, various headers are defined, such as a target address, requestor address, and cache information. These headers are automatically processed at the transport layer, and applications do not have to worry about such details. In the same manner, it seems worthwhile to define such an envelope structure at the messaging layer in a transport-agnostic manner.

3 New-Generation Distributed Programming

So far, we have discussed the messaging layer, addressing how to exchange XML documents. In other words, applications send and receive XML documents, understanding the semantics of the documents. As shown in Figure, this kind of XML messaging is called document-centric messaging (DCM). It is very usual that the format of exchanged XML documents is provided for B2B collaboration. For example, a business defines an XML format for a purchase order, and its trading partners have to submit purchase order documents conforming to the format. In such typical cases, DCM should be used.

On the other hand, XML is considered as a data format for performing RPC, as shown in Figure. With RPC, applications can invoke procedures located on other nodes as if they were located on the same machine. In contrast to using DCM, applications are not concerned with XML documents when using RPC; in stead, they are concerned with the API of the remote procedures (see Figure). In that case, XML is used only on the wire to encode data for RPC.

RPC is a fairly old concept, and many RPC technologies exist. However, there are some problems with the existing technologies when we use them for B2B collaboration over the Internet. First, some of them depend on a particular language, and the client and server applications have to use the same programming language. Java Remote Method Invocation (RMI) and Distributed Common Object Model (DCOM) are typical examples. This tightly coupled approach is advantageous in performance but disadvantageous in interoperability. To improve interoperability, CORBA provides IDL for programming-language independence. The concept of IDL is simple, but a CORBA platform tends to become big because CORBA specifies an integrated architecture for IDL, protocol, API, message format, and services. In a B2B situation, it is difficult to assume that both businesses have such a fairly big platform. As a result, CORBA is rarely used for B2B collaboration.

How can we develop an RPC technology that can be used over the Internet? What should be standardized for highly interoperable RPC? The answer could be a loosely coupled RPC.[1] Existing tightly coupled RPCs are concerned with the standardization of the API that is accessed by applications. On the other hand, the loosely coupled RPC is concerned only with the data format for communicating RPC data. In other words, how to provide the application with an API should be platform-dependent but should not be shared between platforms. This approach clearly contributes to the improvement of interoperability.

[1] The term "loosely coupled" concerns interoperability rather than performance and is discussed in terms of Web services in Chapter 13.

It is not worthwhile to discuss which is more important, DCM or RPC. If you are concerned with XML documents in your applications, you have to take the DCM approach. On the other hand, if you have applications and want to publish on the Internet as soon as possible, RPC should be easier. Throughout this chapter and the next, you should keep in mind that DCM and RPC are complementary.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows