Overview of the SOAP Protocol





Overview of the SOAP Protocol

The original impetus for SOAP was XML-based method calls over a network; thus the acronym SOAP, for Simple Object Access Protocol. SOAP has undergone several evolutions since the original SOAP 0.88 specification. Previous to that, there were several XML protocols that targeted message passing and/or remote procedure calls.

In the spring of 2000, several people from Microsoft, IBM, and other companies got together and finished SOAP version 1.1. This version was very similar to SOAP 1.0 but updated the HTTP binding and some other features. More important, it unlocked the SOAP protocol from RPCs, a purpose associated with earlier SOAP versions. RPCs were still possible, and even described in SOAP 1.1, but were no longer the sole purpose. SOAP had evolved. To illustrate, let's cover the major sections of the SOAP specification.

Enveloping with SOAP

Beginning with the 1.1 specification, SOAP specifies three major XML elements that can envelope, or wrap around, any XML data message that you may want to send. These three elements are contained in the http://schemas.xmlsoap.org/soap/envelope/ namespace. The root document element is called <Envelope> and is mandatory. The two child elements of this element are called <Body> and <Header>.

NOTE

As mentioned earlier, there are several versions of SOAP, including the W3C's working group for SOAP 1.2. This chapter discusses SOAP 1.1, except where specifically stated otherwise.

The SOAP Body

The <Body> element is the other mandatory element of any SOAP message. Within this element, any XML can be included. The data found within the <Body> is the data that is intended for the final message recipient. No matter how many firewalls, bridges, and other intermediaries or processes touch the SOAP message, only the final destination should actually read and act upon the <Body> data.

The SOAP Header

The <Header> element specifies data that by default is intended for the final destination, but that intention can be changed. In general, the <Header> specifies data that is orthogonal to the <Body> data; for example, it may contain authentication information (refer to Listing 9.1), or information to specify the contextual ID (e.g., a session) for the message. The <Header> isn't required for a SOAP message.

SOAP messages can be very simple, because the specification states that although a namespace declaration is recommended, neither it nor the <Header> element is required. This means that the following SOAP message is legal, albeit useless:

<Envelope> 
     <Body />
</Envelope>

Here is an example of a slightly more complicated SOAP message:

<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/"> 
     <Body>
          <Alert xmlns="http://keithba.com/alerts">
               KeithBa is online!
          </Alert>
     </Body>
</Envelope>
Using the Header for Authentication
<Envelope xmlns="http://schemas.xmlsoap.org/soap/envelope/">
     <Header>
          <AuthHeader xmlns="http://keithba.com/security">
               <UserName>Keith</UserName>
               <Pwd>KeithRocks!</Pwd>
          </AuthHeader>
          <Context xmlns="http://keithba.com/context">
               http://keithba.com/alerts/1234321
          </Context>
     </Header>
     <Body>
          <Alert xmlns="http://keithba.com/alerts">
               KeithBa is online!
          </Alert>
     </Body>
</Envelope>
Actors

Header elements, by default, are intended for the final destination, that is, the <Body> recipient. However, you can reset the destination of the header data to another recipient. As a matter of fact, you can have a header for the final destination, another for a firewall, and yet another for another firewall.

Firewalls, Bridges, and Intermediaries

SOAP is a messaging protocol. Messages aren't always sent from one machine (the sender) directly to just one other machine (the destination). Often, you may need to send a SOAP message through several intermediares. These intermediares may be bridges, which take the SOAP message as HTTP and send it out over SMTP. Or they may be firewalls, which take the message and verify that is authorized.

Imagine that our authentication header from Listing 9.1 were really intended for the destination's firewall, which handles authorization, but that the context were still intended for the final destination. In this case, the actor attribute would be used to indicate the intended recipient of the header data, as shown in Listing 9.2.

Using the Actor Attribute
<soap:Envelope soap:xmlns="http://schemas.xmlsoap.org/soap/envelope/">
     <soap:Header>
          <AuthHeader
                soap:actor="http://theFirewall"
                xmlns="http://keithba.com/security">
               <UserName>Mel</UserName>
               <Pwd>MelRocks!</Pwd>
          </AuthHeader>
          <Context xmlns="http://keithba.com/context">
               http://keithba.com/alerts/1234321
          </Context>
     </soapHeader>
     <soap:Body>
          <Alert xmlns="http://keithba.com/alerts">
               KeithBa is online!
          </Alert>
     </soap:Body>
</soap:Envelope>
The mustUnderstand Attribute

Another attribute that can be applied to headers, called mustUnderstand, indicates that the intended recipient of the header must process and semantically understand the header. Otherwise, a standard error must be returned (refer to the section entitled Errors with SOAP later in this chapter).

Listing 9.3 shows an example of using the mustUnderstand attribute on the context header from Listing 9.2.

Using the mustUnderstand Attribute
<soap:Envelope soap:xmlns="http://schemas.xmlsoap.org/soap/envelope/">
     <soap:Header>
          <AuthHeader
                 soap:actor="http://theFirewall"
                 xmlns="http://keithba.com/security">
               <UserName>Mel</UserName>
               <Pwd>MelRocks!</Pwd>
          </AuthHeader>
          <Context
                 xmlns=http://keithba.com/context
                 soap:mustUnderstand="true" >
               http://keithba.com/alerts/1234321
          </Context>
     </soapHeader>
     <soap:Body>
          <Alert xmlns="http://keithba.com/alerts">
               Melissa is online!
          </Alert>
     </soap:Body>
</soap:Envelope>

In this example, if the intended recipient of the context header (in this case, the final destination of the SOAP message) doesn't semantically understand the header, then an error must be returned. By semantically understand, I mean that the final destination must be expecting the header. This varies from SOAP toolkit to SOAP toolkit, but generally it means that unless there is specific code in the message handler for this header information, the SOAP runtime of the toolkit should raise the error.

Errors with SOAP

Error handling is defined to some degree in the SOAP specification. Basically, the SOAP specification defines a standard mechanism for encoding and returning error information: the SOAP <Fault>. A specific section of the SOAP specification covers these and also defines some standard errors to send with faults. These are covered in each section of the SOAP specification as appropriate.

As mentioned, error information is sent with SOAP via the <Fault> element, which is a child of the SOAP <Body> element. <Fault>s also are specified to have several child elements that contain specific information about the error. They are as follows:

  • <faultstring>— Provides a human-readable description of the error.

  • <faultcode>— A specific code used to detail the error computationally. This must be a qualified name.

  • <faultfactor>— A value used to indicate who caused the error.

  • <detail>— An element that can contain any kind of XML, which is used to expand upon the error. This is typically an application-specific error.

Imagine if the recipient didn't understand our earlier example of the context header. The SOAP specification defines a specific fault code and error that must be returned when a header is not understood. This error code might resemble Listing 9.4.

A SOAP Error Code
<soap:Envelope
  xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
   <soap:Body>
       <soap:Fault>
           <faultcode>soap:MustUnderstand </faultcode>
           <faultstring>SOAP Must Understand Error</faultstring>
           <detail>
                <ExtraInfo xmlns="http://keithba.com">
                    The header wasn't understood. What's up with that?
               </ExtraInfo>
           </detail>
       </soap:Fault>
   </soap:Body>
</soap:Envelope>

Notice that the fault code was set to be soap:MustUnderstand. This is the code that should be sent for headers that are not understood. SOAP also describes three other fault codes:

  • VersionMismatch— The message recipient does not support the version of SOAP used (indicated by the namespace on the envelope and other elements).

  • Client— It is the client's fault that the error occurred. This could be for a variety of reasons, such as a malformed XML document.

  • Server— The error is due to the server, such as a processing error.

Remote Method Calls with SOAP

SOAP contains two different sections that, when combined, enable you to encode the common data types from languages such as Java and C#. It also describes how to map common RPCs, such as the concept of return values, based on this encoding of common data types.

Encoded XML

Section 5 of the SOAP specification contains rules for encoding data types into XML. You will find encoding to be a tactical solution to many Web service interoperability problems, but I think that the correct long-term strategy is to emphasize non-encoded or literal SOAP messaging.

NOTE

Be aware that the specific rules for encoding are changing from SOAP 1.1 to SOAP 1.2. These changes are primarily to express the encoding in terms of the XML Information Set (Infoset), and to clear up bugs that have been found with the encoding rules in SOAP 1.1.

Basically, the rules in Section 5 for encoding XML are simple. The thing to remember is that the XML format is dictated by the type system of the programming language or languages involved. By this I mean that the encoding rules are framed in terms of programming language types.

An example is probably the best place to start. Imagine that you have a simple class that represents an Address (written in C#):

public class Address 
{
     public String[] Street;
     public String City;
     public String State;
     public String ZipCode;
}

This class, according to the rules of Section 5, will resemble Listing 9.5.

Encoding a C# Class in XML
<tns:Address xmlns:tns="http://keithba.com">
      <Street href="#id2" />
      <City xsi:type="xsd:string">Redmond</City>
      <State xsi:type="xsd:string">WA</State>
      <ZipCode xsi:type="xsd:string">98045-0001</ZipCode>
</tns:Address>
<soapenc:Array id="id2" soapenc:arrayType="xsd:string[2]">
      <Item>1 Microsoft Way</Item>
      <Item>Suite 1</Item>
</soapenc:Array>

Let's review the basic rules of serialization:

  • Simple types, such as strings, can be called whatever you want. But you need to have either a schema or an xsi:type, or (within arrays) the element name itself needs to be called by the data type. In this address example, we are using the xsi:type and we are doing the array information.

  • Structures, such as structs and classes, are wrapped with a wrapper element whose members are not namespace qualified. In this example, the wrapper is called Address, but it doesn't have to be.

  • Arrays are wrapped without namespace qualifying the array elements, and they also contain a QName-like description of the array. For example, with "xsd:string[2]", it is the [2], not the QName, that sets it apart.

  • Object references are expressed via href and id attributes. If you give elements IDs to name them, then you can then refer to those elements in multiple places (multi-ref) using the href syntax, prepending the # symbol.

Two other interesting features of Section 5 encoding are partially transmitted arrays and sparse arrays. Basically, if you want to send only the last two elements in an array of four elements, you can send the offset of the arrays. Or, in this case, we'll send the last item of the two-item array:

<soapenc:Array 
         id="id2" soapenc:arrayType="xsd:string[2]"
         soapenc:offset="[1]">
      <Item>Suite 1</Item>
</soapenc:Array>

With sparse arrays, you send only specific items in the array, not every one, but the ones you are sending are not merely a sequential set based on an offset:

<soapenc:Array id="id2" soapenc:arrayType="xsd:string[4]"> 
      <Item soapenc:position="[2]">Building 3</Item>
     <Item soapenc:position="[4]">Room 4</Item>
</soapenc:Array>

Elements that are omitted with SOAP encoding may be considered null as well. This means, in the following code example, that ZipCode can be considered to have a null value, because it is missing:

<tns:Address xmlns:tns="http://keithba.com"> 
      <Street href="#id2" />
      <City xsi:type="xsd:string">Redmond</City>
      <State xsi:type="xsd:string">WA</State>
</tns:Address>
<soapenc:Array id="id2" soapenc:arrayType="xsd:string[2]">
      <Item>1 Microsoft Way</Item>
      <Item>Suite 1</Item>
</soapenc:Array>
Remote Method Calls

Section 7 of the SOAP specification deals with how you can make a remote procedure call with SOAP, using the rules of encoding found in Section 5. The RPC-style mechanism found in Section 7 is similar to that found in other protocols such as CORBA or RMI, but much more simplistic. Basically, Section 7 states that you model the procedure call information as a struct (or class), as described in Section 5.

Imagine you have a function that is used to submit an Address, like the one from the previous example. In C#, this function signature might look as follows:

 public Address SubmitAddress( Address addr, bool dontSave ) 

In this case, using the rules of Section 5, we would create a struct-like piece of XML called <SubmitAddress> that contains the Address parameter <addr> and the <dontSave> Boolean, inside of the SOAP <Body> element. Listing 9.6 shows this.

A Method Call with Encoding
<soap:Envelope
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
     xmlns:tns="http://soapinterop.org"
     xmlns:types="http://soapinterop.org/encodedTypes"
     xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body
     soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    <tns:SubmitAddress>
      <addr href="#id1" />
      <dontSave xsi:type="xsd:boolean">false</dontSave>
    </tns:SubmitAddress>
    <tns:Address id="id1" xsi:type="tns:Address">
      <Street href="#id2" />
      <City xsi:type="xsd:string">Redmond</City>
      <State xsi:type="xsd:string">WA</State>
      <ZipCode xsi:type="xsd:string">98045-0001</ZipCode>
    </tns:Address>
    <soapenc:Array id="id2" soapenc:arrayType="xsd:string[2]">
      <Item>1 Microsoft Way</Item>
      <Item>Suite 1</Item>
    </soapenc:Array>
  </soap:Body>
</soap:Envelope>

Notice also that the <addr> parameter is referenced via the href attribute, which isn't required, but is how .NET will always serialize (using XML Serialization) any classes it comes across when used as parameters. It is important that each parameter matches the name and type of the parameter it is encoding, and that the struct which is the method call matches the method name being modeled.

Responses from these kinds of RPC method calls are also modeled as Section 5 structs, with a couple of differences. Remembering that our SubmitAddress method returns an Address, it would look something like Listing 9.7.

A Section 7 Response Message
<soap:Envelope
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
     xmlns:tns="http://soapinterop.org"
     xmlns:types="http://soapinterop.org/encodedTypes"
     xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body
       soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    <tns:SubmitAddressResponse>
      <SubmitAddressResult href="#id1" />
    </tns:SubmitAddressResponse>
    <tns:Address id="id1" xsi:type="tns:Address">
      <Street href="#id2" />
      <City xsi:type="xsd:string">Redmond</City>
      <State xsi:type="xsd:string">WA</State>
      <ZipCode xsi:type="xsd:string">98045-0001</ZipCode>
    </tns:Address>
    <soapenc:Array id="id2" soapenc:arrayType="xsd:string[2]">
      <Item>1 Microsoft Way</Item>
      <Item>Suite 1</Item>
    </soapenc:Array>
  </soap:Body>
</soap:Envelope>

Notice that the return response is modeled as a struct as well. The name of the struct is not important, but its position is: It should be the first (and possibly only) struct within the SOAP <Body> element. Likewise, the value of the return value doesn't matter, but its position does: It should be the first element within the struct.

This brings up an interesting point, in that now out parameters can also be returned, and named. For example, let's modify this to make the Boolean value <dontSave> a reference parameter that is passed in and out of the function (not that this makes a lot of sense). This response on the wire would then resemble Listing 9.8.

An Encoded Message
<soap:Envelope
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema"
      xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
      xmlns:tns="http://soapinterop.org"
      xmlns:types="http://soapinterop.org/encodedTypes"
      xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body
      soap:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
    <tns:SubmitAddressResponse>
      <SubmitAddressResult href="#id1" />
      <dontSave xsi:type="xsd:boolean">false</dontSave>
    </tns:SubmitAddressResponse>
    <tns:Address id="id1" xsi:type="tns:Address">
      <Street href="#id2" />
      <City xsi:type="xsd:string">PortlandCity>
      <State xsi:type="xsd:string">OR</State>
      <ZipCode xsi:type="xsd:string">97123</ZipCode>
    </tns:Address>
    <soapenc:Array id="id2" soapenc:arrayType="xsd:string[2]">
      <Item>100 Main St.</Item>
      <Item>Apt 4</Item>
    </soapenc:Array>
  </soap:Body>
</soap:Envelope>

The <dontSave> element is after the return value. This is because the return value is the first element, although not named, in the struct that is modeled as the method return. But, if you remember, Section 5 encoding allows us to omit elements that are null. If we were to return a null return value, then the first element in the struct would be the Boolean, not the return value!

These kinds of return values will confuse many SOAP toolkits. Therefore, I highly recommend that you avoid returning null values from encoded Web service operations whenever you might also use referenced parameters.

The Encoding Style Attribute

When using the Section 5 encoding rules, it's a good idea to specify that this encoding style was used. SOAP 1.1 has a special attribute, called encodingStyle, that you can apply to the SOAP <Body> or other elements to indicate that Section 5 rules are being used. When encodingStyle is set to be http://schemas.xmlsoap.org/soap/encoding/, this means that Section 5 was used.

Note that this encodingStyle attribute does not necessarily mean that Section 7, RPC, rules were used. However, in practice, encoding is seldom done except when remote procedure calls via Section 7 are employed.

SOAP 1.1 also states that this encodingStyle can be a list of URIs that comprises several encodings, with each encoding in the list a subset (or constraint set) of the preceding one. Because in practice very few SOAP toolkits can handle such a list of URIs, I generally recommend that if you have control of this attribute, you don't fill it with a list.

Also note that you aren't required to send this attribute. I would always send it, but in theory, most endpoints already know whether they are encoded or not. This attribute would only confirm it. Furthermore, with literal-style SOAP, you won't need an encodingStyle to indicate this, because the absence of encoding is by definition literal.

SOAP and HTTP

SOAP lets you use any transport you want; in other words, it is transport independent. This is a major design advantage of SOAP, because the ability to send a message over multiple transports is both important and powerful.

Section 6 of the SOAP specification is all about how to do SOAP over the HTTP transport. Earlier versions of SOAP, such as SOAP 1.0, also detailed how to do SOAP over HTTP, but in those cases it was specified that you needed to use a piece of the HTTP Extension Framework to do this. In this case, the M-POST verb was specified as required.

With SOAP 1.1, this requirement when doing SOAP over HTTP was removed. Now, a SOAP request can be sent as either an M-POST or a regular M-POST. There are a few interesting things you must take care of when doing SOAP with HTTP. Listing 9.9 shows an example of a request message sent over HTTP.

A Literal Message
POST /SomeVDir/Service1.asmx HTTP/1.1
Host: localhost
Content-Type: text/xml; charset=utf-8
Content-Length: XXX
SOAPAction: "http://autoparts.com/SubmitPO"

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:xsd="http://www.w3.org/2001/XMLSchema"
      xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body>
    <PO xmlns="http://autoparts.com/">
      <ID>123</ID>
      <PartName>Tire</PartName>
      <Quantity>4</Quantity>
    </PO>
  </soap:Body>
</soap:Envelope>

Notice that the context type is text/xml. This is mandated by the SOAP specification. In addition, a custom HTTP header needs to be added called SOAPAction. The SOAP 1.1 specification says that this value should indicate the "intent" of the SOAP message. That's pretty vague, and as a consequence, there are different interpretations of how to use this header.

By default, the .NET Framework treats this value as an indication that the method should be deserialized and then dispatched to. You can override this behavior as well, such that the SOAPAction header isn't used for dispatching, but the header will still be required.

Listing 9.10 shows an example of a response message in HTTP.

A Response Message with No Encoding
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Length: XXX

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema"
     xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
  <soap:Body />
</soap:Envelope>

As you can see, the response message over HTTP is a typical HTTP response with an XML body. The only caveat is that it must be text/xml.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows