Network Data Representation





Network Data Representation

One of the two big breakthroughs that enabled the Web was HTML. HTML was an open and standards-based data-formatting language that could be used to represent the data in a document. It was not a binary format but a text-based format based on the concept of markup "tags" that were inserted into the content to provide formatting. This had been done before. The Word file format is a binary form of formatting that holds both the content and the information required to format it. It, however, is not open nor standards based. Microsoft created it and controls it.

Perhaps more important is that it is binary. The barrier to entry for a binary format is that the user typically must create a program just to read or write the format. But with a text-based format such as HTML, anything that can create an ASCII text file can create and/or read the source of the format. The agreed upon format is to use ASCII or Unicode, which is a common standard, and to build on that by including inline markup tags.

How can this extend to the services model? HTML isn't a good fit because its primary mission is to control the formatting of content. Machines rarely care that a particular word is displayed in pink or blue. They are more concerned that the word itself is "pink" and what that might mean in a certain context. The idea, however, of using ASCII as a standard representation and then adding markup to create structure is a concept that can be generalized—and indeed has been—for something called eXtensible Markup Language (XML). XML is about the meaning of the document's content, as opposed to how the content is displayed.

Let's take a look at an example. I am going to express the same thing, an invoice, two ways. First off, let's look at a screenshot of the invoice. Figure shows what the invoice would look like in the browser.

Figure. An invoice page in Internet Explorer.

graphics/06fig01.jpg

This is what I would see as a human being browsing this Web page. What would I see if I were a computer browsing this page? I would see the underlying HTML markup. The same page in this format is shown in Listing 6.1.

Listing 6.1 The Same Invoice from Figure Seen from a Computer's Perspective
<html>
<head>
<title>Deep Training Invoice</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<table cellSpacing="0" cellPadding="0" width="640" border="0" align="center">
  <tr>
    <td ALIGN="RIGHT">
      <table cellSpacing="1" cellPadding="1" width=640 border="0"
           align="center">
        <tr>
          <td><font face=Arial size=6><b>Deep Training</b></font></td>
        </tr>
        <tr>
          <td><font face=Arial size=3><b>123 Microsoft Way</b></font></td>
        </tr>
        <tr>
          <td><font face=Arial size=3><b>Redmond, WA 98052</b></font></td>
        </tr>
        <tr>
          <td><font face=Arial size=3><b>1-888-555-1212</b></font></td>
        </tr>
        <tr>
          <td></td>
        </tr>
      </table>
    </td>
  </tr>
  <tr>
    <td ALIGN="center">
      <table cellSpacing="1" cellPadding="1" width="99%" border="0"
           align="center">
        <tr>
          <td ALIGN="RIGHT"> <font face="Verdana" size="5">
               <b>Invoice: 159297</b></font>
          </td>
        </tr>
        <tr>
          <td align="right">
               <font face="Verdana" size="1">Date : 7/27/2001 </font>
          </td>
        </tr>
        <tr>
          <td align="right">
               <font face="Verdana" size="1">ACCOUNT : 20440 </font>
          </td>
        </tr>
      </table>
      <table cellSpacing="1" cellPadding="1" width="99%" border="0"
           align="center">
        <tr>
          <td> <BR>
          </td>
        </tr>
        <tr>
          <td>
            <table cellSpacing="1" cellPadding="1" width="99%" border="0"
                 align="center">
              <tr>
                <td align="left"> <font face="Verdana" size="-2">
                     <b>VERGENT SOFTWARE&nbsp;&nbsp;
                  </b></font></td>
                <td>
                    <BR><BR>
<BR>
                </td>
              </tr>
              <tr>
                <td align="left"> <font face="Verdana" size="-2"> <b> BILL TO
                  </b></font><BR>
                  <BR>
                </td>
                <td>
                     <font face="Verdana" size="-2"> <b> SHIP TO </b></font>
                      <BR>
                </td>
              </tr>
              <tr>
                <td><font face="Verdana" size="-2">234 Microsoft Way<BR>
                  REDMOND, WA 98053 <br>
                  </font></td>
                <td><font face="Verdana" size="-2">234 Microsoft Way<BR>
                  REDMOND, WA 98053 <br>
                  </font></td>
              </tr>
            </table>
          </td>
        </tr>
      </table>
      <BR>
      <BR>
      <table cellSpacing="1" cellPadding="1" width="99%" border="1"
           align="center">
        <tr align="center">
          <td>SHIP VIA</td>
          <td>PO</td>
          <td>SALES PERSON</td>
        </tr>
        <tr align="center" BGCOLOR="#c5c5c5">
          <td>
              <font face="Verdana" size="-2"> UPS BLUE (2 days)(23.51) </font>
          </td>
          <td><font face="Verdana" size="-2"> &nbsp; </font></td>
          <td><font face="Verdana" size="-2"> WEB &nbsp;</font> </td>
        </tr>
      </table>
      <BR>
      <table cellSpacing="1" cellPadding="1" width="99%" align="center"
           border="1" bgcolor="#eeeeee">
        <tr>
          <td align="center">COURSE</td>
          <td align="center">DESCRIPTION</td>
          <td align="center">QTY</td>
          <td align="center">PRICE</td>
          <td align="center">TOTAL</td>
        </tr>
        <tr BGCOLOR="#c5c5c5">
          <td align="left">
               <font face="Verdana" size="-2">DEEPASPNY</font>
         </td>
          <td align="left">
               <font face="Verdana" size="-2">DeepASP.NET Mini Camp</font>
          </td>
          <td align="middle">
               <font face="Verdana" size="-2">1&nbsp;</font>
          </td>
          <td align="right">
               <font face="Verdana" size="-2">399.00&nbsp;</font>
          </td>
          <td align="right">
               <font face="Verdana" size="-2">399.00&nbsp; </font>
          </td>
        </tr>
      </table>
      <table cellSpacing="1" cellPadding="1" width="99%" align="center"
            border="0">
        <tr>
          <td align="LEFT"> </td>
          <td align="right"> <font face="Verdana" size="-2"><b>SUB TOTAL :</b>
            </font> </td>
          <td align="right">
               <font face="Verdana" size="-2">$399.00 </font>
          </td>
        </tr>
        <tr>
          <td align="LEFT"> </td>
          <td align="right">
               <font face="Verdana" size="-2"><b>(Non Taxable)OTHER
               CHARGES :</b> </font>
          </td>
          <td align="right"> <font face="Verdana" size="-2">$0.00 </font></td>
        </tr>
        <tr>
          <td align="LEFT"> </td>
          <td align="right"> <font face="Verdana" size="-2"><b>DISCOUNT :</b>
            </font> </td>
          <td align="right"> <font face="Verdana" size="-2">$0.00</font> </td>
        </tr>
        <tr>
          <td align="LEFT"> </td>
          <td align="right"> <font face="Verdana" size="-2">
               <b>FREIGHT :</b> </font>
          </td>
          <td align="right"> <font face="Verdana" size="-2">$0.00 </font></td>
        </tr>
        <tr>
          <td align="LEFT"> </td>
          <td align="right">
               <font face="Verdana" size="-2"><b>TAX :</b> </font>
          </td>
          <td align="right"> <font face="Verdana" size="-2">$0.00 </font></td>
        </tr>
        <tr>
          <td align="LEFT"> </td>
          <td align="right">
               <font face="Verdana" size="-2"><b>TOTAL :</b> </font>
          </td>
          <td align="right">
               <font face="Verdana" size="-2">$399.00</font>
          </td>
        </tr>
        <tr>
          <td align="LEFT"> </td>
          <td align="right"> <font face="Verdana" size="-2"><b>PAYMENTS :</b>
            </font> </td>
          <td align="right">
               <font face="Verdana" size="-2">$399.00 </font>
          </td>
        </tr>
        <tr>
          <td align="LEFT"> </td>
          <td align="right">
               <font face="Verdana" size="-2"><b>BALANCE :</b></font>
          </td>
          <td align="right">
               <font face="Verdana" size="-2">$0.00 </font>
          </td>
        </tr>
        <tr>
          <td align="LEFT">
               <font face="Verdana" size="-2"><B>Notes:</B><BR>
               AUTH#=027731 CC#=41XX-XXXX-XXXX-1302 </font>
          </td>
        </tr>
      </table>
      <TABLE cellSpacing=1 cellPadding=2 border=1 width="80%"
           align="left">
        <TR>
          <TD ALIGN=MIDDLE>TYPE PAYMENT</TD>
          <TD align=middle>DATE</TD>
          <TD ALIGN=MIDDLE>CREDIT CARD # / CHECK #</TD>
          <TD ALIGN=MIDDLE>AMOUNT</TD>
        </TR>
        <tr bgcolor="#eeeecc">
          <td><font face="Verdana" size="-2"> CREDITCARD </font></td>
          <td> 7/27/2001 </td>
          <td>
               <font face="Verdana" size="-2"> VISA 41XX-XXXX-XXXX-1302</font>
          </td>
          <td align=right><font face="Verdana" size="-2"> $399.00 </Font></td>
        </tr>
      </TABLE>
         </td>
         </tr>
</TABLE>
<TABLE cellSpacing=1 cellPadding=1 width="75%" align=center border=1
     bgcolor="#eeeeee">
  <TR>
    <TD ALIGN="CENTER">
         <font face="Verdana" size="-2"><B>TRACKING NUMBER INFORMATION
         / SENT FROM HEADQUARTERS</B></font>
    </TD>
  </TR>
  <TR>
    <TD ALIGN="CENTER"> <font face="Verdana" size="-2"> <b>UPS</b> </font><A
HREF=http://wwwapps.ups.com/tracking/tracking.cgi?tracknum=1Z2622413545750957
       target=new>
       <B><FONT size=2>1Z2622413545750957</FONT></b></A>
    </TD>
  </TR>
</TABLE>
</body>
</html>

Look at this HTML. Without the visual formatting, it is no longer nearly as easy to pick out the various pieces. How would you find the total or the authorization code? From a machine's perspective, this is mainly gobbledygook. I could say that the total is always going to come after a text string "TOTAL :</b> </font></td><td align="right"> <font face="Verdana" size="-2">". But what happens when the developer of the page decides that the total should be shown in Helvetica? The string I am matching no longer works and my code breaks.

How can this be extended to a services model? To create a system whereby computers communicate without human intervention, HTML isn't going to cut it. It requires something that is more concerned with representing the data in a meaningful manner instead of making it look pretty. This is where XML comes in. Let's look at a representation of the same invoice in XML. Listing 6.2 shows one way to do it. XML is explained more thoroughly in Chapter 10, "Using XML."

Listing 6.2 A Representation of the Invoice in Listing 6.1 in XML
<?xml version="1.0" encoding="utf-8" ?>
<invoice number="159297" date="7272001">
    <account>20440</account>
    <company>Vergent Software</company>
    <billto>
        <address>234 Microsoft Way</address>
        <city>Redmond</city>
        <state>WA</state>
        <zip>98053</zip>
    </billto>
    <shipto>
        <address>234 Microsoft Way</address>
        <city>Redmond</city>
        <state>WA</state>
        <zip>98053</zip>
    </shipto>
    <shipvia>
        <transport>UPS Blue</transport>
        <days>2</days>
        <cost>9.00</cost>
        <tracking>1Z2622413545750957</tracking>
    </shipvia>
    <salesperson>web</salesperson>
    <items>
        <item sku="DEEPASPNY">
            <description>DeepASP.NET Mini Camp</description>
            <qty>1</qty>
            <price>399.00</price>
        </item>
        <item sku="ASPBOOK">
            <description>ASP.NET Book</description>
            <qty>1</qty>
            <price>49.95</price>
        </item>
    </items>
    <subtotal>448.95</subtotal>
    <shipping>9.00</shipping>
    <tax>0.00</tax>
    <total>457.95</total>
    <payments>457.95</payments>
    <balance>0.00</balance>
    <paymenttype>CREDITCARD</paymenttype>
    <creditcard>
        <type>VISA</type>
        <number>43XX-XXXX-XXXX-1302</number>
        <auth>027731</auth>
        <date>07-27-2001</date>
        <amount>457.95</amount>
    </creditcard>
</invoice>

Now is it clear where the total for this invoice is? It is enclosed by the <total> and </total> tags. These are tags totally unrelated to the display of the information. Their only purpose is to define where to look in the document to find the total. This makes them great candidates for string matching to pick apart the document in a standard way.

Location

How do I define the location or endpoint of a page on the World Wide Web? The Web popularized the concept of a URL, or uniform resource locator. You have seen these. They are strings such as http://www.deeptraining.com/default.aspx. The URL in the preceding example is made up of several parts. A syntax-style definition of a URL is as follows:

<protocol> "://" <host> [":" <port>] [<path> ["?" <query>]]

The first part identifies the protocol. The HTTP at the beginning of the earlier example means that when accessing this URL, you should use the Hypertext Transfer Protocol. Another valid protocol identifier for most browsers is FTP, or File Transfer Protocol. Internet Explorer accepts either

file://c:\temp\invoice.htm

or

ftp://localhost/temp/invoice.htm

The second part identifies the host that contains the resource. This is permitted to contain an IP address, but in most cases, it will contain a hostname.domain.network combo such as www.deeptraining.com. The third part is an optional port designation. If not specified, the default convention is to use port 80 for all HTTP traffic. By specifying a port, you can potentially host more than one Web server on a single IP address. This is frequently used by network address translation (NAT)-based firewalls to direct incoming traffic to Web servers behind the firewall. The fourth part is one of the more important parts. It indicates the path to the resource. This is a standard path of the form /temp/invoice.htm. Note the forward slashes used in the path. The HTTP protocol was invented in the Unix world in which path delimiters are forward-slash characters, in contrast to the backslash characters used in the DOS/Windows world. The last part is optional information that varies for a particular path. You have seen this when you go to a search page. You type in what you are interested in and a page is displayed with a URL like

http://www.deeptraining.com/searchresults.aspx?Query=ASP.

The ?Query=ASP part on the end is a query parameter used to pass additional information to the search results page.

The combination of all these parts represents a unique endpoint in the scheme of the universe. In addition, it is an endpoint that even my 8-year-old daughter can attribute some meaning to, given the ubiquity of Web usage in today's Internet-savvy world.

In a world where I want to make services available, URLs are useful to uniquely identify the location of my service. I can also potentially use the query parameters portion of the URL to optionally pass information to my service.

Advertisement

How do you find information on the wildly popular Ichiro Suzuki bobblehead doll? If you are like most people today, you fire up a Web browser and look it up. But how do you find the information? Your first try is probably to go to www.ichirosuzuki.com or perhaps even www.seattlemariners.com. If that didn't have the information you were looking for, what is the next step? You can head to a search engine such as www.google.com and type in "Ichiro Bobblehead." In no time at all, Google will spit back dozens of matches for Web sites that have information on the latest craze to hit Safeco field.

Let's translate this to the idea of services. I have a great Web site that I built recently to sell some of those Ichiro bobblehead dolls. When billing the customers an exorbitant amount, I want to make sure that I also charge a sufficient amount for shipping. It would make sense that given the shipping address I need to send the doll to, I want to calculate how much it is going to cost to ship it. I want to utilize a Web service to do this in real-time. I know I am going to be shipping the dolls to eagerly waiting customers using United Parcel Services (UPS) and need to find a service that calculates UPS 2-day rates.

My first guess is to go to www.ups.com, but I quickly determine that they don't yet offer UPS ground rate calculation as a Web service. How can I find out who else might? This is where a search engine analogous to Google would be valuable. As it turns out, several vendors are building directories of services that allow a developer to query them and discover trading partners that offer the services they are interested in. These directories provide a standard interface—Universal Description, Discovery, and Integration (UDDI)—for the categorization of services, companies, and the schemas they use. They are accessible via a Web-based interface for you to initially find the services that will fulfill your needs. The UDDI directories also expose themselves using XML Web services so that your applications can dynamically use them also.

After I have a reference to a server, I also need to be able to determine what services that particular server exposes to the outside world. This browsing of services is facilitated by placing an XML file, called a DISCO file, in the root of the Web server. DISCO stands for Discovery, and this XML file provides links to all the XML Web services exposed on that server.


 Python   SQL   Java   php   Perl 
 game development   web development   internet   *nix   graphics   hardware 
 telecommunications   C++ 
 Flash   Active Directory   Windows