Mapping from Tables to an XML Document





Mapping from Tables to an XML Document

In the previous sections, we discussed conversions from an XML document to relational tables. However, conversions from tables to an XML document are often required when data is originally stored in a database as table records. The discussions in the previous sections are useful for such conversions. For example, if multiple tables represent a certain concept, such as a purchase order, it can be represented in an XML document. It is possible to map an XML document to a table one by one with links that correspond to foreign keys. However, in many cases it causes problems because it is not easy to manage multiple XML documents. If primary keys in the table are not necessary in an XML document, they can be ignored.

1 Nested and Flat Representation

Let's look at how to convert the tables shown in Figure, 11.5 and 11.6 by using the two approaches. The tables store information for purchase orders. This example is more complex than we saw before in Figure and 11.2 so that we can explain the two approaches. The tables show information about addresses, quantities, and products, respectively. The invoice_ID column in Figure is referred to as a foreign key in Figure. The productID column in Figure is also referred to from Figure.

Figure Relational Tables for Purchase Orders (1): PO_TBL

invoice_ID VARCHAR(32)

country VARCHAR(32)

name VARCHAR(128)

street VARCHAR(128)

2001-08-31-12345

US

Alice Smith

123 Maple Street

2001-08-31-12346

US

Bob Miller

1365 Tree Street

Figure Relational Tables for Purchase Orders (2): ITEM_TBL

itemID VARCHAR(32)

productID VARCHAR(128)

qty INTEGER

invoice_ID VARCHAR(32)

000001

ibm0010

2

2001-08-31-12345

000002

ibm0011

1

2001-08-31-12345

000003

ibm0010

8

2001-08-31-12346

000004

ibm0011

10

2001-08-31-12346

Figure Relational Tables for Purchase Orders (3): PRODUCT_TBL

ProductID VARCHAR(32)

Name VARCHAR(128)

Company VARCHAR(64)

ibm0010

ThinkPad X21

IBM

ibm0011

ThinkPad T22

IBM

Let's look at how to convert from the tables to an XML document by using the two methods.

Listing 11.2 shows a list of purchase orders with the nesting method. In this example, product information is represented with the product element, which is a child element of an item element. The structure is quite natural, and it is easy for humans to read. However, it is redundant because the same product information appears in multiple purchase orders. Relationships between invoices and products use many-to-many mapping, so using the nesting method may not be the best solution.

Listing 11.2 Constructed XML document (nesting method), chap11/po2.xml
<?xml version="1.0"?>
<purchaseOrderList
  <invoice invoiceNo="2001-08-31-12345">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
    </shipTo>
    <items>
      <item qty="2">
        <product>
          <name>ThinkPad X21</name>
        </product>
      </item>
      <item qty="1">
        <product>
          <name>ThinkPad T22</name>
        </product>
      </item>
    </items>
  </invoice>
  <invoice invoiceNo="2001-08-31-12346">
    <shipTo country="US">
      <name>Bob Miller</name>
      <street>1365 Tree Street</street>
    </shipTo>
    <items>
      <item qty="8">
        <product>
          <name>ThinkPad X21</name>
        </product>
      </item>
      <item qty="10">
        <product>
          <name>ThinkPad T22</name>
        </product>
      </item>
    </items>
  </invoice>
</purchaseOrderList>

Listing 11.3 shows an XML document using the flat method. Information for each product appears once in the document. Multiple invoices refer to the information by using the ID and IDREF type attributes. The product element has the productID attribute, which is referred to by the productIDRef attribute of an item element. Multiple item elements can share product information. The flat method has a compact representation unlike the nesting approach. However, to get product information for an item, you should write code to relate the ID and IDREF type attributes.

Listing 11.3 Constructed XML document (flat method), chap11/po3.xml
<?xml version="1.0"?>
<purchaseOrderList>
  <invoice invoiceNo="2001-08-31-12345">
    <shipTo country="US">
        <name>Alice Smith</name>
        <street>123 Maple Street</street>
    </shipTo>
    <items>
      <item qty="2" productIDRef="ibm0011"/>
      <item qty="1" productIDRef="ibm0011"/>
    </items>
  </invoice>
  <invoice invoiceNo="2001-08-31-12346">
    <shipTo country="US">
        <name>Bob Miller</name>
        <street>1365 Tree Street</street>
    </shipTo>
    <items>
      <item qty="8" productIDRef="ibm0010"/>
      <item qty="10" productIDRef="ibm0011"/>
    </items>
  </invoice>
  <products>
    <product productID="ibm0010">
      <name>ThinkPad X21</name>
    </product>
    <product productID="ibm0011">
      <name>ThinkPad T22</name>
    </product>
  </products>
</purchaseOrderList>

We introduced the nesting and flat methods for representing data stored in relational tables. You can choose the appropriate approach according to the original table data. The book titled Professional XML Database, by Kevin Williams et al. (Wrox Press, ISBN 1-86003-58-7) shows a set of many useful rules for mapping from tables to an XML document. In that book, the two approaches are called the containment and pointered approaches.

You can convert an XML document to another XML document by using XSLT. In the first edition of this book, we first created an XML document that contained row elements that had multiple col subelements, which is a very naive representation of a table. The XML document was next converted to an HTML document. It is possible to get a non-HTML document—for example, a PDF document.

2 Element versus Attribute Representation

The next design point is how to represent data by using an attribute or an element. For example, we can use an element to represent a product name as follows:

<!-- Represents a product name as an element -->
<product productID="ibm0010">
  <name>ThinkPad X21</name>
</product>

And it is also possible to represent it by using an attribute as follows:

<!-- Represents a product name as an attribute -->
<product productID="ibm0010" name="ThinkPad X21"/>

There have been long discussions on this topic before XML was created. You can see a survey of the discussions, edited by Robin Cover, at http://xml.coverpages.org/elementsAndAttrs.html.

For example, one says we should use an element if the data represents an essential part of a whole concept. Another says it depends on whether the data represents content or markup. Yet another says we should employ the attribute approach, because it reflects the characteristics of tables (for example, a list of attributes is unordered), and it may be efficient.

There is no general answer that is better than another. It should depend on the semantics of the data. Two attributes may be related so that it is natural to represent them in a structured way. Therefore, when you design an XML schema for a given table schema, you should consider multiple viewpoints, such as data modeling, efficiency, and the applications that receive the XML documents.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows