Selecting the XML Format

Selecting the XML Format

Use a Standard or Roll Your Own?

There are always benefits to not reinventing the wheel, and this rule holds true when considering schemas for common business documents. However, though a sad prospect to consider, the wheel you're looking for may not have been invented yet. This is one area where I will offer some somewhat unqualified guidance. If you can find a schema that fits your purposes and has acceptance in your community as a standard, then by all means use it. If you can't find one that fits or if you find more than one and there is no consensus in your trading community or user base about which one is "standard," then you have at least a few different ways you can go. If one or more schemas might work but aren't universally accepted, there are a few criteria you could use to select among them. One criterion might be the number and quality of other programs, stylesheets, sample documents, and documentation available for the schemas. Another might be whether or not you can use the schemas on a royalty-free basis. If you do find an existing schema that is attractive from a number of these perspectives yet doesn't fully accommodate your data, it may still be a good fit for you if you can easily customize or extend it. However, if all else fails, don't have any hesitancy about developing your own schema.

Choosing or creating an XML format is an important decision, but your particular choice is probably not going to be extremely critical. If people don't like the formats you have selected, they can always perform transformations (particularly if you make it easy for them, as we'll discuss later in the chapter). Go back and review the first part of Chapter 10 on XSLT if you like; transformations are something we are going to live with for several years to come, if not forever. The rest of this section deals with general issues regarding designing your formats.

General Document Design Decisions

In Chapter 4 we discussed various issues regarding the design of XML documents that are independent of their schema representation. Here are a few of them again, from the perspective of designing your own documents.

  • Naming conventions: The most pragmatic advice I can give you is to use names that will be familiar to users of your application. If a field is called Purchase Order Number on the screen or a printed report, don't give it an Element name of CustomerOrderIdentifier. Issues such as use of upper and lower case, abbreviations, and word separators are matters of style. Choose one you like, then stick to it once you've chosen.

  • Elements and Attributes: This one again is kind of a religious issue. If you have defined criteria for deciding whether or not an item of data should be an Element or Attribute, and if you can apply those criteria on a consistent basis, then by all means use a mix of Elements and Attributes if you wish. If you can't be consistent, your users will thank you if you stick with Elements only.

  • Structure: Should you go for a relational, nested, or some other type of structure? XML lends itself most to a nested, hierarchical structure for organizing and grouping data. Most people follow this style, but there are some exceptions. Again, your users will thank you if you make a firm design decision and apply it consistently.

  • Reuse of existing logical formats: If you currently provide a facility to import a purchase order in a flat file, using an XML document of the same logical structure may allow you to use similar processing logic when importing an XML purchase order.

Again, this list is not exhaustive. These are some of the major decisions you'll have to make, but I'm sure there will be others. The next issue has more to do with the data that you include in a document than a particular set of design choices.

Providing Identifying Information

This is probably more of a concern for electronic commerce applications than for application integration, but it may be an issue there too depending on the situation. In the EDI world, most EDI management systems rely on one or a few specific fields in an outbound document to look up the EDI-related details about the trading partner. This is usually a customer or vendor number and is used as a key to the trading partner setup in the EDI system.

Although the world of e-commerce using XML is still evolving, things may be slightly different in that world than they have been in the EDI world. There's a tendency among utilities that move XML around to just consider documents as payload and not look inside them. This is unlike EDI management systems that must examine the application data in order to transform it. The strategy you will have to follow will depend on the particular methods or systems used for data transport. The bottom line is that you may have requirements for providing identifying information within a document or by some external means such as specific file names, locations, or key values that are passed in method calls.

Schema Design

Schema design is a very broad and complex topic. As much as I would like to offer you my knowledge and opinions about it, I'm afraid it would qualify for another complete chapter, if not another book. I have to set an appropriate scope somewhere for a book that has already turned out longer than planned, so I'm going to point you elsewhere for details about schema design. There are several good resources listed at the end of the chapter. Beyond that I'll offer only a few general observations.

Despite what some authorities may tell you, no single technique is right for all circumstances. If you're going to design only one or a few schemas, probably any approach that provides the required validation will be adequate. This includes letting an IDE like XMLSPY or TurboXML generate a schema from a representative instance document as I discussed in Chapter 6. On the other end of the spectrum, if you are designing several documents for a fairly large system, you would be well served to take a more disciplined approach. I favor creating type library schemas containing simple and complex types that are reused in other schemas, similar to the approach discussed in Chapter 4 that was considered by X12 and OASIS. However, there are certainly other techniques.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows