How to Use This Book
How you use this book depends very much on what you want out of it. As I said earlier, the book is primarily focused on the requirements of two distinct sets of readers: technical end users and developers. I've attempted to structure the book so that both groups can easily use it. I'll first cover how the chapters are organized, then I'll suggest how someone from either group might want to approach a chapter.
Some chapters (like this one) are mostly general text that might be relevant to the needs of both groups. These are organized in a fashion appropriate to the topic.
All chapters contain a section of references or of resources (sometimes both). These provide information on references cited in text and suggestions for further study. I generally try to be as specific with resource URLs as I can, but in many cases I can give only general pointers. Some Web sites are reorganized so frequently that links to specific internal pages quickly become invalid. In such cases the referenced URL should have either a link to the desired resource or a search facility to help you find it.
The chapters that present utilities are organized slightly differently. Here's the general organization by sections. (Note: This is the general organization; some chapters may not have each of these sections.)
A description of what the utility is supposed to do, including general specifications of the input, processing, output, parameters, and restrictions.
Running the Utility:
A guide to running the utility from the command line, including description of the arguments and options.
Sample Input and Output:
A review of sample input and output to help further illustrate the utility's functionality.
The design of the utility. A high- to mid-level design overview is presented in text, with a more detailed pseudocode description that highlights DOM usage. Chapters 6 through 9, in which more complex utilities are developed, break the design section into two aspects. The first deals with high-level design or overall design considerations, while the second deals with detail design. The detail sections are not intended to be read front to back (unless you happen to need help falling asleep); they are included for reference.
Highlights of the Java implementation of the code, with listings of important code fragments and explanation.
Highlights of the C++ implementation of the code, with listings of important code fragments and explanation.
Enhancements and Alternatives:
Suggestions for how the utility might be enhanced plus discussion of different design and coding approaches.
End users will probably use the utilities chapters differently than developers will, as described in the next section.
Notes for Primary Audiences
Technical End Users
If you're a technical end user, you probably mostly care about what a utility does and how to run it. Therefore, scan the beginning of the chapter, and review the requirements. If the utility provides what you need, look at how to run it. You can skip the rest of the chapter.
If you're a developer, you probably care less about what a utility does than about how it does it. Scan the beginning of the chapter, review the requirements, and study the design. Then look at the Java or C++ implementation, depending on your language choice. You'll probably also find the enhancement suggestions interesting. You can find out how to download the source code by referring to the book's Web site, and you can get more information on related topics from the Resources section.
Other than those specific suggestions, I'll offer only this: This book is meant to be a tool kit. I don't expect anyone to read it cover to cover. Use what you need, scan what looks useful for the future, and ignore the rest.
To aid you in exploring this tool kit, here is a chapter-by-chapter guide to what you'll find in it.
Chapter 1, Introduction:
You're reading it.
Chapter 2, Converting XML to CSV:
This chapter presents a very basic approach for converting an XML document to a CSV format file. I start with this topic for two reasons: (1) CSV is still the most common universal format used by desktop applications, and (2) reading an XML document is a bit easier than writing one, so we'll start with the easy stuff first. This chapter covers the basic techniques for reading and parsing XML files using the DOM, so it is a foundation chapter for developers. The example file used is a generic address book listing, since it is most amenable to the type of processing performed by this utility.
Chapter 3, Converting CSV to XML:
This is the converse of the preceding chapter. Here we take a CSV file as input and convert it to an XML document. It is also a foundation chapter for developers since it deals with the basic techniques for creating an XML document in memory using the DOM and writing it out to disk. I again use the generic address book listing as an example.
Chapter 4, Learning to Read XML Schemas:
This book is intended primarily for people who are going to be reading schemas, not writing them (or, at most, writing fairly simple schemas). The W3C XML Schema Recommendation is pretty obtuse, and there are some big, fat (and good) books out now that cover it in gory detail. (My Web site, http://www.rawlinsecconsulting.com, has recommendations for a few.) I cover here the basics of what you need to know, going over the essential features and a few different approaches to schema design that you're likely to encounter. This chapter is important because many of the remaining chapters use schemas.
Chapter 5, Validating against Schemas:
In this chapter the utilities developed in Chapters 2 and 3 are modified to support validating an XML instance document against a schema on input and validating it against the schema before writing it out. Various examples of validation failures are presented with the types of error messages that the different APIs offer.
Chapter 6, Refining the Design:
The utilities presented in Chapters 2 and 3 are fairly basic in functionality and simple in design and coding. The other utilities developed in the book have more sophisticated functionality and therefore require more complex designs and coding. This chapter lays the foundation for these utilities by discussing several issues related to processing XML documents and non-XML grammars and presenting the base classes used by the utilities developed in Chapters 7, 8, and 9.
Chapter 7, Converting CSV Files to and from XML, Revisited:
This chapter builds on the foundation of Chapter 6 and develops CSV conversion utilities that are more capable than those developed in Chapters 2 and 3. For example, XML works best when a disk file contains a single XML business document. However, most batch imports and extracts in CSV format bundle many separate business documents into one disk file. These utilities combine many XML documents into one CSV file and split one CSV file into many XML documents. The concept of driving the conversion with a parameter file, coded as an XML document, is presented in this chapter. For the XML to CSV conversion, a purchase order is used as the sample document since small and medium-sized businesses commonly receive such documents from larger customers. The CSV to XML conversion uses an invoice as the sample document. In this fashion we deal with the two most basic documents used in the procurement cycle.
Chapter 8, Converting Flat Files to and from XML:
Most common CSV files have a uniform format. Every row is in the same format. Flat files are not so simple. Each document usually has at least a header record, at least one type of detail record, and often a trailer or summary record. They may have many, many more types of records. They therefore require more complicated processing than CSV files. As with our CSV utilities, an XML file describes the structure of the flat file. We'll again use a purchase order as the sample document for converting from XML and an invoice as the sample document for converting from a flat file.
Chapter 9, Converting EDI to and from XML:
This chapter presents the most complex type of conversion we'll cover, between XML and EDI formats. The grammar of EDI syntaxes is analyzed, and algorithms appropriate for processing EDI are presented. This chapter also covers other issues around EDI, such as processing Functional Acknowledgments and handling control numbers, and the preliminary functionality needed to support these requirements.
Chapter 10, Converting from One XML Format to Another with XSLT:
This chapter presents the basics you need to know about using XSLT to transform a document from one XML format to another XML format. It covers the most commonly used features as well as a few things to watch out for. This chapter is primarily targeted for technical end users, though developers need to be aware of what XSLT can do so they can design around it.
Chapter 11, Using the Conversion Techniques Together:
This chapter presents a few use cases and simple script examples for using the utilities together to solve some common conversion problems. It also presents the requirements and high-level design for the initial version of the Babel Blaster open source EDI/EC/EAI file conversion program. It builds on the utilities presented in the previous chapters to develop a comprehensive system for solving many types of file conversion problems.
Chapter 12, Building XML Support into a Business Application:
The preceding chapters presented various techniques for reading and writing XML documents. However, other issues need to be addressed before deciding on an overall approach to XML-enabling an application. This chapter discusses issues such as integrating with existing code in the least disruptive manner, selecting data for import or export, deciding whether and what to validate, choosing design options other than the DOM, and processing data in other formats (such as relational databases).
Chapter 13, Security, Transport, Packaging, and Other Issues:
Having the data in XML isn't enough. You also have to be able to get it to and from your trading partners. They may want to receive and send the data over a private Value Added Network (VAN) if they are using EDI, but more than likely they will want to use the public Internet. To do that you'll need ways to package several XML documents into one bundle (or unpack them) and handle security. This chapter addresses these other B2B considerations. I'll present a basic approach for assessing your real needs in these areas and suggest some practical strategies for supporting those needs without getting too much over your head into bleeding-edge technology.
To help make it clear what you're looking at, I adopted some formatting conventions in this book as shown below.
Fragments of source code in Java, C++, pseudocode, and various types of XML syntax:
Source Code Program
// This is a C++ program
Single lines of code:
myDoc = new DocBuilder;
Fragments of non-XML files:
Doe,John,12 Lee Street,Boston,MA,01303
Command line program execution:
java MyProgram input output -a argument
The first time an important term is used it appears in bold font, for example, pipe and filter. The first time an acronym is used it appears in italics, for example, W3C, accompanied by the full name set in regular font. In some cases, important acronyms, for example, XSLT, first appear in both bold and italic.
Several of the terms used in this book have different meanings, depending on the context. For example, we can talk about an element's attributes in an XML instance document, or when discussing the DOM we can talk about the attributes (or properties) defined for any of its interfaces, including a DOM element. To help keep things straight, whenever I refer to a named DOM entity I will capitalize the term. In addition, Elements and Attributes discussed in the context of instance documents will always be capitalized.