Building a System: Babel Blaster

All the utilities in the book were designed so they could be used stand-alone as filters either by themselves or in combination with other filters. However, they were also designed so they could be built into a more comprehensive system. To develop that system the code from the examples has gone into an open source project being conducted under the GNU General Public License at As with any open source project run by volunteers, the requirements, what gets done, and when (or if) are all subject to who gets involved and how much time they can contribute. I'll outline here my proposals for the requirements for versions 1.0 and 1.1 and sketch out a few ideas about the overall architecture. If you want to find out the current status or help with the project, send me an e-mail ([email protected]).

Why "Babel Blaster"?

I am a great fan of the Hitchhiker's Guide to the Galaxy series written by the late Douglas Adams. In that series people were able to understand each of the various galactic languages by inserting a small yellow Babel fish into one of their ears. As this project got started I thought, what a great name for a file format converter! Unfortunately, a few other people also thought it was a good name and have used it for purposes too similar to what I had in mind. I thought about it a bit more and remembered my older brother's great delight in Zaphod Beeblebrox and his famous drink, the Pan Galactic Gargle Blaster. And I thought, that's it: Babel Blaster!

It conveys the idea of blasting away the babble of different document formats. And it has just the right kind of attitude.

Version 1.0 Requirements

I talked in Chapter 1 about functional and nonfunctional requirements. Let's approach Babel Blaster the same way.

Functional Requirements

We discussed enhancements in several chapters. Here is some of the functionality I propose for version 1.0.

  • Support a series of transformations rather than a single transformation step.

  • Support CSV files with multiple record formats and record groups.

  • Add support for other data types, primarily different date formats.

  • Support processing a directory of legacy format files rather than just a single file.

  • Fully support ANSI X12 EDI syntax features up through version 004060.

  • Fully support UN/EDIFACT EDI in both ISO 9735 versions (2 and 4).

  • Track Functional Acknowledgments received for transmitted data and provide other auditing and tracking capabilities.

  • Enable processing of a transmission file that contains several EDI interchanges.

  • Provide a relatively easy way to automatically create file description documents (or the version 1.0 equivalent) specifying the grammar of EDI messages.

Nonfunctional Requirements

Again, nonfunctional requirements of the system relate not to what the system does but to how it does it. These aspects deal with attributes of the system or its use such as cost, performance, and ease of use. For version 1.0 we'll keep the requirements I laid out in Chapter 1 such as maintainability and portability and add a few other important ones.

  • Ensure backward compatibility: Here we're mainly concerned with the file description documents and other components that end users develop to use the stand-alone utilities developed in the book. The system must provide either direct backward compatibility or easy migration utilities.

  • Migrate the C++ version to open DOM API: I used MSXML for the book because I wanted to address DOM programming for people trying to use it with C++. However, MSXML poses certain complexities due to COM. I also believe that if the project is moving to an open source model, it shouldn't rely on APIs that aren't.

  • Improve ease of use: I don't find the utilities particularly hard to use, but I'm sure that improvements could be made. This is almost always a persistent requirement when upgrading a product, and the list seems incomplete without it.

A nongoal for this version is performance improvements.

This seems to me a fairly comprehensive list. Getting all of these features into version 1.0 will depend on having enough volunteers involved. If you like to code this kind of thing, talk to me!

Architectural Overview

Even though I tried very hard to build these utilities using reusable modules, as we start looking at a larger architecture a few things may have to change. As any experienced designer knows, the shape of individual components can change as the overall system in which they are used changes. I'm sure that the lower-level classes will be okay, but the converter classes may need some adjustments. It is prudent to defer major architecture changes until after requirements have been finalized, so I'm not going to try to sketch out even a block diagram at this point. However, we can think about some general characteristics of the architecture.

To facilitate ease of use we can provide users with a single program to invoke. We can further simplify use by letting them specify the name of the input and a single identifier for the type of conversion to perform. This identifier would point to a set of data in a data store that would contain the file description document specification (or equivalent) and other relevant details. With EDI formats as the input, all that might be necessary is a generic identifier for the EDI syntax since EDI interchanges have a lot of identifying information in them.

The main program would retrieve the details from the data store about the first transformation step and execute the appropriate filter. Then, based on other details from the data store regarding succeeding transformation steps, it would invoke the appropriate filters and write the final output. Output location could be determined either by information retrieved from the data store or from a command line argument. Allowing a user-specified system-level command for either the initial or final step would provide a way to invoke file transmission utilities.

The overall architectural style remains pipe and filter, but the individual filters retain an object-oriented design. This means that the main program creates converter objects of the appropriate classes as it needs them.

Trading Partner/Application Information

In Chapter 9 we identified a need for a data store for EDI control information. In the current version of the XMLToX12 utility the data store was used only for generating sequential control numbers for interchanges and functional groups. This XML document could be expanded to include audit entries for interchanges and groups that are transmitted. It could also be used for tracking Functional Acknowledgments received. Because it would be an XML document, creating formatted reports, displaying it as a Web page, or developing simple queries against it would be fairly easy using standard tools.

In addition, we can observe that while the grammars of our various file formats are similar, there are some fundamental differences in processing CSV and flat files versus EDI. For EDI we can conceivably have one generic grammar that could cover all uses of an X12 transaction set or UN/EDIFACT message. However, for populating control segments in outbound interchanges a different set of information is required for each trading partner. It makes sense to set up a data store for EDI trading partner information and to break out the EDI grammars into separate documents. In addition, for ease of use we could set up entries in this data store, keyed off of short identifiers, that could be used to retrieve the file description documents for other legacy formats.

Linking Pipes and Filters

While we're still going to use pipes and filters for the general architectural style, we have at least two different choices for the types of pipe we use. The current utilities use the file system as a pipe. For simplicity, version 1.0 could continue to use the file system and simply fork child processes to execute stand-alone filters as necessary. However, all our conversion utilities are coded to deal with DOM documents. If we're passing them off to something else that can use a DOM document as input, there's no reason to write it out to the disk and read it back in again. This is no problem for the converters and presents only a minor problem for XSLT transformations. It is likely we'll want to set up direct calls to Xalan (Java or C++) from the main program rather than fork a child process.

Version 1.1 Requirements

Functional Requirements

I don't have many ideas at this point, but a couple come to mind.

  • Native data exchange capability: Performing file format conversion is essential to many electronic commerce scenarios. However, it is worthless unless you can also move the data around. Building in a native capability for transmitting and receiving data would save users from having to do their own integration with a suitable utility. The top candidates for file transfer and packaging protocols might be FTP, HTTP, SMTP, S/MIME, ebXML messaging, EDIINT AS1/AS2, or SOAP (more about these in Chapter 13). The primary network protocol that should be supported is IP (for the public Internet). However, there may also be a need in some situations to support other network protocols using dial-up access. Integrating Babel Blaster with one or more existing open source utilities that provide these capabilities could save considerable development effort. However, links to proprietary systems could be provided, or we could certainly investigate building the appropriate functionality from scratch.

  • Conversion of legacy IBM data formats: I put this one off until version 1.1 because I'm not sure of the demand and because of some of the complexity. Converting between EBCDIC and ASCII isn't too bad, but supporting all the numeric formats can be fairly involved.

Nonfunctional Requirements

We have avoided much emphasis on performance because we wanted to focus on maintainability, understandability, and functionality. However, I think if Babel Blaster gets to version 1.1 we might want to look at performance and ease of use.

  • Performance: In several areas the coding and storage techniques might be made more efficient by a few very localized changes. I'm most concerned with byte to string conversions in Java and with all the dynamic creation and destruction of DataCell objects in both implementations. While the design depends on this behavior, we might make it more efficient by statically allocating buffers for the DataCell contents and pointing to them rather than asking the system to reallocate them with each new DataCell object.

  • Ease of use: I'm not quite sure what we can do here yet, but I'm sure there will be something. The book utilities and version 1.0 rely on command line programs and XML IDEs like XMLSPY and TurboXML for everything else. Perhaps a nice graphical user interface for the front end?

As I mentioned earlier, if you'd like to volunteer your ideas and skills for Babel Blaster, feel free to contact me.

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows