Item 19: Prefer data-driven communication over behavior-driven communication





Item 19: Prefer data-driven communication over behavior-driven communication

Over the years, IPC has steadily evolved from the idea of "shipping data from one process to another" to "making a function (or method) call in another process." While we can explain away the difference as one merely of detail and encapsulation ("Oh, making a remote procedure call is just sending a request message and waiting for a response message, it's no different"), the fact remains that the intent of the communication style drives two very different kinds of communication interaction, one behavior-driven, the other data-driven. In general, you'll want to prefer data-driven communications, particularly when navigating across component boundaries. And, although this mostly applies to Web Services, known there as document-oriented style, it's also desirable in straight Java-centric communications layers.

For most developers, the concept of behavior-driven and data-driven communications is new ground. The difference effectively lies in what you, the developer, intend when you make the network communication take place. In standard object-RPC toolkits like RMI and CORBA, you're invoking a method—that is, you're ordering an object instance to execute a particular method and return. You pass parameters, wait for execution on the remote machine to complete, then harvest the result sent back to you. Pretty straightforward, no?

In a data-driven approach, however, you never "invoke a method" per se but instead simply send a packet of data to a remote resource to do what it will with it. There is no implicit assumption that a response will return—in fact, in many respects, it would be much better all around if there wasn't one, since that frees you to continue processing locally without having to wait for the data to come back across the wire (see Item 20).

To put this into more concrete terms, consider the idea of an online order-processing system, much as Amazon.com might use. The user has already filled out the shopping cart and moved to the checkout page, indicating the order is ready. It's now your responsibility to take this order and its contents and do the usual things to it: verify the credit card number, charge the card, and so on.

One approach is to build a collection of remote objects, invokable from the client tier (in this case, a servlet, but an EJB session bean would look almost identical), and invoke them one at a time as the order dictates:






public void doPost(HttpServletRequest req,

                   HttpServletResponse resp)

  throws ServletException, IOException

{

  // . . . Put in verification and input-validity-checking

  // code here; see Item 61 for details

  //



  HttpSession session = req.getSession(false);

  OrderModel order =

    (OrderModel)session.getAttribute("order");



  Context ctx = new InitialContext();

    // Always do lookup, in order to permit failover;

    // see Item 16



  // Verify credit card

  //

  CreditCardProcessor ccp =

    (CreditCardProcessor)ctx.lookup(...);

  if (ccp.verify(order.getCreditCard().getName(),

                 order.getCreditCard().getNumber(),

                 order.getCreditCard().getExpirationDate()))

  {

    if (ccp.charge(order.getCreditCard().getNumber(),

                   order.getAmount()))

    {

      for (OrderItem oi : order.getItems())

      {

        // Process each item, take it out of stock, whatever

      }

    }

  }

}


Again, whether this code occurs in a servlet, a session bean, or something in between is really irrelevant—the point here is that each step is carefully measured via behavioral actions against beans (whether remote or local) that carry out each step in modular fashion.

In a data-driven approach, however, the sender of the data makes no assumptions about what needs to happen; in fact, the client revels in its relative ignorance. Rather than making any behavioral demands, the client simply packages the data and sends it to a neutral party (usually some kind of storage layer), from which other interested parties retrieve the data for processing. The classic way to do this is via JMS:






public void doPost(HttpServletRequest req,

                   HttpServletResponse resp)

  throws ServletException, IOException

{

  // . . . Put in verification and input-validity-checking

  // code here; see Item 61 for details

  //



  HttpSession session = req.getSession(false);

  OrderModel order =

    (OrderModel)session.getAttribute("order");



  Context ctx = new InitialContext();

    // Always do lookup, in order to permit failover;

    // see Item 16



  // Remain ignorant, just send it on for further processing

  //

  Connection conn = ...; // Where we get this isn't important

                         // now

  try

  {

    Session session =

      conn.createSession(false, Session.AUTO_ACKNOWLEDGE);

    Destination dest = ctx.lookup("jms/NewOrders");

      // Note that there's no hint of what will happen in

      // the queue name

    MessageProducer producer = session.createProducer(dest);

    ObjectMessage msg = session.createObjectMessage();

    msg.setObject(order);

      // JMS ObjectMessage requires Serializable objects; if

      // OrderModel is a JavaBean, it must be Serializable

      //

    producer.send(msg);

  }

  finally

  {

    session.close(); // Aggressively release resources;

                     // see Item 67

  }



  // And we're out, happy and carefree; generate some kind of

  // "Thank you" response page back to the user, probably by

  // forwarding to a JSP

  //

}


Note that in the data-driven example, the client code gives absolutely no hint of what will happen to this order—we're simply letting somebody (we don't even know who) know that a new order has been received. What happens to it from there is not our concern.

It might be tempting to simply classify the two approaches as RPC and messaging, respectively, since behavior-driven systems are most easily modeled using RPC-based toolkits like RMI and CORBA, while data-driven systems fit messaging-oriented systems (such as JMS) like a glove. If it helps, then feel free to use this classification, but keep in mind that it is an oversimplification—it's always possible to build a behavior-driven system using JMS or a data-driven system using RMI, with all the commensurate benefits and drawbacks. It's the style of approach, not the technology on which it's built, that makes the difference.

Given all that, so what? The behavior-driven approach is much easier for developers to understand and build, particularly since it fits in pretty easily with the standard way J2EE applications are built—why on earth should anybody look to avoid this, particularly in favor of something that won't gel as quickly with the way we traditionally think?

Three reasons: evolution, intermediaries, and recoverable operations.

It's a fact that an enterprise system has a "reach" that stretches beyond the team that builds it and the group they built it for—remember the 10 fallacies of enterprise systems discussed in Chapter 1, in particular Fallacy 9: "The system is monolithic." Change is inevitable, and once an API has been published to the world, it can never change. Or, to be more accurate, it can only change when all consumers of that API agree that it can change, which is more or less saying the same thing—getting business partners, departments, and system administrators to all agree on rolling out a new change is about as likely as getting conservative and liberal politicians to agree on...well, anything.

Unfortunately, change must happen, and dealing with it becomes much easier in the data-driven approach than in the behavior-driven approach. For example, credit card charges in the 21st century often require not only the standard 16-digit credit card number but a 3-digit "verification code" printed on the back of the card in small print. Unfortunately, the API designed in our CreditCardProcessor class doesn't expect a 19-digit number, only a 16-digit one, so we'll need to change the class API to allow for a fourth parameter, the verification code.

Under EJB, or any other method-call-based distributed toolkit for that matter, this means changing the interface and regenerating the remote stubs shipped to the client. Fortunately, a certain amount of change can be handled silently—for example, a method to the interface can often be silently added to the remote implementation without breaking previous stubs (RMI supports this)—but it doesn't take much to quickly exhaust the forgiveness of the remoting toolkit. Deleting a parameter, for example, or deleting a method entirely will break existing clients faster than you can say "D'oh!"

If all I'm sending, however, is a data structure with no implicit behavior behind it, adding that extra bit of data is as simple as adding the additional field to the CreditCard class whose instance is stored inside Order. If this is a Serializable object, I can use the facilities of Serialization (see Item 71) to handle the evolution of the data structure as necessary when those breaking changes come.

Or, in fact, I can simply create a new data structure, maybe calling it Order2, or OrderEx, or one of an infinite variety of uninspired names, and the message processors (the programs that pull the message and handle it) can examine the actual type at runtime to figure out what to do next. In fact, nobody ever said that the process pulling these messages out of the JMS Queue has to work alone. While it's convenient sometimes to have just one processor handle these new orders, in the case of a fundamental change (perhaps now certain orders need to be encrypted where before it wasn't necessary), it can sometimes be easier to simply create an all-new processor, even while we leave the old one in place to handle the previous collection of messages, which clients may still be sending us. JMS doesn't care—all it sees, at its heart, is a byte array.

If you don't want to deal with Serialization versioning, you can always choose a more loosely typed approach by not sending actual objects but just Collection objects that in turn contain primitive types. The Map interface works well for this because each key/value pair in the Map can correspond roughly to a field in a traditional class. Yes, you're sacrificing compile-time safety for flexibility, so code defensively and test relentlessly to avoid any production deployment embarrassments.

Last but not least, of course, we have to consider the ultimate flexible data format, XML. Sending the message in an XML-based format permits a great deal of data evolutionary flexibility, both with or without the use of XML Schema. What's more, the use of XML in your data-driven designs allows for a future degree of interoperability that wouldn't be present in the JMS-based approach, although it's not difficult to factor in. (Make sure your XML doesn't make undue assumptions about the platform that will be receiving the XML, however. That means no automatic object-to-XML conversion APIs except in very simple doses; see Items 43 and 22 for details.)

Evolution isn't the only reason for preferring data-driven designs, though it is a definite plus. A data-driven channel can include a number of intermediaries, software or hardware processors that act in some useful and typically invisible manner on behalf of the entire communications process, layering in useful crosscutting concerns like using different transport mechanisms (e.g., SMTP/POP3 or even an instant messenger protocol like Jabber).

In fact, the notion of intermediaries could conceivably be generalized in a larger sense, that of the flexibility gained when the client remains ignorant of the ultimate recipient of the information. For starters, in a message-driven environment, the messaging layer holds the message even while the ultimate recipient is offline. In other words, the client never has to deal with an outage of the code processing the messages—as long as the messaging layer remains active, we can evolve and modify the code behind the Queue as often as we wish without impacting the clients (the order-taking Web site, in this case), since messages will simply accumulate until we're ready to bring the processor back online. (This, of course, assumes that the client isn't blocking while waiting for some kind of response, which may imply that you're using JMS in a behavior-driven manner.)

More importantly, again because of the disconnect between sender and ultimate recipient, we can do a lot of interesting things to this message that wouldn't otherwise be feasible with behavior-driven communications. Hohpe and Woolf [EAI] document a number of useful patterns, such as Message Router ("decoupling individual processing steps so that messages can be passed to different filters depending on a set of conditions," 78), Content-Based Router ("handle a situation in which the implementation of a single logical function is spread across multiple physical systems," 230), and one already alluded to earlier, Normalizer ("process messages that are semantically equivalent but arrive in a different format," 352). In essence, we're creating all kinds of opportunities for hook points (see Item 6) in the system.

But wait—act now, and we'll throw in context-completeness (see Item 18) and a tendency to avoid excessive "chattiness" across the network (see Item 17) for free! Because the data-driven approach tends to prefer simple and complete data sets, rather than Domain Model [Fowler, 116] objects, there's less tendency toward building a system that will create numerous accidental round-trips across the network. (The easiest way to make sure of this, of course, is to ensure that nothing that gets passed across the wire inherits from java.rmi.Remote.)

Unfortunately, data-driven communication is a subtle science; it's a slippery slope at best, an impenetrable pedantic nit at worst. For example, go back to the e-commerce Web site for just a moment. The semantic difference between posting an "Order" message and an "Add Order" message is a very small one. The key difference here is what happens when an existing order gets changed or modified. In the case of the data-driven architecture, the order simply gets placed back into the same queue, and it's assumed that the back-end processors will handle it appropriately, whereas in the behavior-driven case, we'll have to create new methods and code to handle the idea of "modifying" an order, as opposed to placing a new one.

The real payoff in using a data-driven design comes when you need to move to an open-integration model, where clients could be coming in from entirely different platforms. Because your API is now defined in terms of simplistic data structures, rather than complex object models, it makes it easier to move to a doc/literal Web Service model and in turn easier for clients in non-object-oriented languages to adapt.

By the way, just in case you were about ready to chuck the whole idea and go back to your favorite behavior-driven design approach, keep in mind that if you follow J2EE conventional wisdom and use Data Transfer Objects [Fowler, 401] to keep from having to bang on entity beans directly from the client layer, you're effectively using a data-driven communications approach (depending on how your session beans are implemented, of course). It's not that radical of an approach, despite how it may seem on the surface: for example, no law written states that the DTO has to mirror exactly the entity bean it fronts. This gives you the opportunity to decouple the DTO from the entity (or entities, perhaps), gaining some of the advantages discussed earlier.

Using DTOs is a relatively easy way to get started working with data-driven communications, but regardless of whether you take an XML-based approach, a collection-based approach, or one that uses DTOs, the key is to focus on the data you wish to exchange, not on the behaviors you want executed. In a loosely coupled component architecture (see Items 1 and 2), this allows each component to evolve independently of the others without triggering massive recompilation or, worse, massive failure reports the minute the new code goes live.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows