Item 42: Use procedural-first persistence to create an encapsulation layer





Item 42: Use procedural-first persistence to create an encapsulation layer

Prior to the object-oriented takeover of the programming language community, prior to the almost universal sweep of data storage options in the late 1970s and early 1980s that made relational databases ubiquitous, another approach quickly made its way through the ranks of IT.

Instead of directly accessing the data storage layer through SQL, middleware products (TP Monitors and the like) provided well-known entry point routines, what we would call procedures today. These procedures accepted a set of input (parameters) and guaranteed that they would store the data to the system, returning some kind of result indicating success or failure of the action. The details of the actual storage schema and mechanism were hidden behind these opaque walls of the procedure's entry point—so long as the middleware layer ensured that the data was stored appropriately (and could be retrieved, usually via another procedure), no further knowledge was necessary. Later, these procedures were extended to also provide certain kinds of processing, such as providing transactional semantics as a way to guarantee that the entire procedure could execute with ACID properties in place. Even later this processing was made available directly within the database, via what we now call stored procedures.

In the beginning, the goal was to provide an encapsulation layer protecting the raw data from programs manipulating it, which eventually set the stage for transactional processing, and later, the database management system itself. In many respects, SQL was an outgrowth of this idea, providing a declarative language that allowed operators to simply say what data they were interested in, based on a set of constraints expressed as part of the SQL statement. This approach, aside from being extremely powerful in the right hands, also proved difficult for programmers familiar with procedural (and later, object-oriented) languages to understand.

Like other declarative languages (Prolog and the more modern XSLT), SQL requires that users state only what data they're interested in, not how to approach obtaining that data. This fits well with the overall "flow" of the relational model but feels awkward to procedural programmers first learning SQL—they keep wanting to "start with this table over here, then go get this data element over here based on . . . ," and so on. Declarative and procedural languages are two very different beasts (just as objects and relations are). This has prompted relational experts like Joe Celko to state, "The single biggest challenge to learning SQL programming is unlearning procedural programming" [Henderson03, 1].

Rather than adopt one of the other models, then, what works best in some cases is to simply bury the details of persistence behind another layer of software, such that the messy details of doing the persistence can be hidden away from the programmer's view—true encapsulation. Doing this can be as simple as the following code:






public class DataManager

{

  private DataManager()

  { /* Prevent accidental creation */ }



  public Person findPersonByLastName(String lastName)

  {

    // Open a Connection, create a Statement, execute

    // SELECT * FROM person p WHERE

    // p.last_name = (lastName)

    // Extract data, put into new Person instance,

    // return that

  }

  public void updatePerson(Person p)

  {

    // Open a Connection, create a Statement, execute

    // UPDATE person p SET . . .

  }

}


Note that the DataManager is a Singleton but runs locally in the client's virtual machine; it relies on the database access layer (in this case, JDBC) to protect it from any concurrency concerns in order to avoid being a Remote Singleton (see Item 5).

While this makes the DataManager a relatively large class to work with, it does have a number of advantages that make it an attractive option.

  • Data access experts can tune the data access logic (which we'll assume is SQL) as necessary to achieve optimal results.

  • If necessary, changing data storage systems means changing this one class. If we want to change from a relational back end to an OODBMS, client code doesn't have to change to do the persistence work (even if it is potentially unnecessary, depending on the OODBMS in use and its underlying persistence semantics).

  • Since we know that all the persistence logic for this request is bracketed within this function, clients don't have to worry about transactional semantics or transactions being held open any longer than necessary; this is important to minimize lock windows (see Item 29).

  • We can use whatever data access APIs make the most sense for implementing the methods in this one class without changing any client code. If we start by using JDBC and find that JDBC is too low-level to use effectively (not an uncommon opinion), we can switch to SQL/J for this one class only, and clients are entirely ignorant of the change. If we find that SQL/J is easier to work with but doesn't generate optimal JDBC-based access, we can switch back with impunity.

  • We could keep back-end portability by making DataManager an interface and creating DataManager derivatives that specialize in doing storage to one or more different back-end storage layers—an OracleDataManager, an HSQLDataManager (for in-process relational data storage), a generic JDBCDataManager (for generic nonoptimized JDBC-SQL access), and so on. This allows us to tune for a particular back end without sacrificing overall system portability, but at the expense of a lot of work to build each one, including maintenance when the DataManager interface needs to change.

  • New access requires modifying the centralized class. This may seem like a disadvantage at first—after all, who wants to go back and modify code once it's "done"?—but the fact that somebody wants a new query or update gives the DataManager developer an opportunity to revisit the data access code and/or the actual storage of the data. We can use this to add a few optimizations based on the new usage, such as creating new indexes in the relational model. It also serves as a sanity check to prevent looking up data by invalid criteria ("Well, since everybody knows that only men are programmers, I was going to do the programmer lookup by adding criteria to restrict by gender=male; that should speed up the query, right?").

In essence, the DataManager class serves as our sole point of coupling between the data storage tier and the Java code that wants to access it.

Procedural access isn't always done at the Java language level, however. Most commercial databases offer the ability to execute procedural code directly within the database itself, a stored procedure in DBMS nomenclature. While all the major database vendors offer vendor-specific languages for writing stored procedures, many are starting to support the idea of writing Java stored procedures (an unfortunate overload of the acronym JSP), meaning Java programmers don't have to learn another language. However, one of the brightest advantages of using the stored-procedure-as-persistence model is that we can pass the implementation of the stored procedure to the database staff, leaving the entire data model in their hands to be designed, implemented, and tuned as necessary. As long as the definitions of the stored procedures don't change (we'll get runtime SQLException instances if they do) and the stored procedure implementations do what we expect them to (like store or retrieve the data), we can remain entirely ignorant of the underlying data model—which, if you're as good at relational design and execution as I am (that is to say, not at all), is a good thing.

By the way, if you're thinking that the procedural model is a horrible idea, and you would never consider using anything like it, stop and have another look at the Session Façade [Alur/Crupi/Malks, 341], Domain Store [Alur/Crupi/Malks, 516], and other data access patterns. In order to avoid round-trips to the database, recall that current EJB "best practice" thinking has clients calling session beans, passing either Data Transfer Objects [Fowler, 401] or other data in bulk (see Item 23) as parameters to the session bean method, to be extracted and inserted into the local entity bean for storage in the database. This is no different than the DataManager presented earlier, except that now DataManager is a session bean. The same will be true of just about any Web service–based data access model.


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows