Item 41: Use relational-first persistence to expose the power of the relational model





Item 41: Use relational-first persistence to expose the power of the relational model

For years (if not decades), the undisputed king of the data storage tier has been the relational database; despite a valiant run by OODBMSs in the mid-1990s, the relational database remains in firm control of enterprise data, and relational vendors appear unwilling to part with that position any time soon. We developers want to continue to use object-oriented languages, since we've found them to be most convenient for us to solve our problems, but businesses want to continue their investments in relational databases. So it seems natural that we would want to "hide" the messy details of persisting object data to the relational database. Unfortunately, therein lies a problem: objects and relations don't mix well. The difficulty in achieving a good mapping between these two technologies even has its own name, the object-relational impedance mismatch.

Part of the problem of working with relational access technologies (like JDBC) from an object-oriented language is simply that the two technology bases view the world in very different ways. An object-oriented language wants to work with objects, which have attributes (fields) and behaviors (methods), whereas a relational technology sees the world as tuples, collections of data items grouped into a logical "thing"—what Date referred to as "relations" [Date]. In essence, a relational model deals with collections of relations, which we commonly refer to as tables; each relation is a row, each item in the tuple is a column, and an entire language built around the idea of manipulating data in this relational format provides access.

While a full treatise on the problems inherent in an object-relational mapping layer is well beyond the scope of this book, a brief look at just one of the problems involved may help explain why object-relational mapping is such a common problem in J2EE systems. Consider, for a moment, a simple domain object model:






public class Person

{

  private String firstName;

  private String lastName;

  private int age;



  // . . .

}



public class Employee extends Person

{

  private long employeeID;

  private float monthlySalary;

}


This is probably the world's simplest domain model, but how should we persist this out to a relational database?

One approach is to create two tables, PERSON and EMPLOYEE, and use a foreign-key relationship to tie rows from one to the other. This requires a join between these two tables every time we want to work with a given Employee, which requires greater work on the part of the database on every query and modification to the data. We could store both Person and Employee data into a single EMPLOYEE table, but then when we create Student (extending Person) and want to find all Person objects whose last name is Smith, we'll have to search both STUDENT and EMPLOYEE tables, neither of which at a relational level have anything to do with one another. And if this inheritance layer gets any deeper, we're just compounding the problem even further, almost exponentially.

As if this weren't enough, more frequently than not, the enterprise developer doesn't have control over the database schema—it's already in use, either by legacy systems or other J2EE systems, or the schema has been laid down by developers in other groups. So even if we wanted to build a table structure to elegantly match the object model we built, we can't arbitrarily change the schema definitions.

From an entirely different angle, there may be other, far more practical reasons to abandon an objects-first approach. Perhaps no object-relational mapping product on the planet can deal with the relational database schema you inherited as part of your project, or you're simply more comfortable working with the relational model and SQL than an object model (although that would likely imply you started life as a database administrator and later became a Java programmer).

For these reasons and more, frequently it's easier to take a relational view of your data and embrace that fully by not hiding the relational access behind some other kind of encapsulatory wall, be it object-, procedural-, or hierarchical-oriented.

To understand what I mean by the idea of taking a relational-first approach, we need to take a step back for a moment and revisit what exactly the relational approach itself is. According to Chris Date, who along with E. F. Codd is considered to be one of the great fathers of the relational model, "relational systems are based on a formal foundation, or theory, called the relational model of data" [Date, 38, emphasis added]. For mathematicians, a relational model is based on set theory and predicate logic; for the rest of us, however, in simple terms, the relational model of data is seen as nothing but tables. Accessing data yields nothing but tables, and the operators (SQL) for manipulating that data produce tables from tables.

While this may seem like a pretty redundant discussion—after all, it takes about thirty seconds of looking at a relational database to figure out that it's all about the tables—it's the "relation" in the relational model that provides much of the power. Because the end product of a relational data access (SQL statement) is a table, which is in turn the source of a relational data operator such as JOIN, SELECT, and so on, relational data access achieves what Date calls closure: results of one access can serve as the input to another. This gives us the ability to write nested expressions: expressions in which the operands themselves are represented by general expressions, instead of just by table names. This is where much of the power of SQL comes from, although we tend not to use it that much (typically because we keep trying to go for objects-first persistence approaches, and SQL nested expressions don't fit so well in an object-relational mapping layer).

Why is this such a big deal? Because SQL is a powerful language for accessing data out of a relational database, and thanks to the fact that everything produced by a SQL query is a table, we can have a single API for extracting however much, or however little, data from any particular query we need. We don't have the "smaller than an object" problem raised in Item 40 because everything comes back as a table, even if it's a table one column wide. We do face the problem that the relational model frequently won't match the object model programmers would rather work with, but we can address that in a bit. Let's first look at making the relational access itself easier.

Before you shrink in horror at the thought of being condemned to low-level JDBC access for the rest of your life, take a deep breath—a relational-first approach doesn't mean abandoning anything at a higher level than JDBC. Far from it, in fact. We can use several mechanisms to make relational access much easier from Java than just raw JDBC (which still remains as an option in many cases, despite its relatively low-level nature).

First, there's more to JDBC than just Connection, Statement, and ResultSet objects. RowSet, and in particular Sun's CachedRowSet implementation, frequently makes it much easier to work with JDBC by encapsulating the act of issuing the query and harvesting the results. So, assuming you're without a JDBC DataSource, issuing a query can be as easy as this:






RowSet rs = new WebRowSet();

    // Or use another RowSet implementation



// Provide RowSet with enough information to obtain a

// Connection

rs.setUrl("jdbc:dburl://dbserver/PEOPLE");

rs.setUsername("user");

rs.setPassword("password");



rs.setCommand("SELECT first_name, last_name FROM person " +

              "WHERE last_name=?");

rs.setString(1, "Halloway");



rs.execute();



// rs now holds the results of the query


Most of the calls to the RowSet could be hidden behind an object factory interface (see Item 72 for details on object factories), so that client code could be reduced to the following:






RowSet rs = MyRowSetFactory.getRowSet();

rs.setCommand(. . .);

rs.setString(1, "Halloway");

rs.execute();


If you ask me, that's about as simple as you're going to get for direct SQL-based access. The factory itself is not difficult to imagine:






public class MyRowSetFactory

{

  public static getRowSet()

  {

    RowSet rs = new WebRowSet(); // Or some other

                                 // implementation



    // This time, use JNDI DataSource

    // rather than url/username/password

    rs.setDataSourceName("java:comp/env/jdbc/PEOPLE_DS");



    return rs;

  }

}


The RowSet API has methods to control the details of how the query will be executed, including the fetch size of the ResultSet (which should be set as high as possible, preferably to the size of the rows returned by the query, to avoid round-trips at the expense of one giant "pull" across the network), and the transaction isolation level (see Item 35) of this call.

However, the RowSet API still requires that you author the SQL query you're interested in every time you want to use it, and it can get awfully tiring to type SELECT blah blah blah FROM table WHERE baz=? every time you want to pull some data back, particularly if you're using PreparedStatement objects in order to avoid falling into the traps of doing simple string concatenation and opening yourself to injection attacks (see Item 61). So you get tempted to either blow off the possibility of the injection attack—bad idea, mate—or you start cheating in other ways, like SELECT * FROM table, which pulls every column in the table across, rather than just the data you're interested in, which means you're wasting bandwidth. That might not be an issue on your development machine or network, but in production that could easily max out the pipe if 1,000 users execute that command simultaneously.[2] If you don't need it, you shouldn't ask for it across the wire. So what are self-respecting programmers, seeking to do the right thing while saving their poor hands from carpal tunnel syndrome, to do? (And if you're thinking this is a pretty frivolous example, take another approach to the problem: What happens if we change the table definitions, which in turn means changing the SQL statements now scattered all over the codebase?)

[2] OK, it would have to be an awfully wide table, but you get the idea.

One approach is to keep a table-oriented view of the database but put a bit more scaffolding between you and the data access technology itself, through a Table Data Gateway [Fowler, 144]. Essentially, each table becomes a class, which in turn serves as the point of access to any rows in that table:






public class PersonGateway

{

  private PersonGateway() { /* Singleton—can't create */ }



  public static Person[] findAll()

  {

    ArrayList al = new ArrayList();



    RowSet rs = MyRowSetFactory.getRowSet();

    rs.setCommand("SELECT first_name,last_name,age "+

                  "FROM person");

    rs.execute();

    while (rs.next())

      al.add(new Person(rs.getString(1),

                        rs.getString(2),

                        rs.getInt(3));



    return (Person[])al.toArray(new Person[0]);

  }

  public static void update(Person p)

  {

    // And so on, and so on, and so on

  }

}


Note that when done this way, a Table Data Gateway looks a lot like a procedural-first approach to persistence (see Item 42); the key difference is that a Table Data Gateway is per-table and focuses exclusively on manipulating that table's data, whereas a procedural-first approach provides a single point of access for all persistence logic in a table-agnostic way (since the underlying data store may not even be relational tables anymore).

"This works for tables," you may be tempted to say, "but I need to do a lot of queries that aren't restricted to a single table—what then?" In fact, thanks to the closure property of relational data access mentioned earlier, the Table Data Gateway can be extended to be a new variant, say, the Query Data Gateway, where instead of wrapping around a single table, we wrap it around a query instead:






public class ChildrenGateway

{

  private ChildrenGateway() { }

    // Singleton, can't create



  public static Person[] findKidsForPerson(Person p)

  {

    ArrayList al = new ArrayList();



    RowSet rs = MyRowSetFactory.getRowSet();

    rs.setCommand("SELECT first_name,last_name,age "+

                  "FROM person p, parent_link pp "+

                  "WHERE p.id = pp.child_id "+

                      "AND p.last_name=?");

    rs.setInt(1, p.getPersonID());

    rs.execute();

    while (rs.next())

      al.add(new Person(rs.getString(1),

                        rs.getString(2),

                        rs.getInt(3));



    return (Person[])al.toArray(new Person[0]);

  }

}


Ironically, it turns out that this approach was commonly used within the relational world, long before we upstart Java programmers came along, so databases support this intrinsically: it's called a view. The database basically pretends that a query looks and acts just like a table:






CREATE VIEW children AS

  SELECT first_name, last_name, age

  FROM person p, parent_link pp

  WHERE p.id = pp.child_id


This creates a pseudo-table called children that contains all Person objects that have parents. We then issue queries against children by restricting against last_name.

What's the advantage here? All we've done is create something that looks like a table that isn't, and won't that create some kind of performance scariness inside the database? For some of the older database products, yes—but that problem was pretty much corrected some years ago. While there are some restrictions on views, such as on updates to a view—some database products don't allow updates at all, others allow updates only to one table if there's a multi-table join in the view, and so on—as a means by which to make queries more readable, and in some cases more efficient, views are a powerful tool.

If none of these approaches seem particularly what you're looking for because they still seem like too much work and are too awkward to work with from within Java, another approach is to make use of the recently standardized SQL/J, or "embedded SQL for Java," Specification recently approved as part of the SQL-99 Specification. As with other embedded SQL technologies, SQL/J allows a programmer to write SQL statements directly within the Java code, which is then preprocessed by a SQL/J preprocessor, turned into regular Java code, and fed to javac as a normal compilation step. The SQL/J preprocessor takes care of any of the call-level interface logic in your Java code—you just focus on writing the SQL when you want to talk to the database and on Java when you want to do anything else.

The easiest way to use SQL/J is to write static SQL statements embedded in the code directly, known as SQL/J clauses, as shown here:






public static float averageAgeOfFamily(String lastName)

  throws Exception

{

  #sql iterator Ages (int individualAge);



  Ages rs;

  #sql rs =

    { SELECT age FROM person

      WHERE last_name = :lastName };



  int totalAge = 0;

  int numPersons = 0;

  while (rs.next())

  {

    numPersons++;

    totalAge += rs.individualAge();

  }



  return ((float)totalAge) / ((float)numPersons);

}


It looks and feels like a standard Java method, with the exception of the #sql blocks, setting off the code that will be handled by the preprocessor, and the Ages type introduced by the #sql iterator clause early in the method body. This isn't the place to discuss the ins and outs of SQL/J syntax; download the reference implementation and documentation from http://www.sqlj.org.

The thing to recognize here is that SQL/J essentially hides almost all of the data access plumbing from sight, leaving only the barest minimum of scaffolding in place to make it easy to write the SQL itself—in many respects, it's the absolute inverse of an objects-first technology like JDO, which tries so hard to hide the relational access from the programmer's perspective. On top of this "get right to the relational data access" idea, SQL/J also offers a major improvement over any other data access technology to date: if the database schema is available at the time of compilation, a SQL/J preprocessor/translator can actually check, in compile time, the SQL in the Java code against the schema, catching typos and syntax errors without having to rely on runtime SQLException instances to find them. This alone is a compelling reason to take a hard look at SQL/J.

One major drawback to SQL/J, however, is its relative lack of support within the J2EE community. Although the J2EE and EJB Specifications do mention it by name as a possible data access technology layer, it receives almost no attention beyond that. Oracle is the only major database vendor to have a SQL/J implementation available as of this writing, and there seems to be little support from the community as a whole, which is a shame because in many respects this is the epitome of a relational-first persistence approach.

As with the objects-first persistence approach, the relational-first approach has its share of strengths and weaknesses. It exposes the power and grace that SQL itself encompasses, but at the expense of having to abandon (to at least a certain degree) the beauty of the object model in obtaining and updating data. It means having to write code that takes data out of the relational API, be that JDBC or SQL/J, and puts it into your object model, but the tradeoff is that you get to control the exact SQL produced, which can be a powerful optimization because not all SQL is created equal (see Item 50).


     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows