March 17, 2011, 5 a.m.
posted by oxy
Item 21: Consider partitioning components to avoid excessive load on any one machine
Almost from the very beginning, clustering and failover and other such words have been a part of the J2EE programmer's lexicon. In fact, they've been so much a part of J2EE since its inception that they've almost lost all meaning whatsoever. To many J2EE developers, clustering, failover, partitioning, and distribution have been shouted so loudly by vendors and J2EE evangelists that the advantages—and disadvantages—of each become lost in the noise.
Enterprise systems, owing to their multiuser nature, are almost always distributed in some fashion. It's a necessary evil. Distribution makes it easier for data to be shared across multiple users and/or applications (by centralizing the data), but distribution can also lead to better operation, by ensuring that the system as a whole can remain alive even if parts of it are currently down due to an outage, whether planned or unplanned. On top of all that, a properly distributed system can offer a number of optimizations, mostly in the area of avoiding network traffic (see Item 17) by keeping necessary data local to the machine.
Before you mutter to yourself, "Duh, clustering, everybody's got it" and move on, take a deep breath and hang in there with me. Clustering is only one of several ways to distribute the system, and from several angles, it's not even the most efficient way to take advantage of the partitioning of components and load across multiple machines. Let's step back for a second and consider the situation from a higher level.
By their nature, middleware systems tend toward centralization—after all, part of the whole point of middleware in general was to bring disparate systems together into some kind of unified whole. Part of making that possible comes by creating "glue" that welds the systems together, and for maintenance reasons, if nothing else, it's best if a given component exists only once on the network.
But centralization has its problems, most notably that when we exhaust the available capacity of a given machine—that is, when we run into a situation when more users and/or resources need to be supported than the centralized server can provide—simply adding a new machine into the mix suddenly creates a whole host of issues. Concurrency concerns, in particular, rear their ugly head, but bottlenecks and other scalability issues crop up, too.
One method of partitioning is to distribute the system. In essence, we'll take a given component, split it into smaller, "subatomic" pieces, and strategically scatter those pieces across the system.
Consider, for a moment, the DNS. If there is one service on which the entirety of the Internet is built, DNS is it. Think about it—without DNS, www.neward.net is a meaningless collection of ASCII characters, and you'd have to remember the actual IP address, a meaningless quartet of numbers to most people. As if that's not enough, however, remember that DNS provides a measure of location independence, as described in Item 16, that would be impossible (or at least very difficult to build) without it.
But consider how DNS works. Every time you open a socket to some remote server anywhere on the LAN or WAN, you do a DNS lookup to obtain the IP address of the human-readable domain name. Multiply that by every person on the Internet doing the same, and you suddenly realize that if there was ever a system that needed to be scalable and reliable, DNS is it. Now imagine if all of those requests were headed directly at your server—how long do you think it would last before it would simply melt through the floor?
The fact is, DNS isn't implemented as a centralized service or data repository for a very good reason: no machine on the planet is powerful enough to handle the millions upon millions of concurrent requests that a centralized DNS server would have to endure. Instead, DNS is distributed across the Internet in a highly partitioned manner, in a large hierarchy divided into nonoverlapping zones. When resolving a DNS name, it's like walking a collection of servers directly within the domain name itself. For example, simplistically speaking, in the domain name www.neward.net, we start with the top-level domain, net, under which we will (eventually) find the second-level domain name neward, which will in turn forward off to my server, which in turn recognizes the third-level name www. Should I choose to create another domain name that resolves to a different server within my intranet, I publish it to my DNS server, but nothing else on the Internet needs to know about the change—ftp.neward.net follows exactly the same path, and my DNS server acknowledges the new ftp prefix just as it did www for previous requests. (One problem with this scheme, however, is that it requires clients to make several network hops before the domain name is fully resolved, and as a result DNS has evolved to include client-side caching of domain names. We'll examine the ramifications of this in just a second.)
Database professionals have known about this kind of partitioning for some time. The eighth edition of Introduction to Database Systems [Date] discusses it at length in Chapter 21, speaking specifically about distributed databases. In it, Date defines data fragmentation as the ability to divide data into pieces or fragments for physical storage purposes; for example, we might fragment an EMPLOYEE table into two tables, one for the New York branch of the company and one for the London branch:
FRAGMENT EMPLOYEE AS N_EMP AT SITE 'New York' WHERE dept="Sales" OR dept="Admin", L_EMP AT SITE 'London' WHERE dept="TechSupport";
In other words, any data in the EMPLOYEE table that satisfies the first predicate (i.e., the employee works for either the Sales or Admin departments) will be stored physically in a table called N_EMP at the New York server, and any data that fits the second predicate will be stored in London. In a truly distributed database system, "the seams don't show"; queries such as "SELECT * FROM employee WHERE last_name='Date'" should operate against both the London and the New York databases.
In fact, we can talk about two different kinds of data fragmentation: horizontal partitioning, in which we segregate the data according to elements in the data itself (as we do above), and vertical partitioning, where we fragment "down the middle" of the table, perhaps putting the employee's FIRST_NAME, LAST_NAME, and EMPLOYEE_ID columns onto the New York database, and the DEPT and SALARY columns onto the London site. (For what it's worth, DNS chooses a horizontal fragmentation scheme; in fact, it's a recursive horizontal fragmentation scheme, since servers are nested inside of other servers' partitioned space.)
We can apply much the same principle to J2EE systems, at both a code level and a data level. We can partition data horizontally across the database, putting location-aware data into database servers geographically centralized (such as putting data related to human resources into a database that's local to that department, for example), or we can partition data vertically, and "split" tables and/or parts of tables across multiple servers, all depending on how the system will want to use it. In a similar vein, we can partition code in the same way—put the code that works most closely with the human resources tables on a server close by those tables, and so on. While code is less likely to partition horizontally, it's still feasible. For example, you may distribute near-identical session beans across two servers, the difference between them being expected input or output—"expense reports totaling more than $1,000 go to server A, less than $1,000 to server B."
Another way to distribute the system occurs when we scatter the JMS Topic and Queue instances across a network, ensuring that the processors that consume those messages are running on the same machine that hosts the Topic or Queue. Each Topic and Queue can itself obey a certain amount of distribution, as well—if it turns out that a given Queue is being overloaded with work (and thus is turning into a bottleneck), we can simply horizontally split the Queue into two separate ones, for example, the first Queue for last names A–M and the second for N–Z.
The key problem with horizontal partitioning is that somewhere, the "rules" regarding the horizontal partitioning must be kept, and, more importantly, kept up-to-date. For example, if we partition an employee database horizontally along the employee's last name ("Data for all employees with last names beginning with A–M is stored on server1, and N–Z on server2"), we need some kind of rule somewhere that reflects this—and if this is hard-coded into code, repartitioning to three servers requires recoding. Ditto for vertical partitioning.
A second method of partitioning is to replicate the system. This is the universal "clustering" that many, if not all, J2EE systems offer, whereby we take copies of the system and put those copies on multiple nodes, on the grounds that if one of them fails, any other node can take up the request just as easily as the first. Replication offers a couple of advantages. If the data is replicated locally, it means less network traffic to carry out whatever requests are needed, as well as a better availability factor—if any node (server) in the collection fails, no outage is recorded, since any node is capable of responding to the request, unlike the distributed case discussed earlier. If the DNS server for www.neward.net goes down, nobody else can logically take its place, thus denying DNS service for that domain until the server comes back up.
We discuss replication again in Item 37, but a couple of communication-specific issues deserve some words here.
As described in Item 37, the problem with replication is simple: up date propagation. Assume for a moment that we store the data on three different database instances across the enterprise. As long as all we do is read from these three databases, we can read without concerns—each has an exact copy of the data, so any of the three can answer our request with confidence. As soon as we try to make an update to one, however, things get truly sticky—we now have multiple copies of that data, so modifying one means we're inconsistent with the others until they can be updated accordingly. And if two should happen to be updated simultaneously. . . .
We see this problem with DNS, for example; because clients replicate DNS entries in local storage (they cache them off), it often takes on the order of days before a DNS change manages to percolate across the entire Internet. In the meantime, however, if the "old" server is down or offline for some reason, clients are effectively left with an outage that isn't—the server is up, it's just not the server the client thinks it is. This kind of latency might work in DNS, where updates to domain-name/IP-address relationships don't happen very often, but in a system where updates could be coming every second, much less every subsecond, that's not going to fly well for long. (Ironically, we see this kind of latency all the time with credit cards—for some reason, it takes forever for your payment to go through, even though charges seem to materialize on the card even before you've finished signing the receipt.)
At this point, it may be tempting to fall back to some kind of global synchronization scheme, such as a Remote Singleton that acts as an update manager; but if you want your system to scale, don't do this. As described in Item 5, you're effectively introducing a bottleneck, since updates to your databases will now be throttled by how fast your update manager can handle requests. You're also requiring a remote service to respond before you can proceed, getting you into trouble with Item 20. Of course, you could always partition your update manager, but now you're back to the problems of distribution or replication again. You might think you can use timestamps, per the discussion of optimistic concurrency (see Item 33), but you'll quickly run into a problem that has plagued computer science for years: it's almost impossible to get two computers' clocks to agree on the time, particularly at the millisecond-level resolution you'd need for a high-scale system.
Simply updating all the copies as part of the update operation may sound like the best way to go; unfortunately, this is a naïve suggestion at best. Remember, we can't assume that all parts of the system will always be available (see Item 7), and if an update operation has to block waiting for a server to finish rebooting, just one machine having to reboot or restart will hold up the entire system, since no update can complete until that server comes back online. Ouch.
It gets worse with respect to transactions—taking out a lock on one data item in one server should, logically, take out that same lock on other servers, in order to avoid the accidental possibility of the same data item being updated to different values in different locations. But here we run into the very real problem of two concurrent updates each asking for a lock on the same data item, and due to the latency involved in transmitting the request over the wire, finding it already locked—who should give up the lock first? (Of such questions, by the way, are doctoral theses written. This is not an easy problem to solve.)
If you've already read Item 5, you may suddenly be realizing that replicating data brings up many of the same identity issues discussed there—you're right. And, unfortunately, there's no real silver bullet solution that solves the identity problem here, either. If you can accept a certain latency in update propagation across the data, it's not nearly as difficult, but unfortunately for some systems and/or tables, inconsistency is absolutely verboten; for such systems or tables, replication is not a great avenue to pursue.
Note that caching data returned from a database (or any other data, for that matter, but most caches deal with database data, it seems) is just a subset of replication, with the interesting distinction that the client chose to replicate the data, rather than the server. Commensurately, we have the same consistency and update propagation problems that go with replication, which means that if you plan to cache data, you'd best have some kind of answer in mind to the question of what should happen when somebody updates the database through a channel that's not one you expect—remember, you don't own the database, as discussed in Item 45.
Note also that we can substitute the word "data" in the discussions above for "state" in a discussion of stateful session beans and/or entity beans—replicating state (or caching it) across multiple machines in a cluster turns out to be painfully difficult to accomplish. Particularly in the case of stateful session beans, ask your vendor how they implement state caching—if they defer the state to a "state server" (as several do), then effectively whatever gain you get from it being cached is more than likely lost due to the cost of retrieving it from the state server (see Item 17), and unless the state server is somehow replicated, you're back to the single point of failure (the state server) that clustering was supposed to solve for you.
Ultimately, no partitioning scheme solves all problems. If you distribute, you evenly spread the load across the system (as evenly as your horizontal or vertical rules allow, anyway), but you lose the redundancy that a replication scheme offers. Of course, if you replicate, you run into update propagation problems that aren't easily solved. Some combination of the two can sometimes serve as the best policy, distributing the data that can afford no inconsistency and replicating the data that can.