April 1, 2011, 10:59 p.m.
posted by oxy
Item 72: Don't fight the garbage collector
We've all heard the conventional wisdom—it has been trumpeted from magazine articles, books, even the conference and lecture circuit: Java object allocation is slow. Minimize the number of objects you create. Unfortunately, those opinions are based on outdated information, and the conclusions they draw as a result lead Java programmers to do exactly the wrong thing.
In the early, version 1.0 days of Java, the JVM garbage collector was truly awful. It was a simple, stop-the-world, mark-and-sweep garbage collector that resulted in frequent visible pauses in execution. The JVM was holding off on all garbage collection activities until it reached its memory maximum threshold, then ripped through the entire JVM heap to find all the objects worthy of collection, marked them as unreachable (and therefore eligible for collection), and executed another pass of the entire heap to collect them all at once. Because the VM needed to keep everything properly correct and synchronized, all executing threads—which were interpreting Java bytecode because we had no JIT yet in those days—had to be stopped while garbage collection was taking place. In many ways, the 1.0 release birthed Java's reputation as a "performance-avoiding platform."
As Java moved through version 1.1 and into its 1.2 release, developers began to find ways to circumvent and bypass the garbage collector—anything we could do to keep the garbage collector from ripping through the heap and doing its odious duty was a blessing. Since object allocation was frequently a place from which a garbage collector could be triggered, it made sense to minimize the number of calls to new as much as possible. A number of ideas were put forth, including the general-purpose object pool (which we'll revisit in a bit).
What happened in the meantime, however, is obvious in retrospect but less so when considering the conventional wisdom: Java garbage collection algorithms and allocation policies got better. By the time of the Hotspot VM release, and particularly with respect to the J2SE 1.4 (and 1.5) releases, no longer can we rely on the assertion that object allocation is expensive. Similarly, with the exception of finalizable objects, it's also not true that object reclamation is expensive. In truth, in modern VMs object allocation and reclamation require a fraction of their former costs; to understand why, a brief trip through various garbage collection algorithms is in order.
One popular collector is the copying collector, so named because it divides the heap into two equal halves, one labeled Fromspace and the other Tospace. Objects are allocated out of the Fromspace, and when a garbage collection pass is called for, the garbage collector does its usual job of starting from the root set of references and looking for referenced objects. This time, however, instead of simply marking the object, it follows the links and slowly copies every referenced object from the Fromspace to the Tospace (hence the names), as shown in Figure.
3. Arena-based garbage collection
Having done this, any objects left in the Fromspace are, by definition, unreachable and therefore eligible for reclamation. Therefore the garbage collector can assume the entire Fromspace is eligible for reclamation and can blow the entire space away. Naturally, any finalizable objects in the Fromspace must be dealt with somehow before we can blow away the Fromspace, which once again clarifies what a pain finalizers are to the garbage collector. Modulo the finalizer scenario, however, this all-at-once behavior of the copying collector means it releases allocated objects extremely fast, and as a side benefit it automatically compacts the objects in the Tospace. Once this pass is over, we flip the names, so that the current Tospace becomes the Fromspace, and we wait for the next garbage collection pass.
A copying collector isn't perfect, however, and one of its most notable flaws is pretty easy to spot: the heap requires twice as much memory because we need to maintain enough space to copy the entire heap over to the Tospace. This means that if we have an application that consumes, on average, 128MB of objects, the JVM needs to have at least 256MB allocated just for the heap. This is a bit expensive and not particularly attractive, as you might well imagine.
Another garbage collection algorithm in common use, a generational collector, relies on the idea that most objects are short-lived beasts, while a small minority of objects survive for long periods of time. Objects are thus classified by their generation, a chunk of memory that contains objects of roughly the same lifetime. Based on this assumption, a generational collector puts newly created objects into a young generation, a collection of all objects allocated recently. When a garbage collection pass is requested, then, the collector needs to scan only the first generation looking for unused objects since that's the most likely place they will appear. Only when not enough room is collected from the young generation does the collector begin to move up the generational stack, looking in successively older generations for objects to collect. As an object survives for longer periods of time (usually measured in the number of collection passes successfully survived), it will be moved into successively older generations. Most generational collectors have only a few generations, two or three at the most, although in theory an infinite number of generations could be used (see Figure).
4. Generational garbage collector
A generational collector has the immediate and obvious advantage of not having to scan the entire heap during a collection pass, meaning garbage collection passes can be executed more quickly. However, it also suffers the disadvantage that objects in older generations will not be collected for a long period of time, particularly if the young generation remains open enough to satisfy any new allocation requests. In turn, this means that references across generations, particularly an old-to-young reference, will potentially keep objects alive longer than necessary. Finally, depending on the collector, we run the risk that two collections will be necessary, rather than one—if a collection pass through the young generation fails to satisfy allocation requirements, then another pass, this time through the old generation (which may in turn imply a pass through the young generation again) will be necessary, greatly extending garbage collection times.
There are other forms of garbage collection algorithms, including arena-based collectors (allocating objects out of arenas, chunks of memory from which objects are allocated, much as the operating system manages virtual memory in pages), reference-counting collectors (which have fallen out of favor because a reference-counting collector cannot handle circular reference scenarios), and others. For more information on garbage collection in general, see Garbage Collection [Jones/Lims]; for more information on the Sun collector in general, see the Sun Hotspot documentation available online at http://java.sun.com/docs/hotspot/gc/index.html.
Bearing in mind that mark-and-sweep, copying, and generational collectors are only three of an almost infinite number of possible garbage collection algorithms, it suddenly becomes obvious that programmers' choices that might help one type of garbage collector will hurt another type. In particular, the conventional wisdom that programmers should allocate objects out of object pools in order to speed up allocation depends entirely on the kind of garbage collector running underneath the hood. Therefore, if (after having consulted Item 11) you're trying to write portable code, you can't assume you know anything about the garbage collector under the hood, and as a result, you can't know whether to pool your objects or not. In short, you have to trust the garbage collector to do the right thing.
That doesn't mean you're entirely without options, however. A certain amount of configuration is possible, depending on the JVM you're using—each successive J2SE release since 1.3 has offered greater control over garbage collection behaviors and visibility. The next several paragraphs describe the Sun JVM in some detail, so if you want to preserve your vendor-neutral stance (see Item 11), hold your hands over your eyes and sing loudly until we're done.
Starting with J2SE 1.3, the Sun Hotspot JVM garbage collection mechanism is a hybrid mechanism. It breaks the heap into two arenas, using a generational collector to separate young objects from old objects. Objects are first allocated to a nursery, and only when they survive a certain number of garbage collection passes do they migrate up to the old generation. The young generation is managed using a generational garbage collector, consisting of an eden space, where objects start, and two survivor spaces, where objects are copied to as part of the generational scheme. (The eden space thus acts as a permanent Fromspace, and the survivors switch off as the Fromspace and the Tospace, thus leaving one survivor space empty at any given time between garbage collection passes.)
The old generation, however, because objects allocated within it are far less likely to be ready for reclamation, prefers instead to use a traditional mark-sweep-compact algorithm; the mark-sweep-compact algorithm operates as a standard mark-and-sweep algorithm, except that objects marked for survival are then compacted in order to minimize "holes" in the old generation. Mark-sweep-compact is much slower than a copying collector but has the advantage of being able to use the entire heap, unlike the copying collector, which needs to divide into the Fromspace and the Tospace to work.
In addition, a part of the old generation space is reserved for use by the VM and is called the permanent space, since this is where the JVM will allocate Java objects necessary for its own use, such as Reflection objects.
As already described, one of the main advantages of a generational collector is that a complete pass through the entire heap is not necessary—frequently, a single pass through the young generation returns enough free space to satisfy memory requests for the near future. As a result, garbage collection passes within the Sun JVM come in two flavors, a minor collection that scours only the young generation and a full collection that walks both the young generation and the old generation. The goal, within the Sun JVM, is to keep the young generation free enough that most, if not all, allocation requests can be satisfied within the young generation, since a garbage collection pass there is far less expensive than one through the old generation.
Fortunately, we don't have to guess how the garbage collector is running; using the standard -verbose:gc option to the JVM, we can watch the JVM as it moves through its garbage collection phases because each collection, both minor and full, is written to the java console window as the application executes. Under 1.4.x JVMs, the output of this verbose garbage collection behavior looks something like this:
[GC 325407K->83000K(776768K), 0.3400514 secs] [GC 325816K->83372K(776768K), 0.3054352 secs] [Full GC 267628K->83769K(776768k), 1.9542351 secs]
In this case, we're looking at two minor collections and one full collection. Parsing this output is fairly easy: the number before the arrow indicates the combined size of live objects before and after the garbage collection pass. The third number, in the parentheses, is the total available size of the Java heap, which is the total heap size minus one of the young generation's survivor spaces (because we need that for the copying collector in the young generation to work). Last, of course, is the time taken by this garbage collection pass.
Looking at this output, then, is the first step in seeing how the garbage collector is behaving within your application. If, for example, you see many more Full GC entries than GC entries, this is a huge red flag telling you that the young generation is getting saturated with objects and can't find enough room to satisfy requests. Part of this may be due to the fact that objects in the young generation, if referenced from objects in the old generation, cannot be reclaimed, which is another argument against using object pools unnecessarily. Whatever the reason, if the heap isn't large enough to honor an allocation request, the JVM will resize the heap by extending its operating system's process footprint by some amount, up to the maximum specified by the –Xmx option to the JVM.
If, however, the old generation has plenty of room and only the young generation is getting saturated, we can resize the young generation by using some underdocumented Sun-specific JVM tuning flags. By default, the young generation is set to be one-quarter the size of the entire heap; if you want to change this ratio, frequently to expand the size of the young generation, you can use the –XX:NewRatio flag, where the value passed to the flag is the ratio of the young generation to the old generation. For example, -XX:NewRatio=3 sets the young generation ratio to be 1:3 to the old generation, which gets us back to the default.
If you're looking for something a bit more fine-grained than a simple multiple of the total heap size, two more flags, NewSize and MaxNewSize, set bounds on the young generation just as the –ms and –mx flags do for the entire heap. Be careful with these values, though—a young generation larger than half of the total size of the heap is allocating a lot of empty space to the young generation. The default NewSize on a Solaris JVM is 2172K; on an x86, 640K. MaxNewSize is 32M on both platforms, which is typically too small, particularly if you've set the –mx option to be larger than 64M (the default). You can do the same for the old generation using the OldSize option, set to 1408K by default, and likewise for the permanent space using PermSize, 1M by default.
If you're curious about how many passes through the young generation an object has to go before migrating to the old generation, run the application with the PrintTenuringDistribution flag turned on (-XX:+PrintTenuringDistribution). This exercise is not for the faint of heart and is probably useful only to those developers with more familiarity in this area. On the other hand, getting comfortable with those statistics is a good way to get more familiar with the garbage collector.
This sort of configuration is a lot better than trying to work with things programmatically, by the way. For example, many programmers, believing they know better than the garbage collector when might be a good time to run a garbage collection pass, call System.gc on a semi-regular basis, usually when they've released a large object or collection of objects. "I want those objects cleaned up as quickly as possible," they argue, "so I need to tell the garbage collector to run so it'll clean those objects up right away."
This argument has a couple of major flaws. Note that the documentation states that a call to System.gc is not a command but a request to the garbage collector to run. The garbage collector is free to ignore this call if it feels like doing so. However, before you start spluttering in indignation, "How dare they presume such behavior?" stop and think for a moment: If a call to System.gc forced a new garbage collection pass, what if the garbage collector were already in the middle of a pass? Should it abandon all its work thus far just to start over because you said so? This would actually result in a slower garbage collection time and would work entirely contrary to your stated purpose. More importantly, the collector may not be in a good position to efficiently collect the objects you just released; for example, an arena-based garbage collector may find that the objects you think are eligible for reclamation are still in an arena with reachable objects in it and may decide not to collect those objects. Your "help" here wasn't necessary or appreciated.
As a matter of fact, I know of one Web-based system where programmers littered the code with calls to "force" the garbage collector to run when they thought it was appropriate. On a suggestion from one of the senior developers, they tried running a version of the code with all those calls commented out and found it ran roughly 20% faster than it did before, with a smaller memory footprint. Even the Sun documentation points out scary facts behind the System.gc call: "These calls force major collection, and inhibit scalability on large systems." For this reason, the Sun JVM has an option to disable the System.gc call entirely (the "unsupported" option -XX:+DisableExplicitGC), which I strongly suggest you consider using, just in case a few of your more zealous Java developers still drop System.gc into their code.
As another example, consider again the conventional wisdom of object pools: "the cost of allocating a Java object is hideously expensive." Actually, as it turns out, under the Hotspot VM from Sun, the cost of constructing a new object is around ten native CPU instructions plus whatever time is spent in the object's constructor(s). This is easily on par with the best C++ compilers and hopefully puts to rest the idea that allocating objects is expensive—as long as the constructors in the objects being constructed aren't slow or complex, the cost of creating a new object is far more efficient than obtaining an object out of a pool. (Remember, even once the object has been retrieved from the pool it still needs to be initialized to a good starting state, which is what a constructor typically does anyway, so the pool only really avoids the cost of construction—which we've already seen is negligible.)
This isn't to say that all object pooling is bad, only that it's not the panacea it's made out to be by magazine articles more focused on showing off a particular pooling implementation than on discussing when a pool is best used. For example, J2EE goes to great lengths to provide connection pooling of expensive connections (like database connections, the poster child for resource pooling) because the cost of acquiring a database connection can be an expensive operation—not because creating the object itself takes time but because of the work performed under the hood, making the trip(s) across the network, negotiating authentication credentials, and so on.
In fact, object pooling—or rather, resource object management, since that's the more generic term for what we're describing—falls into several major categories.
The ironic part about this discussion is that when broken out this way, it becomes apparent that the EJB Specification makes something of a mistake by placing such heavy emphasis on object pooling—a stateless session bean is a cheap resource to create because it holds no state on behalf of any client, so pooling stateless session bean instances is probably unnecessary. Instead, a container would probably be better off trusting the garbage collector and simply creating new instances of session beans on demand and letting old ones fall into the garbage collector's clutches when the demand drops. (In the interests of safety and security, the object factory used to create the stateless bean instances should probably have an upper boundary on the number of bean instances it will create, in order to avoid denial-of-service attacks from attackers creating an excessive number of stateless calls simultaneously; most EJB implementations provide for this.)
Some readers will undoubtedly look at this list and casually dismiss all four categories as nit-picking over the whole "object pooling" concept; before doing so, however, remember that each one deals with a different scenario and uses different algorithms underneath in accordance with the relative expense and finiteness of the resources they represent. (For details on how to build an effective object pool, see Item 74.)
So, after all that, what now? How exactly are we supposed to write memory-efficient code?
For starters, if you plan to write vendor-neutral code, you have to trust the garbage collector. It's really that simple, and it actually turns out to be the best situation in most cases. Look at it this way: the developers writing the JVM implementations are writing garbage collectors that will work best in "normal" situations and will likely run into problems when faced with tricky and/or bizarre object lifecycle scenarios. Code intentionally—don't try to fool the garbage collector because you'll only confuse it and produce worse results overall.
If you already know the JVM you're working against, however, this isn't necessarily carte blanche to pool objects left and right (assuming your JVM recommends it, which the Sun Hotspot team doesn't for its JVM).
It's usually still better to code intentionally, only managing resource objects when they become finite, expensive, or both; the garbage collectors in most JVMs are optimized to deal with the far more common case, that of objects that have no special needs with respect to outside resources.
The overall point here is to try to avoid "tricks" that you believe will make your garbage collection scenario more efficient, at least not without some hard scientific evidence that it's necessary or worthwhile (see Item 10). For example, when was the last time you actually profiled string concatenation, to see if using StringBuffer is actually more efficient than just adding two String instances together?