May 27, 2011, 5:49 a.m.
posted by datamaker
Chapter 3 explored how an object could be safely or improperly published. The safe publication techniques described there derive their safety from guarantees provided by the JMM; the risks of improper publication are consequences of the absence of a happens-before ordering between publishing a shared object and accessing it from another thread.
The possibility of reordering in the absence of a happens-before relationship explains why publishing an object without adequate synchronization can allow another thread to see a partially constructed object (see Section 3.5). Initializing a new object involves writing to variablesthe new object's fields. Similarly, publishing a reference involves writing to another variablethe reference to the new object. If you do not ensure that publishing the shared reference happens-before another thread loads that shared reference, then the write of the reference to the new object can be reordered (from the perspective of the thread consuming the object) with the writes to its fields. In that case, another thread could see an up-to-date value for the object reference but out-of-date values for some or all of that object's statea partially constructed object.
Unsafe publication can happen as a result of an incorrect lazy initialization, as shown in Figure. At first glance, the only problem here seems to be the race condition described in Section 2.2.2. Under certain circumstances, such as when all instances of the Resource are identical, you might be willing to overlook these (along with the inefficiency of possibly creating the Resource more than once). Unfortunately, even if these defects are overlooked, UnsafeLazyInitialization is still not safe, because another thread could observe a reference to a partially constructed Resource.
Listing 16.3. Unsafe Lazy Initialization. Don't Do this.
Suppose thread A is the first to invoke getInstance. It sees that resource is null, instantiates a new Resource, and sets resource to reference it. When thread B later calls getInstance, it might see that resource already has a non-null value and just use the already constructed Resource. This might look harmless at first, but there is no happens-before ordering between the writing of resource in A and the reading of resource in B. A data race has been used to publish the object, and therefore B is not guaranteed to see the correct state of the Resource.
The Resource constructor changes the fields of the freshly allocated Resource from their default values (written by the Object constructor) to their initial values. Since neither thread used synchronization, B could possibly see A's actions in a different order than A performed them. So even though A initialized the Resource before setting resource to reference it, B could see the write to resource as occurring before the writes to the fields of the Resource. B could thus see a partially constructed Resource that may well be in an invalid stateand whose state may unexpectedly change later.
The safe-publication idioms described in Chapter 3 ensure that the published object is visible to other threads because they ensure the publication happens-before the consuming thread loads a reference to the published object. If thread A places X on a BlockingQueue (and no thread subsequently modifies it) and thread B retrieves it from the queue, B is guaranteed to see X as A left it. This is because the BlockingQueue implementations have sufficient internal synchronization to ensure that the put happens-before the take. Similarly, using a shared variable guarded by a lock or a shared volatile variable ensures that reads and writes of that variable are ordered by happens-before.
This happens-before guarantee is actually a stronger promise of visibility and ordering than made by safe publication. When X is safely published from A to B, the safe publication guarantees visibility of the state of X, but not of the state of other variables A may have touched. But if A putting X on a queue happens-before B fetches X from that queue, not only does B see X in the state that A left it (assuming that X has not been subsequently modified by A or anyone else), but B sees everything A did before the handoff (again, subject to the same caveat).
Why did we focus so heavily on @GuardedBy and safe publication, when the JMM already provides us with the more powerful happens-before? Thinking in terms of handing off object ownership and publication fits better into most program designs than thinking in terms of visibility of individual memory writes. The happens-before ordering operates at the level of individual memory accessesit is a sort of "concurrency assembly language". Safe publication operates at a level closer to that of your program's design.
Safe Initialization Idioms
It sometimes makes sense to defer initialization of objects that are expensive to initialize until they are actually needed, but we have seen how the misuse of lazy initialization can lead to trouble. UnsafeLazyInitialization can be fixed by making the geTResource method synchronized, as shown in Listing 16.4. Because the code path through getInstance is fairly short (a test and a predicted branch), if getInstance is not called frequently by many threads, there is little enough contention for the SafeLazyInitialization lock that this approach offers adequate performance.
The treatment of static fields with initializers (or fields whose value is initialized in a static initialization block [JPL 2.2.1 and 2.5.3]) is somewhat special and offers additional thread-safety guarantees. Static initializers are run by the JVM at class initialization time, after class loading but before the class is used by any thread. Because the JVM acquires a lock during initialization [JLS 12.4.2] and this lock is acquired by each thread at least once to ensure that the class has been loaded, memory writes made during static initialization are automatically visible to all threads. Thus statically initialized objects require no explicit synchronization either during construction or when being referenced. However, this applies only to the as-constructed stateif the object is mutable, synchronization is still required by both readers and writers to make subsequent modifications visible and to avoid data corruption.
Thread-safe Lazy Initialization.
Using eager initialization, shown in Listing 16.5, eliminates the synchronization cost incurred on each call to getInstance in SafeLazyInitialization. This technique can be combined with the JVM's lazy class loading to create a lazy initialization technique that does not require synchronization on the common code path. The lazy initialization holder class idiom [EJ Item 48] in Listing 16.6 uses a class whose only purpose is to initialize the Resource. The JVM defers initializing the ResourceHolder class until it is actually used [JLS 12.4.1], and because the Resource is initialized with a static initializer, no additional synchronization is needed. The first call to getresource by any thread causes ResourceHolder to be loaded and initialized, at which time the initialization of the Resource happens through the static initializer.
Lazy Initialization Holder Class Idiom.
No book on concurrency would be complete without a discussion of the infamous double-checked locking (DCL) antipattern, shown in Listing 16.7. In very early JVMs, synchronization, even uncontended synchronization, had a significant performance cost. As a result, many clever (or at least clever-looking) tricks were invented to reduce the impact of synchronizationsome good, some bad, and some ugly. DCL falls into the "ugly" category.
Again, because the performance of early JVMs left something to be desired, lazy initialization was often used to avoid potentially unnecessary expensive operations or reduce application startup time. A properly written lazy initialization method requires synchronization. But at the time, synchronization was slow and, more importantly, not completely understood: the exclusion aspects were well enough understood, but the visibility aspects were not.
DCL purported to offer the best of both worldslazy initialization without paying the synchronization penalty on the common code path. The way it worked was first to check whether initialization was needed without synchronizing, and if the resource reference was not null, use it. Otherwise, synchronize and check again if the Resource is initialized, ensuring that only one thread actually initializes the shared Resource. The common code pathfetching a reference to an already constructed Resourcedoesn't use synchronization. And that's where the problem is: as described in Section 16.2.1, it is possible for a thread to see a partially constructed Resource.
The real problem with DCL is the assumption that the worst thing that can happen when reading a shared object reference without synchronization is to erroneously see a stale value (in this case, null); in that case the DCL idiom compensates for this risk by trying again with the lock held. But the worst case is actually considerably worseit is possible to see a current value of the reference but stale values for the object's state, meaning that the object could be seen to be in an invalid or incorrect state.
Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is made volatile, and the performance impact of this is small since volatile reads are usually only slightly more expensive than nonvolatile reads. However, this is an idiom whose utility has largely passedthe forces that motivated it (slow uncontended synchronization, slow JVM startup) are no longer in play, making it less effective as an optimization. The lazy initialization holder idiom offers the same benefits and is easier to understand.
Listing 16.7. Double-checked-locking Antipattern. Don't Do this.