“I will be the pattern of all patience.”
William Shakespeare
This chapter describes three patterns and one idiom that simplify locking in concurrent systems: Scoped Locking, Strategized Locking, Thread-Safe Interface, and Double-Checked Locking Optimization.
Developing multi-threaded applications is harder than developing sequential programs because an object can be manipulated concurrently by multiple threads, which may corrupt its internal state. Synchronization mechanisms, such as mutexes or semaphores [McK95], can help ensure objects are serialized correctly. This chapter presents three patterns and an idiom that provide solutions to problems related to synchronizing concurrent objects.
The first idiom and pattern address lock acquisition/release and locking strategies:
When implemented in C++ the Strategized Locking pattern often applies the Scoped Locking idiom. The other two patterns help improve the robustness and efficiency of synchronization mechanisms:
All four patterns and idioms can be used to enhance the implementations of the patterns presented in Chapter 5, Concurrency Patterns.
Other patterns related to synchronization include Code Locking and Data Locking [McK95], Reader/Writer Locking [McK95] [Lea99a], Object Synchronizer [SPM99], as well as Balking and Guarded Suspension [Lea99a].
The Scoped Locking C++ idiom ensures that a lock is acquired when control enters a scope and released automatically when control leaves the scope, regardless of the return path from the scope.
Synchronized Block, Resource-Acquisition-is-Initialization [Str97],1 Guard, Execute Around Object [Hen00]
Commercial Web servers often maintain a ‘hit count’ that records how many times each URL is accessed by clients over a period of time. To reduce latency a Web server process maintains the hit counts in a memory-resident component rather than in a disk file.
Web server processes are often multi-threaded [HS98] to increase throughput. Public methods in the hit-count component must therefore be serialized to prevent threads from corrupting its internal state when hit counts are updated concurrently.
One way to serialize access to a hit-count component is to acquire and release a lock explicitly in each public method. The following C++ example uses the Thread_Mutex defined in the Wrapper Facade pattern (47) to serialize access to critical sections:
class Hit_Counter { public: // Increment the hit count for a URL <path> name. bool increment (const string &path) { // Acquire lock to enter critical section. lock_.acquire (); Table_Entry *entry = lookup_or_create (path); if (entry == 0) { // Something’s gone wrong, so bail out. lock_.release (); return false; // Return a ‘failure’ value. } else { // Increment hit count for <path> name. entry->increment_hit_count (); // Release lock to leave critical section. lock_.release (); return true; } } // Other public methods omitted… private: // Lookup the table entry that maintains the hit count // associated with <path> name, creating the entry if // it doesn’t exist. Table_Entry *lookup_or_create (const string &path); // Serialize access to the critical section. Thread_Mutex lock_; };
Although this code works, the Hit_Counter implementation is unnecessarily hard to develop and maintain. For instance, maintenance programmers may forget to release the lock_ on some return paths out of the increment() method, such as when modifying its else branch to check for a new failure condition:
else if (entry->increment_hit_count () == SOME_FAILURE) return false; // Return a ‘failure’ value.
In addition, the implementation is not exception-safe. Thus, lock_ will not be released if a later version of the increment() method throws an exception or calls a helper method that throws an exception [Mue96].
Both of these modifications will cause the increment() method to return without releasing the lock_. If the lock_ is not released, however, the Web server process will hang when other threads block indefinitely while trying to acquire the lock_. Moreover, if these error cases are rare, the problems with this code may not show up during system testing.
A concurrent application containing shared resources that are manipulated by multiple threads concurrently.
Code that should not execute concurrently must be protected by some type of lock that is acquired and released when control enters and leaves a critical section, respectively. If programmers must acquire and release locks explicitly, however, it is hard to ensure that the locks are released in all paths through the code. For example, in C++ control can leave a scope due to a return, break, continue, or goto statement, as well as from an unhandled exception being propagated out of the scope.
Define a guard class whose constructor automatically acquires a lock when control enters a scope and whose destructor automatically releases the lock when control leaves the scope. Instantiate instances of the guard class to acquire/release locks in method or block scopes that define critical sections.
The implementation of the Scoped Locking idiom is straightforward.
The following class illustrates a guard designed for the Thread_Mutex wrapper facade (47):
class Thread_Mutex_Guard { public: // Store a pointer to the lock and acquire the lock. Thread_Mutex_Guard (Thread_Mutex &lock) : lock_ (&lock), owner_ (false) { lock_->acquire (); // Only set to true if <acquire> succeeds. owner_ = true; } // Release the lock when the guard goes out of scope. ~Thread_Mutex_Guard () { // Only release the lock if it was acquired // successfully, i.e., <false> indicates that // <acquire> failed.. if (owner_) lock_->release (); } private: Thread_Mutex *lock_; // Pointer to our lock. bool owner_; // Is <lock_> held by this object? // Disallow copying or assignment. Thread_Mutex_Guard (const Thread_Mutex_Guard &); void operator= (const Thread_Mutex_Guard &); };
A pointer to a lock, rather than a lock object, should be used in a guard class implementation to prevent copying or assigning a lock, which is erroneous, as discussed in the Wrapper Facade pattern (47).
In addition, it is useful to add a flag, such as the owner_ flag in the Thread_Mutex_Guard example above, that indicates whether or not a guard acquired the lock successfully. The flag can also indicate failures that arise from ‘order of initialization bugs’ if static/global locks are used erroneously [LGS99]. By checking this flag in the guard’s destructor, a subtle run-time error can be avoided that would otherwise occur if the lock was released when it was not held by the guard.
The Scoped Locking idiom resolves the original problems with the Hit_Counter class in our multi-threaded Web server:
class Hit_Counter { public: // Increment the hit count for a URL <path> name. bool increment (const string &path) { // Use Scoped Locking to acquire and release // the <lock_> automatically. Thread_Mutex_Guard guard (lock_); Table_Entry *entry = lookup_or_create (path); if (entry == 0) return false; // Destructor releases <lock_> else { // Increment hit count for this <path> name. entry->increment_hit_count (); return true; // Destructor releases <lock_> } // Other public methods omitted. private: // Serialize access to the critical section. Thread_Mutex lock_; };
In this solution the guard ensures that the lock_ is acquired and released automatically as control enters and leaves the increment() method, respectively.
Explicit Accessors. One drawback with the Thread_Mutex_Guard interface described in the Implementation section is that it is not possible to release the lock explicitly without leaving the method or block scope.
For example, the following code fragment illustrates a situation where the lock could be released twice, depending on whether or not the condition in the if statement evaluates to true:
{ Thread_Mutex_Guard guard (lock); // Do some work … if (/* a certain condition holds */) lock->release () // Do some more work … // Leave the scope, which releases the lock again. }
To prevent this erroneous use, programmers should not access the lock directly. Instead, explicit accessor methods to the underlying lock can be defined in the in the lock’s guard class:
We revise the Thread_Mutex_Guard class as follows:
class Thread_Mutex_Guard { public: // Store a pointer to the lock and acquire the lock. Thread_Mutex_Guard (Thread_Mutex &lock) : lock_ (&lock), owner_ (false) { acquire (); } void acquire () { lock_->acquire (); // Only set to <true> if <acquire> succeeds. owner_ = true; } void release () { // Only release <lock_> if it was acquired // successfully and we haven’t released it yet! if (owner_) { owner_ = false; lock_->release (); } } // Release the lock when the guard goes out of scope. ~Thread_Mutex_Guard () { release (); } private: Thread_Mutex *lock_; // Pointer to our lock. bool owner_; // Is <lock_> held by this object? // … disallow copying and assignment … };
The acquire() and release() accessor methods track whether the lock has been already released, and if so, the lock will not be released in the guard’s destructor.
Using the revised Thread_Mutex_Guard our code will work correctly:
{ Thread_Mutex_Guard guard (lock); // Do some work … if (/* a certain condition holds */) guard.release (); // Do some more work … // Leave the scope, lock is not released again. }
Strategized Scoped Locking. Defining a different guard for each type of lock is tedious, error-prone, and excessive, because it may increase the memory footprint of applications or components. Therefore, a common variant of the Scoped Locking idiom is to apply either the parameterized type or polymorphic version of the Strategized Locking pattern (333).
Booch Components. The Booch Components [BV93] were one of the first C++ class libraries to use the Scoped Locking idiom for multi-threaded C++ programs.
ACE [Sch97]. The Scoped Locking idiom is used extensively throughout the ADAPTIVE Communication Environment framework, which defines an ACE_Guard implementation similar to the Thread_Mutex_Guard class described in the Implementation and Variants sections.
Threads.h++. The Rogue Wave Threads.h++ library defines a set of guard classes that are modeled after the ACE Scoped Locking designs.
Java defines a programming feature called a synchronized block that implements the Scoped Locking idiom in that language. Java compilers generate a corresponding block of bytecode instructions where a monitorenter and a monitorexit bracket this block. To ensure that the lock is always released, the compiler also generates an exception handler to catch all exceptions thrown in the synchronized block [Eng99].
The Scoped Locking idiom offers the following benefit:
Increased robustness. By applying this idiom, locks are acquired and released automatically when control enters and leaves critical sections defined by C++ method and block scopes. This idiom increases the robustness of concurrent applications by eliminating common programming errors related to synchronization and multi-threading.
There are two liabilities of applying the Scoped Locking idiom to concurrent applications and components:
Potential for deadlock when used recursively. If a method that uses the Scoped Locking idiom calls itself recursively, ‘self-deadlock’ will occur if the lock is not a ‘recursive’ mutex. The Thread-Safe Interface pattern (345) describes a technique that avoids this problem. This pattern ensures that only interface methods apply the Scoped Locking idiom, whereas implementation methods do not apply it.
Limitations with language-specific semantics. The Scoped Locking idiom is based on a C++ language feature and therefore will not be integrated with operating system-specific system calls. Thus, locks may not be released automatically when threads or processes abort or exit inside a guarded critical section. Likewise, they will not be released properly if the standard C longjmp() function is called because this function does not call the destructors of C++ objects as the run-time stack unwinds.
The following modification to increment() will prevent the Scoped Locking idiom from working:
Thread_Mutex_Guard guard (&lock_); Table_Entry *entry = lookup_or_create (path); if (entry == 0) // Something’s gone wrong, so exit the thread. thread_exit (); // Destructor will not be called so the // <lock_> will not be released!
In general, therefore, it is inappropriate to abort or exit a thread or process within a component. Instead, an exception-handling mechanism or error-propagation pattern should be used [Mue96].
Excessive compiler warnings. The Scoped Locking idiom defines a guard object that is not used explicitly within the scope, because its destructor releases the lock implicitly. Unfortunately, some C++ compilers print ‘statement has no effect’ warnings when guards are defined but not used explicitly within a scope. At best these warnings are distracting—at worst, they encourage developers to disable certain compiler warnings, which may mask other warnings that indicate actual problems with the code. An effective way to handle this problem is to define a macro that eliminates the warnings without generating additional code.
The following macro is defined in ACE [Sch97]:
#define UNUSED_ARG(arg) { if (&arg) /* null */; }
This macro can be placed after a guard to keep many C++ compilers from generating spurious warnings:
{ // New scope. Thread_Mutex_Guard guard (lock_); UNUSED_ARG (guard); // …
The Scoped Locking idiom is a special case of a more general C++ idiom [Str97] and the Execute Around Object [Hen00] idiom, in which a constructor acquires a resource and a destructor releases the resource when a scope is entered and exited, respectively. When these idioms are applied to concurrent applications, the resource that is acquired and released is some type of lock.
Thanks to Brad Appleton for comments on the Scoped Locking idiom.
The Strategized Locking design pattern parameterizes synchronization mechanisms that protect a component’s critical sections from concurrent access.
A key component for implementing high-performance Web servers is a file cache, which maps URL path names to memory-mapped files or open file handles [HS98]. When a client requests a URL that is in the cache, the Web server can transfer the file contents to the client immediately without accessing slower secondary storage via multiple read() and write() operations.
A file cache implementation for a portable high-performance should run efficiently on a range of multi-threaded and single-threaded operating systems. One way to achieve this portability is to develop multiple file cache classes:
// A single-threaded file cache implementation. class File_Cache_ST { public: // Return a pointer to the memory-mapped file // associated with <path> name. const void *lookup (const string &path) const { // No locking required because we’re // single-threaded. const void *file_pointer = 0; // … look up the file in the cache, mapping it // into memory if it is not currently in the cache. return file_pointer; } // … private: //File cache implementation… // No lock required because we are // single-threaded. }; // A multi-threaded file cache implementation. class File_Cache_Thread_Mutex { public: // Return a pointer to the memory-mapped file // associated with <path> name. const void *lookup (const string &path) const { // Use the Scoped Locking idiom to serialize // access to the file cache. Thread_Mutex_Guard guard (lock_); const void *file_pointer = 0; // … look up the file in the cache, mapping it // into memory if it is not currently in the cache. return file_pointer; } // … private: //File cache implementation… // Synchronization strategy. mutable Thread_Mutex lock_; };
These two implementations form part of a component family whose classes differ only in their synchronization strategy. One component in the family—class File_Cache_ST—implements a single-threaded file cache with no locking. The other component—class File_Cache_Thread_Mutex—implements a file cache that uses a mutex to serialize multiple threads that access the cache concurrently. Maintaining separate implementations of these file cache components can be tedious, however. In particular, future enhancements and fixes must be added consistently in each component’s implementation.
An application or system where components must run efficiently in a variety of different concurrency architectures.
Components that run in multi-threaded environments must protect their critical sections from concurrent client access. When integrating synchronization mechanisms with component functionality two forces must be resolved:
In our example the synchronization strategy is hard-coded. To increase performance on large-scale multi-processor platforms, a new class must therefore be written to support a file cache implementation that uses a readers/writer lock instead of a thread mutex. It is time-consuming, however, to customize an existing file cache class to support new more efficient synchronization strategies.
If there are multiple copies of the same basic file cache component, version-skew will likely to occur because changes to one component may be applied inconsistently to other component implementations. Applying each change manually is also error-prone and non-scalable.
Parameterize a component’s synchronization aspects by making them ‘pluggable’ types. Each type objectifies a particular synchronization strategy, such as a mutex, readers/writer lock, semaphore, or ‘null’ lock. Define instances of these pluggable types as objects contained within a component that can use the objects to synchronize its method implementations efficiently.
The Strategized Locking pattern can be implemented via five activities.
class File_Cache { public: const void *lookup (const string &path) const; // … private: // data members and private methods go here… };
To implement a polymorphic lock object for our file cache example, we first define an abstract locking class with virtual acquire() and release() methods:
class Lock { public: // Acquire and release the lock. virtual void acquire () = 0; virtual void release () = 0; // … };
A Guard class that controls a polymorphic lock can be defined as follows:
class Guard { public: // Store a pointer to the lock and acquire the lock. Guard (Lock &lock) : lock_ (&lock), owner_ (false) { lock_->acquire (); owner_ = true; } // Release the lock when the guard goes out of scope. ~Guard () { // Only release lock if it <acquire> succeeded. if (owner_) lock_->release (); } private: // Pointer to the lock we’re managing. Lock *lock_; // Records if the lock was acquired successfully. bool owner_; };
The following illustrates a Guard class that is strategized by a LOCK template parameter:
template <class LOCK> class Guard { public: // Store a pointer to the lock and acquire the lock. Guard (LOCK &lock) : lock_ (&lock), owner_ (false) { lock_->acquire (); owner_ = true; } // Release the lock when the guard goes out of scope, // but only if <acquire> succeeded. ~Guard () { if (owner_) lock_->release (); } private: // Pointer to the lock we’re managing. LOCK *lock_; // Records if the lock is held by this object. bool owner_; // … disallow copying and assignment … };
This version of our file cache component is passed a polymorphic lock parameter:
class File_Cache { public: // Constructor. File_Cache (Lock &l): lock_ (&l) { } // A method. const void *lookup (const string &path) const { // Use the Scoped Locking idiom to acquire // and release the <lock_> automatically. Guard guard (*lock_); // Implement the <lookup> method. } // … private: // The polymorphic strategized locking object. mutable Lock *lock_; // Other data members and methods go here… };
Similarly, we can define a templatized version of the file cache:
template <class LOCK> class File_Cache { public: // A method. const void *lookup (const string &path) const { // Use the Scoped Locking idiom to acquire // and release the <lock_> automatically. Guard<LOCK> guard (lock_); // Implement the <lookup> method. } // … private: // The parameterized type strategized locking object. mutable LOCK lock_; // Other data members and methods go here… };
If a C++ compiler supports default template arguments, it may be useful to add a default LOCK to handle the most common use case. For example, we can define the default LOCK as a readers/writer lock:
template <class LOCK = RW_Lock> class File_Cache { /* … */ }
When applying the polymorphic approach, implement the locking strategies as subclasses of the abstract class Lock, as discussed in implementation activity 2 (335). If parameterized types are used, ensure that all concrete lock implementations follow the signature for locks defined in implementation activity 2 (335).
In addition to the Thread_Mutex locking strategy defined in Wrapper Facade (47), the Null_Mutex is surprisingly useful. This class defines an efficient locking strategy for single-threaded applications and components:
class Null_Mutex { public: Null_Mutex () { } ~Null_Mutex () { } void acquire () { } void release () { } };
All methods in Null_Mutex are empty C++ inline functions that can be removed completely by optimizing compilers. This class is an example of the Null Object pattern [PLoPD3], which simplifies applications by defining a ‘no-op’ placeholder that removes conditional statements in the component’s implementation. A use of Null_Mutex and other locking strategies appears in the Example Resolved section.
If existing locking mechanisms have incompatible interfaces, use the Wrapper Facade (47) or Adapter [GoF95] patterns to ensure the interfaces conform to the signatures expected by the component’s synchronization aspects.
The following class wraps the Thread_Mutex from the Implementation section of the Wrapper Facade pattern (47), thereby connecting it to our polymorphic lock hierarchy:
class Thread_Mutex_Lock : public Lock { public: // Acquire and release the lock. virtual void acquire () { lock_.acquire (); } virtual void release () { lock_.release (); } private: // Concrete lock type. Thread_Mutex lock_; };
We can apply the parameterized type form of the Strategized Locking pattern to implement a file cache for Web server content that is tuned for various single-threaded and multi-threaded concurrency models:
typedef File_Cache<Null_Mutex> ContentCache;
typedef File_Cache<Thread_Mutex> ContentCache;
typedef File_Cache<RW_Lock> ContentCache;
typedef File_Cache<> Content_Cache;
Note how in each configuration the Content_Cache interface and implementation require no changes. This flexibility stems from the Strategized Locking pattern, which abstracts synchronization aspects into ‘pluggable’ parameterized types. Moreover, the details of locking have been strategized via a C++ typedef. It is therefore straightforward to define a Content_Cache object that does not expose synchronization aspects to applications:
Content_Cache cache;
Bridge strategy. Unfortunately, configuring the polymorphic file cache in implementation activity 3 (337) differs from configuring the templatized file cache, because a polymorphic lock implemented as a pointer cannot be passed as a parameter to the templatized File_Cache and Guard classes. Instead, we need a ‘real’ object, rather than a pointer to an object. Fortunately, the Bridge pattern [GoF95] can help us implement a family of locking strategies that is applicable to both polymorphic and parameterized type approaches. To apply this bridge strategy variant we simply define an additional abstraction class that encapsulates, and can be configured with, a polymorphic lock. An instance of this abstraction class then can be passed uniformly to both polymorphic and templatized components.
Consider the hierarchy of polymorphic locks defined in implementation activity 2 (335) as being a Bridge implementation class hierarchy. The following abstraction class then can be used to encapsulate this hierarchy:
class Lock_Abstraction { public: // Constructor stores a reference to the base class. Lock_Abstraction (Lock &l): lock_ (l) { }; // Acquire the lock by forwarding to the // polymorphic acquire() method. void acquire () { lock_.acquire (); } // Release the lock by forwarding to the // polymorphic release() method. void release () { lock_.release (); } private: // Maintain a reference to the polymorphic lock. Lock &lock_; };
Note how this design allows us to initialize both our polymorphic and parameterized File_Cache and Guard classes with a single Lock_Abstraction class that can be configured with a concrete lock from our hierarchy of locking mechanisms.
As a result of using this variation of Strategized Locking, the family of locking mechanisms becomes more reusable and easier to apply across applications. Be aware however that while this scheme is flexible, it is also more complicated to implement. It should therefore be used with care.
ACE [Sch97]. The Strategized Locking pattern is used extensively throughout the ADAPTIVE Communication Environment framework. Most synchronization aspects of ACE containers components, such as ACE_Hash_Map_Manager, can be strategized via parameterized types.
Booch Components. The Booch Components [BV93] were one of the first C++ class libraries to parameterize locking strategizes via templates.
The Dynix/PTX operating system applies the Strategized Locking pattern extensively throughout its kernel.
ATL Wizards. The Microsoft ATL Wizard in Visual Studio uses the parameterized type implementation of Strategized Locking, completed with default template parameters. In addition, it implements a class similar to the Null_Mutex. If a COM class is implemented as a single-threaded apartment a no-op lock class is used, whereas in multi-threaded apartments a ‘real’ recursive mutex is used.
There are three benefits of applying the Strategized Locking pattern:
Enhanced flexibility and customization. It is straightforward to configure and customize a component for certain concurrency models because the synchronization aspects of components are strategized. If no suitable locking strategy is available for a new concurrency model, the family of locking strategies can be extended without affecting existing code.
Decreased maintenance effort for components. It is straightforward to add enhancements and bug fixes to a component because there is only one implementation, rather than a separate implementation for each concurrency model. This centralization of concerns helps minimize version-skew.
Improved reuse. Components implemented using this pattern become less dependent on specific synchronization mechanisms. They therefore become more reusable, because their locking strategies can be configured orthogonally to their behavior.
There are two liabilities of applying the Strategized Locking pattern:
Obtrusive locking. If templates are used to parameterize locking aspects this will expose the locking strategies to application code. Although this design is flexible, it also can be obtrusive, particularly for compilers that do not support templates efficiently or correctly. One way to avoid this problem is to apply the polymorphic strategy to vary component locking behavior.
Over-engineering. Externalizing a locking mechanism by placing it in a component’s interface may actually provide too much flexibility in certain situations. For example, inexperienced developers may try to parameterize a component with the wrong type of lock, resulting in improper compile- or run-time behavior. Similarly, only a single type of synchronization mechanism may be needed for a particular type of component. In this case the flexibility of Strategized Locking is unnecessary. In general this pattern is most effective when practical experience reveals a component’s behavior to be orthogonal to its locking strategy, and that locking strategies do indeed vary in semantically meaningful and efficient ways.
The main synchronization mechanism in Java is the monitor. The Java language does not provide ‘conventional’ concurrency control mechanisms, such as mutexes and semaphores, to application developers. The Strategized Locking pattern therefore need not be applied to Java directly.
It is possible, however, to implement different concurrency primitives, such as mutexes, semaphores, and readers/writer locks in Java. For example, the util.concurrent package in [Lea99a] defines various types of locks, such as readers/writer locks. The implementation of these primitives then can be used as locking strategies to support various application-specific concurrency use cases. Due to the lack of parameterized types in Java specifications to date, only the polymorphic approach of Strategized Locking pattern could be used to configure different synchronization strategies. In this case, Java implementations of this pattern will be similar to the C++ versions described in this pattern.
Thanks to Brad Appleton for comments on this pattern and Prashant Jain for his contribution explaining how this pattern applies to Java.
The Thread-Safe Interface design pattern minimizes locking overhead and ensures that intra-component method calls do not incur ‘self-deadlock’ by trying to reacquire a lock that is held by the component already.
When designing thread-safe components when intra-component method calls, developers must be careful to avoid self-deadlock and unnecessary locking overhead. For example, consider a more complete implementation of the File_Cache component outlined in the Strategized Locking pattern (333):
template <class LOCK> class File_Cache { public: // Return a pointer to the memory-mapped file // associated with <path> name, adding // it to the cache if it doesn’t exist. const void *lookup (const string &path) const { // Use the Scoped Locking idiom to acquire // and release the <lock_> automatically. Guard<LOCK> guard (lock_); const void *file_pointer = check_cache (path); if (file_pointer == 0) { // Insert the <path> name into the cache. // Note the intra-class <insert> call. insert (path); file_pointer = check_cache (path); } return file_pointer; } // Add <path> name to the cache. void insert (const string &path) { // Use the Scoped Locking idiom to acquire // and release the <lock_> automatically. Guard<LOCK> guard (lock_); // … insert <path> into the cache… } private: mutable LOCK lock_; const void *check_cache (const string &) const; // … other private methods and data omitted… };
This implementation of File_Cache works efficiently only when strategized by a ‘null’ lock such as the Null_Mutex described in the Strategized Locking pattern (333). If the File_Cache implementation is strategized with a recursive mutex, however, it will incur unnecessary overhead when it reacquires the mutex in the insert() method. Even worse, if it is strategized with a non-recursive mutex, the code will ‘self-deadlock’ when the lookup() method calls the insert() method. This self-deadlock occurs because insert() tries to reacquire the LOCK that has been acquired by lookup() already.
It is therefore counter-productive to apply the Strategized Locking pattern to the implementation of File_Cache shown above, because there are so many restrictions and subtle problems that can arise. Yet the File_Cache abstraction can still benefit from the flexibility and customization provided by Strategized Locking.
Components in multi-threaded applications that contain intra-component method calls.
Multi-threaded components often contain multiple publicly-accessible interface methods and private implementation methods that can alter the component states. To prevent race conditions, a lock internal to the component can be used to serialize interface method invocations that access its state. Although this design works well if each method is self-contained, component methods may call each other to carry out their computations. If this occurs, the following forces will be unresolved in multi-threaded components that use improper intra-component method invocation designs:
Structure all components that process intra-component method invocations according two design conventions:
The Thread-Safe Interface pattern can be implemented using two activities:
The interface and implementation methods for File_Cache can be defined as follows:
template <class LOCK> class File_Cache { public: // The following two interface methods just // acquire/release the <LOCK> and forward to // their corresponding implementation methods. const void *lookup (const string &path) const; void insert (const string &path); private: // The following two implementation methods do not // acquire/release the <LOCK> and perform the actual // work associated with managing the <File_Cache>. const void *lookup_i (const string &path) const; void insert_i (const string &path); // … Other implementation methods omitted … };
Our File_Cache implementation applies Thread-Safe Interface to minimize locking overhead and prevent self-deadlock in class methods:
template <class LOCK> class File_Cache { public: // Return a pointer to the memory-mapped file // associated with <path> name, adding it to // the cache if it doesn’t exist. const void *lookup (const string &path) const { // Use the Scoped Locking idiom to acquire // and release the <lock_> automatically. Guard<LOCK> guard (lock_); return lookup_i (path); } // Add <path> name to the file cache. void insert (const string &path) { // Use the Scoped Locking idiom to acquire // and release the <lock_> automatically. Guard<LOCK> guard (lock_); insert_i (path); } private: mutable LOCK lock_; // The strategized locking object // The following implementation methods do not // acquire or release <lock_> and perform their // work without calling any interface methods. const void *lookup_i (const string &path) const { const void *file_pointer = check_cache_i (path); if (file_pointer == 0) { // If <path> name isn’t in the cache then // insert it and look it up again. insert_i (path); file_pointer = check_cache_i (path); // The calls to implementation methods // <insert_i> and <check_cache_i> assume // that the lock is held and perform work. } return file_pointer; } const void *check_cache_i (const string &) const { /* */ } void insert_i (const string &) { /* … */ } // … other private methods and data omitted … };
Thread-Safe Facade. This variant can be used if access to a whole subsystem or coarse-grained component must be synchronized. A facade [GoF95] can be introduced as the entry point for all client requests. The facade’s methods correspond to the interface methods. The classes that belong to the subsystem or component provide the implementation methods. If these classes have their own internal concurrency strategies, refactoring may be needed to avoid nested monitor lockout2 [JS97a].
Nested monitor lockup occurs when a thread acquires object X’s monitor lock without relinquishing the lock already held on monitor Y, thereby preventing a second thread from acquiring the monitor lock for Y. This can lead to deadlock because, after acquiring monitor X, the first thread may wait for a condition to become true that can only change as a result of actions by the second thread after it has acquired monitor Y. It may not be possible to refactor the code properly to avoid nested monitor lockouts if the subsystem or component cannot be modified, for example if it is a third-party product or legacy system. In this case, Thread-Safe Facade should not be applied.
Thread-Safe Wrapper Facade. This variant helps synchronize access to a non-synchronized class or function API that cannot be modified. A wrapper facade (47) provides the interface methods, which encapsulate the corresponding implementation calls on the class or function API with actions that acquire and release a lock. The wrapper facade thus provides a synchronization proxy [POSA1] [GoF95] that serializes access to the methods of the class or function API.
ACE [Sch97]. The Thread-Safe Interface pattern is used throughout the ADAPTIVE Communication Environment framework, for example in its ACE_Message_Queue class.
The Dynix/PTX operating system applies the Thread-Safe Interface pattern in portions of its kernel.
Java. The hash table implementation in java.util.Hashtable uses the Thread-Safe Interface design pattern. Hashtable’s interface methods, such as put(Object key, Object value), acquire a lock before changing the underlying data structure, which consists of an array of linked lists. The implementation method rehash() is called when the load threshold is exceeded. A new larger hash table is created, all elements are moved from the old to the new hash table, and the old table is left to the garbage collector. Note that the rehash() method is not protected by a lock, in contrast to the publicly accessible methods such as put(Object key, Object value). Protecting rehash() by a lock would not deadlock the program due to Java’s reentrant monitors. It would, however, diminish its performance due to the locking overhead.
A more sophisticated use case was introduced in JDK 1.2 with the Collection classes, which applies the Thread-Safe Wrapper Facade variant to make collection data structures thread-safe. The java.util.Collections takes any class implementing the Map interface and returns a SynchronizedMap, which is a different class implementing Map. The methods of SynchronizedMap do no more than synchronize on an internal monitor and then forward to the method of the original object. Developers can therefore choose between fast or thread-safe variants of data structures, which only need be implemented once.
Security checkpoints. You may encounter a real-life variation of the Thread-Safe Interface pattern when entering a country or commercial office building that has a security guard at the border or entrance. To be admitted, you must sign in. After being admitted, other people that you interact with typically trust that you are supposed to be there.
There are three benefits of applying the Thread-Safe Interface pattern:
Increased robustness. This pattern ensures that self-deadlock does not occur due to intra-component method calls.
Enhanced performance. This pattern ensures that locks are not acquired or released unnecessarily.
Simplification of software. Separating the locking and functionality concerns can help to simplify both aspects.
However, there are also four liabilities when applying the Thread-Safe Interface pattern:
Additional indirection and extra methods. Each interface method requires at least one implementation method, which increases the footprint of the component and may also add an extra level of method-call indirection for each invocation. One way to minimize this overhead is to inline the interface and/or implementation methods.
Potential deadlock. By itself, the Thread-Safe Interface pattern does not resolve the problem of self-deadlock completely. For example, consider a client that calls an interface method on component A, which then delegates to an implementation method that calls an interface method on another component B. If the implementation of component B’s method calls back on an interface method of component A, deadlock will occur when trying to reacquire the lock that was acquired by the first call in this chain.
Potential for misuse. Object-oriented programming languages, such as C++ and Java, support class-level rather than object-level access control. As a result, an object can bypass the public interface to call a private method on another object of the same class, thus bypassing that object’s lock. Therefore, programmer’s should be careful to avoid invoking private methods on any object of their class other than themselves.
Potential overhead. The Thread-Safe Interface pattern prevents multiple components from sharing the same lock. Therefore synchronization overhead may increase because multiple locks must be acquired, which also makes it harder to detect and avoid deadlocks. Moreover, the pattern prevents locking at a finer granularity than the component, which can increase lock contention, thereby reducing performance.
The Thread-Safe Interface pattern is related to the Decorator pattern [GoF95], which extends an object transparently by attaching additional responsibilities dynamically. The intention of the Thread-Safe Interface pattern is similar, in that it attaches robust and efficient locking strategies to make components thread-safe. The primary difference is that the Decorator pattern focuses on attaching additional responsibilities to objects dynamically, whereas the Thread-Safe Interface pattern focuses on the static partitioning of method responsibilities in component classes.
Components designed according to the Strategized Locking pattern (333) should employ the Thread-Safe Interface pattern to ensure that the component will function robustly and efficiently, regardless of the type of locking strategy that is selected.
Java implements locking at the method level via monitor objects (399) designated by the synchronized keyword. In Java, monitors are recursive. The problem of self-deadlock therefore cannot occur as long as developers reuse the same monitor, that is, synchronize on the same object. However, the problem of nested monitor lockout [JS97a] [Lea99a] can occur in Java if multiple nested monitors are used carelessly.
The problem of locking overhead depends on which Java Virtual Machine (JVM) is used. If a specific JVM implements monitors inefficiently and monitors are acquired recursively, the Thread-Safe Interface pattern may be able to help improve component run-time performance.
Thanks to Brad Appleton for comments on this pattern. Prashant Jain provided the Thread-Safe Interface variants and the Java nested monitor lockout discussion.
The Double-Checked Locking Optimization design pattern reduces contention and synchronization overhead whenever critical sections of code must acquire locks in a thread-safe manner just once during program execution.
Lock Hint [Bir91]
The Singleton pattern ensures a class has only one instance and provides a global access point to that instance. The following C++ code shows the canonical implementation of Singleton from [GoF95]:
class Singleton { public: static Singleton *instance () { if (instance_ == 0) { // Enter critical section. instance_ = new Singleton (); // Leave critical section. } return instance_; } void method_1 (); // Other methods omitted. private: static Singleton *instance_; // Initialized to 0 by the compiler/linker. };
Applications use the static instance() method to retrieve a pointer to the Singleton and then invoke public methods:
Singleton::instance ()->method_1 ();
Unfortunately the canonical implementation of the Singleton pattern shown above is problematic on platforms with preemptive multi-tasking or true hardware parallelism. In particular, the Singleton constructor can be called multiple times if
At best calling the Singleton constructor multiple times will cause a memory leak. At worst it can have disastrous consequences if singleton initialization is not idempotent.
To protect the critical section from concurrent access we could apply the Scoped Locking idiom (345) to acquire and release a mutex lock automatically:
class Singleton { public: static Singleton *instance () { // Scoped Locking acquires <singleton_lock_>. Guard<Thread_Mutex> guard (singleton_lock_); if (instance_ == 0) instance_ = new Singleton; return instance_; // Destructor releases lock automatically. } private: static Singleton *instance_; static Thread_Mutex singleton_lock_; };
Assuming singleton_lock_ is initialized correctly, Singleton is now thread-safe. The additional locking overhead may be excessive, however. In particular every call to instance() now acquires and releases the lock, even though the critical section should be executed just once. By placing the guard inside the conditional check, we can remove the locking overhead:
static Singleton *instance () { if (instance_ == 0) { Guard<Thread_Mutex> guard (singleton_lock_); // Only come here if <instance_> // has not been initialized yet. instance_ = new Singleton; } return instance_; }
Unfortunately, this solution does not provide thread-safe initialization because a race condition in multi-threaded applications can cause multiple initializations of Singleton. For example, consider two threads that simultaneously check for instance_ == 0. Both will succeed, one will acquire the lock via the guard and the other will block. After the first thread initializes Singleton and releases the lock, the blocked thread will obtain the lock and erroneously initialize Singleton a second time.
A application containing shared resources accessed concurrently by multiple threads.
Concurrent applications must ensure that certain portions of their code execute serially to avoid race conditions when accessing and modifying shared resources. A common way of avoiding race conditions is to serialize access to the shared resources’ critical sections via locks, such as mutexes. Every thread that wants to enter a critical section must first acquire a lock. If this lock is already owned by another thread, the thread will block until the lock is released and the lock can be acquired.
The serialization approach outlined above can be inappropriate for objects or components that require ‘just once’ initialization. For example, the critical section code in our Singleton example must be executed just once during its initialization. However, every method call on the singleton acquires and releases the mutex lock, which can incur excessive overhead [PLoPD3]. To avoid this overhead, programmers of concurrent applications may revert to using global variables rather than applying the Singleton pattern. Unfortunately, this ‘solution’ has two drawbacks [LGS99]:
Introduce a flag that provides a ‘hint’ about whether it is necessary to execute a critical section before acquiring the lock that guards it. If this code need not be executed the critical section is skipped, thereby avoiding unnecessary locking overhead. The general pseudo-code design of this code is shown below:
// Perform first-check to evaluate ‘hint’. if (first_time_in_flag is FALSE) { acquire the mutex // Perform double-check to avoid race condition. if (first_time_in_flag is FALSE) { execute the critical section set first_time_in_flag to TRUE } release the mutex }
The Double-Checked Locking Optimization pattern can be implemented via three activities:
For example, a singleton is initialized just once in a program. The call to the singleton’s constructor is thus executed only once in a critical section, regardless of the number of times the accessor method Singleton::instance() is called.
In accordance with the Scoped Locking idiom (345) a Thread_Mutex singleton_lock_ is used to ensure that the singleton’s constructor does not execute concurrently.
This lock must be initialized prior to the first call to the code that is executed just once. In C++, one way to ensure that a lock is initialized prior to its first use is to define it as a static object, as shown in the Example section. Unfortunately the C++ language specification does not guarantee the order of initialization of static objects that are defined in separate compilation units. As a result, different C++ compiler and linker platforms may behave inconsistently and the lock may not be initialized when it is first accessed.
A better way to avoid this problem is to use the Object Lifetime Manager pattern [LGS99]. This pattern defines a portable object manager component that governs the entire lifetime of global or static objects as follows:
For example, the lock can be placed under the control of an object manager that will ensure it is initialized before any singleton attempts to use the lock to serialize its initialization. The object manager can also delete the singleton when the program terminates, thereby preventing the memory and resource leaks that can otherwise occur with the Singleton pattern [Vlis98a].
The Singleton::instance_ pointer is used as the first-time-in flag. If the flag evaluates to true the critical section is skipped. If the flag also has a particular application-specific purpose, as our Singleton::instance_ pointer is used, it must be an atomic type that can be set without a partial read or write. The following code for the Singleton example is thread-safe, but avoids unnecessary locking overhead by placing the call to new within another conditional test:
class Singleton { public: static Singleton *instance () { // First check if (instance_ == 0) { // Use Scoped Locking to acquire and // release <singleton_lock_> automatically. Guard<Thread_Mutex> guard (singleton_lock_); // Double check. if (instance_ == 0) instance_ = new Singleton; } return instance_; } private: static Singleton *instance_; static Thread_Mutex singleton_lock_; };
The first thread that acquires the singleton_lock_ will construct the Singleton object and assign the pointer to instance_, which serves as the first-time-in flag in this example. All threads that call instance() subsequently will find instance_ is not equal to zero and will thus skip the initialization step.
The second check prevents a race condition if multiple threads try to initialize Singleton simultaneously. This handles the case in which multiple threads execute in parallel. In the code above these threads will queue up at the singleton_lock_ mutex. When the queued threads finally obtain the mutex singleton_lock_ they will find instance_ is not equal to zero and will then skip the Singleton initialization.
This implementation of the Singleton::instance() method only incurs locking overhead for threads that are active inside instance() when the Singleton is first initialized. In subsequent calls to instance() the instance_ pointer is not zero and thus singleton_lock_ is neither acquired nor released.
Volatile Data. The Double-Checked Locking Optimization pattern implementation may require modifications if a compiler optimizes the first-time-in flag by caching it in some way, such as storing it in a CPU register. In this case, cache coherency may become a problem. For example, copies of the first-time-in flag held simultaneously in registers by multiple threads may become inconsistent if one thread’s setting of the value is not reflected in other threads’ copies.
A related problem is that a highly optimizing compiler may consider the second check of flag == 0 to be superfluous and optimize it away. A solution to both these problems is to declare the flag as volatile data, which ensures the compiler will not perform aggressive optimizations that change the program’s semantics.
For our Singleton example, this results in the following code:
class Singleton { // … private: static Singleton *volatile instance_; // instance_ is volatile. };
The use of volatile ensures that a compiler will not place the instance_ pointer into a register, nor will it optimize away the second check in instance().
The downside of using volatile is that all access to will be through memory rather than through registers, which may degrade performance.
Template Adapter. Another variation for the Double-Checked Locking Optimization pattern is applicable when the pattern is implemented in C++. In this case, create a template adapter that transforms classes to have singleton-like behavior and performs the Double-Checked Locking Optimization pattern automatically.
The following code illustrates how to write this template in C++:
template <class TYPE> class Singleton { public: static TYPE *instance () { // First check if (instance_ == 0) { // Scoped Locking acquires and release lock. Guard<Thread_Mutex> guard (singleton_lock_); // Double check instance_. if (instance_ == 0) instance_ = new TYPE; } return instance_; } private: static TYPE *instance_; static Thread_Mutex singleton_lock_; };
The Singleton template is parameterized by the TYPE that will be accessed as a singleton. The Double-Checked Locking Optimization pattern is applied automatically on the singleton_lock_ within the instance() method.
The Singleton template adapter can also be integrated with the Object Lifetime Manager pattern [LGS99], which ensures that dynamically allocated singletons are deallocated automatically when an application process exits. This pattern can also ensure that the static singleton_lock_ data member is initialized properly before its first use.
Pre-initialization of Singletons. This variation is an alternative that may alleviate the need for Double-Checked Locking Optimization. It does this by initializing all objects explicitly at program start-up, for example in a program’s main() function. Thus, there are no race conditions because the initialization is constrained to occur within a single thread.
This solution is inappropriate, however, when expensive calculations must be performed that may be unnecessary in certain situations. For instance, if a singleton is never actually created during program execution, initializing it during program start-up will simply waste resources. Pre-initialization can also break encapsulation by forcing application components with singletons in their implementation to expose this information so the singletons can be initialized explicitly. Likewise, pre-initialization makes it hard to compose applications using components that are configured dynamically using the Component Configurator pattern (75).
ACE [Sch97]. The Double-Checked Locking Optimization pattern is used extensively throughout the ACE framework. To reduce code duplication, ACE defines a reusable adapter template called ACE_Singleton that is similar to the one shown in the Variants section and is used to transform ‘normal’ classes into singletons. Although singletons are not the only use of the Double-Checked Locking Optimization pattern in the ACE framework, they are a common example that demonstrates the utility of the pattern.
Sequent Dynix/PTX. The Doubled-Checked Locking Optimization pattern is used in the Sequent Dynix/PTX operating system.
POSIX and Linux. The Double-Checked Locking Optimization pattern can be used to implement POSIX ‘once’ variables [IEEE96], which ensure that functions are invoked just once in a program. This pattern has been used in the LinuxThreads pthread_once() implementation to ensure its function-pointer parameter init_routine() is called only once—on its first call—and not subsequently.
Andrew Birrell describes the use of the Double-Checked Locking Optimization pattern in [Bir91]. Birrell refers to the first check of the flag as a ‘lock hint’.
The Solaris 2.x documentation for pthread_key_create(3T), which is shown in the Thread-Specific Storage pattern (475) Known Uses section, illustrates how to use the Double-Checked Locking Optimization to initialize thread-specific data.
There are two benefits of using the Double-Checked Locking Optimization pattern:
Minimized locking overhead. By performing two first-time-in flag checks, the Double-Checked Locking Optimization pattern minimizes overhead for the common case. After the flag is set the first check ensures that subsequent accesses require no further locking.
Prevents race conditions. The second check of the first-time-in flag ensures that the critical section is executed just once.
However, there are three liabilities of using the Double-Checked Locking Optimization pattern that can arise if the pattern is used in software that is ported to certain types of operating system, hardware, or compiler/linker platforms. However, because this pattern is applicable to a large class of platforms, we outline techniques for overcoming these limitations.
Non-atomic pointer or integral assignment semantics. If an instance_ pointer is used as the flag in a singleton implementation, all bits of the singleton instance_ pointer must be read and written atomically in a single operation. If the write to memory after the call to new is not atomic, other threads may try to read an invalid pointer. This can result in sporadic illegal memory accesses.
These scenarios are possible on systems where memory addresses straddle word alignment boundaries, such as 32-bit pointers used on a computer with a 16 bit word bus, which requires two fetches from memory for each pointer access. In this case it may be necessary to use a separate, word-aligned integral flag—assuming that the hardware supports atomic word-based reads and writes—rather than using an instance_ pointer.
Multi-processor cache coherency. Certain multi-processor platforms, such as the COMPAQ Alpha and Intel Itanium, perform aggressive memory caching optimizations in which read and write operations can execute ‘out of order’ across multiple CPU caches. On these platforms, it may not be possible to use the Double-Checked Locking Optimization pattern without further modifications because CPU cache lines will not be flushed properly if shared data is accessed without locks held.
To use the Double-Checked Locking Optimization pattern correctly on these types of hardware platforms, CPU-specific instructions, such as memory barriers to flush cache lines, must be inserted into the Double-Checked Locking Optimization implementation. Note that a serendipitous side-effect of using the template adapter variation of the Double-Checked Locking Optimization pattern is that it centralizes the placement of these CPU-specific cache instructions.
For example, a memory barrier instruction can be located within the instance() method of the Singleton template adapter class:
template <class TYPE> TYPE *Singleton<TYPE>::instance () { TYPE *tmp = instance_; #if defined (ALPHA_MP) // Insert the CPU-specific memory barrier instruction // to synchronize the cache lines on multi-processor. asm (“mb”); #endif /* ALPHA_MP */ // First check if (tmp == 0) { // Scoped Locking acquires and releases lock. Guard<Thread_Mutex> guard (singleton_lock_); // Double check. if (tmp == 0) { tmp = new TYPE; #if defined (ALPHA_MP) // Insert a second CPU-specific memory // barrier instruction. asm (“mb”); #endif /* ALPHA_MP */ instance_ = tmp; } } return tmp; }
As long as the Singleton template adapter is used uniformly, it is straightforward to localize the placement of CPU-specific code without affecting applications. Conversely, if the Double-Checked Locking Optimization pattern is hand-crafted into singletons at each point of use much more effort is required to add this CPU-specific code.
Unfortunately, the need for CPU-specific code in implementations of the Double-Checked Locking Optimization pattern makes this pattern inapplicable for Java applications. Java’s bytecodes are designed to be cross-platform and therefore its JVMs lack a memory barrier instruction that can resolve the problem outlined in this liability.
Additional mutex usage. Regardless of whether a singleton is allocated on-demand, some type of lock, such as the Thread_Mutex used in our examples, is allocated and retained for the lifetime of the program. One technique for minimizing this overhead is to pre-allocate a singleton lock within an object manager [LGS99] and use this lock to serialize all singleton initialization. Although this may increase lock contention, it may not affect program performance because each singleton will most likely acquire and release the lock only once when its initialized.
The Double-Checked Locking Optimization pattern is a thread-safe variant of the Lazy Evaluation pattern [Mey98] [Beck97]. This pattern is often used in programming languages such as C that lack constructors in order to ensure components are initialized before their state is accessed.
For example the following C code initializes a stack:
static const int STACK_SIZE = 1000; static Type *stack_; static int top_; void push (Type *item) { // First-time-in-check. if (stack_ == 0) { // Allocate the pointer, which implicitly // indicates that initialization was performed. stack_ = malloc (STACK_SIZE * sizeof Type); top_ = 0; } stack_[top_++] = item; // … }
The first time that push() is called, stack_ is 0, which triggers its implicit initialization via malloc().
The co-author of the original version [PLoPD3] of this pattern was Tim Harrison. Thanks to John Basrai, James Coplien, Ralph Johnson, Jaco van der Merwe, Duane Murphy, Paul McKenney, Bill Pugh, and Siva Vaddepuri for their suggestions and comments on the Double-Checked Locking Optimization pattern.
1. The Scoped Locking idiom is a specialization of Stroustrup’s ‘Resource-Acquisition-is-Initialization’ idiom [Str97]. We include this idiom here to make the book self-contained and to illustrate how Stroustrup’s idiom can be applied to concurrent programs.
2. See the Consequences Section of the Monitor Object pattern (399) for a more detailed discussion of the nested monitor lockout problem.