Chapter 8. Threads and Collection Classes

package javathreads.examples.ch08.example1;

import java.util.*;

public class CharacterEventHandler {
    private Vector listeners = new Vector( );

    public void addCharacterListener(CharacterListener cl) {
        listeners.add(cl);
    }

    public void removeCharacterListener(CharacterListener cl) {
        listeners.remove(cl);
    }

    public void fireNewCharacter(CharacterSource source, int c) {
        CharacterEvent ce = new CharacterEvent(source, c);
        CharacterListener[] cl = (CharacterListener[] )
                                  listeners.toArray(new CharacterListener[0]);
        for (int i = 0; i < cl.length; i++)
            cl[i].newCharacter(ce);
    }
}

In this case, using a vector is sufficient for our purposes. If multiple threads call methods of this class at the same time, there is no conflict. Because the listeners collection is threadsafe, we can call its add(), remove(), and toArray() methods at the same time without corrupting the internal state of the Vector object. Strictly speaking, there is a race condition here in our use of the toArray() method; we’ll talk about that a little more in the next section. But the point is that none of the methods on the vector see data in an inconsistent state because the Vector class itself is threadsafe.

A second option would be to use a thread-unsafe class (e.g., the ArrayList class) and manage the synchronization explicitly:

package javathreads.examples.ch08.example2;
...
public class CharacterEventHandler {
    private ArrayList listeners = new ArrayList( );
    public synchronized void addCharacterListener(CharacterListener cl) {
        ...
    }
    public synchronized void removeCharacterListener(CharacterListener cl) {
        ...
    }
    public synchronized void fireNewCharacter(CharacterSource source, int c) {
        ...
    }
}

Or we could have synchronized the class like this:

package javathreads.examples.ch08.example3;
...
public class CharacterEventHandler {
    private ArrayList listeners = new ArrayList( );

    public void addCharacterListener(CharacterListener cl) {
        synchronized(listeners) {
            listeners.add(cl);
        }
    }

    public void removeCharacterListener(CharacterListener cl) {
        synchronized(listeners) {
            listeners.add(cl);
        }
    }
    public void fireNewCharacter(CharacterSource source, int c) {
        CharacterEvent ce = new CharacterEvent(source, c);
        CharacterListener[] cl;
        synchronized(listeners) {
            cl = (CharacterListener[])
                                  listeners.toArray(new CharacterListener[0]);
        }
        for (int i = 0; i < cl.length; i++)
            cl[i].newCharacter(ce);
    }
}

In this example, it doesn’t matter whether we synchronize on the collection object or the event handler object (this); either one ensures that two threads are not simultaneously calling methods of the ArrayList class.

Our third option is to use a synchronized version of the thread-unsafe collection class. Most thread-unsafe collection classes have a synchronized counterpart that is threadsafe. The threadsafe collections are constructed by calling one of these static methods of the Collections class:

Set s = Collections.synchronizedSet(new HashSet(...));
Set s = Collections.synchronizedSet(new LinkedHashSet(...));
SortedSet s = Collections.synchronizedSortedSet(new TreeSet(...));
Set s = Collections.synchronizedSet(EnumSet.noneOf(obj.class));
Map m = Collections.synchronizedMap(new HashMap(...));
Map m = Collections.synchronizedMap(new LinkedHashMap(...));
SortedMap m = Collections.synchronizedSortedMap(new TreeMap(...));
Map m = Collections.synchronizedMap(new WeakHashMap(...));
Map m = Collections.synchronizedMap(new IdentityHashMap(...));
Map m = Collections.synchronizedMap(new EnumMap(...));
List list = Collections.synchronizedList(new ArrayList(...));
List list = Collections.synchronizedList(new LinkedList(...));

Any of these options protect access to the data held in the collection. This is accomplished by wrapping the collection in an object that synchronizes every method of the collection interface: it is not designed as an optimally synchronized class. Also note that the queue collection is not supported: the Collections class supplies only wrapper classes that support the Set, Map, and List interfaces. This is not a problem in most cases since the majority of the queue implementations are synchronized (and synchronized optimally).

Complex Synchronization

A more complex case arises when you need to perform multiple operations atomically on the data held in the collection. In the previous section, we were able to use simple synchronization because the methods that needed to access the data in the collection performed only a single operation. The addCharacterListener() method has only a single statement that uses the listeners vector, so it doesn’t matter if the data changes after the addCharacterListener( ) method calls the listeners.add() method. As a result, we could rely on the container to provide the synchronization.

We alluded to a race condition in the fireNewCharacter() method. After we call the listeners.toArray( ) method, we cycle through the listeners to call each of them. It’s entirely possible that another thread will call the removeCharacterListener() method while we’re looping through the array. That won’t corrupt the array or the listeners vector, but in some algorithms, it could be a problem: we’d be operating on data that has been removed from the vector. In our program, that’s okay: we have a benign race condition. In other programs, that may not necessarily be the case.

Suppose we want to keep track of all the characters that players typed correctly (or incorrectly). We could do that with the following:

package javathreads.examples.ch08.example4;

import java.util.*;
import javax.swing.*;
import javax.swing.table.*;

public class CharCounter {
    public HashMap correctChars = new HashMap( );
    public HashMap incorrectChars = new HashMap( );
    private AbstractTableModel atm;

    public void correctChar(int c) {
        synchronized(correctChars) {
            Integer key = new Integer(c);
            Integer num = (Integer) correctChars.get(key);
            if (num == null)
                correctChars.put(key, new Integer(1));
            else correctChars.put(key, new Integer(num.intValue( ) +1));
            if (atm != null)
                atm.fireTableDataChanged( );
        }
    }

    public int getCorrectNum(int c) {
        synchronized(correctChars) {
            Integer key = new Integer(c);
            Integer num = (Integer) correctChars.get(key);
            if (num == null)
                return 0;
            return num.intValue( );
        }
    }

    public void incorrectChar(int c) {
        synchronized(incorrectChars) {
            Integer key = new Integer(c);
            Integer num = (Integer) incorrectChars.get(key);
            if (num == null)
                incorrectChars.put(key, new Integer(-1));
            else incorrectChars.put(key, new Integer(num.intValue( ) -1));
            if (atm != null)
                atm.fireTableDataChanged( );
        }
    }

    public int getIncorrectNum(int c) {
        synchronized(incorrectChars) {
            Integer key = new Integer(c);
            Integer num = (Integer) incorrectChars.get(key);
            if (num == null)
                return 0;
            return num.intValue( );
        }
    }

    public void addModel(AbstractTableModel atm) {
        this.atm = atm;
    }
}

Here we use thread-unsafe collections to hold the data and explicitly synchronize access around the code that uses the collections. It would be insufficient to use Hashtable collections in this code without also synchronizing as we did earlier. Although retrieving a value from a hashtable is threadsafe, and replacing an element in a hashtable is also threadsafe, the overall operation is not threadsafe: both collection operations must be atomic for the algorithm to succeed. Otherwise, two threads could simultaneously retrieve the stored value, increment it, and store it; the net result would be a score that is one less than it should be.

The moral of the story is that using a threadsafe collection does not guarantee the correctness of your program. Because of the explicit synchronization required in this example, we were able to use a thread-unsafe collection (although, as we’ll see in Chapter 14, if you use a threadsafe collection, it’s unlikely you’ll see much difference.)

Iterators and Enumerations

Many situations call for using each element of a collection. Such is the case in our example. We called the toArray() method, which returns an array containing every element in the vector. The Vector and Hashtable classes also have methods that return a java.util.Enumeration object that contains every element in the collection. More generally, all collection classes implement one or more methods that return a java.util.Iterator object. The iterator also contains every element in the collection.

Each of these techniques presents special synchronization concerns. We’ve already seen that looping through the array returned by the toArray() method can lead to a situation where we’re accessing an element in the array that no longer appears in the collection. That may or may not be a problem for your program; if it is a problem, the solution is to synchronize access around the loop that uses the array.

Enumeration objects are difficult to use without explicit synchronization. The enumeration keeps state information about the collection; if the collection is modified while the enumeration is active, the enumeration may become confused. The enumeration fails in some random way, possibly through an unexpected runtime exception (e.g., a NullPointerException).

To use an enumeration of a collection that may also be used by multiple threads, you should synchronize on the collection object itself:

package javathreads.examples.ch08.example5;
...
    public void fireNewCharacter(CharacterSource source, int c) {
        CharacterEvent ce = new CharacterEvent(source, c);
        Enumeration e;
               synchronized(listeners) {
               e = listeners.elements( );
               while (e.hasMoreElements( )) {
                       ((CharacterListener) e.nextElement( )).newCharacter(ce);
                  }
         }
    }
}

You could synchronize the method instead, as long as your collection is not used in any unsynchronized method. The point is that the enumeration and all uses of the collection must be locked by the same synchronization object.

Iterators behave somewhat differently. If the underlying collection of an iterator is modified while the iterator is active, the next access to the iterator throws a ConcurrentModificationException, which is also a runtime exception. Unlike enumerations, if the iterator fails, the underlying collection can still be used. The way in which iterators fail immediately after a modify operation is called “fail-fast.”

The safest way to use an iterator is to make sure its use is synchronized by its underlying collection (just as we did with the enumeration)—or to make sure that it and the collection are protected by the same synchronization lock.

You can’t rely upon the fail-fast nature of iterators. Iterators make a best effort at determining when the underlying collection has changed, but in the absence of synchronization, it’s impossible to predict when the failure occurs. Once a failure has occurred, the iterator is not useful for further processing. Therefore, you’re left with a situation where some elements of the collection have been processed and others have not.

Two classes—CopyOnWriteArrayList and CopyOnWriteArraySet —provide special iteration semantics. These classes are designed to copy the underlying collection when necessary so that iterators operate on a snapshot of the data from the time the iterator was created. Modifying the collection while the iterator is active creates a copy of the collection for the iterator.

This is an expensive operation, both in terms of time and memory usage. However, it ensures that iterators can be used from unsynchronized code because the iterators end up operating on old copies of the data. So, the iterators never throw a concurrent modification exception.

These classes are designed for cases where modifications to the collection are rare and the iterator of the collection is used frequently by multiple threads. This allows the iterators to be unsynchronized and still be threadsafe; as long as the updates are rare enough, this yields better overall performance. Note, however, that race conditions are still possible with this technique; it’s essentially the same type of operation as we saw earlier with the toArray() method. The difference is when the copying occurs: when you call the toArray() method, a copy of the collection is made at that time. With the copy-on-write classes, the copy is made whenever the collection is modified.

Thread-Aware Classes

Many collection classes are what we would term “thread-aware.” They have many internal and subtle features that were designed specifically for threads:

Some collections have an implementation that minimizes the need for synchronization by segmenting the collection. It is possible for threads to modify the collection simultaneously, without any synchronization, when they are operating on different segments.
Some provide special services—such as iterator handling—that are specifically designed for multithreaded environments. The main reason for copy-on-write iterators is to balance the performance issues of many simultaneous threads iterating through the collection against a few updates to the collection.
Interfaces have been enhanced to handle issues related to threads better. For example, the concurrent hashmap has the ability to add a key only if the key is not in the map; this simple enhancement removes the need for explicit synchronization for parallel writes of new elements.

The Producer/Consumer Pattern

One of the more common patterns in threaded programming is the producer/consumer pattern. The idea is to process data asynchronously by partitioning requests among different groups of threads. The producer is a thread (or group of threads) that generates requests (or data) to be processed. The consumer is a thread (or group of threads) that takes those requests (or data) and acts upon them. This pattern provides a clean separation that allows for better thread design and makes development and debugging easier. This pattern is shown in Figure 8-1.

Figure 8-1. The producer/consumer pattern

The producer/consumer pattern is common for threaded programs because it is easy to make threadsafe. We just need to provide a safe way to pass data from the producer to the consumer. Data needs to be synchronized only during the small period of time when it is being passed between producer and consumer. We can use simple synchronization since the acts of inserting and removing from the collection are single operations. Therefore, any threadsafe vector, list, or queue can be used.

The queue-based collection classes added to J2SE 5.0 were specifically designed for this model. The queue data type is perfect to use for this pattern since it has the simple semantics of adding and removing a single element (with an optional ordering of the requests). Furthermore, blocking queues provide thread-control functionality: this allows you to focus on the functionality of your program while the queue takes care of thread and space management issues. Of course, if you need control over such issues, you can use a nonblocking queue and use your own explicit synchronization and notification.

Here’s a simple producer that uses a blocking queue:

package javathreads.examples.ch08.example6;

import java.util.*;
import java.util.concurrent.*;

public class FibonacciProducer implements Runnable {
    private Thread thr;
    private BlockingQueue<Integer> queue;

    public FibonacciProducer(BlockingQueue<Integer> q) {
        queue = q;
        thr = new Thread(this);
        thr.start( );
    }

    public void run( ) {
        try {
            for(int x=0;;x++) {
                Thread.sleep(1000);
                queue.put(new Integer(x));
                System.out.println("Produced request " + x);
            }
        } catch (InterruptedException ex) {
        }
    }
}

The producer is implemented to run in a separate thread; it uses the queue to store requests to be processed. We’re using a blocking queue because we want the queue to handle the case where the producer gets too far ahead of the consumer. When that happens, we want the producer to block (so that it does not produce any more requests until the consumer catches up).

Here’s the consumer:

package javathreads.examples.ch08.example6;

import java.util.concurrent.*;

public class FibonacciConsumer implements Runnable {
    private Fibonacci fib = new Fibonacci( );
    private Thread thr;
    private BlockingQueue<Integer> queue;

    public FibonacciConsumer(BlockingQueue<Integer> q) {
        queue = q;
        thr = new Thread(this);
        thr.start( );
    }

    public void run( ) {
        int request, result;
        try {
            while (true) {
                request = queue.take( ).intValue( );
                result = fib.calculateWithCache(request);
                System.out.println(
                        "Calculated result of " + result + " from " + request);
            }
        } catch (InterruptedException ex) {
        }
    }
}

The consumer also runs in its own thread. It blocks until a request is in the queue, at which point it calculates a Fibonacci number based on the request. The actual calculation is performed by the Fibonacci class available in the online examples (along with a testing program).

Notice that the producer and consumer threads are decoupled: the producer never directly calls the consumer (and vice versa). This allows us to interchange different producers without affecting the consumer. It also allows us to have multiple producers serviced by a single consumer, or multiple consumers servicing a single producer. More generally, we can vary the number of either based on performance needs or user requirements.

The queue has also hidden all of the interesting thread code. When the queue is full, the producer blocks: it waits on a condition variable. Later, when the consumer takes an element from the queue, it notifies the waiting producer. A similar situation arises when the consumer calls the take() method on an empty queue. You could write all the condition variable code to handle this, but it’s far easier to allow the queue to do it for you.

We chose to calculate a Fibonacci number in our test program because we used a recursive algorithm that takes an increasingly long time to compute. It’s interesting to watch how the producer and consumer interact in this case. In the beginning, the consumer is blocked a lot of the time because it can calculate the Fibonacci number in less than one second (the time period between requests from the producer). Later, the producer spends most of its time blocked because it has overwhelmed the consumer and filled the queue.

If you have a multiprocessor machine, you can run the example with multiple consumer threads, but eventually the result is the same: the calculations take too long for the consumers to keep up.

Using the Collection Classes

So, which are the best collections to use? Obviously, no single answer fits all cases. However, here are some general suggestions. By adhering to these suggestions, we can narrow the choice of which collection to use.

When working with collection classes, work through interfaces
As with all Java programming, interfaces isolate implementation details. By using interfaces, the programmer can easily refactor a program to use a different collection implementation by changing only the initialization code.
There is little performance benefit in using a nonsynchronized collection
This may be surprising to many developers—for an understanding of the performance issues around lock acquisition, see Chapter 14. In brief, performance issues with lock acquisitions occur only when there is contention for the lock. However, a nonsynchronized collection should have no contention for the lock. If there is contention, having race conditions is a more problematic issue than performance.
For algorithms with a lot of contention, consider using the concurrent collections
The set, hashmap, and list collections that were added in J2SE 5.0 are highly optimized. If a program’s algorithm fits into one of these interfaces, consider choosing a J2SE 5.0 collection over a synchronized version of a JDK 1.2 collection. The concurrent collections are much better optimized for multithreaded access.
For producer/consumer-based programs, consider using a queue as the collection
Queues are best for the producer/consumer model for many reasons. First, queues provide an ordering of requests, preventing data starvation. Second, queues are highly optimized, having minimal synchronization, atomic accesses, and even safe parallel access in many cases. With these collections, a huge number of threads can work in parallel with little bottlenecking at the queue’s access points.
When possible, try to minimize the use of explicit synchronization
Iterators and other support methods that require tranversal of an entire collection may need more synchronization than the collection provides alone. This can be a problem when many threads are involved.
Limit your use of iterators from the copy-on-write collections
First, use these classes only when the number of elements in the collection is small. This is because of the time and size requirements of the copy-on-write operation. Second, your program must not require that the collection have the most up-to-date information. The iterator contains only the information of the collection at the time that it is created.
Consider using multiple collections
While some of these collections have minimal synchronization, these synchronization periods can still be an issue when many threads are involved. Consider having an algorithm that uses segmented collections instead of a generic implementation in which all threads use the same collection.
There is little difference between a set and a map
Theoretically, a set and a map are different in a number of ways, but in terms of implementation, there is little difference. Many of the set collections are just implemented by using the map collection. This means that the choice is not actually a choice: an item stored in a set is merely stored as a key in a map.

Summary

In this chapter, we have examined how threads interact with Java’s collection classes. We’ve seen the synchronization requirements imposed by different classes and how to handle those requirements effectively. We’ve also examined how these classes can be used for the common design pattern known as the producer/consumer pattern.

Example Classes

Here are the class names and Ant targets for the examples in this chapter. The online examples also include test code for the producer/consumer pattern.

Description	Main Java class	Ant target
Swing Type Tester	`javathreads.examples.ch08.example1.SwingTypeTester`	ch8-ex1
Swing Type Tester (uses array lists)	`javathreads.examples.ch08.example2.SwingTypeTester`	ch8-ex2
Swing Type Tester (uses synchronized blocks)	`javathreads.examples.ch08.example3.SwingTypeTester`	ch8-ex3
SwingTypeTester (counts character success/failures)	`javathreads.examples.ch08.example4.SwingTypeTester`	ch8-ex4
SwingTypeTester (uses enumeration)	`javathreads.examples.ch08.example5.SwingTypeTester`	ch8-ex5
Producer/Consumer Model	`javathreads.examples.ch08.example6.FibonacciTest nConsumers`	ch8-ex6

In the Ant script, the number of consumer threads is defined by this property:

<property name="nConsumers" value="1"/>