Programming Clojure, Third Edition

Concurrency, Parallelism, and Locking

A concurrent program models more than one thing happening simultaneously. A parallel program takes an operation that could be sequential and chooses to break it into separate pieces that can execute concurrently to speed overall execution.

There are many reasons to write concurrent or parallel programs:

For decades, performance improvements have come from packing more power into cores. Now, and for the near future, performance improvements will come from using more cores. Our hardware is itself more concurrent than ever, and systems must be concurrent to take advantage of this power.
Expensive computations may need to execute in parallel on multiple cores (or multiple boxes) to complete in a timely manner.
Tasks that are blocked waiting for a resource should stand down and let other tasks use available processors.
User interfaces need to remain responsive while performing long-running tasks.
Operations that are logically independent are easier to implement if the platform can recognize and take advantage of their independence.

Concurrency makes it glaringly obvious that more than one observer (e.g., thread) may be looking at your data. This is a big problem for languages that complect^[29] value and identity. Such languages treat a piece of data as a bank ledger with only one line. Each new operation erases history, potentially corrupting the work of every other thread on the system.

While concurrency makes the challenges more obvious, it’s a mistake to assume that multiple observers come into play only with concurrency. If your program ever has two variables that refer to the same data, those variables are different observers. If your program allows mutability at all, then you must think carefully about state.

Mutable languages tend to tackle the challenge by locking and defensive copying. Continuing the ledger analogy: the bank hires guards (locks) to supervise the activities of anybody using a ledger, and nobody is allowed to modify a ledger while anybody else is using it.

When the performance becomes really bad, the bank may even ask ledger readers to make their own private copies of the ledger so they can get out of the way and let transactions continue. These copies must still be supervised by the guards!

As irritating as this model sounds, it gets worse at the level of implementation detail. Choosing what and where to lock is a difficult task. If you get it wrong, all sorts of bad things can happen. Race conditions between threads can corrupt data. Deadlocks can stop an entire program from functioning at all. Java Concurrency in Practice [Goe06] covers these and other problems, plus their solutions, in detail. It’s a terrific book, but it’s difficult to read it and not ask yourself, “Is there another way?”

Clojure’s model for state and identity solves these problems. The bulk of program code is functional. The small parts of the codebase that truly benefit from mutability are distinct and must explicitly select one of four reference models. Using these models, you can split your models into two layers:

A functional model that has no mutable state. Most of your code will normally be in this layer, which is easier to read, easier to test, and easier to parallelize.
Reference models for the parts of the application that you find more convenient to deal with using mutable state (despite its disadvantages).

Let’s get started working with state in Clojure, using the most notorious of Clojure’s reference models: software transactional memory.