In a sequential program, that is, a program with only one goroutine, the steps of the program happen in the familiar execution order determined by the program logic. For instance, in a sequence of statements, the first one happens before the second one, and so on. In a program with two or more goroutines, the steps within each goroutine happen in the familiar order, but in general we don’t know whether an event x in one goroutine happens before an event y in another goroutine, or happens after it, or is simultaneous with it. When we cannot confidently say that one event happens before the other, then the events x and y are concurrent.
Consider a function that works correctly in a sequential program. That function is concurrency-safe if it continues to work correctly even when called concurrently, that is, from two or more goroutines with no additional synchronization. We can generalize this notion to a set of collaborating functions, such as the methods and operations of a particular type. A type is concurrency-safe if all its accessible methods and operations are concurrency-safe.
We can make a program concurrency-safe without making every concrete type in that program concurrency-safe. Indeed, concurrency-safe types are the exception rather than the rule, so you should access a variable concurrently only if the documentation for its type says that this is safe. We avoid concurrent access to most variables either by confining them to a single goroutine or by maintaining a higher-level invariant of mutual exclusion. We’ll explain these terms in this chapter.
In contrast, exported package-level functions are generally expected to be concurrency-safe. Since package-level variables cannot be confined to a single goroutine, functions that modify them must enforce mutual exclusion.
There are many reasons a function might not work when called concurrently, including deadlock, livelock, and resource starvation. We don’t have space to discuss all of them, so we’ll focus on the most important one, the race condition.
A race condition is a situation in which the program does not give the correct result for some interleavings of the operations of multiple goroutines. Race conditions are pernicious because they may remain latent in a program and appear infrequently, perhaps only under heavy load or when using certain compilers, platforms, or architectures. This makes them hard to reproduce and diagnose.
It is traditional to explain the seriousness of race conditions through the metaphor of financial loss, so we’ll consider a simple bank account program.
// Package bank implements a bank with only one account. package bank var balance int func Deposit(amount int) { balance = balance + amount } func Balance() int { return balance }
(We could have written the body of the Deposit
function as
balance += amount
, which is equivalent, but the longer form
will simplify the explanation.)
For a program this trivial, we can see at a glance that any sequence
of calls to Deposit
and Balance
will give the right
answer, that is, Balance
will report the sum of all amounts
previously deposited.
However, if we call these functions not in a sequence but
concurrently, Balance
is no longer guaranteed to give the right answer.
Consider the following two goroutines, which represent two
transactions on a joint bank account:
// Alice: go func() { bank.Deposit(200) // A1 fmt.Println("=", bank.Balance()) // A2 }() // Bob: go bank.Deposit(100) // B
Alice deposits $200, then checks her balance, while Bob deposits $100.
Since the steps A1
and A2
occur concurrently with
B
, we cannot predict the order in which they happen.
Intuitively, it might seem that there are only three possible orderings,
which we’ll call “Alice first,” “Bob first,” and “Alice/Bob/Alice.”
The following table shows the value of the balance
variable after each
step.
The quoted strings represent the printed balance slips.
Alice first Bob first Alice/Bob/Alice 0 0 0 A1 200 B 100 A1 200 A2 "= 200" A1 300 B 300 B 300 A2 "= 300" A2 "= 300"
In all cases the final balance is $300. The only variation is whether Alice’s balance slip includes Bob’s transaction or not, but the customers are satisfied either way.
But this intuition is wrong.
There is a fourth possible outcome, in which Bob’s deposit occurs in
the middle of Alice’s deposit, after the balance has been read
(balance + amount
) but before it has been updated
(balance = ...
), causing Bob’s transaction to disappear.
This is because Alice’s deposit operation A1
is really a sequence of two
operations, a read and a write; call them A1r
and A1w
. Here’s
the problematic interleaving:
Data race 0 A1r 0 ... = balance + amount B 100 A1w 200 balance = ... A2 "= 200"
After A1r
, the expression balance + amount
evaluates to
200, so this is the value written during A1w
, despite the
intervening deposit.
The final balance is only $200.
The bank is $100 richer at Bob’s expense.
This program contains a particular kind of race condition called a data race. A data race occurs whenever two goroutines access the same variable concurrently and at least one of the accesses is a write.
Things get even messier if the data race involves a variable of a type
that is larger than a single machine word, such as an interface, a string, or a slice.
This code updates x
concurrently to two slices of
different lengths:
var x []int go func() { x = make([]int, 10) }() go func() { x = make([]int, 1000000) }() x[999999] = 1 // NOTE: undefined behavior; memory corruption possible!
The value of x
in the final statement is not defined; it could
be nil, or a slice of length 10, or a slice of length 1,000,000.
But recall that there are three parts to a slice: the pointer, the
length, and the capacity.
If the pointer comes from the first call to make
and the length
comes from the second, x
would be a chimera, a slice whose
nominal length is 1,000,000 but whose underlying array has only 10
elements.
In this eventuality, storing to element
999,999 would clobber an arbitrary faraway memory location, with
consequences that are impossible to predict and hard to debug and
localize. This semantic minefield is called undefined behavior and
is well known to C programmers; fortunately it is rarely as
troublesome in Go as in C.
Even the notion that a concurrent program is an interleaving of several sequential programs is a false intuition. As we’ll see in Section 9.4, data races may have even stranger outcomes. Many programmers—even some very clever ones—will occasionally offer justifications for known data races in their programs: “the cost of mutual exclusion is too high,” “this logic is only for logging,” “I don’t mind if I drop some messages,” and so on. The absence of problems on a given compiler and platform may give them false confidence. A good rule of thumb is that there is no such thing as a benign data race. So how do we avoid data races in our programs?
We’ll repeat the definition, since it is so important: A data race occurs whenever two goroutines access the same variable concurrently and at least one of the accesses is a write. It follows from this definition that there are three ways to avoid a data race.
The first way is not to write the variable.
Consider the map below, which is lazily populated as each key is
requested for the first time.
If Icon
is called sequentially, the program works fine, but if
Icon
is called concurrently, there is a data race accessing the
map.
var icons = make(map[string]image.Image) func loadIcon(name string) image.Image // NOTE: not concurrency-safe! func Icon(name string) image.Image { icon, ok := icons[name] if !ok { icon = loadIcon(name) icons[name] = icon } return icon }
If instead we initialize the map with all necessary entries before
creating additional goroutines and never modify it again,
then any number of goroutines may
safely call Icon
concurrently since each only reads the map.
var icons = map[string]image.Image{ "spades.png": loadIcon("spades.png"), "hearts.png": loadIcon("hearts.png"), "diamonds.png": loadIcon("diamonds.png"), "clubs.png": loadIcon("clubs.png"), } // Concurrency-safe. func Icon(name string) image.Image { return icons[name] }
In the example above, the icons
variable is assigned during
package initialization, which happens before the program’s
main
function starts running.
Once initialized, icons
is never modified.
Data structures that are never modified or are immutable are
inherently concurrency-safe and need no synchronization.
But obviously we can’t use this approach if updates are essential, as
with a bank account.
The second way to avoid a data race is to avoid accessing the variable
from multiple goroutines.
This is the approach taken by many of the programs in the previous
chapter.
For example, the main goroutine in the concurrent web crawler (§8.6) is the sole goroutine that
accesses the seen
map, and the broadcaster
goroutine in
the chat server (§8.10) is the only
goroutine that accesses the clients
map.
These variables are confined to a single goroutine.
Since other goroutines cannot access the variable directly, they must
use a channel to send the confining goroutine a request to query or
update the variable.
This is what is meant by the Go mantra “Do not communicate by sharing
memory; instead, share memory by communicating.”
A goroutine that brokers access to a confined variable using channel
requests is called a monitor goroutine for that variable.
For example, the broadcaster
goroutine monitors access to the
clients
map.
Here’s the bank example rewritten with the balance
variable confined
to a monitor goroutine called teller
:
// Package bank provides a concurrency-safe bank with one account. package bank var deposits = make(chan int) // send amount to deposit var balances = make(chan int) // receive balance func Deposit(amount int) { deposits <- amount } func Balance() int { return <-balances } func teller() { var balance int // balance is confined to teller goroutine for { select { case amount := <-deposits: balance += amount case balances <- balance: } } } func init() { go teller() // start the monitor goroutine }
Even when a variable cannot be confined to a single goroutine for its entire lifetime, confinement may still be a solution to the problem of concurrent access. For example, it’s common to share a variable between goroutines in a pipeline by passing its address from one stage to the next over a channel. If each stage of the pipeline refrains from accessing the variable after sending it to the next stage, then all accesses to the variable are sequential. In effect, the variable is confined to one stage of the pipeline, then confined to the next, and so on. This discipline is sometimes called serial confinement.
In the example below, Cake
s are serially confined, first to the
baker
goroutine, then to the icer
goroutine:
type Cake struct{ state string } func baker(cooked chan<- *Cake) { for { cake := new(Cake) cake.state = "cooked" cooked <- cake // baker never touches this cake again } } func icer(iced chan<- *Cake, cooked <-chan *Cake) { for cake := range cooked { cake.state = "iced" iced <- cake // icer never touches this cake again } }
The third way to avoid a data race is to allow many goroutines to access the variable, but only one at a time. This approach is known as mutual exclusion and is the subject of the next section.
Exercise 9.1:
Add a function Withdraw(amount int) bool
to the
gopl.io/ch9/bank1
program.
The result should indicate whether the transaction succeeded or failed
due to insufficient funds.
The message sent to the monitor goroutine must contain both the amount
to withdraw and a new channel over which the monitor goroutine can send the
boolean result back to Withdraw
.