Chapter 12
The Prisoner's Dilemma and Other Strategic Non-zero-sum Games

The analysis of zero-sum games is relatively st`raightforward because in that kind of games trying to maximize the utility of one of the players is equivalent to minimizing the utility of the other (recall that in zero-sum games whatever one player won was always at the opponent's expense). We will now move away from these purely confrontational games and consider games where the interests of the players are at least partially aligned (e.g., they can earn some benefits without necessarily making their opponents worse off). We call these games non-zero-sum games because the sum of the outcomes for both players is not necessarily identical under all combinations of strategies. Non-zero-sum games sometimes lead to unexpected conclusions. In fact, we might have to stop thinking about Nash equilibria as providing the optimal solution for the game.

12.1 The Prisoner's Dilemma

To motivate non-zero-sum games, consider the famous prisoner's dilemma, which appears in pretty much any procedural show on TV. Two men suspected of committing a crime together are arrested and placed in separate interrogation rooms. This game assumes the police can only prosecute for the more serious charge if one or both suspects confess; otherwise, they can only prosecute them for a lesser charge. Each suspect may either confess or remain silent, and each one knows the consequences of their actions. If one suspect confesses but the other does not, the one who confessed turns incriminate evidence against their partner and goes free, while the other goes to jail for 20 years. On the other hand, if both suspects confess, then both of them go to jail for 5 years. Finally, if both suspects remain silent, they both go to jail for a year for a lesser charge. Assuming that each criminal only cares for their own well-being, the payoffs can be summarized in Table 12.1. Note that the sum of the payoffs is not constant (it is c012-math-001 if both confess but c012-math-002 if both remain silent) and therefore this is not a zero-sum game.

Table 12.1 Payoffs for the prisoner's dilemma

Prisoner 2
Confess Remain silent
Prisoner 1 Confess c012-math-003 c012-math-004
Remain silent c012-math-005 c012-math-006

From a cursory examination of the table, it would seem like remaining silent is the optimal solution for the game (at least from the point of view of the aggregate number of years that the criminals will spend in jail). However, this solution is not a Nash equilibrium, and it is unlikely that players will adopt such strategy. To see why, let's consider the set of best responses for Prisoner 2, which are presented in Table 12.2. Note that confessing is a dominant strategy for Prisoner 2 (and, by the symmetry of the game, for Prisoner 1 as well). Hence, the Nash equilibrium corresponds to both Prisoners confessing and getting 5 years of jail each.

Table 12.2 Best responses for Prisoner 2 in the prisoner's dilemma game

If Prisoner 1… Prisoner 2 should…
… confesses, … confess.
… remains silent, … confess.

This can seem somewhat contradictory at first sight. The outcome that involves both prisoners confessing is an equilibrium, as no player has a unilateral incentive to change their behavior if they know that the other prisoner will confess. However, one can't help but notice how this is clearly a stupid strategy for both prisoners because both would be better off if they could coordinate their actions so that both remained silent. This type of apparent contradiction (where Nash equilibria are not necessarily good solutions to the game) do not happen in zero-sum games, but are very common in non-zero-sum games. They arise because in non-zero-sum games details such as the order of play and the ability of the players to communicate, make binding agreements or set side payments can have a important effect on the outcome.

12.2 The Impact of Communication and Agreements

Consider now the non-zero-sum game between Anil and Anastasiya presented in Table 12.3. We assume that no communication, agreements, or wealth transfers are allowed between the players; these are assumptions that we will make throughout this book unless noted otherwise.

Table 12.3 Communication game in normal form

Anastasiya
Strategy c012-math-007 Strategy c012-math-008
Anil Strategy 1 c012-math-009 c012-math-010
Strategy 2 c012-math-011 c012-math-012

As with all other games, let's consider the set of best responses for each player (see Tables 12.4 and 12.5). The best responses are obtained by noting that, if Anil decides to use strategy 1, then Anastasiya will choose strategy c012-math-013 (which pays 5) over strategy c012-math-014 (which pays 0), while if Anil decides to use strategy 2, then Anastasiya will choose strategy c012-math-015 (which pays 10) over strategy c012-math-016 (which pays 0). Similarly, if Anastasiya decides to use strategy c012-math-017, then Anil will choose strategy 2 (which pays 5) over strategy 1 (which pays 0), while if Anastasiya decides to use strategy c012-math-018, then Anil will choose strategy 1 (which pays 10) over strategy 2 (which pays 0).

Table 12.4 Best responses for Anastasiya in the communication game

If Anastasiya… Anil should…
… plays c012-math-019, … play 2.
… plays c012-math-020, … play 1.

Table 12.5 Best responses for Anil in the communication game

If Anil… Anastasiya should…
… plays 1, … play c012-math-021.
… plays 2, … play c012-math-022.

From Tables 12.4 and 12.5, it is clear that both c012-math-023 and c012-math-024 are pure-strategy equilibria for this game. Indeed, c012-math-025 is the best response to 2 and vice-versa, while c012-math-026 is the best response to 1 and vice-versa. In addition, the game admits a third, mixed-strategy equilibrium. To find this additional equilibrium point, let c012-math-027 be the probability that Anastasiya plays c012-math-028 (so that the probability that she plays c012-math-029 is c012-math-030). The expected payoffs for Anil are shown in Table 12.6.

Table 12.6 Expected utility for Anil in the communication game

If Anil plays … the expected value of the game for Anil is …
… strategy 1, c012-math-031.
… strategy 2, c012-math-032.

Thus, for Anil to be indifferent among his strategies, we need

equation

and the expected payoff for Anil is c012-math-033. Exactly the same argument applies to Anastasiya. Note that this is really a Nash equilibrium for the game. If Anil plays 1 with probability 2/3, there is nothing that Anastasiya can do to improve her expected utility over the one she would get by playing c012-math-034 2/3 of the time, and vice-versa.

Note that, unlike the pure-strategy equilibria, the mixed-strategy equilibrium is fair (in the sense that the payoff for both players is the same, c012-math-035). However, the expected payoff of c012-math-036 for each player is still well below the payoffs that any of them could obtain by playing any of the pure strategies (which is at least 5). If communication and utility transfers among players were allowed, both players would be better-off by playing one of the pure strategy equilibria and having the player with the highest payoff and transferring 2.5 units to the other player so that both make a higher benefit of 7.5 units.

12.3 Which Equilibrium?

In the case of zero-sum games, if multiple Nash equilibria exist, they all have the same payoff. Hence, in those circumstances, the specific equilibrium the players ultimately settle for is largely irrelevant. However, as the example from the previous section shows, in non-zero-sum games, the payoffs of different equilibria can be quite different. This makes interpreting Nash equilibria and predicting the outcome of the game more difficult.

To illustrate these phenomena, consider the so-called game of chicken. The name has its origins in a game in which two drivers drive toward each other on a collision course: one must swerve, or both may die in the crash, but if one driver swerves and the other does not, the one who swerved will be called a chicken. A similar game was played by youths in the 1950s and inspired the classic James Dean movie Rebel without a cause.

An example of a possible payoff matrix associated with this game can be seen in Table 12.7. The payoff of c012-math-037 for each player in case of a collision is meant to represent a big loss (at least, when compared against the small profit/loss made when one player swerves and the other goes straight). The game of chicken has been used to model a number of real-life situations, including the doctrine of mutually assured destruction during the Cold War.

Table 12.7 The game of chicken

Hans
Swerve Straight
Ileena Swerve 1 c012-math-038 c012-math-039
Straight 2 c012-math-040 c012-math-041

Let's consider the best responses from each player. Table 12.8 shows the best responses for Ileena which, because of the symmetry of the payoff table, also apply to Hans. From this table, it is clear that there are two pure-strategy equilibria, which correspond to one of the players swerving and the other going straight. In the notation of mixed-strategy equilibria, these two equilibria correspond to c012-math-042 and c012-math-043, where c012-math-044 and c012-math-045 are the probabilities that Ileena and Hans swerve, respectively. Both of these equilibria imply outcomes in which there is no crash, but one of the players is always the “chicken”.

Table 12.8 Best responses for Ileena in the game of chicken

If Hans… Ileena should…
… swerves, … go straight.
… goes straight, … swerve (better chicken than dead!).

Besides these two pure-strategy equilibria, the game also admits a true mixed-strategy equilibrium, which corresponds to the players swerving 99% of the time and going straight 1%. To see this, let c012-math-046 be the probability that Hans swerves. Table 12.9 presents the expected payoffs of the game for Ileena. Since, in the equilibrium, the utilities of swerving and going straight must be same for both options, we have

equation

The expected value of the game for this mixed-strategy equilibrium is c012-math-047 for each of the players.

Table 12.9 Expected utility for Ileena in the game of chicken

If Ileena plays … the expected utility for Ileena is …
… strategy 1, c012-math-048.
… strategy 2, c012-math-049.

The outcome implied by this third mixed-strategy equilibrium is a very troubling one. Indeed, although the probability that both players go straight at the same time is really small (because of independence, c012-math-050), if the game is played for long enough the law of large numbers ensures that a crash will eventually occur!

What equilibrium will prevail in practice, assuming that the game is played multiple times and the players can learn from each other? Note that the utilities of the players are

equation

for Ileena and

equation

for Hans. Let's assume that both players start the game playing according to the mixed-strategy Nash equilibrium. In particular, if Ileena sticks to the optimal strategy c012-math-051 and we can plot the utilities of both players as a function of c012-math-052 (the probability that Ileena will swerve) using the following R code:

c12uf001
Image described by caption and surrounding text.

Figure 12.1 Expected utilities for Ileena (solid line) and Hans (dashed line) in the game of chicken as function of the probability that Hans will swerve with probability c012-math-053 if we assume that Ileena swerves with probability c012-math-054.

As expected, the point where the two utilities intersect corresponds to c012-math-055, which is the equilibrium strategy. Furthermore, Figure 12.1 shows that, no matter what Hans does, his utility remains constant at c012-math-056, and there is no incentive for him not to play the Nash equilibrium c012-math-057. However, since the utility of Hans is constant, there is also no incentive for him to play it for sure either! Indeed, although Hans cannot increase his profit, he can reduce (or even increase!) the profit of Ileena. In the two extremes, if c012-math-058 then Hans can on its own maximize the profit of Ileena (making it 0.01), and if c012-math-059 then the profit for Ileena would be minimized (making it c012-math-060).

We have implicitly assumed that the outcome for Ileena does not matter to Hans: there is no reason why Hans would prefer one of these outcomes over the other. However, in real-life players might indeed have a preference for making the other player better (or worse off). For the sake of argument, let's assume that Hans decides not to play the mixed-strategy equilibrium, but instead is a bit more aggressive and slightly decrease his probability of swerving in order to reduce the payoff of Ileena. The following code can be used to plot the utility of each player as a function of c012-math-061 if c012-math-062:

c12uf002
Image described by caption and surrounding text.

Figure 12.2 Expected utilities for Ileena (solid line) and Hans (dashed line) in the game of chicken as function of the probability that Ileena will swerve with probability c012-math-063 if we assume that Hans swerves with probability c012-math-064.

Figure 12.2 indicates that, by adopting a slightly more aggressive strategy, Hans has completely changed the incentives for Ileena. If Ileena realizes the change in Hans strategy, then she will also surely change her strategy to c012-math-065, which is the choice that maximizes her utility. What happens then? The following code again plots the utilities of both players as a function of c012-math-066 when c012-math-067:

c12uf003
Image described by caption and surrounding text.

Figure 12.3 Expected utilities for Ileena (solid line) and Hans (dashed line) in the game of chicken as function of the probability that Hans will swerve with probability c012-math-068 if we assume that Ileena always swerves.

Figure 12.3 indicates that, once Ileena starts to swerve all the time, Hans will start to go straight all the time, which coincides with one of the pure-strategy Nash equilibria we identified at the beginning. But unlike the mixed-strategy equilibria, not only the players have no incentive to deviate but they also have strong incentives to stick to their strategies. A similar argument can be made if the players begin playing with the mixed-strategy equilibrium and one of the players decides to slightly increase the probability of swerving. In that case, the first player to increase the probability will see himself trapped in an equilibrium in which it will have to swerve all the time!

The previous discussion demonstrates that the mixed-strategy equilibrium of the game of chicken is unstable, that is, that once one of the players slightly deviates from it the steady state of the game will move toward a different equilibrium. Unstable equilibria are fragile and unlikely to persist for long in the real world. On the other hand, the pure-strategy equilibria of the game of chicken are stable and, once adopted, small deviations are unlikely to change the outcome of the game.

12.4 Asymmetric Games

The non-zero-sum games studied so far have been symmetric in the sense that both players have the same strategies and payoffs. However, not all games are symmetric in their gains for both players; for example, consider a two-person fencing game where each player has only one attack move and one defensive move, but they have different gains for each player (imagine one player is a more defensive player – Ki-Adi Mundi – and the other is more skillful in offensive moves – Asajj Ventress). Table 12.10 shows the payoffs associated with this game.

Table 12.10 A fictional game of swords in Star Wars

Asajj
Attack move Defensive move
Ki-Adi Attack move c012-math-069 c012-math-070
Defensive move c012-math-071 c012-math-072

Best responses for each of the players are presented in Tables 12.11 and 12.12. Again, there are two pure-strategy equilibria, one in which Ki-Adi always attacks and Asajj always defends, and another in which the roles are reversed. In addition, there is a mixed-strategy equilibrium in which Ki-Adi attacks with probability c012-math-073 and defends with probability c012-math-074 while Asajj attacks with probability c012-math-075 and defends with probability c012-math-076.

Table 12.11 Best responses for Ki-Adi in the sword game

If Asajj… Ki-Adi should…
… attacks, … defend.
… defend, … attack.

Table 12.12 Best responses for Asajj in the sword game

If Ki-Adi… Asajj should…
… attacks, … defend.
… defend, … attack.

We proceed now to find c012-math-077, c012-math-078, c012-math-079, and c012-math-080. Table 12.13 presents the expected value of the game for Ki-Adi under each of this possible actions. As we have argued in the past, for Ki-Adi to be willing to randomize, the utility derived from both options has to be the same in equilibrium. This means that c012-math-081 and, since c012-math-082, we have

equation

Table 12.13 Expected utility for Ki-Adi in the asymmetric sword game

If Ki-Adi… the expected utility for Ki-Adi is …
… attacks, c012-math-083
… defends, c012-math-084

Recall that c012-math-085 and c012-math-086 are, respectively, the probabilities that Asajj will decide to attack or defend.

Hence, the optimal strategy for Asajj involves attacking 50% of the time, and defending the other 50% of the times.

We can carry out a similar calculation for Ki-Adi, with Table 12.14 presenting the expected value for each of his choices. Equating the expected utilities we have c012-math-087, or c012-math-088. Now, using the fact that c012-math-089 we have

equation

Hence, the optimal strategy for Ki-Adi is to defend 40% of the time and to use his attacking move 60% of the time.

Table 12.14 Expected utility for Asajj in the asymmetric sword game

If Asajj… the expected utility for Asajj is …
… attacks, c012-math-090
… defends, c012-math-091

Recall that c012-math-092 and c012-math-093 are, respectively, the probabilities that Ki-Adi will decide to attack or defend.

12.5 Exercises

  1. 1. Consider the following two-player, non-zero-sum game:

    Player 2
    1 2 3
    Player 1 c012-math-094 c012-math-095 c012-math-096 c012-math-097
    c012-math-098 c012-math-099 c012-math-100 c012-math-101
    c012-math-102 c012-math-103 c012-math-104 c012-math-105
    1. a. Does either player have a dominant strategy?
    2. b. Does either player have a dominated strategy?
    3. c. What are all the pure-strategy Nash equilibria? If there aren't any, is there a mixed-strategy equilibria?
  2. 2. Consider the following two-player, non-zero-sum game:

    Rose
    c012-math-106 c012-math-107 c012-math-108
    Joe c012-math-109 c012-math-110 c012-math-111 c012-math-112
    c012-math-113 c012-math-114 c012-math-115 c012-math-116
    c012-math-117 c012-math-118 c012-math-119 c012-math-120
    1. a. Does either player have a dominant strategy?
    2. b. Does either player have a dominated strategy?
    3. c. What are all the pure-strategy Nash equilibria? If there aren't any, is there a mixed-strategy equilibrium?
  3. 3. Consider the following two-player, non-zero-sum game:

    Romney
    Left Center Right
    Obama Left c012-math-121 c012-math-122 c012-math-123
    Center c012-math-124 c012-math-125 c012-math-126
    Right c012-math-127 c012-math-128 c012-math-129
    1. a. Does either player have a dominant strategy?
    2. b. Does either player have a dominated strategy?
    3. c. What are all the pure-strategy Nash equilibria? If there aren't any, is there a mixed-strategy equilibria?
  4. 4. Splitting three chocolate chips. Two kids are asked to write down an integer number between 1 and 3. The player with the higher number gets that amount of chocolate chips minus the amount written by the opponent (if they chose 2 and 1, respectively, the child that had chosen the higher number gets one chocolate chip). The kid who chooses the smaller number of chips will get the remainder of chips (in the example above, the second kid will get two chips). If they choose the same number of chips, they split the three chips equally (i.e., 1.5 each). Put the game in normal form. What is the solution of the game?

  5. 5. In the game described in the previous exercise, assume instead that the kid with highest number gets that amount of chocolate chips minus the amount written by the opponent, but the second opponent gets the number they asked for (to a maximum of three). First put this game in normal form. What is the solution of this game? Explain which differences you see between these two games.

  6. 6. Tax collection. Consider a game between a tax collector (Mary) and a tax payer (John). John has income of 200 and may either report his income truthfully or lie. If he reports truthfully, he pays 100 to Mary and keeps the rest. If John lies and Mary does not audit, then John keeps all his income. If John lies and Mary audits, then John gives all his income to Mary. The cost of conducting an audit is 20. Suppose that both parties move simultaneously (i.e., Mary must decide whether to audit before he knows John's reported income). Find the mixed-strategy Nash equilibrium for this game and the equilibrium payoffs to each player.

  7. 7. In this simple coin game, players start by choosing randomly whether to have either no coin or one coin in their hand; secondly, each of them also needs to try to guess the sum of coins in their hands. They win if they get the right sum and they draw when they both have the right answer; the players loose in any other situation.

    1. a. Compute the payoff table associated with this game and show that this is a non-zero-sum game.
    2. b. Find all pure- and mixed-strategy equilibria for this game.
    3. c. Which change to the rules of the previous game would turn it into a zero-sum game?
  8. 8. A company's cash is contained in two safes, which are kept some distance apart. There is $90,000 in one safe and $10,000 in the other. A burglar plans to break into one safe and have an accomplice set off the alarm in the other one. The watchman has time to check only one safe; if he guards the wrong one, the company loses the contents of the other safe, and if he guards the right one, the burglar leaves empty-handed. From which safe is a sophisticated burglar more likely to steal? With what probability? What should the watchman do, and how much, on average, will be stolen?

  9. 9. Why do cops separately interrogate suspected accomplices of a crime?

  10. 10. Can you give a historical example in which two (groups) of countries could be thought of as playing one of the pure-strategy equilibria in the game of chicken? How about an example of two countries playing the mixed-strategy equilibrium?

  11. 11. [R] Use the simulation of the game of chicken to investigate what happens if one player plays using the randomized strategy associated with the mixed-strategy Nash equilibrium while the other player always goes straight. Which one of the two equilibriums do you think is more likely to become the steady state after repeated play?

  12. 12. [R] Build a simulation for the asymmetric game in Section 12.4 and contrast the various equilibria.