Chapter 11
Rock–Paper–Scissors: Mixed Strategies in Zero-Sum Games

Not all two-person, zero-sum games admit a pure-strategy equilibrium (i.e., a solution that results in the players always using the same strategies over and over again). A well-known example is the game of rock–paper–scissors, also known as roshambo. Rock–paper–scissors is a hand game between two players that simultaneously select among three gestures (corresponding to rock, paper, or scissors). The objective of the game is to select a gesture that defeats that of the opponent, and the winner of each round makes $1 out of the other player. The game is resolved as follows:

The payoffs of the game for our two players Jiahao and Antonio are reported in Table 11.1. Since the outcomes for the two players add to the same quantity in all cell, this is clearly a zero-sum game. As we have done before, our next step in solving the game is to come up with optimal responses from each player's point of view (which are probably already obvious to you). Table 11.2 shows the best responses for Jiahao. Because of the symmetry of the game, the same table also applies to Antonio.

Table 11.1 Player's profit in rock–paper–scissors

Antonio
Rock Paper Scissors
Rock c011-math-001 c011-math-002 c011-math-003
Jiahao Paper c011-math-004 c011-math-005 c011-math-006
Scissors c011-math-007 c011-math-008 c011-math-009

Note that no combination of strategies leads to a pure-strategy equilibrium: once Jiahao knows that Antonio will play (say) rock for sure, he has a clear incentive to continuously play paper. But once Jiahao realizes that Antonio will play paper for sure, he has an incentive to start playing scissors. In turn, this will lead to Antonio selecting to play rock. Consequently, players who repeatedly play this game will tend to continuously cycle through the different gestures. This is unlike the examples we discussed before where, once the equilibrium strategy has been attained, there is no incentive for the players to deviate unilaterally.

Table 11.2 Best responses for Jiahao in the game of rock–paper–scissors

If Antonio plays … Jiahao should play …
… rock, … paper.
… paper, … scissors.
… scissors, … rock.

If you have ever played rock–paper–scissor before, you have probably figured out that constantly changing your gesture in a more or less unpredictable manner is a better strategy than always selecting the same gesture. Indeed, it turns out that the optimal strategy for this game corresponds to randomly selecting a strategy among the three available to them. How often should they play each strategy in rock–paper–scissor? Because the profit from playing the game is the same ($1) under all strategies, the (correct!) intuition is that players should alternate among strategies so that you spend about 1/3 of their time playing each one of them. Game strategies that involve such randomly chosen actions each time the game is played are called mixed strategies. This is in contrast to the pure strategies we studied in the previous chapter, which involve always using the same action.

Mixed-Strategy Nash Equilibria

We say that a Nash equilibria involves mixed strategies if, in equilibrium, each player randomizes their actions each time they play according to a given probability distribution over their options.

In real-life games, mixed-strategy equilibria sometimes fail to materialize as the long-term outcome of repeated games because humans are very bad at avoiding patterns of behavior, but they are nonetheless the optimal way to proceed.

11.1 Finding Mixed-Strategy Equilibria

A general approach for deriving mixed-strategy equilibria in zero-sum games is to find a set of probabilities for the actions available to each player such that a player is indifferent to which strategy she selects if the opponent randomizes according to their probabilities, and vice-versa. Indeed, if both players are willing to randomize, then the expected utility associated with each alternative should be the same (otherwise, the players would not randomize but would always choose the option that maximizes their utility).

Let's illustrate this principle using the rock–paper–scissors example. Let c011-math-010 be the probability that Antonio chooses rock on a given round, c011-math-011 the probability that he chooses paper, and c011-math-012 the probability that he chooses scissors. Note that, since these are the only three options available, these probabilities need to satisfy c011-math-013. The expected value of each strategy for Jiahao is then given in Table 11.3.

Table 11.3 Utility associated with different actions that Jiahao can take if he assumes that Antonio selects rock with probability c011-math-014, paper with probability c011-math-015 and scissors with probability c011-math-016

If Jiahao plays … The expected value of the game for Jiahao is …
… rock, c011-math-017.
… paper, c011-math-018.
… scissors, c011-math-019.

Now, remember that no strategy by itself is the best one among all strategies; this means that the rational player will have no particular preference among any of their strategies. This in turn means that the expected values for all strategies need to be equal to each other. For example, we could make the expected value of the game for rock and paper, as well as that for rock and scissor, equal. This leads to

11.1 equation

which, together with the fact that all probabilities need to add up to one,

11.2 equation

gives us a system of three equations with three unknowns (with the unknowns corresponding to the probabilities Antonio will choose each strategy). To solve the system of equations, first add (11.1) and (11.2) together to get

11.3 equation

Inserting this result back into (11.1), we also get

equation

Therefore, we have shown that all three probabilities need to be equal. Since they also need to add up to 1 (recall Equation (11.3)), we get

equation

Because of the symmetry of the game, the same argument applies to the randomization strategy of Jiahao, and the equilibrium point of the game corresponds to each player randomly (and independently) selecting a hand gesture with equal probability among the three options available to them every time they play. The expected value of this game is then

equation

Consequently, if the game is repeated multiple times and Jiahao adopts the optimal mixed strategy, he is bound to at least not lose money in the long run. That is, if the players adopt the Nash equilibrium as their strategy, then none of them will make any money. To corroborate this, the following simulation allows you to compare the optimal strategy implied by the Nash equilibrium (which Jiahao always plays) against other strategies played by Antonio.

c11uf001
c11uf001

In the simulation above, we assumed that Antonio picks paper 10% of the time, rock 80% of the time, and scissors 10% of the time, but the result is the same no matter what Antonio does: the long-run profit for both Antonio and Jiahao is always zero. In this game, the Nash-equilibrium can be thought of the best defensive strategy: no matter what the other player does, you cannot lose money. The other side of that statement is that, if the other player uses the Nash equilibrium as its strategy, then there is nothing you can do to make more money.

Pure strategies are a special case of mixed strategies where the probability associated with one of the alternatives is equal to 1. By expanding the space of possible strategies to include mixed strategies, we can guarantee that a large class of zero-sum games has a solution given by a Nash equilibrium. This is a consequence of the minimax theorem:

Minimax Theorem

For every two-person, zero-sum game with finitely many strategies, there exists a value c011-math-023 and a mixed strategy for each player, such that (1) given the strategy for Player 2, the best payoff possible for Player 1 is c011-math-024, and (2) given the strategy for Player 1, the best payoff possible for Player 2 is c011-math-025.

In simpler words, the minimax theorem states that every two-person, zero-sum game with finitely many strategies has at least one solution, which might involve players using mixed strategies.

11.2 Mixed Strategy Equilibria in Sports

Mixed strategy equilibria appear in a number of sports including baseball, football, and soccer. For example, suppose now that you are playing soccer and, in particular, that you will be kicking a penalty. Therefore, you need to decide how you are going to kick (your options are to kick left, center, or right). The goalkeeper also needs to make a similar decision about where to lunge (again, left, center, or right). Table 11.4 presents the utility derived by each player from each combination of strategies (the numbers correspond to the (historical) conditional probabilities that a goal is scored/not scored).

Table 11.4 Utilities associated with different penalty kick decisions

Goal keeper
Left Center Right
Kicker Left (0.65,0.35) (0.95,0.05) (0.95,0.05)
Center (0.95,0.05) (0,1) (0.95,0.05)
Right (0.95,0.05) (0.95,0.05) (0.65,0.35)

As with rock–paper–scissors, there are no pure-strategy Nash equilibria for this game. If you have ever watched soccer, this should not be surprising, a kicker or a goal keeper who becomes predictable are typically very bad for their teams.

To determine the mixed-strategy equilibrium we proceed as before and first compute the expected value of the game for the kicker when the goalkeeper randomizes their actions so that with probability c011-math-026 she will lunge left, with probability c011-math-027 she will stay in the center, and with probability c011-math-028 she will lunge right. The results are presented in Table 11.5.

Table 11.5 Utility associated with different actions taken by the kicker if he assumes that goal keeper selects left with probability c011-math-029, center with probability c011-math-030, and right with probability c011-math-031

If Kicker kicks … the expected value of the game for the kicker is …
… left, c011-math-032.
… center, c011-math-033.
… right, c011-math-034.

Since the expected utilities must be the same, equating the first two expressions leads to

equation

Similarly by equating the second and third expressions, we have

equation

Note that one consequence of these two equations is that c011-math-035. This makes intuitive sense; if we look at the payoff table, we realize that the left and right choices are interchangeable. Now, using the fact that c011-math-036, we have

equation

and from there we get

equation

Therefore, the equilibrium strategy for the goalkeeper is to lunge to the left about 43% of the time, to the right around 43% of the time, and to stay in the center about 14% of the time. If we now look at the kicker's optimal strategy, we discover that it is identical (he should kick to the right 43% of the time, to the left 43% of the time, and to the center 14% of the time). The expected value of the game for the kicker, if both players stick to this strategy, is obtained by inserting the optimum probabilities found above in any of the expressions for the expected value of the game in Table 11.5 (recall that, by definition, they have to be the same)

equation

This number can be interpreted as the (marginal) probability of scoring a goal if the players follow the optimal strategy.

11.3 Bluffing as a Strategic Game with a Mixed-Strategy Equilibrium

We turn our attention now to a very simplified version of “poker” in which you and your opponent Alya each place a $5 bet on the table and then secretly toss a coin with a 0 on one side and 1 on the other. You play first, and you can decide to either pass (c011-math-037) or bet (c011-math-038) an additional $3. If you pass, the numbers tossed by you and Alya are compared; the largest number takes the pot ($10). If both numbers are the same, each player gets their $5 back (a tie). On the other hand, if you bet the extra $3, Alya might decide to see (c011-math-039) or fold (c011-math-040). If Alya folds, you get the pot ($13, of which $8 are yours and $5 are Alya's) irrespective of the numbers tossed. If Alya decides to see she must add $3 to the pot (for a total of $16). Again the numbers are compared; the larger number takes the $16 and if the numbers are equal, each gets their money back. Figure 11.1 shows a decision tree with the sequence of decisions associated with the game.

Graphical illustration of decisions in a simplified version of poker.

Figure 11.1 Graphical representation of decisions in a simplified version of poker.

To analyze this game, we consider both rounds of bets simultaneously and write down all strategies available to each player. Each of these strategies will describe what action the player takes based on what the coin shows. For example, you could decide to pass no matter whether you have a 0 or a 1 (call this strategy pass-pass, or c011-math-041). Alternatively, you could pass whenever you have a 0 and bet if you have a 1 (call this pass-bet, or c011-math-042), you can bet if you have a 0 and pass if you have a 1 (call this bet-pass, or c011-math-043), or you could always bet no matter what the outcome of the toss is for you (call this strategy bet-bet, or c011-math-044). Intuitively, some of these strategies are very bad (e.g., playing c011-math-045 is clearly a bad idea); this will be confirmed by our analysis below. Similarly, Alya also has four strategies, c011-math-046 (fold no matter what her coin shows), c011-math-047 (always see, no matter what her coin shows), c011-math-048 (fold is she has a 0 and see if she has a 1), and c011-math-049 (see if she has a 0, and fold if she has a 1).

The outcome of the game for every pair of strategies is summarized in Table 11.6. The numbers in the table correspond to the expected profit for each player from each particular combination of strategies. For example, if you decide to play c011-math-050, you will always pass and the outcome will depend only on the outcomes of the coin tosses no matter what strategy Alya chooses (as she never gets to play). Therefore, all the cells in the first row of the table are equal to zero,

equation

Similarly, the expected value for you when you choose strategy c011-math-051 and Alya chooses c011-math-052,

equation

while the expected value for you when you choose strategy c011-math-053 and Alya chooses c011-math-054,

equation

Table 11.6 Expected profits in the simplified poker

Alya
c011-math-055 c011-math-056 c011-math-057 c011-math-058
You c011-math-059 c011-math-060 c011-math-061 c011-math-062 c011-math-063
c011-math-064 c011-math-065 c011-math-066 c011-math-067 c011-math-068
c011-math-069 c011-math-070 c011-math-071 c011-math-072 c011-math-073
c011-math-074 c011-math-075 c011-math-076 c011-math-077 c011-math-078

The remaining entries of Table 11.6 can be computed in a similar way. Once these calculations have been completed, we can proceed to find best responses for each player actions (see Tables 11.7 and 11.8). From these tables, it is clear that c011-math-079 and c011-math-080 are dominated strategies. This is intuitively clear: if you are the first player to play, passing no matter what your hand looks like or always betting in the first round and always folding on the second are very bad ideas. Similarly, for Alya, c011-math-081, and c011-math-082 are dominated strategies. Again this makes sense: always folding is a bad idea for Alya, as is betting when she has a 0 and folding when she has a 1 (she has some chance of winning if her coin shows a 1, while the best she can do if she sees a 0 is to tie).

Table 11.7 Best responses for you in the simplified poker game

If Alya plays … you should play …
c011-math-083, c011-math-084.
c011-math-085, c011-math-086.
c011-math-087, c011-math-088.
c011-math-089, c011-math-090 or c011-math-091 (you are indifferent).

Table 11.8 Best responses for Alya in the simplified poker game

If you play … Alya should play …
c011-math-092, c011-math-093, c011-math-094, c011-math-095, c011-math-096 (she is indifferent, she never gets to play).
c011-math-097, c011-math-098.
c011-math-099, c011-math-100.
c011-math-101, c011-math-102.

If we eliminate the strategies that are dominated, we end up with a reduced payoff table (see Table 11.9). With this reduced table, it is clear that there is no pure-strategy equilibrium for the game. To find a mixed-strategy equilibrium, we apply the same ideas we have used before (but on the reduced table, since the dominated strategies have been discarded). First, let c011-math-103 be the probability that Alya picks the c011-math-104 strategy, and c011-math-105 be the probability that Alya picks the c011-math-106 strategy. The expected profits for each of your actions if Alya randomizes among c011-math-107 and c011-math-108 according to c011-math-109 and c011-math-110 can be seen in Table 11.10.

Table 11.9 Expected profits in the simplified poker game after eliminating dominated strategies

Alya
c011-math-111 c011-math-112
You c011-math-113 c011-math-114 c011-math-115
c011-math-116 c011-math-117 c011-math-118

Table 11.10 Expected profits associated with different actions you take if you assume that Alya will select c011-math-119 with probability c011-math-120 and c011-math-121 with probability c011-math-122

If you play … the expected value of the game for you is …
c011-math-123, c011-math-124.
c011-math-125, c011-math-126.

Since the expected utility from both actions must be the same in equilibrium, we have

equation

Finally, using the fact that c011-math-127, we have

equation

and c011-math-128. In other words, Alya should always see if she has a 1 and should fold 60% of the time if she has a 0.

A similar calculation for your randomization probability shows that you should play c011-math-129 60% of the times and c011-math-130 the other 40% of the times. That is, you should always bet if you have a 1, and bluff 60% of the times in which you got a 0. Substituting these numbers back into the formulas for the expected values we have that your expected profit from this game is

equation

That is, you win at least 30 cents on average if you play according to your optimal mixed strategy, and Alya looses at most 30 cents on average if she plays using their optimal strategy. None of you can do better than that. It is worthwhile to observe that your payoff is positive in this game because you are able to play first. The fact that the first player has an advantage when bluffing is the reason why the role of dealer in Poker rotates around all players. In addition, note that if the players were to choose their strategies simultaneously then both players would have zero expected value (the game would be very similar to matching pennies, see Exercise 1).

The following simulation can help you corroborate that players have no incentive to unilaterally deviate from the Nash equilibrium. In particular, as long as you stick to your optimal strategy (always betting if you get a 1, and betting 60% of the time if you get a 0), there is nothing that Alya can do to improve her outcome.

c11uf002
c11uf002

11.4 Exercises

  1. 1. Matching pennies is a game played between two players, Player A and Player B. Each player has a penny and must secretly turn the penny to heads or tails. The players then reveal their choices simultaneously. If the pennies match (both heads or both tails), Player A receives one dollar from Player B (c011-math-131 for A, c011-math-132 for B). If the pennies do not match (one heads and one tails), Player B receives one dollar from Player A (c011-math-133 for A, c011-math-134 for B). Is there any pure-strategy equilibrium for this game? If so, what is the expected payoff? Is there any mixed-strategy equilibrium for this game? If so, what is the expected payoff?

  2. 2. In addition to the pure-strategy equilibrium already discussed, the game whose payoff was given in Table 10.12 also admits mixed-strategy equilibria. Find these equilibrium strategies along with the payoff of the game.

  3. 3. Combat and strategy-based video games frequently feature cycles in their characters' or units' effectiveness that resemble the pattern of rock–paper–scissors. These often attempt to emulate cycles in real-world combat (such as where cavalry are effective against archers, archers have an edge over spearmen, and spearmen are strongest against cavalry). It is claimed that this kind of strategy makes the game self-balancing. Explain this claim.

  4. 4. Recall the sword duel game described in Exercise 6 from Section 10.4. Back then we found that there's no pure-strategy equilibrium. Find the mixed-strategy equilibria and their expected payoff.

  5. 5. In the website http://www.samkass.com/theories/RPSSL.html, an extended version of rock–paper–scissors is described (it was made popular in the TV show The Big Bang Theory). Setup the table of payoff for this game and find its solution.

  6. 6. What about the seven-gesture version of rock–paper–scissors in http://www.umop.com/rps7.htm?

  7. 7. Assume that we are playing the simplified “poker” where the blind is $4 (instead of $5) and the raise is another $4 (instead of $3). How should we randomize and what would be expected value of the game for the player that gets to act first? What would be the optimal strategy for the first player if they knew that the second player will always see on a 1 but will only do so 20% of the time on a 2?

  8. 8. [R] Write code to simulate the simplified game of poker described in the previous exercise. What happens if the first player to bet deviates from their optimal strategy, but the second player does not?

  9. 9. Why is it important in poker for players to alternate at playing blind bets?

  10. 10. Change the rock–paper–scissors game so that when scissor matches paper, scissors gets a gain of 2 and paper a loss of c011-math-135 and when rock matches scissors, rock gets a gain of 3 and scissors gets c011-math-136. What is the Nash equilibrium for this game? What is the expected payoff to the players?

  11. 11. [R] Write code to simulate the penalty kick game discussed at the beginning of Section 11.2.

  12. 12. Do all calculations required to construct Table 11.6 and check that your results agree with those provided in this book.