CHAPTER 9
Behavioral Game Theory
9.1 Nature of behavioral game theory
Behavioral game theory and standard game theory
Empirical studies of simple games
Bargaining with incomplete information
Empirical studies of bargaining games with complete information
Empirical studies of bargaining games with incomplete information
Iteration leading to decreased payoffs
Iteration leading to increased payoffs
Nature and functions of signaling
Empirical studies of signaling games
Experience-Weighted Attraction learning
Case 9.1 Penalty kicking in professional soccer
Case 9.2 Impasses in bargaining and self-serving bias
Case 9.3 Market entry in monopoly
The movie Dr. Strangelove (1964) features a device called the Doomsday Machine, which is designed to automatically destroy life on Earth if there is a nuclear attack against the Soviet Union. Furthermore, the device is impossible to untrigger. This seemingly foolhardy characteristic is essential for producing the desired deterrent effect on the enemy. It ensures that the decision-making process is irrevocable, eliminating the possibility of any human interference, thus rendering the device an entirely credible commitment as a retaliatory defense weapon.
The movie is a black comedy regarding the nuclear arms race between the US and the Soviet Union. This arms race had many flashpoints in the 1960s, notably the Cuban missile crisis in 1962, and featured vital aspects of game theory, both between nations and within nations. US and Soviet leaders wanted to deter the other country from attacking them, and appear strong to their citizens, without overburdening their countries by excessive spending on defense. Furthermore, behavioral factors were important, as initially President Kennedy overreacted angrily to Soviet Premier Krushchev’s abrasive tone in his speeches. It took time for him to learn that this was normal in Krushchev’s speeches relating to US–Soviet relations, and was essentially due to the speeches being aimed at pleasing a Soviet audience.
9.1 Nature of behavioral game theory
In the last chapter we considered some aspects of game theory in general, since some of these concepts were necessary in order to understand how intertemporal preferences affect behavior. In particular, we have seen that game theory is relevant whenever there is interdependence in decision-making. In some cases the game was played between a firm and consumers, as in Case 8.1 involving price plans for gym memberships; in other cases the game was played between different ‘selves’, specifically an impatient short-run self and a patient long-run self. We have also come across some of the important concepts in game theory, such as strategies, sequence, commitment and payoffs. In this chapter we consider more general applications of game theory. In order to discuss these applications it is necessary to have a more solid foundation as far as the basic elements of game-theoretic analysis are concerned.
The essence of interdependent decision-making situations is that when A makes a decision (for example regarding price, entry into a market, whether to take a job), it will consider the reactions of other persons or firms to its different strategies, usually assuming that they act rationally, and how these reactions will affect their own utility or profit. It must also take into account that the other parties (from now on called players), in selecting their reactive strategies, will consider how A will react to their reactions. This can continue in a virtually infinite progression. In this situation there is often a considerable amount of uncertainty regarding the results of any decision.
These kinds of situation occur in all areas of economics; some examples are: the setting of interest rates by the central bank in macroeconomic policy; oligopolistic pricing in microeconomics; wage negotiations and strikes in labor economics; bidding in financial economics; and trade negotiations in international economics. Game theory situations also occur in politics, sociology, warfare, ‘games’ and sports, and biology, which make the area a unifying theme in much analysis. Game theorists have therefore come from many different walks of life, although the main pioneers were Von Neumann and Morgenstern (1944) and Nash (1951), who were essentially mathematicians.
Elements of a game
The concept of a game, as we are now using it, therefore includes a large variety of situations that we do not normally refer to as games. A good example is the standard prisoner’s dilemma (PD) game, shown in Table 9.1. The classic PD situation involves two prisoners who are held in separate police cells, accused of committing a crime. They cannot communicate with each other, so each does not know how the other is acting. If neither confesses, the prosecutor can only get them convicted on other minor offences, each prisoner receiving a one-year sentence. If one confesses while the other does not, the one confessing will be freed while the other one receives a 10-year sentence. As one of my students has pointed out, this type of ‘confession’ really amounts to ‘snitching’, and corresponds to defecting, since it increases the player’s payoff at the expense of the other player. If both players confess they will each receive a five-year sentence.
Table 9.1 Prisoner’s dilemma
The values in the table represent payoffs, in terms of jail sentences; the payoffs for Suspect A are the left-hand values, while the payoffs for Suspect B are the right-hand values. The objective for each suspect in this case is obviously to minimize the payoff in terms of jail time. The structure of this kind of game in terms of the relationships between the payoffs is relevant to many types of real-world situation, and we will return to it frequently in both this chapter and the next.
Thus we can say that chess, poker and rock-paper-scissors are games in the conventional sense, as are tennis and football (either American football or soccer). However, games in the technical sense used in this chapter also include activities like going for a job interview, a firm bargaining with a labor union, someone applying for life insurance, a firm deciding to enter a new market, a politician announcing a new education/transport/health policy or a country declaring war. What do these diverse activities have in common?
The following are the key elements of any game:
1 Players – these are the relevant decision-making identities, whose utilities are interdependent. They may be individuals, firms, teams, social organizations, political parties or governments.
2 Strategies – these can be defined in different ways. In some cases the term strategy refers to a complete plan of action for playing a game. In other cases a strategy simply involves the choice of a single action, like ‘confessing’ in a PD game. It is important to understand that in many games there may be many actions involved. A complete plan means that every possible contingency must be allowed for. In this chapter, in keeping with common convention, we will use the term ‘rule’ for a complete plan of action, and reserve the term ‘strategy’ for a specific action or move. Strategies often involve either ‘cooperating’ or ‘defecting’. In the game theory context, ‘defection’ refers to a strategy where one player chooses an action that increases their own payoff at the expense of the other player(s). Cooperation involves making a risky move where the opponent(s) may take advantage of you and reduce your payoff, in the hope that if they return the favor both or all of you may be made better off as a result.
3 Payoffs – these represent changes in welfare or utility at the end of the game, and are determined by the choice of strategy by each player. It is normally assumed that players are rational and have the objective of maximizing these utilities or expected utilities. Notice that the word each is important; what distinguishes game theory from decision theory is that in the latter outcomes only depend on the decisions of a single decision-maker.
The normal-form representation of a game specifies the above three elements, as shown in Table 9.1. Players are assumed to move simultaneously, so these kinds of situation can be represented by tables or matrices. The normal-form representation helps to clarify the key elements in the game.
When the players do not move simultaneously, and the sequence of moves is important, it is necessary to use an extensive-form representation, which usually involves a game tree. The concept of a game tree is illustrated in Figure 9.1; this is an example of an ultimatum game. In this type of game there are two players, and as with many games, there is a proposer (P) and a responder (R). In its standard form, a certain sum of money, frequently $10, represents a value of the gain to exchange, or surplus, that would be lost if no trade was made. P offers the sum x to R, leaving themselves $10 – x. R can either accept the offer, or reject it, in which case both players receive nothing. In the game shown in Figure 9.1 it is assumed that if A determines an even split out of 10 the game ends, and that the only possible uneven split is (8, 2).
Figure 9.1 Extensive form of ultimatum game
The extensive-form representation involves five elements:
1 A configuration of nodes and branches running without any closed loops from a single starting node to its end nodes.
2 An indication of which node belongs to each player.
3 Probabilities that ‘nature’ (an external force) uses to choose branches at random nodes.
4 Collections of nodes, which are called information sets.
5 Payoffs at each end node.
Nodes, therefore, represent decision points for a particular player, or for nature (for example, it may decide to rain or not). Information sets are collections of nodes where a player has the move at every node in the information set, and when the play in the game reaches a particular node, the player with the move does not know which node in the information set has been reached. There are two decision nodes, the first for A and the second for B. The second node has an information set. The equilibria for both this game and the prisoner’s dilemma are discussed in the next section.
Types of game
There are many different types of game theory situation, and different methods of analysis are appropriate in different cases. It is therefore useful to classify games according to certain important characteristics.
1 Cooperative and non-cooperative games
In cooperative games the players can communicate with each other and collude. They can also enter into third-party enforceable binding contracts. Much of this type of activity is expressly prohibited by law in developed countries. Many of the games that are of interest in economic situations are of the non-cooperative kind. This type of game involves forming self-enforcing reliance relationships, which determine an equilibrium situation. The nature of such equilibria is discussed in the next section. We shall also see that many games involve a mixture of cooperation and competition, and this is true of the basic ‘one-off’ PD game.
2 Two-player and multi-player games
PD situations are obviously two-player games. However, this kind of game is capable of being extended to consider more than two parties, as we have seen in the previous chapter, in the context of public goods games. Having more players tends to increase the likelihood of defection, particularly in the ‘one-off’ situation, referred to as a ‘one-shot’ game. One version of such a situation is sometimes referred to as ‘the tragedy of the commons’. This applies in cases where property rights are untradeable, insecure or unassigned, for example, where pollution is involved. The reasoning is that with more players it is important to defect before others do; only if defectors are easily detected and punished will this be prevented. The depletion of fish stocks in the North Sea due to over-fishing, and the resulting conflicts, are an example of the tragedy of the commons. In other situations, instead of the resource being overused, it is undersupplied, as with public goods like street-lighting and hospitals. With multi-player games there is also the opportunity for some of the players to form coalitions against others, to try and impose strategies that would otherwise be unsustainable.
3 Zero-sum and nonzero-sum games
With zero-sum games, sometimes called constant-sum games, the gain of one player(s) is automatically the loss of another player(s); thus the sum of the gains (or losses) of the players is constant. This can apply for example in derivatives markets, where certain transactions occur between two speculators. However, most situations involve nonzero-sum games; furthermore, even when monetary gains and losses offset each other, the utilities of such gains and losses may not do so, because of loss-aversion.
4 Complete and incomplete information
In the version of the PD presented above it was assumed that all the players knew for certain what all the payoffs were for each pair of strategies. In practice, this is often not the case, and this can also affect strategy. In some cases a player may be uncertain regarding their own payoffs; in other cases they may know their own payoffs but be uncertain regarding the payoffs of the other player(s). For example, an insurance company may not know all the relevant details regarding the person applying for insurance, a situation leading to adverse selection. Likewise, bidders at an auction may not know the valuations that other parties place on the auctioned item. Games with incomplete information are unsurprisingly more difficult to analyze.
5 Static and dynamic games
Static games involve simultaneous moves; the PD game is a simultaneous game, meaning that the players make their moves simultaneously, without knowing the move of the other player. In terms of analysis the moves do not have to be simultaneous in chronological terms, as long as each player is ignorant of the moves of the other player(s). Many life situations involve dynamic games; these involve sequential moves, where one player moves first and the other player moves afterward, knowing the move of the first player. The ultimatum bargaining game is an example of a dynamic game. The order of play can make a big difference to the outcome in such situations.
6 Discrete and continuous strategies
Discrete strategies involve situations where each action can be chosen from a limited number of alternatives. In the PD game there are only two choices for each player, to confess or not confess; thus this is a discrete strategy situation. In contrast, a firm in oligopoly may have a virtually limitless number of prices that it can charge; this is an example of a continuous strategy situation. As a result the analytical approach is somewhat different, in terms of the mathematical techniques involved.
7 ‘One-shot’ and repetitive games
The distinction between these two types of situation has already been discussed in the previous chapter. Most short-run decision scenarios in business, such as pricing and advertising, are of the repetitive type, in that there is a continuous interaction between competitors, who can change their decision variables at regular intervals. Some of these games may involve a finite number of plays, where an end of the game can be foreseen, while others may seem infinite. Long-run decisions, such as investment decisions, may resemble the ‘one-shot’ situation; although the situation may be repeated in the future, the time interval between decisions may be several years, and the next decision scenario may involve quite different payoffs.
Behavioral game theory and standard game theory
Standard game theory (SGT) generally involves four main assumptions, which have left it exposed to criticism: (1) people have correct mental representations of the relevant game; (2) people have unbounded rationality; (3) equilibria are reached instantly, since there are no time lags due to learning effects or other factors; and (4) people are motivated purely by self-interest. We will see that, in spite of these assumptions, SGT does not necessarily perform badly in many situations, when its predictions are compared with empirical findings. Indeed, in many ‘one-shot’ games, both static and dynamic, and involving complete and incomplete information, its predictions can be quite accurate. This applies in particular to ‘market-like contexts involving the interaction of many mutually anonymous agents capable of forming complete, third-party enforceable contracts’ (Eckel and Gintis, 2010). However, as Goeree and Holt (2001) have noted, changes in payoffs can result in significant anomolies, discussed in later sections. In other games, like bargaining games and iterated games, its predictions may be way off track; for example, in the ultimatum bargaining game in Figure 9.1, B may be outraged by the uneven offer and reject it because it violates social norms of fairness. We will examine this kind of situation in more detail in the next chapter. However, we often find that, by relaxing the SGT assumptions, and adding certain new parameters within the basic game-theoretic framework, we can improve fit and prediction significantly. This is in keeping with the general approach of behavioral economics regarding the modification and extension of the standard model.
Camerer (2009) has stated that there are four main elements of behavioral game theory, corresponding to the four assumptions of SGT mentioned above:
1 Representation
This refers to how a game is perceived or mentally represented. Often players perceive a game incorrectly or have an incomplete representation of a game, and SGT tends to ignore this aspect.
2 Initial conditions
These involve the players’ beliefs about the game situation. SGT assumes that these are correct, and that actions match beliefs. Behavioral game theory takes bounded rationality into account, either by proposing limits on strategic thinking (as in cognitive hierarchy theory), or by assuming players make stochastic mistakes because of ‘noise’, meaning unwanted information that interferes with intended signals.
3 Learning
This is relevant in repeated games, where players can learn from their own and others’ payoffs, other players’ strategies, and can also learn about what other players are likely to do. This factor is ignored in SGT.
4 Social preferences
Players have preferences regarding not only their own payoffs, but also those of others, and the distribution of these payoffs, and these are ignored in SGT.
Thus throughout the rest of the chapter we will be considering models that tend to be more complex than standard models. However, we will be focusing on the first three elements described above, and the impact of the fourth factor, social preferences, will be examined in the next chapter. In the discussion of these models we will find two strands of analysis in Behavioral Game Theory (BGT) that are additional to standard game theory:
1 A sound basis of experimental evidence
This entails an examination and evaluation of many empirical studies, to see what anomalies arise with the relevant SGT model, and modifying it accordingly.
2 A sound basis in the discipline of psychology
BGT models are not only constrained by empirical evidence but they are also based on theory from psychology.
9.2 Equilibrium
In order to determine strategy or an equilibrium situation, we must first assume that the players are rational utility maximizers. We can now consider four main types of equilibrium and appropriate strategies in situations involving different payoffs. These are: (1) dominant strategy equilibrium; (2) iterated dominant strategy equilibrium; (3) Nash equilibrium; and (4) subgame perfect Nash equilibrium (SPNE). It is appropriate to consider these equilibria in this order from a pedagogical point of view, since they are increasingly complex in nature. However, the concept of Nash equilibrium, in a variety of forms, is the most important since it is the most general, and it includes the first two types above as special cases. All of these equilibria are based on the assumption that players have the objective of maximizing their expected utilities, and expect that other players have the same objective.
It is also necessary to distinguish between discrete and continuous strategies, since, although the same conditions apply, the analytical approach is different. In the next section we will consider another important type of equilibrium, known as mixed-strategy equilibrium (MSE), and will discuss other types of equilibrium also. Since discrete strategies are generally easier to analyze, we will discuss these first.
Discrete strategies
As described earlier, these relate to situations where each action can be chosen from a limited number of alternatives.
1 Dominant strategy equilibrium
A strategy S1 is said to strictly dominate another strategy S2 if, given any collection of strategies that could be played by the other players, playing S1 results in a strictly higher payoff for that player than does playing S2. Thus we can say that if player A has a strictly dominant strategy in a situation, it will always give at least as high a payoff as any other strategy, whatever player B does. A rational player will always adopt a dominant strategy if one is available. Therefore, in any static game involving discrete strategies, we should always start by looking for a dominant strategy. This is easiest in a two-strategy situation; we can discuss this process in the PD situation described earlier. Table 9.2 is a repeat of this situation. One problem with explaining the prisoner’s dilemma situation is that there is a tendency for confusion to arise regarding how confessing and denying correspond to cooperation and defection. The normal meaning of confessing and denying, and even the sounds of the words, prompts many students to think of confessing corresponding to cooperation and denying to defection, but actually it is the other way around. That is why we must remember that confessing is really ‘snitching’, while denying amounts to standing firm. Whatever strategy is used by the other player, the best response is always to defect, or confess. If suspect B confesses or snitches, suspect A is better off confessing, since they will only get 5 years rather than a 10-year sentence. If suspect B does not confess, suspect A is still better off confessing, since they will get off free rather than serving a year. Thus we can say that for each player there is a dominant strategy of confessing or defecting.
Table 9.2 Dominant strategy equilibrium
When there are many possible strategies dominant strategies have to be found by a process of eliminating dominated strategies. We could also say that not confessing is in this case a dominated strategy for both players; this means that it will always give a lower or equal payoff, whatever the other player does.
Therefore, given the payoffs in Table 9.2, it is obvious that there is a dominant strategy equilibrium, meaning that the strategies pursued by all players are dominant. By individually pursuing their self-interest each player is imposing a cost on the other that they are not taking into account. It can therefore be said that in the PD situation the dominant strategy outcome is Pareto dominated. This means that there is some other outcome where at least one of the players is better off while no other player is worse off. However, Pareto domination considers total or social welfare; this is not relevant to the choice of strategy by each player.
2 Iterated dominant strategy equilibrium
What would happen if one player did not have a dominant strategy? This is illustrated in Table 9.3, which is similar to Table 9.2 but with one payoff changed. There is now an asymmetry in the matrix of payoffs, because a confession by A if B does not confess results in a 2-year sentence, maybe because A has had a prior conviction. Although B’s dominant strategy is unchanged, A no longer has a dominant strategy. If B confesses (or defects), A is better off also confessing, as before, but if B does not confess (or cooperates), A is better off also not confessing.
Table 9.3 Iterated dominant strategy equilibrium
In this case A can rule out B not confessing (that is a dominated strategy for B), and conclude that B will confess; A can, therefore, iterate to a dominant strategy, which is to confess. Thus the equilibrium is the same as before. The general rule for determining the iterated dominant strategy equilibrium is to identify all the dominated strategies first and eliminate them. With games involving a greater number of possible strategies it is more difficult to determine an iterated dominant strategy equilibrium, since it can take a while to work out all the strategies that are dominated.
3 Nash equilibrium
The situation becomes more complicated when neither player has a dominant strategy. This means that we are no longer considering a PD, since the structure of the payoffs has changed, as shown in Table 9.4. Now the table is symmetrical again, but both suspects now get a 2-year sentence if they confess when the other suspect does not confess.
Table 9.4 Game with no dominant strategy
There is no single equilibrium here, meaning that there is no universal tendency for either player to take either action. Instead we have to use the concept of a Nash equilibrium. This represents an outcome where each player is pursuing their best strategy in response to the best-response strategy of the other player. This is a more general concept of equilibrium than the two equilibrium concepts described earlier; while it includes dominant strategy equilibrium and iterated dominant strategy equilibrium, it also relates to situations where the first two concepts do not apply. There are two such equilibria in Table 9.4:
(i) If B confesses, A is better off confessing; and given this best response, B’s best response is to confess.
(ii) If B does not confess, A is also better off not confessing; and given this best response, B’s best response is not to confess.
The same equilibria could also be expressed from the point of view of determining B’s strategy:
(i) If A confesses, B is better off confessing; and given this best response, A’s best response is to confess.
(ii) If A does not confess, B is also better off not confessing; and given this best response, A’s best response is not to confess.
Both A and B will clearly prefer the second equilibrium, but there is no further analysis that we can perform to see which of the two equilibria will prevail. This presents a problem for strategy selection if the game is repeated, as will be seen later.
The concept of Nash equilibrium is an extremely important one in game theory, since frequently situations arise where there is no dominant strategy equilibrium or iterated dominant strategy equilibrium. In the next section we will examine cases where none of the three kinds of equilibrium situation exists, and mixed strategies are involved.
4 Subgame perfect Nash equilibrium
This kind of equilibrium is relevant in extensive-form games. We will use the ultimatum game in Figure 9.1 for illustration. This is repeated in Figure 9.2.
A subgame is the continuation game from a singleton node (a node with no other nodes in its information set) to the end nodes which follow from that node. Thus there is a subgame at the decision node for B. Subgame perfection means that players play their equilibrium strategies if the subgame is reached. SPNE is an equilibrium for the complete game where players play their equilibrium strategies in each subgame. In order to determine the SPNE for a game we have to use the method of backwards induction. This means thinking forward and working backward, and we will see that this often unnatural method is the key to successful strategy in many situations. In the ultimatum game in Figure 9.2 it means that, in order to determine A’s optimal, or equilibrium, strategy, we must first consider B’s situation. B must make a decision if A goes for an uneven split. According to standard game theory (ignoring social preferences), a rational B will accept the uneven split, since a payoff of 2 is better than 0 from rejecting the offer. Working backwards, we can now say that A will therefore decide to go for an uneven split, since a payoff of 8 is better than the 5 from an even split. Thus the SPNE for the game is (uneven, accept|uneven).
Figure 9.2 Extensive form of ultimatum game (Figure 9.1 repeated)
It should be noted that SPNE is more restrictive than Nash equilibrium, since it assumes that Nash equilibrium must apply in all sub-games, as well as in the overall game. In the ultimatum game above there are two Nash equilbria, but only the one determined above is subgame perfect. The other Nash equilibrium is (even, reject|uneven). In the second case, if A anticipates that B will reject an uneven offer, A will decide to go for an even split, so this is a best response. If A goes for an even split, B does not get a chance to respond (it is assumed in this game). However, although (even, reject|uneven) is a Nash equilibrium, it is not subgame perfect, since B should not reject an uneven split according to standard game theory.
Continuous strategies
Strategies in this case relate to a continuous variable rather than a discrete one. Price and quantity of output are frequent examples in economics. We will take two examples from oligopoly theory here, both of which involve competition in the form of output. The first relates to the Cournot case, where moves are simultaneous, and the second relates to the Stackelberg case (1934), where moves are sequential. We will assume the same parameters for market demand and cost conditions in each case, which will allow us to see the advantages of being the first mover in such oligopoly situations.
1 Cournot oligopoly
This model, originally developed in 1838, initially considered a market in which there were only two firms, A and B. In more general terms we can say that the Cournot model is based on the following assumptions:
(i) There are few firms in the market and many buyers.
(ii) The firms produce homogeneous products; therefore each firm has to charge the same market price (the model can be extended to cover differentiated products).
(iii) Competition is in the form of output, meaning that each firm determines its level of output based on its estimate of the level of output of the other firm. Each firm believes that its own output strategy does not affect the strategy of its rival(s).
(iv) Barriers to entry exist.
(v) Each firm aims to maximize profit, and assumes that all the firms do the same.
Because strategies are continuous in the Cournot model, this allows a more mathematical approach to analysis.
The situation can be illustrated by using the following example, involving two firms A and B:
(i) Market demand is given by P = 400 – 2Q
(ii) Both firms have constant marginal costs of $40 and no fixed costs.
The analytical procedure can be viewed as involving the following steps.
Step 1
Transform the market demand into a demand function that relates to the outputs of each of the two firms. Thus we have:
(9.1) |
Step 2
Derive the profit functions for each firm, which are functions of the outputs of both firms. Bearing in mind that there are no fixed costs and therefore marginal cost and average cost are equal, the profit function for firm A is as follows:
(9.2) |
Step 3
Derive the optimal output for firm A as a function of the output of firm B, by differentiating the profit function with respect to QA and setting the partial derivative equal to zero:
(9.3) |
Strictly speaking, the value of QB in this equation is not known with certainty by firm A, but is an estimate. Equation 9.3 is known as the best response function or response curve of firm A. It shows how much firm A will put on the market for any amount that it estimates firm B will put on the market.
The second and third steps above can then be repeated for firm B, to derive firm B’s response curve. Because of the symmetry involved, it can be easily seen that the profit function for firm B is given by:
ΠB = 360QB – 2QB2 – 2QBQA |
(9.4) |
And the response curve for firm B is given by:
QB = 90 – 0.5QA |
(9.5) |
This shows how much firm B will put on the market for any amount that it estimates firm A will put on the market. The situation can now be represented graphically, as shown in Figure 9.3.
Step 4
Solve the equations for the best response functions simultaneously to derive the Cournot equilibrium. The properties of this equilibrium will be discussed shortly.
QA = 90 – 0.5QB
QB = 90 – 0.5QA
QA = 90 – 0.5(90 – 0.5QA)
QA = 90 – 45 + 0.25QA
0.75QA = 45
QA = 60
QB = 90 – 0.5(60) = 60
Figure 9.3 Courmot response curves
The market price can now be determined:
P = 400 – 2(60 + 60)
P = $160
The equilibrium obtained here is also referred to as a Cournot–Nash Equilibrium, since each firm is making a best response to the other firm’s strategy, and there is no tendency for either firm to deviate from this strategy.
2 Stackelberg oligopoly
Although this model was originally developed in non-game-theory terms, we will apply a game-theoretic analysis to the situation. The basic assumptions underlying the Stackelberg model are as follows:
(i) There are few firms and many buyers.
(ii) The firms produce either homogeneous or differentiated products.
(iii) A single firm, the leader, chooses an output before all other firms choose their outputs.
(iv) All other firms, as followers, take the output of the leader as given.
(v) Barriers to entry exist.
(vi) All firms aim to maximize profit, and assume that the other firms do the same.
We shall now refer to the same situation, in terms of demand and cost functions, as that assumed earlier for the Cournot duopoly and examine how equilibrium is determined. We shall then draw certain conclusions regarding the differences in outcomes.
Market demand was given by: P = 400 – 2Q
Both firms have a cost function given by: Ci = 40Qi
Thus we can write the market demand as:
(9.6) |
Where QL is the output of the leader and QF is the output of the follower.
Because the Stackelberg situation is a dynamic or sequential game, we need to analyze the situation by using the same fold-back method described for the ultimatum bargaining game, even though that game involved discrete strategies. Therefore we must first consider the situation for the follower. They are essentially acting in the same way as a Cournot duopolist. Thus their profit function is given by:
(9.7) |
The next step is to obtain the response function for the follower, by deriving the optimal output for the follower as a function of the output of the leader; thus we differentiate the profit function with respect to QF and set the partial derivative equal to zero:
(9.8) |
It should be noted that this is the same as the Cournot result as far as the follower is concerned. However, the leader can now use this information regarding the follower’s response function when choosing the output that maximizes its own profit. Thus it will have the demand function given by:
(9.9) | |
(9.10) |
The leader’s profit function is given by:
(9.11) | |
(9.12) |
We can now obtain the output of the follower by using the response function in (9.8), giving us QF= 45.
These outputs allow us to obtain the market price:
P = 400 – 2(90 + 45) = $130 |
(9.13) |
We can now obtain the profits for each firm:
ΠL = (130 – 40)90 = $8100
and ΠF = (130 – 40)45 = $4050
Total profit for the industry is $12,150.
These results can be compared with the Cournot situation (CS), yielding the following conclusions:
(i) Price is not as high as in the CS ($130 compared with $160).
(ii) Output of the leader is higher and output of the follower lower than in the CS.
(iii) Profit of the leader is higher and profit of the follower lower than in the CS.
(iv) Total profit in the industry is lower than in the CS ($12,150 compared with $14,400).
Thus we can see that in the Stackelberg situation there is an advantage to being the first mover. However, we should not think that there is always an advantage to being first mover, as we shall see later.
Empirical studies of simple games
Even in simple static games with complete information there are often anomalies in terms of discrepancies between predictions based on standard game theory and actual observed behavior. One example, discussed by Goeree and Holt (2001), is the ‘traveler’s dilemma’ (TD) game. In this game two players independently and simultaneously choose integer numbers between (and including) 180 and 300. Both players are paid the lower of the two numbers, and, in addition, a positive amount R is transferred from the player with the higher number to the player with the lower number. For example, if one player chooses 200 and the other chooses 240, they receive payoffs of 200 + R and 200 – R respectively. Since R is positive, the best response is to undercut the other player by 1 (if their decision were known), and both players will iterate to a dominant strategy equilibrium of choosing the lower bound of the range, in this case 180. It should be noted that the size of R does not affect the equilibrium in this game. Goeree and Holt (2001) found that SGT predicted behavior well when the cost of having the higher number was large (R = 180), with 80% of all subjects in their experiment choosing the SGT equilibrium strategy. However, when the cost of having the higher number was low (R = 5), the SGT prediction was way off target, with about 80% of the subjects this time choosing the highest number in the range of 300. This result will be commented on in the conclusion of the section.
Other anomalies in simple games have been noted, including the one-shot prisoners’ dilemma. There is a tendency to cooperate here, in both experiments and in real life, which is not predicted by SGT, where we have seen that the dominant strategy equilibrium is for both players to defect. Since there are multiple theories relating to the causes of this anomaly, a discussion of it is best left until after the examination of repeated games, as many PD games in real life are of this variety.
So far we have discussed anomalies related to Nash equilibrium. Another weakness related to this concept is that in many game situations there are multiple equilibria, and SGT is silent regarding which of these is more likely. One example of this kind of game is a ‘minimum-effort coordination game’, again investigated by Goeree and Holt (2001). In this game two players simultaneously choose ‘effort’ levels, in the range from 110 to 170, with a cost; the payoff for each player is a joint product, consisting of the minimum of the two efforts, minus the product of the player’s own effort and a constant cost factor, c, where c < 1. In this game any common effort in the range is a Nash equilibrium, because a unilateral 1-unit increase in effort above a common starting point will not change the minimum but will reduce one’s payoff by the cost of the effort, c. Similarly, a 1-unit decrease in effort will reduce the payoff by 1 – c, meaning that it will reduce the minimum product by more than the savings in effort cost. SGT cannot therefore produce predictions regarding what level of effort players are likely to make in a one-shot game of this type. It does however predict that the cost of effort should not affect the equilibria in general terms, so long as c < 1. On the contrary, Goeree and Holt found that when the cost of effort was low (c = 0.1), behavior was concentrated at the highest effort level of 170, while when the cost of effort was high (c = 0.9) efforts were concentrated at the lowest possible level.
There are other examples of coordination games with multiple equilibria where there are ‘focal points’, but these involve a more complex analytical approach and are discussed in the section on iterated games.
Behavioral conclusions
Although there are both anomalies and areas where SGT is silent, for the most part the contradictions and observed behavior are ‘generally consistent with simple intuition based on the interaction of payoff asymmetries and noisy introspection about others’ decisions’ (Goeree and Holt, 2001). Most people would not be surprised that a greater cost of error or effort would reduce the value of players’ decisions in the traveler’s dilemma or coordination games. We will also see that this conclusion of Goeree and Holt tends to apply to other empirical anomalies related to other types of game discussed in later sections.
9.3 Mixed strategies
Pure and mixed strategies
All the strategies so far discussed have involved what are called ‘pure’ strategies. A pure strategy always responds in the same way to a given situation, or, in more technical terms, it involves the selection of exactly one action at each decision node. However, there are many games where there is no Nash equilibrium in pure strategies. This applies to ‘trivial’ games like matching pennies and rock-paper-scissors, and to real-life games like poker, tennis and football (both American and soccer). For example, in rock-paper-scissors, if A plays rock, B’s best response is to play paper (paper wraps rock); however, A’s best reply to B’s best response is to play scissors (scissors cuts paper), not rock, and thus there is no Nash equilibrium here, or if any other action is taken by either player, in terms of pure strategies. On the other hand, we will see shortly that there is an equilibrium in terms of mixed strategies.
We can introduce the idea of a mixed strategy by considering the generic game referred to as ‘Battle of the Sexes’ (BOS). The nature of this game is that a pair of players, one of each sex, want to spend an evening out together, but they have different interests. A wants to watch a boxing match, but B wants to go to the ballet. In this situation a simplified payoff table can be illustrated by Table 9.5. As before, row payoffs are given first and column payoffs second.
If both players go to the ballet, B has a fine time, but A does not enjoy himself, except for having the company of his partner. The situation is reversed if they both watch boxing. On the other hand, if each does their own thing, it is assumed that they are miserable without each other’s company. In this situation the reader should be able to verify that there are two Nash equilibria in terms of pure strategies: either they both go to the ballet, or they both watch boxing.
Table 9.5 Battle of the sexes
There is also another equilibrium in terms of mixed strategies. This should be easy to see in common sense terms, at least if the situation is a repeated game: half the time they go to the ballet, and half the time they watch boxing. This type of equilibrium is referred to as a mixed strategy equilibrium (MSE), since there is no tendency or incentive for the players to depart from it. In this case there is no mathematical computation necessary to determine the MSE, since the payoff table is symmetrical. If the payoffs are asymmetrical the MSE is more complex to determine, as we will see in the next section.
Unpredictability
In spite of the title ‘Battle of the sexes’, this game is essentially a game involving cooperation, as is the prisoner’s dilemma. In both cases the players are trying to coordinate their actions. However, in competitive games the key to success is often unpredictability. If an opponent can detect a pattern in your behavior, then they will beat you. This applies to the games mentioned earlier where there is no equilibrium in pure strategies. For example, if your opponent knows that you are going to play rock each time, they will always play paper and beat you. Equally, if they detect that you play an alternating pattern of rock, then scissors, then paper, they will also be able to beat you by selecting the appropriate responses of paper, rock and scissors. Any detectable pattern can thus be beaten.
Let us consider the well-known situation in tennis where one player is serving and the other is receiving. This is a good example to use for several reasons: (1) both players have two main possible actions: server can serve to forehand or backhand, and receiver can move to forehand or backhand; (2) these actions are repeated many times between the same two players in a match, enabling any pattern to be detected; and (3) an extensive field study (Walker and Wooders, 2001) has been conducted to compare theoretical predictions with empirical observations. We can consider this a simultaneous game, since, at least at the top level of the game, the receiver must anticipate the direction of the serve and decide what direction to move in before the server hits the ball if they are to have a reasonable chance of making a return.
A simplified form of this situation is illustrated in Table 9.6, where the server’s payoff is 0 if the serve is returned and 1 if it is not. This is a zero-sum game, so the receiver’s payoff is 1 if a return is made and 0 if not. It is assumed at this point that if the receiver anticipates wrongly he will fail to make a return, but if he anticipates correctly he will return successfully.
Table 9.6 Game with no Nash equilibrium in pure strategies
In terms of Nash equilibrium, if the server aims to the forehand, the receiver’s best response is to move to the forehand; obviously the server’s best response to this best response is to serve to the backhand. The situation is reversed if the server aims to the backhand, so there is no Nash equilibrium in pure strategies. This means that there are two significant differences between this situation and the BOS game discussed earlier. We have already noted that in this case the players are in competition with each other, rather than trying to cooperate with each other. The game in this case is also a zero-sum game: if one player gains, the other automatically loses the same amount (ignoring loss-aversion). This situation arises in more general terms whenever one player wants a coincidence of actions, while the other player does not. This happens in many real-life situations, not just in recognized games: employers want to monitor employees who shirk, while shirkers want to avoid being monitored; the tax authorities want to audit those who evade taxes, while evaders want to avoid being audited; attacking armies want to gain an element of surprise, while defenders want to avoid being surprised. The question, therefore, arises: how does each player determine an optimal strategy, maximizing payoffs, in this kind of situation?
Randomization
As stated earlier, the key to success is unpredictability. This is achieved by a process of randomization. In the example in Table 9.6 the optimal strategy for each player, i.e. the MSE, is to randomize their actions so that half of the time they serve or move in one direction and half of the time they go the other way. Randomization in this case means that the players must each act in such a way that it is as if they are tossing a coin to determine their action at each play. Only by randomizing their actions can they avoid their opponent detecting a pattern in their play, allowing the opponent to anticipate their actions and beat them.
However, it is important to realize that the tossing a coin analogy is only appropriate when payoff matrices are symmetrical, like in the simple tennis example and the BOS game (randomization is not necessary there, since the players are trying to cooperate rather than compete, and a simple alternating scheme would suffice). When payoffs are asymmetrical the MSE involves a more complex type of randomization, and it has to be calculated. Randomization may seem like ‘madness’ in terms of being the basis of a strategy, but there must be method in it if it is going to be a sensible, or optimizing, strategy. There must be a pattern in one’s lack of pattern. This seeming paradox can be illustrated by a more realistic development of the tennis example in Table 9.6. We will now consider the situation where payoffs are no longer either ‘succeed/fail’ or (1, 0), but allow for degrees of success. In other words we are now going to consider a game involving continuous rather than discrete payoffs. Table 9.7 indicates the probabilities of the server beating the receiver and the complementary probabilities of the receiver returning. This table is adapted from the excellent and highly readable book by Dixit and Nalebuff (1991), Thinking Strategically (p. 173).
Table 9.7 Mixed strategy equilibrium
This situation is still a zero-sum game (the probabilities or payoffs in each cell always add up to 100%), but it is not symmetrical since the receiver’s forehand is stronger than their backhand. This can be seen from the fact that if the receiver correctly anticipates a serve to their forehand they will make a successful return 90% of the time, while if they correctly anticipate a serve to their backhand their success rate is only 60%. In order to understand how the MSE is derived from the optimal strategies for each player, let us consider first of all the pattern of 50/50 randomization so far discussed, which we will see is suboptimal for both players. The server wants to maximize the percentage of winning serves (minimize the percentage of successful returns) and the receiver wants to do the opposite.
If the server serves to the forehand half the time and to the backhand half the time, the server’s success rate when the receiver moves to their forehand will be 0.5(10%) + 0.5(80%) = 45%, while their success rate when the receiver moves to their backhand will be 0.5(70%) + 0.5(40%) = 55%. Thus the average success rate for the server is 50% (and it is also 50% for the receiver). However, this figure assumes that the receiver is moving to forehand and backhand on a 50/50 basis. We can now see that this is not optimal for the receiver, since by moving to their forehand all the time they can improve their success rate from 50% to 55%, and thus reduce the server’s success rate to 45%. How then can we derive an optimization strategy for each player?
The key intuition here is to see that when a player is optimizing their strategy there is no incentive for the opponent to change their strategy. As long as the opponent can gain by changing their strategy, one is not optimizing one’s own strategy, as seen in the example above when both players start with a 50/50 randomization pattern. Thus player A (the server) maximizes their payoffs when B (the receiver) is indifferent between their actions (moving to forehand or backhand). The solution can be obtained by using some simple algebra. Let A serve to the forehand in the proportion p, and to the backhand in the proportion (1–p). Similarly, let B move to the forehand in the proportion q, and to the backhand in the proportion (1–q). In order to compute the optimal strategy for A therefore we must equate B’s payoffs from moving in either direction:
Average payoff from moving to forehand = p(90) + (1–p)(20) = 70p + 20
Average payoff from moving to backhand = p(30) + (1–p)(60) = –30p + 60
70p + 20 = –30p + 60
100p = 40
p = 0.4 or 40%
Thus the server’s optimal strategy is to serve to forehand 40% of the time and to backhand 60% of the time. Only with these proportions is the receiver unable to exploit the situation to their own advantage, and to the server’s disadvantage.
The optimal strategy for the receiver can be calculated in a similar way. In this case the server’s payoffs from serving in either direction must be made equal:
Average payoff from serving to forehand = q(10) + (1–q)(70) = –60q + 70
Average payoff from serving to backhand = q(80) + (1–q)(40) = 40q + 40
–60q + 70 = 40q + 40
–100q = –30
q = 0.3 or 30%
Thus the receiver’s optimal strategy is to move to forehand 30% of the time and to backhand 70% of the time. Only with these proportions is the server unable to exploit the situation to their own advantage, and to the receiver’s disadvantage.
We can also compute the overall success rate (s) for the server if they use their optimal strategy and the receiver reacts accordingly:
s = 0.4{0.1(0.3) + 0.7(0.7)} + 0.6{0.8(0.3) + 0.4(0.7)} = 0.52 or 52%
The corresponding success rate for the receiver is therefore 48%.
A number of things can be observed regarding the MSE of optimal strategies here. The first general point is that the MSE is always identical to both a maximin and a minimax strategy for each player. This means that the server is trying to maximize their own minimum payoff, which will result if the opponent is optimizing their own strategy. We have seen for example that if the server randomizes on a 50/50 basis the receiver can exploit this to reduce the server’s overall success rate to 45%. Therefore the 50/50 pattern is not a maximin strategy; the minimum payoff is maximized at 52%. A similar line of reasoning applies to the receiver. Likewise, a minimax strategy (sometimes called ‘minimax regret’) means that each player is trying to minimize the maximum payoff for their opponent; this follows from the zero-sum nature of the game.
Another observation is that, like many predictions of game theory, the solution is not an intuitive one. While it is not as counterintuitive as some predictions we will come across, it may seem strange that the receiver should move to their stronger forehand so little, only 30% of the time. This is because the server is serving more to the more vulnerable backhand, and it therefore pays the receiver to move to that side more often.
Empirical studies of MSE
It is all very well to say that a server should serve to the forehand 40% of the time on a random basis, but how can the server actually achieve this? A number of empirical studies have been performed going back to the 1950s examining this randomization process, and how successful it is in achieving MSE. Almost all of these studies have involved experiments, and these have become more sophisticated and more revealing over time, as various design flaws have been eliminated.
Many psychologists and neuroscientists believe that the brain incorporates some kind of randomizing mechanism (for a survey see Glimcher, 2003), but the precise operation of this has not yet been studied in detail at the physiological level. What has emerged from empirical studies, however, is that this mechanism is far from perfect. Although results from different studies vary in their conclusions, the general pattern is that departures from MSE, while often small, are usually statistically significant. These departures can be observed both in games requiring randomization and in direct randomization tasks where subjects are asked to generate a sequence of random responses. There are three main aberrations from a correct pattern of randomization:
1 People produce too many runs of numbers in a sequence
A run is a succession of similar responses. In order to explain this factor it is helpful to give an example. Take the sequence THHTHTTH. This sequence has eight responses, and six runs. The maximum number of runs of eight is obtained if the responses alternate each time.
2 People alternate their responses too much
This observation is similar to the first, and probably has the same psychological foundation, as we will see later. This phenomenon is also commonly observed in real life; for example, people avoid betting on lottery numbers that have recently won, until it is ‘their turn’ to win again.
3 People generate samples that are too balanced
There is a tendency for people to assume that large sample properties are also observed in small samples. We can use the previous example of the heads-and-tails sequence to illustrate this phenomenon. Obviously, in a large sample one would expect the total numbers of heads and tails to be approximately equal; however, in a small sample there is statistically a relatively high probability of the sample being biased. The probability of obtaining an exactly 50/50 distribution of heads and tails in a sequence of eight coin tosses is only 0.27.
In spite of these general findings there are some interesting results regarding learning. As mentioned earlier, there have been some field studies examining the performance of professional players in certain games in terms of their abilities to randomize successfully. We have seen that successful randomization can be judged by observing whether expected payoffs are equal with each action. For example, if a tennis player is randomizing properly, his success rate will be the same serving to the forehand as serving to the backhand. Several studies have now been performed in this area, one in tennis (Walker and Wooders, 2001) and two in European football (Chiappori, Levitt and Groseclose, 2002; Palacios-Heurta, 2001). Walker and Wooders studied ten big tennis matches in the period 1974–1997, concentrating on long matches in order to provide a larger sample of points. They observed in particular the proportions of winning points when servers served to the right or left. The football studies examined both the direction of penalty kicks and the direction of goalkeeper moves. The main finding in all studies is that win rates from different actions are approximately the same, supporting the hypothesis that professional players at least can successfully randomize to achieve MSE. Walker and Wooders note that the pro tennis players still have a tendency to over-alternate, although much less than the results observed in experimental studies.
As a final comment regarding these empirical studies it is relevant to consider the conclusion of Walker and Wooders:
The theory (of MSE) applies well (but not perfectly) at the ‘expert’ end of the spectrum, in spite of its failure at the ‘novice’ end. There is a very large gulf between the two extremes, and little, if anything, is presently known about how to place a given strategic situation along this spectrum (p. 1535).
There is another empirical anomaly that has been observed with games involving MSE, and this relates to situations where payoffs are asymmetrical, and as a result players cease to randomize on a 50/50 basis. Goeree and Holt (2001) investigated this anomaly using a ‘matching pennies’ game, which essentially has the same payoff structure as the tennis game: one player is trying to match strategies (like the returner) and the other is trying to mismatch strategies (like the server). Goeree and Holt found that the standard MSE prediction of a 50/50 division of choices was highly accurate when the payoffs were symmetrical, but was way off when payoffs were asymmetrical. It should be noted that changing one player’s payoffs should not affect their strategy according to SGT, since their strategy should be based on the other player being made indifferent between the two alternatives. However, when one player’s payoff for matching strategies of ‘left–left’ was quadrupled (with other payoffs unchanged), this increased the proportion of players choosing the ‘left’ strategy from around the predicted 50% to 96%. Furthermore, their opponents seemed to anticipate this, with 84% of them iterating to the appropriate mismatching response of ‘left–right’. A similar opposite pattern was observed when the ‘matching’ players had their payoffs for ‘left–left’ reduced to about half of the original level: they chose the ‘right’ strategy, and again this move was largely anticipated by their opponents.
Behavioral conclusions
At this point the main question we have to ask concerns the causes of the aberrations from MSE observed in empirical studies. What is the psychological foundation for such aberrations? Rapoport and Budescu (1997) propose that there is a combination of two factors at work: limited working memory and the representativeness heuristic. Gintis (2009) proposes a different approach to the issue, focusing on social norms and correlated equilibrium. We will examine each explanation in turn, since they focus on quite different aspects.
1 Limited working memory and the representativeness heuristic
This essentially relates to the concept of bounded rationality. In their model subjects remember only the previous m elements in their sequence and use the feature-matching heuristic, which is an aspect of the representativeness heuristic, a phenomenon discussed in more detail in Chapter 4. This means that they choose the m + 1st element to balance the number of heads and tails choices in the last m + 1 flips, ignoring small sample variation. If the memory length is not very long, subjects will tend to over-alternate when asked to generate a random sequence. In binary experiments with coin tosses this model suggests that memory length is about seven characters. As an illustration of this model, in the sequence of heads and tails given earlier, the first seven results involve 4T and 3H; therefore, feature-matching requires that the eighth result be H.
An interesting observation concerning the over-alternation tendency is that this tendency is not present in young children. Contrary to many other psychological errors, where people improve as their minds develop, this is one case where the opposite occurs. It seems that only the prolonged experience and exposure to harsh market forces of seasoned professionals can overcome this tendency, at least to some extent.
One interpretation of MSE that is commonly favored in terms of explaining the observed aberrations is that players do not need to actually randomize with perfection, as long as other players cannot guess what they will do. This implies that bounded rationality is symmetrical. In this case MSE can be described as being an ‘equilibrium in beliefs’. This means that players’ beliefs about the probable frequency with which their opponent will choose different strategies are correct on average, and make them indifferent about which strategy to play. For example, in our tennis scenario described earlier, if a receiver estimates that there is a 40% chance the server will serve to his forehand, he will be indifferent about which way to move. Empirical studies where players have been given the opportunity to randomize explicitly, but have declined to do so, have indicated that a population of such players can still achieve aggregate results close to those predicted by MSE (Bloomfield, 1994; Ochs, 1995; Shachat, 2002). These findings lend some support to the ‘equilibrium in beliefs’ hypothesis.
A final question arises at this point. Is there any rival theory that can produce better predictions than MSE? Some results indicate that a model involving quantal response equilibrium (QRE) may achieve this. According to QRE players do not choose the best response with certainty (as is the case with the other equilibria so far discussed), but ‘better respond’ instead. This means that they choose better responses (with higher payoffs) with higher probabilities. There is some psychological foundation underlying such a model, given the existence of bounded rationality, ‘noise’, uncertainty, and problems of encoding and decoding information. The jury is still out on the virtues of QRE versus MSE.
2 Social norms and correlated equilibrium
Gintis (2009) claims that the correlated equilibrium is a much more natural equilibrium than the Nash equilibrium, and can increase welfare compared with MSE. The notion of a correlated equilibrium originated with Aumann (1987). Essentially, the correlated equilbrium relies on a ‘choreographer’ to determine a rule of play, acting as Nature and making the first move in the game. The players then play a Nash equilibrium, each assuming that the other player(s) obey the rule. A couple of examples will illustrate. The simplest one involves determining which side of the road to drive on. It is easy to see that there are two Nash equilbria, with both drivers either driving on the left or both driving on the right. The weakness of the Nash equilibrium concept is that it provides no indication of what each driver should do; obviously they could signal, a factor considered in a later section, but it would be much easier if there was a common norm that both drivers universally obeyed. This is of course the situation that exists in almost all countries. A slightly more complex example relates to the Battle of the Sexes game described earlier. There were two Nash equilibria there, and a further equilibrium in mixed strategies. To determine the equilibrium in mixed strategies we have to let PA = the probability of A going to the ballet and PB = the probability of B going to the ballet and then set the payoffs of the other player’s strategy equal for each player, just as in the tennis example. We find that in equilibrium PA = 1/3 and PB = 2/3. If we apply these probabilities to the payoffs we can calculate the expected payoffs as follows:
Expected payoff to A = 2/9(1) + 5/9(0) + 1/9(2) = 2/3
Expected payoff to B = 1/9(2) + 5/9(0) + 2/9(1) = 2/3
However, it can easily be seen that both players can improve on this by following a pure strategy determined by a norm. For example the norm may be that both players go to boxing on Monday to Friday, and then to the ballet at the weekend. In this case A’s average payoff will be 5/7(2) + 2/7(1) = 12/7 and B’s average payoff will be 5/7(1) + 2/7(2) = 9/7. In this case any ‘matching’ norm that both players follow is better than MSE, because the payoffs are low in MSE since both players ‘mismatch’ most of the time.
We will see that correlated equilibrium is also relevant in repeated games where iteration and backward induction are important. The nature and importance of social norms will be discussed in more detail in the next chapter.
It can be seen once again that observed empirical anomalies often correspond to intuitions; in this case, with more complex game situations, the existence of bounded rationality and the use of heuristics are important factors.
9.4 Bargaining
Bargaining refers to the process by which parties agree to the terms of a transaction. It has been a focus of attention for economists certainly since the time of Edgeworth (1881), with his well-known ‘Edgeworth box’, which showed the range of outcomes that represented optimality for both parties. In the 1950s, economists, notably Nash (who was really a mathematician), began to use game theory in their approach to the problem of determining optimal outcomes. Nash in many ways foreshadowed the work of more recent researchers, using a two-level approach. At one level he investigated the ways in which parties determined how to come to an agreement (unstructured bargaining), and at another level he examined the nature of the solution that the parties would arrive at, given a certain set of rules for the bargaining procedure (structured bargaining).
From the 1960s economists began to apply the methods of experimental economics to these twin problems, comparing empirical results with theoretical predictions. This then allowed theories to be modified in line with such results, suggesting certain psychological processes and phenomena that have become incorporated into the body of behavioral game theory.
It should also be stated at this point that bargaining games are considered in more detail in the next chapter, since they involve the concepts of social norms and fairness.
Unstructured bargaining
This kind of bargaining allows the players to use any kind of communication, not restricting the type of message or the order of offers made. Nash (1950) had proposed a unique Pareto-optimal solution that maximized the product of the utility gains for each player above the so-called ‘disagreement point’. However, many early experimental studies in the 1970s produced results that did not agree with the Nash solution. The reason for this finding was that these studies did not consider how monetary payoffs mapped into utilities as far as attitudes to risk were concerned (they usually assumed risk-neutrality).
Roth and Malouf (1979) used a ‘binary lottery’ technique to induce risk-neutrality. This method requires some explanation. Players are asked to bargain over the distribution of a number of lottery tickets. For example, if they bargain for 60 tickets out of 100, they have a 0.6 probability of winning a fixed cash prize. This technique assumes that players are indifferent between compound lotteries and their single-stage equivalents, for example if they are indifferent between having a
0.5 chance of having 60 tickets and having 30 tickets with certainty. However, the experiments of Kahneman and Tversky have shown that this assumption is highly dubious, as we have seen in the discussion of prospect theory in Chapter 5. Therefore the use of lottery tickets as payoffs may not in itself yield different results from using monetary payoffs.
The study by Roth and Malouf indicated that, when tickets gave the same monetary prize ($1) to each player, the players bargained nearly universally to a 50/50 split with a negligible amount of disagreement. However, when the prize of a ticket to the second player was three times the value of the prize to the first player ($3.75 to $1.25), there tended to be two focal points for the bargaining solution. The main focal point was the split of 75/25 in favor of the first player, which equalized the expected payoffs (the first player had three times as many tickets, but for a prize of a third of the amount of the second player). However, there was a second focal point, again involving a 50/50 split. One result of having two focal points rather than one was that the average rate of disagreement was higher, at 14% of the transactions. Roth and Murnighan (1982) duplicated this result.
Another focal point effect was found in a study by Roth and Schoumaker (1983), relating to the past history of the players. The experiment began with some players playing against a computer that was programmed to give a generous share to these players, without the players’ knowledge, When these players began to play against other human players, with all players’ histories being known, there was a ‘reputation effect’, such that players who had been successful in the past were able to negotiate more favorable outcomes later.
Other studies have shown that focal points can be determined purely by chance, meaning by factors that are totally irrelevant to the bargaining transaction. Mehta, Starmer and Sugden (1992) found that allocating playing cards on a chance basis to players affected their demands in bargaining situations. When both players had equal numbers of aces they easily bargained to a single focal point of a 50/50 split. However, unequal allocations of ace cards resulted in dual focal points, one with a 50/50 split and another according to the ‘irrelevant’ distribution of aces, so that a player with three aces out of four often demanded 3/4 of the pot, and players with only one ace often demanded only 1/4 of the pot.
We shall see that the underlying factor behind these different focal points is the phenomenon of ‘self-serving’ bias, discussed in Chapter 4. People tend to prefer interpretations of information that are favorable to themselves – a good example being that the vast majority of people believe they are better-than-average drivers. In the Roth and Malouf study, it was the second players, with the higher prize, who were proposing 50/50 splits, rather than splits which gave the players the same expected payoff. Self-serving bias is a major factor preventing the negotiation of agreements in many real-life bargaining situations in business and international relations. The question is: can the problem be solved and how?
There is certainly evidence that the problem can be solved in experimental situations. The first case study at the end of this chapter reviews a series of studies by Loewenstein et al. (1993) and Babcock et al. (1995, 1997) relating to legal situations where a plaintiff is suing a defendant for damages relating to an accident, with the legal costs to each party mounting as the case takes longer to settle. The authors find various ways in which the probability of settlement can be increased, for example by assigning the roles of plaintiff and defendant after the players have read the relevant information about the facts of a case.
It may be objected at this point that these results relate to experiments, not to field studies, and it is difficult, if not impossible, to apply the different protocols used to real-life situations. Obviously plaintiffs and defendants in real legal cases do not get assigned roles after accidents and similar events have already occurred. However, there are certain policy implications arising from the Babcock et al. (1997) study, where subjects were asked to list weaknesses in their case. The resulting large increase in settlement rate suggests that mediation can be very useful in many situations, where mediators can point out all aspects of the case, including weaknesses overlooked by the different parties. They may also be able to suggest compromise solutions in complex situations, when there are many variables involved, that the principals in the transaction may not be able to envisage on their own. Certainly organizations like the World Trade Organization and United Nations can play a role here in conflicts in international relations. Given recent perceived ‘failures’ by both these institutions, it must be noted that the success of such organizations depends on the will of parties outside the main conflict to find a solution.
Structured bargaining
The general nature of the structure used in experiments has been for the players to alternate offers over a finite or infinite period. It is usually not advisable for a player to make two consecutive offers, since this tends to be viewed as a sign of weakness. Since these games are sequential, they are referred to as dynamic games. There is also a cost of delay if an offer is rejected, since continued negotiations in reality tend to involve some kind of opportunity cost, such as lost profit and wages in an industrial dispute. A factor of major importance in the outcome of these situations is the discount rate/discount factor of each player. Players with lower discount rates (higher discount factors) have an advantage in such games, if discount rates are common knowledge, since they can afford to be more patient. Since the 1980s many experiments have been conducted using a number of variations in procedures: the most important variables have been the number of rounds of offers, the size of discount rate, and the relationship between the rates of the players.
A simple example of a two-stage bargaining game is one where each player gets to make a single proposal for how to split a pie, with the amount of money to be divided falling from $5 in the first stage to $2 in the second. The first player proposes a split of $5 that is either accepted (and implemented) or rejected, in which case the second player proposes a split of $2 that is either accepted or rejected by the first player. If this second offer is rejected then both players end up with payoffs of zero.
In standard or classical game theory the method of solving dynamic games of this kind is to use backwards induction, or the ‘foldback’ method. This method starts by examining the end of the game first and working back towards the beginning. In the second stage of the game described above a rational second player would demand $1.99, since the first player should accept the remaining $0.01 rather than get nothing. Therefore in the first stage the first player should offer $2, anticipating that the second player will accept this. In general with this kind of game the first player should offer the amount that the pie is reduced to in the second round. We will comment on this approach in the next section, since backwards induction is involved in both dynamic games and repeated games, and in practice it tends to predict badly in many situations (Binmore and Shaked, 2010a).
Bargaining with incomplete information
In many real-life situations the players have asymmetrical information, typically knowing more about their own payoffs than about those of the other player. For example, in auctions buyers know their own valuation of the item, but not often that of the seller, and vice versa. The simplest situation here is a first-price sealed-bid auction with two bidders, where both players make simultaneous bids, with the prize going to the highest bid at the same price as the bid. For example, each bidder’s value for the prize may be equally likely to be $0, $2 or $5, with bids constrained to be in integer amounts, with ties being decided by a coin toss. The equilibrium in SGT is described as Bayesian Nash, which specifies an equilibrium bid for each value of the bidder. It can be shown (though the calculations are somewhat tedious) that the equilibrium bids in this example are $0, $1 and $2 for values of $0, $2 and $5 respectively.
The above game is a static game. Many bargaining games with incomplete information are dynamic, and can involve several stages. This makes the bargaining situation more complicated, since not only are players trying to maximize their utilities, but they are also aware that the offers and bids that they make and accept or reject convey information regarding their valuations that are often detrimental to their interests. This aspect of the game involves the concept of signaling, discussed in more detail in a later section.
Empirical studies of bargaining games with complete information
We will begin by discussing games with complete information, since these are simpler to analyze. Early experiments by Binmore, Shaked and Sutton (1985) used a protocol that involved two two-round games, with a common discount factor ( ) of 0.25, and in the repetition of the game Player 1 in the first game became Player 2 in the second game. The results again indicated that there were two focal points: one involving a ‘fair’ 50/50 split, and another involving the SPNE of 75/25. This split is an SPNE since the pot of £1 was reduced to £0.25 in the second round of both games if Player 1’s initial offer was refused; thus it makes no sense (ignoring social preferences and reciprocity) for Player 2 to refuse any offer above £0.25. Another notable finding was that in the second game initial offers shifted to the single focal point of the SPNE, suggesting a learning effect. It was suggested that Player 1 in the second game, having experienced the situation of being Player 2 in the first game, now realized that it made no sense to make an initial offer of more than £0.25. Further studies involving a similar kind of ‘role-reversal’ protocol have indicated a similar learning effect, but not as rapid as in situation above.
A later study by Neelin, Sonnenschein and Spiegel (1988) used an experimental protocol that involved two-round, three-round and five-round alternating offer games, with common discount factors of 25%, 50% and 34% respectively, and with the SPNE being $1.25 in each case. Although the study found that initial offers in the two-round game were heavily concentrated around the SPNE, initial offers in the three-round and five-round games were not. In all three types of game the initial offers tended to be concentrated around the size of the pie in the second round. This suggests that the subjects, as business and economics students, had learned about backwards induction or had worked it out for themselves for one step, but were unable to apply the technique beyond this to determine the SPNE for three-round and five-round games.
Experiments by Ochs and Roth (1989) provided some evidence, though still weak, that transactions do approximately attain SPNE. The most important finding concerns counter-offers. Not only were these commonly refused in the second and third rounds but they were also frequently ‘disadvantageous’. In other words a majority of players were making counter-offers that would leave them less well-off than if they had accepted the original offer. For example, it makes no sense to reject an offer of $3 out of a $10 pot if this rejection then reduces the size of the pot to $2.50 in the next round. Similarly, it makes no sense to reject the offer of $3 if the pot is then reduced to $3.50 and one makes the counter-offer of $1, since one would only gain $2.50 even if the counter-offer is accepted. There are two possible explanations for this phenomenon of disadvantageous counter-offers:
1 Players had social preferences that inclined them to reject unequal offers.
2 Players had limited computation abilities, failing to realize that rejecting an offer in one round would limit their gains to less than this in later rounds.
As stated earlier, the first explanation is discussed in the next chapter. There have been several studies investigating the second factor, and in particular the strength of the learning effect. Although the original study by Binmore, Shaked and Sutton (1985) suggested a strong learning effect over just two games, later studies (Ochs and Roth, 1989; Bolton, 1991; Harrison and McCabe, 1992; Carpenter, 2000) indicated much slower or insignificant learning rates. Camerer et al. (1994) and Johnson et al. (2002) have designed experiments to achieve two aims: (1) isolate social preferences by having players play against a computer, and (2) investigate the thinking processes of subjects by tracking their demands for information in the different rounds of the game. They have reported three main findings:
1 Players tend not to make equilibrium offers even when social preferences are not involved.
2 Players tend not to look ahead one or two periods to consider what will happen if an offer is rejected.
3 Players can learn to look ahead, using backward induction, if they are explicitly taught to do so, since this process appears not to occur naturally.
The experimental protocols discussed up to this point have all involved fixed-discounting games. Other studies have examined situations where there are fixed costs of delay. This situation may apply in legal cases or industrial disputes where costs are generally independent of the size of the pie. The most interesting finding here that has been consistently reported (Binmore, Shaked and Sutton, 1989; Rapoport, Weg and Felsenthal, 1990; Forsythe, Kennan and Sopher, 1991) is that divisions of the relevant pies tend to be very uneven, more in keeping with SPNE, unlike the frequent even splits in fixed-discounting games. The challenge for researchers is that social preferences should apply equally to each situation, so why the difference? More research needs to be conducted regarding learning in each situation; differences in learning may account for the observed disparity.
Empirical studies of bargaining games with incomplete information
In bargaining situations with incomplete information we have once again two main aspects to consider: (1) how to organize these situations in terms of bargaining structure; and (2) how to determine solutions in terms of the kind of equilibrium that will prevail.
In terms of the first aspect, Valley et al. (2002) have found that communication improves the efficiency of trade. A trade is efficient if a transaction occurs when a buyer’s valuation exceeds a seller’s valuation. In typical sealed-bid mechanisms where both parties submit a threshold price that they will trade at, trade will not be 100% efficient, since both parties will tend to ‘shave’ their bids according to the predictions of game theory. This means that buyers will bid less than their true valuation, while sellers will bid more than their true valuations. The crucial finding in the Valley et al. study was not so much the fact that communication improves trade efficiency, but the manner in which it does so. It appears that bargainers tend to coordinate on a single price that they will both bid. This coordination takes the form of ‘feeling each other out’, searching for clues regarding the other player’s valuation, while still maintaining a fair amount of bluffing as to their own valuation – we are not talking about mutual truth-telling here for the most part. It also seems from this pioneering study that face-to-face communication is more successful at improving trade efficiency than written communication.
With regard to the second aspect, some studies have found that solutions in terms of bidding tend to conform surprisingly well to SGT predictions. The reason that this is surprising is that the predictions are hardly intuitive, as we shall see shortly. Most studies have focused on the sealed-bid mechanism, or bilateral call market, as it is often referred to in financial markets. In this kind of market both buyers and sellers make sealed bids, and a transaction occurs at the half-way price if the buyer’s bid exceeds that of the seller. If the buyer’s bid (v) is below that of the seller (c) no transaction takes place. We will take an example used by Daniel, Seale and Rapoport (1998). If the buyers’ and sellers’ valuations (V and C) are uniformly distributed over the space (0, 200) for the buyer and (0, 20) for the seller, the game-theoretic predictions are as follows:
Sellers’ bids will be a linear function of their valuations, according to the equation
c = 50 + 2/3C
Buyers’ bids will follow a piecewise linear pattern. This means that, when drawn graphically, the function consists of three linear segments joined together.
In mathematical terms, the buyer’s predicted bidding pattern is shown below:
When V ≤ 50, v = V
When 50 ≤ V ≤ 70, v = (50 +2V)/3
When V > 70, v = 63.3
These predictions suggest that sellers should ask a price much higher than their actual valuation or costs, while buyers should mostly make a flat bid of 63.3, as long as their valuation is at least 70; these predictions are both strong and counter-intuitive, making for a revealing empirical test. Although the empirical findings from the study by Daniel, Seale and Rapoport (1998) do not indicate a sharp piecewise linearity in buyers’ valuations, they do confirm the predictions in two main ways:
1 Buyers bid a smaller fraction of their valuations when their valuations are high.
2 Sellers mark up their costs very considerably.
The authors replicated these results, in terms of generally confirming game-theoretical predictions, when the parameters of the experiments were somewhat changed, for example with a larger range of sellers’ valuations. They also found a significant learning effect, in that buyers started by bidding too high and then after ten rounds learned to reduce their offers substantially.
Behavioral conclusions
In the study by Goeree and Holt (2001) the subjects performed very much as predicted by SGT in the ‘shrinking pie’ game described earlier, where the pie started at $5 and then shrank to $2 in the second round. The average offer observed in the study was $2.17, compared to the SGT prediction of $2. However, when the pie was shrunk from $5 to only $0.50 in the second round, the first player subjects diverged considerably from the SGT prediction of offering only $0.50 in the first round: the average offer was $1.62, with 28 out of the 30 offers being above $0.50. This type of situation will be discussed in more detail in the next chapter, since the main reason for the divergence appears to involve social preferences and the concept of fairness.
As far as games with incomplete information are concerned, let us return to the simple example of the auction game described earlier, where each bidder’s value for the prize may be equally likely to be $0, $2 or $5. We saw that the equilibrium bids in SGT in this example are $0, $1 and $2, respectively. Goeree and Holt (2001) find that the Bayesian Nash equilibrium predicts well in this situation, with 80% of the bids matching the equilibrium. However, just as we have seen in other examples, they find that changing the payoffs, while not affecting the equilibrium, does affect behavior and causes divergences from predictions. In this case, changing the values to $0, $3 and $6 reduced the proportion of Nash bids to 50%. Goeree and Holt hypothesize that these deviations from Nash behavior seem to be sensitive to the costs of deviation. This is an important general conclusion, since it applies to other games described previously, and also to games described later in the chapter involving coordinaton, like the ‘stag hunt’ game. Goeree and Holt also point to the possibility of risk-aversion, which is again a factor in coordination games.
9.5 Iterated games
Iteration and dominance
We have seen in the second section of the chapter that it is often easy to solve games with a dominant equilibrium, particularly if there are only two strategies available for each player in a two-player game. In more complex situations we can iterate to a dominant equilibrium by eliminating dominated strategies. We shall see that some situations can involve many steps of iteration, even an infinite number. Given this increased complexity, there is, therefore, a slight departure from the structure of previous sections, in that empirical studies are discussed throughout the section. The main objective here is to examine how players conduct iterations in different game situations, in particular how many steps they take, using empirical investigation; we can then draw certain conclusions regarding the underlying psychological mechanisms involved, particularly relating to beliefs about other players.
It is useful to start with a simple two-step game, using an example from Beard and Beil (1994). They used a sequential game with two players, and by varying the payoffs in a number of ways, they were able to investigate how much player 1 was willing to bet on player 2 obeying dominance. The basic version of the game is shown in Table 9.8.
Table 9.8 Iterated dominance game
Player 1 moves first, and if she moves left that ends the game; she earns $9.75 for herself and $3 for Player 2. On the other hand, if Player 1 moves right, Player 2 then gets to move next. If she acts in pure self-interest, she will move right also, earning $5 rather than $4.75 from moving left. This response will also earn $10 for Player 1, which is slightly higher than the $9.75 she would receive if she moved left at the start. Thus the iterated dominant equilibrium is (right, right). However, there is a risk to player 1 in playing right, since if player 2 does not obey dominance she will only get $3.
In this baseline experiment 66% of player 1s played left, showing a general mistrust of player 2. In the event this mistrust proved justified, since when player 1 played right, player 2 only responded by playing right 83% of the time. This percentage means that the expected payoff for player 1 from playing right turned out to be only (3 × 0.17) + (10 × 0.83) = $8.81, worse than the payoff from playing left.
The investigators then varied the payoffs as follows:
1 Less risk – lower payoff for player 1 moving left.
2 More assurance – lower payoff for player 2 if (right, left).
3 More resentment – higher payoff for player 2 if player 1 moves left.
4 Less risk, more reciprocity – higher payoff for player 1 if (right, left), and higher payoffs for player 2 if player moves right.
When there was less risk, more assurance or more reciprocity for player 1, this increased willingness to play right; when playing right created more resentment, they were less likely to play right. However, it was notable that in all the scenarios above player 2s always responded by playing right if player 1 had played right; in other words, player 2s obeyed dominance. This experiment leads to a general conclusion that has since been confirmed by many other studies: players tend to believe that other players are less likely to obey dominance, i.e. be rational, than they actually are. This is particularly true when the cost of irrationality is small (Goeree and Holt, 2001). On the basis of the experiment above there could be a number of explanations for this; for example, player 1s may have incorrect beliefs regarding the social preferences of player 2s. However, this explanation tends to be ruled out by empirical findings from ‘beauty contest’ games, described next.
Beauty contest games
The name for this revealing type of game originated with Keynes’s The General Theory of Employment, Interest, and Money in 1936. He likened investment on the stock market to a beauty contest where competitors have to pick out the prettiest faces, the prize being awarded to the competitor whose choice most nearly corresponds to the average preference of the competitors as a whole. As Keynes explained the situation:
… each competitor has to pick, not those faces which he himself finds prettiest, but those which he thinks likeliest to catch the fancy of the other competitors, all of whom are looking at the problem from the same point of view. It is not a case of choosing those which, to the best of one’s judgment are really the prettiest, nor even those which average opinion genuinely thinks the prettiest. We have reached the third degree where we devote our intelligences to anticipating what average opinion expects the average opinion to be (p. 156).
This situation can be easily modeled into a simple game for experimental purposes. The standard form of this game is to ask a group of players to select a number from 0 to 100. The winner is the player whose number is closest to a certain fraction (p), say 2/3, of the average of all the players. The purpose of the experiment is to examine how many rounds of iteration players perform. If players choose randomly or uniformly the average will be 50, so 2/3 of this number gives 33. This choice reveals one step of reasoning. The second step is to reason that if other players use one-step reasoning and choose 33, then their best choice is 22. A third step would be to assume that other players use two steps and therefore choose 15. It can be seen that there are an infinite number of possible iterations in this game, and the resulting iterated dominant Nash equilibrium is 0. Nagel (1995) found that the average choice was about 35, with frequency ‘spikes’ at 33 and 22. More comprehensive experiments were carried out by Ho, Camerer and Weigelt (1998), confirming the general finding that players performed only one or two steps of iteration. Camerer (1997) found similar results with different types of subjects: psychology undergraduates, economics PhDs, portfolio managers and CEOs. In field studies involving contests for readers of financial magazines offering substantial prizes, the results also tend to be similar, with spikes at 33 and 22, but with a somewhat lower average number. About 8% of contestants chose the equilibrium of 0.
There are two possible conclusions from these experiments: either people are generally unable to iterate beyond a couple of steps, or they do not believe that other people are capable of doing so. In order to come to more definite conclusions we have to examine results from other games.
Iteration leading to decreased payoffs
A good example where further iteration reduces payoffs is a so-called ‘centipede’ game. This is a sequential game involving two players and a repeated number of moves. At each move a player can take 80% of a growing pie (leaving the other player with 20%) and end the game, or they can ‘pass’ and let the other player move, with the pie doubling with each move. This kind of game is also known as a trust game, since the players can benefit by trusting the other players, at least to some extent. Experiments were carried out with this game by McKelvey and Palfrey (1992), using an initial pie of $0.50 and four moves. A game tree for this game is shown in Figure 9.4.
Figure 9.4 Centipede game
If the players pass on all four moves they end up with $6.40 and $1.60, a substantial improvement over the initial situation of $0.40 and $0.10. However, if we solve the game by backwards induction, the dominant strategy at the last move is to take; the same is true at the second-to-last move, and so on right back to the first node. Thus the iterated dominant solution is to take at the first move. Passing at the first node would violate four steps of iterated dominance. We can see that this is a kind of game that ‘unravels’, resulting in a PD-type of result, meaning one that is Pareto-dominated by passing on all moves. It also resembles PD in that self-interest causes a mistrust of cooperation by the other player.
So much for the theoretical equilibrium according to SGT. In practice, McKelvey and Palfrey found that the game tended not to unravel until towards the end. In four-move games only 6–8% of players took on the first move, with this percentage increasing at each node, reaching 75–82% on the last move. In six-move games, where the end pie is 26 or 64 times the initial amount, only 1% or fewer of players took on the first move. Only with high stakes, and after learning through five trials, did the fraction of players taking at the first move increase significantly, to 22%.
Similar results have been found in other experimental games that have a similar structure to the centipede game, for example multiple-strategy PD games. Experiments have also been conducted involving continuous strategies (like the beauty contest game), rather than discrete strategies, for example pricing in imperfect competition. In general, players tend to demonstrate two to four steps of iterated dominance in their initial choices.
A similar kind of unraveling effect was noted in the previous chapter, in relation to the dual-self model. When sophisticated consumers are aware that eventually they will most likely succumb to temptation of some kind, they may decide to give in now and indulge themselves. This is one situation where sophisticated consumers may end up worse off than naïve consumers, who may resist temptation for a while, being unable to predict that they will eventually give in.
Iteration leading to increased payoffs
In some games further iteration improves payoffs. A good example is the so-called ‘dirty faces’ game, which has been posed as a riddle for many decades. In its original form (Littlewood, 1953) there are three ladies, A, B and C, in a railway carriage, all of whom have dirty faces and all of whom are laughing. A then realizes from the reactions of her companions that she must be laughable. Her reasoning is that, if B saw that A’s face was clean, she would infer that C was laughing at her and, therefore, stop laughing. Since she is still laughing, this must mean that A’s face is dirty. In this situation A is assuming that B is sufficiently rational to draw inferences from the behavior of C.
This kind of situation can be modeled experimentally by constructing a game where players know the ‘types’ of other players, but not their own type. Such an experiment was performed by Weber (2001). Players are of two types, X and O, with probabilities of 0.8 and 0.2. There are two possible strategies, Up or Down. An Up move gives no payoff to either type. A Down move gives a payoff of $1 to type X and –$5 to type O. Players take it in turn to move, and the game ends when one player plays Down. If players know nothing of their type the expected payoff of playing Down is negative, so they should choose Up (assuming risk-neutrality). Being in state X is like having a dirty face, and playing Down is equivalent to knowing you have a dirty face. The players are commonly told that at least one of them is type X. If a player observes that the other player is O they can infer immediately that they are type X and play Down on the first move. If a player observes that the other player is X, they cannot infer their own type on the first move of the game. They will both therefore play Up on the first move. This will then inform each player that the other has observed that they are type X. Therefore both players should play Down on the second move.
Weber observed that in the XO protocol 87% of the subjects displayed rationality, using one step of iterated reasoning. However, in the XX situation, only 53% of subjects played Down on the second move, using two steps of iterated reasoning. Camerer (2003) notes that the subjects in this experiment were Caltech students, who are selected for their skills at doing logic puzzles, so that this result of only about half of people performing two steps of iteration may be an upper bound in abstract games like this.
Behavioral conclusions
The results of the games like the ‘dirty faces’ game tend to indicate that the reason why people do not generally perform multiple iterations is not just that they doubt the ability of others to perform in this way; they often have ‘limited computability’. However, to obtain really decisive evidence on this issue, experiments have to be performed which examine not just people’s choices of strategy, but the decision rules that they use to arrive at such choices. Such experiments have been conducted by Stahl and Wilson (1995) and Costa-Gomes, Crawford and Broseta (2001).
These experiments are similar in nature to those performed by Johnson et al. (2002) described in the discussion of mixed strategy equilibrium, in that they require subjects to look at certain information in specific locations on a computer screen, thus revealing the information used and steps performed in arriving at decisions. Nonequilibrium (in the SGT sense) models of this type are often referred to as ‘level-k’ models. A pioneering study involving such a model was performed by Stahl and Wilson (1995); they classified subjects into five main types:
1 Level-0 (choosing each strategy equally often).
2 Level-1 (assuming others are level-0 and best responding to them).
3 Level-2 (assuming others are level-1 and best responding to them).
4 Naïve Nash (assuming others will play Nash equilibrium).
5 Worldly Nash (assume some others play Nash, but that others are level-1 or level-2).
The study estimated from using 12 different games that about 18% of players were in the first category, 20% in the second, only 2% in the third, 17% in the fourth and 43% in the fifth.
The study by Costa-Gomes, Crawford and Broseta (2001) reported that 45% of subjects were naïve, meaning choosing strategies with the highest average payoff, while many were classified as optimistic, meaning using a maximax strategy (maximizing their maximum payoff). They also noted that 10% of subjects violated one step of iterated dominance, with this fraction rising to 35% and 85% for two and three steps.
Camerer, Ho and Chong (2004) have proposed a cognitive hierarchy (CH) theory, which is based on both the theory and empirical evidence in many of the studies described in this section. Their model proposes a Poisson frequency distribution to describe the proportions of players using different numbers of steps in thinking (K). The objective here was to provide a model that has both a sound psychological basis and a sound empirical basis in order to predict equilibrium conditions in iterated games. The model can also be used to predict initial situations for learning models. The model is parsimonious and easy to use, because the Poisson distribution involves only a single parameter: its mean and variance are identical ( ). Those players using 0 steps correspond to level-0 in the Stahl and Wilson classification; those using K steps anticipate the decisions of lower-step thinkers, and best-respond to the mixture of those decisions using normalized frequencies. Empirical studies suggest that a value of around 1.5 is appropriate in many games. The authors have tested the model using first-period data from a large number of experimental games, and it has always predicted at least as well as the SGT Nash equilibrium. When the cognitive hierarchy model is compared with QRE, which also tends to predict better than Nash equilibrium, it has two main advantages: (1) it is more sophisticated in psychological terms; and (2) it has more empirical appeal, since it can account for probability ‘spikes’ in choosing strategies that are often observed in experiments.
A couple of recent empirical field studies have supported the CH model. One of these examined behavior in the Swedish lottery game, LUPI (Östling et al., 2007). In this game players choose integers from 1 to 99,999, and the lowest unique positive integer wins (hence the name LUPI). Given that about 50,000 people play this lottery each day, the Nash equilibrium prediction is approximately equal choice of numbers from 1 to 5000, a sharp drop-off in choice between 5000 and 5500, and very few choices above 5500. This prediction was fairly accurate for the first seven days of play, but actual behavior involved too many choices of low numbers, too few between 2500 and 5000, and too many numbers above 5500. A CH model estimated the pattern of choices somewhat better, with a value of = 2.98, somewhat higher than was found in the experimental data. Another field study by Brown, Camerer and Lovallo (2007) investigated why moviegoers seem to ignore the fact that movies that are not reviewed before they are released tend to be low in quality. Using a CH model, the study estimates that a value of = 1.26 fitted the data best, close to earlier lab estimates from experimental data. As Camerer (2009) observes, this naïve strategy by moviegoers leads to greater box-office profits from withholding poor movies from review.
There have also been a couple of recent studies that have indicated that CH models incorporating level-k thinking are not always superior to other models. A study by Choi (2006) examined social learning in networks, where some players in the core of the network had more extensive information links with other players, while players in the periphery had fewer links. Choi found that in various experimental treatments where the CH model was applied the dominant cognitive type was closely related to the Bayesian-rational type. This supports the accuracy of SGT predictions for environments with this kind of choice architecture, and seems at variance with the findings of the studies reported above. However, Choi resists making direct comparisons with those studies indicating the importance of bounded rationality for two reasons:
1 The experiments in the Choi study involve dynamic games, whereas previous studies supporting CH models involved static models. Choi hypothesizes that cognitive reasoning at higher levels is easier in dynamic games than in static games.
2 Choices involving higher levels of reasoning, like Bayesian rationality, are often the same as choices made using lower levels of reasoning. In order to test what levels of reasoning are being applied careful experiments need to be performed that distinguish between these different levels.
Another study by Crawford, Gneezy and Rottenstreich (henceforth CGR) (2008) has also indicated that CH models do not tell the whole story in terms of reasoning in complex decision situations. This study examined coordination games where focal points were possible. These games are essentially of a ‘Battle-of-the-sexes’ type, where the players are trying to match outcomes. The example quoted in the earlier section on MSE involved the activities of watching ballet or watching boxing. In this case focal points were not relevant since, it was assumed, neither choice was salient. However, in some coordination games certain choices may be salient because of their label. A famous example is Schelling’s (1960) experiment where he asked subjects to choose independently and without communication where in New York City they would try to meet each other. Obviously the subjects wanted to match choices to maximize their payoff. This game is somewhat different from ‘Battle-of-the-sexes’ because it was assumed that the players had no preference regarding location or strategy, as long as the choices matched. Despite innumerable possible choices of meeting place, the majority of subjects chose Grand Central Station, the most salient traffic hub of the time. Interestingly, choice would probably be more difficult now; both Times Square and Ground Zero would be competitors. The main finding of the CGR study was that the power of focal points to coordinate choices is limited, because the effect of label salience is easily overcome by payoff salience; the result is that even very small payoff asymmetry can cause coordination failures. The study used the labels ‘X’ and ‘Y’ in one treatment, with the reasoning being that ‘X’ should be salient. This proved to be true, so that in the situation where payoffs for matching choices, either XX or YY, were symmetrical ($5 for each player in either case), it was found that 64% of the subjects coordinated on the XX focal point. This was well above the MSE prediction of 50%. However, when the payoffs were slightly changed to an asymmetrical situation, so that coordination at X rewarded player 1 with $5 but player 2 with $5.10 while coordination at Y rewarded player 1 with $5.10 but player 2 with $5, the coordination rate fell to only 38%, well below the MSE prediction of 50.5%. This result suggests a failure of iterative reasoning consistent with level-k models.
However, the CGR study also used a different choice treatment, involving a pie chart divided into three equal sectors (like the Mercedes-Benz logo), and with the bottom sector shaded. Thus players could choose ‘right’, ‘left’ or ‘bottom’, with the objective again being to coordinate choices. In this treatment it was found that coordination can persist even with asymmetric payoffs, so that even without labels the bottom sector was salient enough to overcome payoff salience. The empirical results here were sometimes not consistent with CH or level-k predictions, but suggest instead a notion of collective rationality called ‘Schelling salience’ or ‘team reasoning’. With this kind of reasoning ‘players begin by asking themselves, independently, if there is a decision rule that would be better for both than individualistic rules, if both players followed the better rule’ (CGR, 2008). In this case the better rule was to choose bottom rather than try to maximize individual payoff. These results are also consistent with the findings of Mehta, Starmer and Sugden (1994a, b) and Bardsley et al. (2006).
In some cases bounded rationality may not be a factor at all in explaining why people do not play a Nash equilibrium solution. Instead a notion similar to the ‘team reasoning’ described above may be responsible for determining strategy, indicating again the importance of social norms and correlated equilibrium. Let us take the repeated PD game as an example, in which the payoff structure is as follows:
The approach in standard game theory is to use backwards induction. Assuming a finite number of rounds known by both players, say 100, each player would defect in round 100. On that basis each player would then defect in round 99, and so on all the way to the first round. Thus the equilibrium strategy is to defect on all rounds, resulting in a total payoff of 100. There are many rules that a ‘choreographer’ could make that would improve the payoffs to both players. A simple one is to play tit-for-tat, starting off in round 1 by cooperating. If both players adopt this ‘nice’ strategy then both will end up cooperating for all 100 rounds, and receive a total payoff of 300. It can be easily seen that this correlated equilibrium is indeed much more natural than the Nash equilibrium of continual defection, as Gintis (2009) claims. This conclusion is also well-supported by empirical evidence from experimental studies with repeated PD games, where both experienced and inexperienced players tend to cooperate much more than predicted by SGT. Furthermore, this example is an illustration supporting the claim by Binmore and Shaked (2010a) that the rule of backward induction is frequently not applied empirically, even when other-regarding preferences are taken into account.
In summary, it appears that there is no single model that can fully explain all decision behavior in iterated games. In some of these games players may be fully rational with SGT predictions proving accurate. However, in many iterated games players appear to be highly heterogeneous, operating with different levels of thinking. Although patterns of iteration vary somewhat from game to game, people usually do not do more than two or three steps of iterated dominance, where the elimination of one’s own dominated strategies is counted as the first step. However, there is evidence from some experimental games, particularly more complex ones where people are far from equilibrium at first, that learning takes place and that they perform further iterations as successive rounds of the game take place (Rubinstein, 1989). These learning aspects are discussed in the final section of the chapter. In other games it also appears that team reasoning is an important factor. However, as the CGR study concludes, ‘it remains puzzling that team reasoning plays an important role in subjects’ responses to Pie games, but not to X-Y games’. CGR speculate that the use of team reasoning depends on Pareto-dominance relations among coordination outcomes and their degree of payoff conflict, but there is obviously a need for more systematic research in this area in order to determine the relative importance of Bayesian rationality, level-k models and team reasoning and the contexts in which each is prevalent.
9.6 Signaling
Nature and functions of signaling
Many types of game feature asymmetric information, where one player wants to convey information to the other player(s). Such information does not necessarily have to be true. Actions taken by players that are designed to influence the beliefs and behavior of other players in a way favorable to themselves are often referred to as commitments or strategic moves. In order to be effective these signals or strategic moves must have credibility. This characteristic requires two factors:
1 Affordability by the signaler’s type
Someone wanting to obtain a good job may want to signal that they are this ‘type’ of person by investing in an expensive education or training program. A union striking for higher wages must be able to afford to go on strike, taking into consideration the foregone wages.
2 Non-affordability by other types
A firm producing an inferior, unreliable product cannot afford to give a decent warranty for it. Thus good warranties are credible signals that products are of high quality. Profitable firms may use advertising as a signal of this type. Firms lacking a sound financial foundation may be unable to spend on advertising, so consumers may view advertising as a signal that a firm is well-established.
Signaling is widely used, not just in situations commonly related to economics and business, but also in politics, international relations, sport, warfare and biology. In general it may be used to achieve either competitive or cooperative objectives.
There are also situations where signaling occurs without there being any intention to influence the behavior of other parties. For example, when we see an athlete winning a race, and he is wearing a particular brand name of shoes, then this sends a signal that this brand is of good quality. Of course, if the athlete is sponsored by the shoe manufacturer, in this case the signal is deliberate. The issue that arises with unintentional signaling is how people value these signals compared to private information, such as occurs when they try the shoes on in a shop to see how comfortable they are. This aspect is discussed in the next section, in relation to learning.
Signaling and competition
One of the most interesting aspects of signaling in a competitive context is that it may appear to be inefficient or self-defeating, since it limits the actions of the signaler. Some examples from the various fields mentioned above may help to illustrate the seemingly paradoxical nature of much signaling.
We have just observed that much advertising seems wasteful of a firm’s resources, particularly if it is not directly aimed at increasing awareness or perceptions of the quality of the product advertised. A similar type of business activity, which may superficially seem to be against a firm’s interests, is the use of a ‘Most Favored Customer Clause’ (MFCC). Essentially what this involves is a guarantee to customers that the firm will not charge a lower price to other customers for some period in the future; if it does, it will pay a rebate to existing customers for the amount of the price reduction, or sometimes double this amount. This is particularly important in consumer durable markets. The reason why this strategy is ingenious (and disingenuous) is that it serves a dual purpose:
1 Ostensibly, it creates good customer relations – many customers are concerned when they are considering buying a new consumer durable that the firm will reduce the price of the product later on. This applies particularly when there are rapid changes in technology and products are phased out over relatively short periods, like computers and other electronics products.
2 The MFCC creates a price commitment – it would be expensive for the firm to reduce price at a later stage, since it would have to pay rebates to all its previous customers. Thus other firms are convinced that the firm will maintain its price, and this causes prices to be higher than they would be without such commitment, contrary to consumer expectations.
In politics it is common for people to make statements like they will never raise taxes, or they will limit immigration. Of course it can always be maintained that talk is cheap, but politicians may have much to lose by reneging on such commitments, and making embarrassing U-turns, especially if the relevant policies form a major part of their electoral platform (their ‘type’). Their reputations may be irredeemably tarnished by such actions.
In the field of international relations, countries, or groups of countries, often try to influence other countries that are non-compliant in certain regards (like researching and building nuclear weapons) by imposing trade sanctions. These sanctions may hurt the countries imposing them, for example by reducing the availability of oil and increasing its price. On the other hand, if they hurt the non-compliant country more, they may be successful in forcing it to act in the intended manner. However, it is notable that in practice many sanctions have failed to achieve the intended response. When imposed against poor countries, like Iraq and Iran, they mainly affect the poor and can be transformed by dictatorial leaders into a different signal, meaning that the Western countries imposing them want to damage their welfare. It might be said that Western leaders in this situation are not using enough iterations of strategic thought.
Moving onto the field of sport, we can use an example from tennis that was described earlier in the discussion of mixed strategies. It was seen that a server who knows that his opponent’s backhand is relatively weak will tend to serve in that direction more frequently (but not exclusively, since his moves then become predictable). If his opponent then improves his backhand with practice, in a later match this may be signaled by moving more to the backhand. The server’s response is to serve more to the opponent’s forehand, which is stronger. The non-intuitive conclusion can be generalized to other sporting (and non-sporting) situations: by improving our weak points we force our opponents to deal more with our strengths.
Some of the most dramatic examples of signaling come from the field of warfare. The example frequently given here is that of Cortes burning his boats when he invaded Mexico to conquer the Aztec empire (although the historical accuracy of this event is dubious). This drastic form of commitment had two effects. First, it caused his soldiers to fight harder than otherwise, since they knew that they had no alternative. The second effect involves a further iteration in strategic thought: the natives lost morale, since they knew that their enemy was now implacable and would not stop until they had either conquered their land or were completely wiped out.
Signaling also seems to play an important role in evolutionary biology. Biologists were stumped to find an evolutionary explanation for the peacock’s tail, until Zahavi (1975) proposed the ‘handicap principle’. The puzzle was that the lavish tail of the peacock was very expensive to maintain in terms of scarce resources; surely natural selection would eliminate such an extravagance? Zahavi proposed that the peacock’s tail, by indeed being a handicap, served as a signal to potential mates that the owner must be very healthy, and therefore desirable, in order to be able to afford to maintain such an extravagance. This aspect of sexual selection has also attracted the attention of social scientists. Some have suggested that certain self-destructive habits of young people in particular, like smoking, binge drinking, doing drugs or reckless driving, may also be interpreted as a similar signal; only ‘hard’, and therefore desirable, individuals can maintain such habits.
It should be noted at this stage that not all signals involve commitments of the type illustrated here. We shall see in the next section that signals can also be used to ensure coordination and cooperation.
Signaling and cooperation
Many games involve more than one equilibrium, as we have seen in previous sections. Even PD situations, when repeated under certain conditions, may give rise to different equilibrium strategies. In terms of everyday situations, one of the most simple is determining which side of the road to drive on. This is obviously a coordination game, with players trying to match strategies (unless they like playing ‘chicken’, but that is a different game). The original situation here must have arisen thousands of years ago, with people driving wagons along trails. Obviously there are two possible equilibria, left or right for both players, with roughly equal payoffs in each case. There are a number of stories that claim to explain the origin of driving on one side or the other. In the US, for example, the fact that wagon-drivers held whips mainly in their right hand may have caused a preference for driving on the right to avoid hitting passers-by. In the UK the prevalent practice of mounting one’s horse from the left side may have accounted for the opposite equilibrium being selected.
What these examples demonstrate is that different equilibria may offer different payoffs, with one being preferred over the other in terms of favoring both players. However, there is no principle (like dominance or iterated dominance) that guarantees the attainment of the favorable equilibrium. This situation is modeled by the stylized ‘stag hunt’ game, which is described in Table 9.9.
Table 9.9 Stag hunt game
The essence of this game is that hunting a stag successfully requires the coordination of two hunters. Success brings a big payoff, but hunting stag is risky, since if the other hunter does not cooperate, the payoff is zero. Hunting rabbit is safer, since this can be done on one’s own, and one is guaranteed a payoff of one. There are two Nash equilibria in this game: both hunters hunt stag, or both hunters hunt rabbit. Hunting stag is clearly preferred by both, since it is Pareto-dominant. However, this may not be a focal point because the hunters may be risk-averse, preferring to pursue the ‘maximin’, or risk-dominant strategy of hunting rabbit. A ‘maximin’ strategy selects the strategy that maximizes the minimum payoff. A risk-dominant strategy is defined as one that minimizes joint risk, measured by the product of the cost of deviations by other players to any one player who does not deviate (Harsanyi and Selten, 1988). In the example above, if a hunter plays stag and the other hunter deviates and plays rabbit, the cost to the hunter not deviating is 2. The same applies if the roles are reversed, so the joint risk of the stag-stag strategy is 4. If both hunters hunt rabbit, there is no cost to deviation and, therefore, zero joint risk.
When empirical tests have been performed in stag-hunt situations, it appears that people tend to be risk-averse. In experiments by Cooper et al. (1990) 97% of players played the inefficient equilibrium, with no players going for the efficient one. It should be noted that in this experiment the efficient equilibrium only gave a payoff of 25% more for the efficient equilibrium, not 100% more, as in the example in Table 9.9. Increasing the difference in payoffs may change the results significantly, but the author is not aware of any experiments with efficient equilibria awarding payoffs in the order of twice the inefficient payoffs.
The only way that the preferred equilibrium can be reached (ignoring outside options) is by signaling. The Cooper et al. study found that signaling by just one player, in effect allowing him to indicate that he intended to play stag, resulted in an increase in the number of players playing the payoff-dominant equilibrium from 0% to 55%. When both players were allowed to signal this fraction increased to 91%.
A note of caution is necessary here regarding the benefits of signaling for the purposes of coordination in cooperative games. Two-way communication does not always improve payoffs compared with one-way communication in games with more than one equilibrium. In Battles of the Sexes (BOS) games the key difference in the structure of the game compared with ‘stag hunt’ is that preferences are asymmetrical. Players again want to match strategies, for example by both going to the ballet or both watching boxing, but each player has a different preference. Cooper et al. (1990, 1994) found that, without signaling, players mismatched strategies 59% of the time. One-way signaling allowed one player to indicate that they would play their preference, and this reduced mismatching to just 4%. However, when both players signaled, there was a conflict as both indicated their different preferences, resulting in mismatching rising back up to a 42% rate.
Empirical findings from signaling games
Many signaling games are complex in structure compared to the games so far discussed. This is because, in competitive situations, at least one player has a type, and some players want to reveal their type while others want to hide it. Other players must try to guess this type from the actions of these players, using iterated thinking. A relatively simple illustration is where an employee is hired by an employer. The employee knows her type in terms of whether she is a high productivity (H) worker or a low productivity (L) worker, but the employer cannot observe this directly. At the start the employer is only able to use prior probabilities of each type occurring, based on past experience. For example, there may be a 50/50 chance of the employee being either H or L. During their employment workers may put in varying degrees of effort (E). The employer has to judge the type of worker from the amount of effort that they put in, using effort, which they can observe, as a signal of productivity. An employer can use this information regarding effort to revise the prior probabilities; this process of Bayesian updating was explained in Chapter 4. Employers sack workers whom they perceive to be L, but to do this they have to monitor workers, which is costly. L workers may put in more costly effort in order to persuade the employer that they are really of the H type. In turn, H workers may work harder than otherwise in order to distinguish themselves from the L workers, increasing their effort to a level that is unsustainable or too costly for the L workers.
In general, equilibria in these situations are often referred to as pooling equilibria or separating equilibria. A pooling equilibrium occurs when the different types make the same move, for example if both types of worker put in the same effort, and then it becomes impossible for the other player to detect type. A separating equilibrium occurs if different types make different moves, in this case putting in different amounts of effort. It may be too costly for L workers to exert a lot of effort, but not for H workers, who may find it worthwhile to put in the extra effort to distinguish or separate themselves from the L type. An example of experiments related to monopoly and new firm entry is given in Case 9.2, relating to a situation modeled by Cooper, Garvin and Kagel (1997a, b). These experiments manipulated the payoff variables for both the new entrant and the monopolists in order to examine how this would affect the type of equilibrium observed.
Behavioral conclusions
In one-shot games signaling does not always produce Nash equilibria. As seen in other game situations, a change in payoff structure can cause deviations. Goeree and Holt (2001) found that in a situation where there were multiple equilibria involving pooling, with both types of sender sending the same signal, the observed behavior of subjects in their experiment contradicted this, and there was a separation of sender types for about 80% of the senders.
We can also see from experiments involving dynamic games with several stages that separating equilibria are more likely when dominance violation prevents one type from successfully imitating the move of another type. In this case, high-cost monopolists found it unprofitable to produce as much output as low-cost monopolists. However, it does take some time for the low-cost monopolists to learn to produce more output than they would otherwise do, because of the iterations involved. Similar experiments, for example those by Camerer and Weigelt (1988) on trust and reputation, and by Chaudhuri (1998) on production quotas and ratchet effects, also indicate the importance of learning processes, as players take time to adjust their beliefs and behavior. Furthermore, they do not always do so in the optimal direction, or by the same extent as is predicted by theory.
We therefore now need to examine these learning processes.
9.7 Learning
Learning and game theory
We have seen that learning, meaning changing behavior through experience, occurs in many different types of game, although it is ignored by standard game theory. However, up to this point we have been examining behavior in these different classes of game in order to draw conclusions about empirical behavior. This involves studying game behavior for its own sake, as an end in itself. For example, we have examined stag hunt and BOS games in order to see how people coordinate their behavior and cooperate. The objective at this final stage is different. We want to use games in general, rather than a particular class of games, as a means to an end: the fitting and testing of different models of learning. In this situation we are not so much concerned with the observation that, for example, people tend to form a separating equilibrium in the monopoly/entry game under certain conditions. Instead we are interested in how this observation sheds light on different models of learning.
Learning theories and models
Many different theories of learning have been proposed over the years. These include evolutionary dynamics, reinforcement learning, belief learning, anticipatory (sophisticated) learning, imitation, direction learning, rule learning and experience-weighted attraction (EWA) learning. Although all of these will be described to some extent, and the relationships between them explained, we will focus attention on four main classes of learning theory: reinforcement, belief learning, EWA and rule learning.
Most learning models involve the concept of ‘attraction’. Strategies are evaluated according to certain criteria discussed shortly, to calculate attraction values that are updated in response to experience. Learning models differ in terms of the basis of these criteria or elements of attraction. It is helpful at this stage to introduce some notation regarding these elements. It is assumed here for simplicity that other players all use the same strategy as each other, for example, the kth strategy; otherwise s-i is a vector:
sij = the jth strategy (out of mi strategies) of player i
s–ik = the kth strategy (out of ni strategies) of other players
si(t) = the actual strategy chosen by player i in period t
s i(t) = the actual strategy chosen by other players in period t
πi(sij s–ik) = the payoff to player i from playing sij when others played s–ik
bi(s–i(t)) = player i’s best response to the other players’ strategies in period t
We can now move on to describing these elements of attraction, using the stag hunt game from Table 9.9 for illustration. There are seven pieces of information that may be relevant in different learning models; it is assumed in this example that Player i is A, and that he decides to hunt stag while the other player B hunts rabbit:
Camerer (2003) illustrates how these elements form the basis of attraction for different strategies according to different learning models in a very useful table, reproduced in Table 9.10.
Table 9.10 Information requirements for different learning theories
Source: Camerer (2003), p. 272.
Table 9.10 clearly and concisely indicates the relevant elements in each learning theory, and allows easy comparisons between the different models. For example, we can now say that the attraction of the strategy of hunting stag before period t + 1, according to reinforcement theory, can be represented as:
As (t) = f{si(t), πi(si(t), s–i(t))} |
(9.14) |
while according to belief learning theory:
As (t) = f{s–i(t), πi(si(t), s–i(t)), πi(sij, s–i(t))} |
(9.15) |
The differences between these different theories of learning are described shortly.
Unfortunately, this is where the easy part ends. In order to estimate a particular learning model the attractions have to be mapped into probabilities of choosing different strategies using some statistical rule, usually involving a fairly complex mathematical function like ‘logit’. An explanation of this process goes beyond the scope of this book, but further details are given in Camerer’s (2003) excellent book, Behavioral Game Theory: Experiments in Strategic Interaction.
All the models are intuitively plausible in general terms, so the next issue concerns the testing of models against each other in empirical terms. This can be done in several ways:
1 Direct testing
Subjects can be asked what types of information they use in making strategy decisions.
2 Indirect testing
Experiments can be set up, using computer formats, to observe what information people use.
3 Statistical testing
This involves using statistical techniques like logit regression and MLE to find models that both fit the data best and make the most accurate predictions. These two desirable criteria do not necessarily go together, as we will see in the discussion of EWA.
Each of these kinds of test will be considered in comparing the different models, which can now be explained in more detail.
Reinforcement learning
This theory became popular as an essential component of the behaviorism movement in psychology in the 1920s, being associated with the figures of Watson, Pavlov and Skinner. This extreme view of human nature dominated the field until the 1960s. Since then it has largely become discredited because the theory fails all the tests described above.
As can be seen from Table 9.10, reinforcement learning theories propose that subjects use very little information in making strategy choices, just their own previous choices and the resulting payoffs. While such behavior may occur in many non-human animals, various empirical studies have indicated that people use further information than this, relating to other elements in the table.
As far as statistical tests are concerned, reinforcement models may predict the direction of learning correctly, but are usually too slow to match the pace of human learning. This is because in many situations, both in experiments and in real life, there is little reinforcement. Reinforcement can only occur if a subject chooses a good strategy (like a dog responding to the sound of a dinner bell); when a bad strategy is chosen the subject is left searching for a better one with little guidance. Even if a good strategy is selected, this may still be suboptimal, but the subject has no indication of this.
Belief learning
Well-known examples of belief learning in economics go back to Cournot’s model of oligopoly in 1838, described earlier. The Bertrand (1883) and Stackelberg (1934) models of oligopoly are also early examples, and all these oligopoly models feature a best response to behaviour observed in the previous period. Thus we have seen that equilibrium in the Cournot case is frequently referred to as a Cournot–Nash equilibrium. In the 1950s models featuring ‘fictitious play’ were proposed (Brown, 1951; Robinson, 1951). In fictitious play, players keep track of the relative frequencies with which other players play different strategies over time. These relative frequencies then lead to beliefs about what other players will do in the next period. Players then calculate expected payoffs for each strategy based on these beliefs, and choose strategies with higher expected payoffs more frequently. The basic fictitious play model weights all past observations equally, but more recent variations of the model give different weights to past observations, reducing weights for observations further back in time. The Cournot model is at the extreme end of this spectrum, where only the most recent observation is taken into consideration. A more recent variation of belief learning has been proposed by Jordan (1991), and involves Bayesian learning. In this scenario players are uncertain regarding the payoffs of other players, but have prior probabilities that are updated over time regarding which payoff matrix the other players are using.
As far as empirical testing is concerned, direct measures of the information used by subjects indicate that fictitious play does not explain learning well. Nyarko and Schotter (2002) showed that stated beliefs often deviated from those proposed by fictitious play, even though fictitious play predicts behavior by other players more accurately than the beliefs stated by the experimental subjects. The predictions of the Jordan model have also been tested, with mixed results. The model appears to predict well in a simple situation, but is unlikely to perform well in more complex games. It is difficult to draw definite conclusions in comparing the merits of reinforcement learning compared with belief learning. Different studies have favored different models, and results depend on: (1) the type of game used; (2) the precise specification of the learning model used; and (3) the econometric methods used to test goodness of fit and prediction.
Experience-Weighted Attraction learning
This model was introduced by Camerer and Ho (1999a, b) in response to the perceived weaknesses of both the reinforcement and belief learning models. The most obvious problems were that reinforcement learning models assumed players ignored information about foregone payoffs, while belief learning models assumed that players ignored information about what they had chosen in the past. Since empirical testing indicated that players seem to use both types of information, the EWA model was created as a hybrid to take into account all the relevant information.
The EWA model is therefore mathematically complex in its construction, containing four parameters. These parameters relate to:
1 The weight placed on foregone payoffs.
2 The decay of previous attractions (due to forgetting or being given less importance as the environment changes).
3 The rate at which attractions grow; this affects the spread of choice probabilities for different strategies.
4 The strength of initial attractions, which depends on prior beliefs. This is updated in Bayesian fashion.
It goes outside the scope of this book to describe the details of the EWA model, and the interested reader is directed to the original papers, and to Camerer (2003).
When it comes to empirical testing, statistical analysis involving 31 data sets has shown that EWA is generally superior in terms of goodness of fit compared with either reinforcement learning or the weighted fictitious play version of belief learning (Camerer, Ho and Chong, 2002). The model has been criticized as being unnecessarily complex, and including so many parameters that it was bound to fit data better than other models; however, these criticisms ignore the three main strengths of the model:
1 The EWA parameters are not really additional to those in other models. Other models simply implicitly assume certain values for these parameters.
2 The EWA model illustrates the relationship between the reinforcement and belief learning models. By letting certain parameters take on extreme values EWA can become identical with these models.
3 The EWA model not only has a better fit, but it also predicts better (in 80–95% of the studies where comparisons have been made). It is often assumed that these two criteria for a good theory go together, but this is not necessarily true. It is an important point in statistical analysis that the incorporation of additional parameters in a model can improve goodness of fit. However, the ultimate test of the model is out-of-sample prediction. It has been shown in a number of studies that models with better fit do not necessarily produce better out-of-sample prediction. The rule used by Camerer, Ho and Chong (2002), which is fairly common, is to use only 70% of the data for estimating goodness of fit (assuming this allows sufficient degrees of freedom). Then the resulting model based on the 70% sample is tested for goodness of prediction against the remaining 30% of the data.
Rule learning
The learning models proposed so far all involve a single ‘rule’ and sticking to it. Stahl (1996, 1999a, b, 2000a, b) has proposed a model, again a hybrid, that allows people to switch from one rule to another, depending on how these rules perform. A rule is essentially a way of weighting various pieces of evidence, relating to the seven elements of information described in Table 9.10. These rules in turn determine strategies, and the probability that a strategy is played depends on the weight attached to a particular rule; these weights are updated according to how each rule performs over time.
Like EWA, this is a complex, multi-parameter, model that is difficult to estimate econometrically. However, it does have a sound psychological basis, and is very flexible. Stahl has shown that, in terms of predicting relative frequencies of choices in a population, rule learning predicts better than other models, including EWA (again using out-of-sample methods).
Behavioral conclusions
Standard game theory does not take learning into consideration at all. This means that the equilibrium predicted by SGT does not involve any change in strategies over time as a game is played repeatedly.
A prominent anomaly here relates to repeated PD games. We have seen that the dominant strategy in the one-shot game is to defect. However, it has been repeatedly observed in experiments that in repeated games it is common for players to cooperate. This phenomenon is discussed in more detail in the next chapter, since it involves the concepts of fairness and social preferences.
The SGT prediction provides a benchmark for comparison for learning models, albeit one that should not be difficult to improve on. Indeed, Stahl (2001) has shown that all the models discussed in this section predict considerably better than the standard equilibrium model. As might be expected, the learning models that fit and predict better tend to be more complex, incorporating more information and more parameters. Empirical results are difficult to compare, since different models perform differently in different games. Recent studies of learning have produced highly contradictory results. For example, a meta-analysis by Weizsäcker (2010), combining the results of 13 experiments carried out in other studies, finds that people ignore the choices and payoffs of other players too often. On the other hand, Goeree and Yariv (2010) find that most subjects imitate the average choices of the other players and do not consider their own possible payoffs enough. In general, reinforcement models tend to do better than belief models in simple MSE games, but belief-learning models perform better in coordination games, market games and iteration games. Good learning models should be flexible enough to perform well in terms of fit and prediction on a universal basis. In view of this problem, Camerer (2003) suggests three main challenges for learning theory in the future:
1 Models should allow for sophistication. This means they should take into account that players understand how other players learn; in turn this involves taking into account the last two elements of information in Table 9.10, the actual and foregone payoffs of other players.
2 Models should allow for incomplete information regarding foregone payoffs. This again requires greater complexity.
3 Models should allow a greater range of possible strategies, combined with some algorithm for reducing these to a feasible number for comparison purposes. The determination of an appropriate algorithm poses a major challenge, since it relies on research from neuroscience. An example of such an algorithm is the ‘somatic marker’ hypothesis proposed by Damasio (1994).
9.8 Summary
9.9 Review questions
1 Explain what is meant by a dominant strategy equilibrium.
2 Compare the concept of dominant strategy equilibrium with Nash equilibrium.
3 Explain the structure of the prisoners’ dilemma game, and show how its equilibrium is determined in SGT.
4 Explain what is meant by a mixed strategy equilibrium, and its implications for optimal strategy.
5 Explain the main findings of the study by Goeree and Holt (2001).
6 Explain the meaning of focal points, and the strategy implications.
7 Explain why equilibria in bargaining games are often different from those predicted by SGT.
8 Explain the role of signaling in games.
9 Explain what is meant by a pooling equilibrium and the circumstances under which it may occur.
10 Explain what is meant by EWA learning.
9.10 Review problems
1 Mixed strategy equilibrium
The table below shows success rates for the receiver in returning a serve in a tennis match.
Table 9.11 Receiver’s success rates
a) Is this a zero-sum game?
b) Explain why a 50/50 randomization strategy is non-optimal for each player.
c) Determine the optimal strategy for each player.
d) Determine the overall success rates for server and receiver, assuming each is using an optimal strategy.
2 Types of game and equilibrium
The table below shows payoffs in a two-player game, where there are two possible strategies: X and Y.
Table 9.12 Payoffs in a two-player game
a) Explain the nature of this game, and derive the equilibrium or equilibria.
b) Is this a cooperative or a competitive game?
c) Is there a focal point in the game?
d) Is there any incentive for the players to signal?
9.11 Applications
Case 9.1 Penalty kicking in professional soccer
Where should top soccer players aim their penalty kicks? One might think at first glance that this is a matter of knowing the kicker’s strengths and the goalkeeper’s weaknesses. Most right-footed kickers are better at aiming to the left of the goal, or the goalkeeper’s right. On the other hand, more goalkeepers are more proficient at diving to their right. However, as soon as the kicker predictably kicks in one direction the keeper can anticipate this and is more likely to make a save. Also, if the keeper becomes predictable in diving to one direction then the kicker can take advantage of this and aim elsewhere. The key to success then for both players is to be unpredictable and randomize their directions of shooting and diving.
Welcome to mixed strategy equilibrium (MSE), as game theorists refer to it. As is often the case in game theory, the conclusions are often counterintuitive. In MSE each player will maximize their success when they are indifferent regarding the direction to aim or move, since if they have any preference the other player can take advantage of them, which means that they are not optimizing their chances of success.
But does this prediction of game theorists actually predict behavior in terms of penalty kicking in the real world? After all, the calculations required to estimate the correct type of randomization to use involve long, complicated formulas. Maybe somewhat surprisingly, the theory predicts remarkably well, according to a study by Chiappori, Levitt and Groseclose (2002). Apparently most penalty kickers in the top French and Italian leagues are extremely good at mixing things up. This does not mean that these players are also extremely good mathematicians; they merely act ‘as-if’ they were good mathematicians. Learning and natural selection are responsible for the result.
Let us consider the study in more detail. Five factors aided the development of an appropriate model:
1 Well-defined game structure
The game involves two players, and is a zero-sum game. Each player must determine their move before they can observe the other player’s move. This assumption can be tested empirically. A penalty kick can travel at up to 125mph, and reaches the goal in 0.2 seconds. Thus keepers must move before the kick is made. However, kickers must also determine direction before they observe the keeper move. This means that the game resembles a ‘matching pennies’ game.
2 Well-defined strategy space
Both kickers and keepers can move right, left, or stay centre.
3 Well-defined outcomes
Preferences are easy to determine: kickers want to score and keepers want to prevent a score. Furthermore, these results involve huge financial incentives at the top level.
4 Available data
There is plentiful video recorded data of top league games in France and Italy. These provided a sample of 459 penalty kicks.
5 Available history
Players can and do examine histories of opposing teams. In particular keepers are trained to save penalties and know the past history of penalty kickers. There is an asymmetry here though that was observed empirically. While keepers treat kickers as individuals with different strategies based on their past history, kickers treat keepers as being homogeneous.
Yet there is one final twist to the story of game theory and penalty taking. One empirical observation was not predicted by the theory. There is one kicking-direction strategy that produces notably more success than any other: kicking straight down the middle. This is what two of the best players of the last decade, Cristiano Ronaldo and Zinédine Zidane, did in the 2006 World Cup, both with success. Why is there this discrepancy with the theory? The anomaly is explained by a factor that the model in the study does not take into account: private costs and benefits to the kicker. If a kicker has a penalty saved when the ball is aimed to the left or right, the save can be put down to the keeper’s skill. However, if the penalty is saved when the kicker aims down the middle he appears an incompetent fool. Thus in the conflict of interests between the individual player and the team, it may be better for the kicker to maximize his own private benefit rather than the benefit of the team, and aim to the left or right.
Questions
1 What type of game does this resemble, in terms of the games described in the chapter?
2 Construct a table showing the normal form of the penalty game, assuming (1) players only take into account the team’s benefits; (2) the penalty kicker is right-footed and is equally able in kicking to the left or right; (3) the goalkeeper is equally proficient in diving to left or right. Determine the equilibrium of the game.
3 Construct a table showing the normal form of the penalty game, assuming (1) players only take into account the team’s benefits; (2) the penalty kicker is right-footed and is 20% stronger in kicking to their left side; (3) the goalkeeper is 20% stronger in diving to their right side. Determine the equilibrium of the game.
4 Construct a table showing the normal form of the penalty game, taking into account both the team’s benefits and individual payoffs, under the same assumptions as the previous question. Determine the equilibrium of the game.
5 Explain the implications of the differences between private and team benefits as far as goalkeepers are concerned.
6 Explain the implications of the differences between private and team benefits as far as team managers are concerned.
Case 9.2 Impasses in bargaining and self-serving bias
In much of the literature on bargaining, failure to reach agreement has often been put down to the problem of incomplete or asymmetric information. The resulting uncertainty was alleged to cause bargaining impasse, since bargainers used costly delay as a signaling device regarding their own reservation values (Kennan and Wilson, 1990; Cranton, 1992). This theory is difficult to test in terms of field studies, and experimental studies have proved difficult because of problems in controlling aspects of the experimental environment.
Loewenstein et al. (1993), Babcock and Loewenstein (1997) and Babcock et al. (1995, 1997) have proposed a different theory regarding failure to reach agreement. This theory relates to the existence of self-serving bias, where subjects conflate what is fair with their own self-interest. They have conducted various experiments relating to legal situations where a plaintiff is suing a defendant for damages. They developed a tort case based on a trial in Texas, in which an injured motorcyclist sued the driver of the car that collided with him for $100,000.
In the first experiment, subjects were randomly assigned the roles of plaintiff and defendant. They then had the experiment explained to them, along with the rules of negotiation and the costs of failing to reach agreement. Both subjects were then given 27 pages of materials from the original legal case, including witness testimony, police reports, maps, and the testimony of the parties. Subjects were informed that the identical materials had been given to a judge in Texas, who reached a judgment between $0 and $100,000 in terms of compensation.
Before negotiation, the subjects were asked to guess the damages awarded, with the incentive of a monetary bonus if their guess was within $5000 of the actual amount. They were also asked to state what they considered a fair amount of compensation for an out-of-court settlement. None of this information was available to the other party. Subjects were then allowed to negotiate for 30 minutes, with the legal costs to each party mounting as the case took longer to settle, at the rate of $5000 for every five minutes delay. If no agreement was reached after 30 minutes, the judge’s decision determined the plaintiff’s compensation. Apart from being paid a fixed fee for participation, the 160 student subjects received additional rewards according to the bargaining outcome, with $1 corresponding to each $10,000 of outcome.
Under normal negotiation conditions, where pairs of players are assigned roles as either plaintiff or defendant from the outset before reading the details of the case, there was a large average difference in estimates of damages, with plaintiffs estimating the judge’s award to be about $14,500 higher than defendants. However, this study did not itself demonstrate self-serving bias, since other factors might possibly have caused the discrepancy. Therefore a second study, by Babcock et al. (1995), varied the protocol. In this experiment there were two groups of subjects. The first group acted as a control group; roles were again randomly assigned and the subjects had the same instructions as in the first experiment. The estimates of the judge’s award were even further apart than in the first experiment, averaging about $18,500. Furthermore, 28% of the pairs of bargainers failed to reach agreement. In the second group, the roles of the subjects were only assigned after they had read all the case materials. This had a dramatic effect on outcomes: estimates of the judge’s award now varied by an average of less than $7000, and only 6% of the subjects failed to reach agreement.
This second study therefore demonstrated that self-serving bias occurs in the encoding of information; other studies have confirmed this process, as bias causes people to ignore information that is not favourable to their interests.
Babcock et al. (1997) found another way of removing self-serving bias in the above situation. After the players were assigned roles and had read the case information, they were told about the possibility of bias and asked to list weaknesses in their position. Again the results were quite dramatic: there was no significant difference in the estimates of damages by both sides; 96% of the pairs settled, and settled more quickly than in either of the previously described protocols.
Questions
1 What other factors might improve settlement rates in the type of legal dispute described above? Suggest other experimental protocols that might be used to investigate these factors.
2 Self-serving bias has sometimes been called the opposite: ‘self-defeating’ bias. Why?
3 What might be the role of self-serving bias in evolutionary terms, bearing in mind that it can be self-defeating?
Case 9.3 Market entry in monopoly
Cooper, Garvin and Kagel (1997a) performed an interesting and revealing study of market entry in a monopolistic situation. In their experiment, monopolists were classified into two types, high-cost (H) and low-cost (L). A potential entrant was considering entry, but did not know the monopolist’s type. The game was a simple sequential game, with the monopolists moving first by determining output, and then the potential entrant moved by deciding whether or not to enter. The game was repeated over a number of periods and cycles, in order to gain some insight into the learning process.
Monopolists moved by determining an output in the range 1–7 units. H firms maximized profit at the output of 2 units and made losses if output exceeded 5 units, regardless of whether the other firm entered. L firms maximized Profit at 4 units, and continued to make Profit up to the maximum output. Profits were obviously much higher for both H and L firms if the other firm did not enter. For entrants there were two different playing protocols. In the first one (LP), payoffs from entering were generally lower, and the expected value of entry (based on the prior probability of the monopolist being H of 0.5) was less than the payoff from staying out. In the second protocol (HP), payoffs were generally higher for the entrant, and the expected value of entry based on prior probabilities was greater than the payoff from staying out.
In both protocols the monopolist moves first, determining output. A high output in the initial move acts as a signal that the monopoly is of the L type, and therefore the other firm should stay out in order to maximize its own payoff. This signal is obviously more costly for H firms than for L firms, but there is some incentive for H firms to hide their type and aim for a pooling equilibrium, where E cannot see what type of firm the monopolist is. If E cannot distinguish between the two types of monopolist, he is forced to use prior probabilities, and in the LP protocol this will deter E from entry (based on expected values), giving greater payoffs to both H and L firms.
According to standard game theory, there are several equilibria in the LP protocol. There are two pure-strategy separating equilibria, where H produces an output of 2, while L produces either 6 or 7, deterring entry. There are also several pooling equilibria, with both H and L types producing the same output, any level from 1 to 5. In this case, since E cannot observe type, he is deterred from entry, as explained above.
In the HP protocol the SGT equilibrium is different, since it now becomes Profitable for L firms to produce 6 or 7 units, an unProfitable output for H firms, resulting in a separating equilibrium. The higher output is necessary to convince E that they are indeed low-cost firms and that entry is therefore not worthwhile. In this situation there are also several partial pooling equilibria where the H and L types do not make exactly the same choices, but the sets of choices they sometimes make overlap.
In a second version of the experiment Cooper, Garvin and Kagel (1997b) increased the payoffs of H firms at outputs of 6 and 7, so that they were still positive at the highest levels of output instead of making losses as previously. This made it more difficult for L firms to give a credible signal and separate, and therefore more difficult for E firms to decide whether or not to enter. The objective here was to see how the rate of convergence and the learning rate would be affected; they were predicted to be slower than in the first experiment.
A number of important empirical findings emerged from both of these experiments:
1 Players played as ‘myopic maximizers’ at the start, maximizing payoffs without regard to how their opponent would perceive this action. This applied in both protocols.
2 In the first LP protocol, H players soon learned to increase output to conceal their type, leading to a pooling equilibrium at the output of 4 units. By the end of all the sessions nearly 70% of the H players settled on this output, and nearly all the L players. Only 6% of potential entrants entered.
3 In the second HP protocol, H players again learned to increase their output from 2 to 4 units to try to pool with the L players. However, the L players then gradually learned to increase their own output to 6 units, to separate themselves from the H players, who could not make a Profit at such a high output. By the end of the game there was essentially a separated equilibrium, with 80% of L players producing 6 units, and nearly half the H players producing 2 units. However, there was another spike of H players, with 32% of them still trying to conceal their type by producing 4 units. This turned out to be a failed strategy, since all the E players entered at all outputs up to and including 5 units.
4 In the second experiment convergence was not only slower, as predicted, but there was no real pattern of convergence at all. H players tended to average an output of 3 throughout the periods, while L players usually produced 4 units, with a gradual upward trend. The result was a partial pooling equilibrium, with overlapping outcomes. Instead of all E players entering at the output of 4, as in the first experiment, only 72% entered at this output, as it became more difficult for E players to observe type.
5 When steps of iterated dominance are explained to the subjects, the rate of equilibration is faster.
Questions
1 How do the empirical findings from the experiments compare with the predictions of standard game theory? What new light do they add to our knowledge of game behavior?
2 Explain what is meant by a credible signal, using the experiments as an illustration.
3 Explain the relationship between iteration and learning in the context of these experiments.