THE PRISONER’S DILEMMA IN REPEATED INTERACTIONS
Do Drawn Knives Increase Cooperation in the World?
SPONTANEITY, AUTOMATIC RESPONSE, AND SPEED OF REACTION ARE among the most important characteristics of emotional responses. In fact, there are many cases in which sheer speed of reaction is one of the advantages that emotional responses have over careful deliberation. Our instinctive recoil on seeing a snake crawling in the grass saves us from possible danger far more efficiently than would cognitive analysis of the situation.
The speed and automatic nature of our social reactions are, it turns out, very important. I will show in this chapter how emotional behavior, perhaps paradoxically because of its automatic nature, can bring about cooperation in situations in which rational behavior fails to do so.
We will take another look at the Prisoner’s Dilemma, but this time we will concentrate on situations in which the players play the same game many times. This means that the players need to take into account long-term strategic considerations.
In the previous chapter we showed that rational and selfish individuals will not cooperate in the Prisoner’s Dilemma game when it is played only once, since noncooperation is a so-called “dominant strategy”—it guarantees a higher payoff no matter what the other player does. Now consider what happens if the game is played twice. At each stage of play, each player decides whether to cooperate (“be generous”) or not to cooperate (“take”) at that stage. After both stages are completed, the total payoff that the players receive is the sum of the payoffs in each of the two stages.
To analyze rational behavior in this repeated game, we start by concentrating on the second stage of the game. In the second stage, the original Prisoner’s Dilemma is in effect being played only once—there is no next stage in which to punish or reward behavior this time around. Thus the strategic analysis is equal to that of the one-stage Prisoner’s Dilemma, which we have already shown leads to the conclusion that the only rational behavior is noncooperation by both players.
Knowing what rational players will do in the second stage, we can try to predict how the players will behave in the first stage of the game. The players’ behavior in the first stage has no influence on their payoffs in the second stage of the game, hence the first stage is in effect a one-stage Prisoner’s Dilemma as well. In the first stage the players will once again both choose not to cooperate.
It is not difficult to see that the same reasoning will apply to any number of repeated stages, as long as both players know the exact number of stages that will be played, whether it is one or three or a hundred thousand. In greater detail, when both players know that they are playing in the last stage of the game, they have no rational reason to cooperate no matter what happened in the previous stages. But it then follows that in the second-to-last stage they will not cooperate, and so on. This sort of reasoning is called an inductive argument, and it is often used in game theory.
Notice that the induction here starts off with both players not cooperating in the last stage. But what happens if the players don’t know when the last stage will be, even as it is happening? Most human interactions are, in fact, like this. Consider for example the interactions you have with your regular car mechanic, your colleagues from work, or even your spouse. You almost never know exactly how many more times in the future you will interact with them. This naturally leads to the question: What rational behavior can be expected without the assumption that the players know when they will reach the last stage of a repeated game?
Robert Aumann answered this very important question. It is considered one of his most important contributions to game theory. Using a mathematical model, Aumann proved that in such situations cooperation is possible in equilibrium even when the players are rational. Both the model and Aumann’s proof are beautiful and deep constructions. Fully explaining them in detail requires delving into a level of formal mathematics that is beyond the scope of this book, so let me try to describe them in plainer terms.
Imagine yourself playing the Prisoner’s Dilemma repeatedly in a situation where after every stage there is a 99 percent chance that you will play the same game again against the same player and a 1 percent chance that you will never meet that same person again. This description is a bit unrealistic—it probably overstates the number of interactions you’re likely to have in the long run with any one person. But it is valuable for describing the short-term mindset of most interactions, so let’s put that objection aside for now.
We need to consider what “strategy” means in this case. In the one-stage game a strategy is simply a decision of whether or not to cooperate. In a repeated game the concept of a strategy is much more complicated; it is in effect a thick book of decisions, with each decision relating to the action that you will choose given what has happened up to now in the play of the game. Here is an example of such a strategy: up to the 700th stage I cooperate no matter what the other player has done, and from the 700th stage onwards, after every stage that the other player does not cooperate, I reciprocate by not cooperating for the next two stages.
If you think that this seems like a complicated strategy, my response is that it is actually a very simple strategy—note that I managed to describe it entirely within a sentence and a half. There are strategies that are so complicated that to write them down even for the first few stages I would need more paper than can be found in the entire Library of Congress (including the paper in the restrooms). Often, however, the most complicated strategies are the least interesting. In fact I will describe in this chapter two strategies that are extremely simple but of great interest. They are:
1. The Grim Trigger strategy—in the first stage I choose “be generous” and continue to be generous as long as the other player also chooses “be generous.” However, if the other player chooses “take” at some stage (even if the other player does this only once) then I will forever after that choose “take” in every subsequent stage.
2. The Tit-for-Tat strategy—at every stage I choose the same choice that the other player chose in the previous stage.
Two rational players (whose only goal is their personal material gain) who both use the Grim Trigger strategy will find themselves in an equilibrium under which they will both cooperate (choose “be generous”) forever. The explanation for this is quite simple. First note that if both players are using the Grim Trigger strategy, then they cooperate in the first stage. Both players will recognize that the other player has cooperated. The strategy will then lead both of them to cooperate in the second stage, and similarly to cooperate in the third stage, and so on. At each stage in which they both cooperate they each add $50 to their total winnings.
Neither player can do better than this by choosing a different strategy, as long as the other player sticks to the Grim Trigger strategy. It is true that if one of the players were to choose “take” at some stage, when the other player is using the Grim Trigger strategy, then the player who chose “take” will get $100 at that stage, making him better off by $50 than he would have been had he chosen “be generous.” But by doing so he will have triggered the other player’s “punishment”: in every subsequent stage (and there are many subsequent stages to be expected) he will lose $50 instead of gaining $50, as the other player steadfastly chooses “take” no matter what happens. Note that stable cooperation is attained here due to the fact that any noncooperation triggers immediate noncooperative retribution on the part of the other player, creating a situation of effective deterrence to noncooperation.
IN HIS NOBEL PRIZE ACCEPTANCE LECTURE IN STOCKHOLM, SWEDEN, Robert Aumann spoke about a game theoretical insight very similar to the one presented in the previous chapters. He even claimed that this insight explains the essence of nearly every international conflict, including the Israeli-Palestinian conflict. The message was that to prevent bloodshed, people need to create mechanisms for deterrence using tough strategies, as the United States and the Soviet Union did during the Cold War. Only strong deterrence, under this argument, can prevent people from succumbing to incentives for conflict.
Shortly after Aumann’s Nobel Prize ceremony several media publicists approached me and asked me to respond to this claim. I argued that despite the fact that the insight presented by Aumann is deep and beautiful and that I couldn’t point to anyone who deserved a Nobel Prize more than Aumann, there are very few direct connections between the elegant mathematical results in this field and concrete applicable conclusions for international conflicts. Deterrence alone is a situation too unstable to be used as a dependable basis for maintaining peace and preventing bloodshed—any small change may set off the “Grim Trigger.” Although the theoretical model implies that cooperation is an equilibrium under conditions of deterrence, once that equilibrium is broken the entire edifice on which peace and cooperation depends is shattered because the very threats underpinning the deterrence are liable to lead to disasters on global scales (imagine what would have happened if the United States and the Soviet Union had actually implemented the bellicose threats that they regularly issued during the Cold War).
Deterrence alone is not sufficient. Alongside deterrence based on threats we need to construct systems that include positive inducements for both sides, such as joint economic interests, for example, that can serve as an additional source of stability in international relations. This is similar to the idea that individuals should be motivated using both carrots and sticks.
Some people went much further than I did in opposing some of the ideas presented in Aumann’s Nobel lecture. A group of Israeli leftists formally petitioned the Nobel Committee with a request that it withdraw the awarding of the Nobel Prize to Aumann due to his political opinions and the political lessons he draws from his scientific research. This infuriated me (possibly an irrational emotional response). If science were administered along strictly politically correct lines and its leading practitioners were only rewarded based on their political opinions, human progress would still be stuck in the same place that it was during the Dark Ages.
The Tit-for-Tat strategy is less drastic than the Grim Trigger strategy but still ensures equilibrium. The Tit-for-Tat strategy also punishes noncooperation on the part of one player, but in this case the punishment for noncooperation that lasts only one stage is more forgiving than the punishment of the Grim Trigger strategy. If the noncooperative player goes back to cooperating in the next stage, then the punishment ceases and the players go back to playing cooperatively in each stage.
It turns out that Tit-for-Tat leads to a cooperative equilibrium; neither player can profit by unilaterally choosing not to cooperate. If a player does choose not to cooperate for a few stages and then cooperates again, the play of the game will go back to a cooperative path in the future, but until that happens she will lose more than she gained from temporarily not cooperating. (It takes a little math to show this, but see for yourself if you like. What happens if a player chooses not to cooperate for only one stage? How much does she gain that first round, and how much does she lose thereafter?)
We have so far considered repeated interactions in which, after each stage, both players expect that there will be another stage of the game with high probability. What happens in other situations? Consider two concrete examples. Imagine yourself enjoying a week of vacation in Malaga, Spain. On the first day of the vacation you walk into a restaurant and are so pleased with the excellent meal served there that you decide to return and eat there on every remaining day of the vacation. Each time you sit at the restaurant the same waiter serves you. In this scenario, your interactions with the waiter are in effect in a six-stage (six being the number of remaining days in the vacation) repeated Prisoner’s Dilemma.
Cooperation, which involves the waiter giving you good service and you reciprocating with a generous tip, is significant in this situation. Note that on each day of your vacation—except for the last one—you expect repeated interaction with the waiter with high probability. On the last day, however, you expect with high probability that you will not be returning to the same restaurant at any time in the foreseeable future, since it is the last day of the vacation, your flight tickets were booked long ago, and you need to be back at work the day after tomorrow.
Can the Grim Trigger strategy ensure a cooperative equilibrium on each day of the vacation? Clearly not (again, assuming rational considerations with a single goal of selfishly maximizing your material condition). Even if the waiter is under the impression that you will be staying in the city for a very long period of time with an uncertain last day, cooperation will not be maintained on each day of your vacation for the simple reason that on the last day of your vacation you have no (selfish) reason for leaving the waiter a tip. There is a very small probability that you will return to the same restaurant the next day (your flight might be canceled, so we can suppose that probability is small but not zero). It follows that if you walk out without a leaving a tip, the probability that the waiter will be able to punish you with bad service in the future is very small.
If the waiter is sufficiently rational, intelligent, and “selfishly materialistic,” he will understand that at some point there will come a day in which you will walk out of the restaurant without leaving a tip even if he gives you first-class service. That might be enough to wipe out the incentive he has to serve you well every day: he knows with certainty that a day without a tip is coming, he just does not know exactly when that day will arrive.
This description of the peculiar relationship between a vacationer in Malaga and a local waiter might seem a bit over the top, but it actually plays out this way more often than you might think. It is known that people tend to give larger tips in local restaurants where they are regular customers than in foreign restaurants that they happen to stumble upon and that they will return to with very small probability. Service is also usually better in restaurants whose customers are local residents who frequently eat there, compared to tourist traps.
Despite this, we still often leave tips, even in places where there is no material gain for us in doing so. Why do we do this? Why do we avoid the opportunity to exploit the “last day effect” cynically whenever we can? (In fact, there are people who tend to leave an especially large tip on the last day of a vacation as a form of gratitude for the good service they received over several days.)
The answer, unsurprisingly, lies in our emotions. Remember, in the real world we play out our Prisoner’s Dilemma–like situations over and over again, not just once. To help think about this, let me introduce the concept of automatons.
Computer scientists invented automatons, but they are widely used in many models in economics and game theory. My small contribution to their work is this: I believe that emotions can be described using automatons and that this can lead to new insights, even though automatons are machines.
Automatons are defined using the following components (and only these):
2. A set of actions
3. An outcome function that, given a state and an action, determines a new resulting state
4. An action function that associates each state with an action
5. An initial state
A copy machine that makes one hundred copies is a good example of an automaton.
Its set of states is the set of all the integer numbers from zero to one hundred (hence there are 101 states).
Its set of actions contains only two actions, “copy” and “stop.”
Its outcome function takes every state x (between zero and one hundred) and returns the state x+1 if the action is “copy.” If the action is “stop,” the function returns the state x, that is, the state does not change.
Its action function returns “copy” for every state that is less than one hundred and returns “stop” when the state is one hundred.
Its initial state is zero.
You can see that based on the way it is defined, this automaton will start at state 0, then move on to state 1 followed by state 2 and so on. At each of these states the automaton will make a copy of the document until it reaches the state 100, at which point it stops. (If this description reminds you of a computer program, there is a good reason for that. An automaton is essentially a simple computer program.)
You might think of automatons (and computers) as the exact opposite of emotional beings, yet they are similar in at least one way: if you know the circumstances, they are predictable. If I react emotionally to the situations in which I find myself and always draw a knife when insulted, then my behavior can be described using only two states: (1) I feel insulted, and (2) I do not feel insulted. My action function causes me to draw my knife if (and only if) I feel insulted. I am, in effect, an automaton, and not even a very complex one.
In contrast, if I am a purely rational person, then my behavior will be more complex. A sense of insult alone might not suffice to cause me to draw my knife. I might do that only if I feel insulted and I am also persuaded that the person who insulted me cannot later prove in a court of law that I used a knife against him. The sub-situation in which it cannot be proved that I used a knife is itself composed of many other sub-situations (who else is in the area and can serve as a witness, is there a surveillance camera in operation that can be used in court, etc.). We see that the number of states needed to describe the behavior of a rational person is far greater than the number of states in the description of an emotional person, making the use of automatons for modeling rational behavior much more difficult. (Remember, emotions are adept at creating commitment—we are less likely to respond to such subtleties as the presence or absence of criminal witnesses when feeling insulted or angry.)
Hence, the crucial difference between a rational and an emotional reaction is that the latter is less dependent on the circumstances. This doesn’t mean that an emotional person will always react in the same manner to an insult, but it does say that a rational person’s reaction will be more dependent on the circumstances of the event (this is also consistent with the fact that a rational state of mind is associated with more self-control).
The emotional, “automaton” description feels a bit more like real life, doesn’t it? You might be puzzled regarding the “drawn knife” example above—after all, drawing knives could not possibly lead to useful cooperation. But that is wrong. The emotional behavior that leads to drawing knives is a positive element in forming cooperation. To be more precise about this, and to avoid exaggeration, let’s state it this way: vengeful behavior, in the right dosage, can be a positive element in forming cooperation. Hesitant and over-forgiving emotional behavior will not lead to cooperation. To the contrary, it leads to egotism, because in a world in which every action is forgiven, every individual has an incentive to act egotistically while harming others.
Imagine that you are the following automaton playing the game:
1. The set of states represents your emotional state: you are either angry or calm.
2. The actions are either “cooperate” or “don’t cooperate.”
3. The outcome function takes the action chosen by the other player in the previous stage and determines your state in the current stage as follows: if the other player chose “cooperate,” you are now calm, but if the other player chose “don’t cooperate,” you are now angry.
4. The action function takes your state into account and determines the action you choose as follows: if you are calm, then you choose “cooperate,” and if you are angry, you choose “don’t cooperate.”
5. Your initial state is “calm.”
If both players are automatons as described above, then they will definitely cooperate throughout all the stages of the game. This follows because they both start in a calm state, leading each of them to cooperate, which further keeps them both in a calm state and so on; no player will ever be angry.
We need to check whether a player can gain more by behaving as if she were a different automaton (assuming that she is playing against the automaton described above). For example, we can imagine that one of the players is always in an angry state no matter what happens or is always in a calm state no matter what.
To gain more, even in the short term, a “deviating” player will need to choose “don’t cooperate” in at least one stage, giving him a payoff of $200 as opposed to $150 (since his opponent will have chosen “cooperate”). But this behavior will have implications in the later stages of the game. After the deviating player has chosen “don’t cooperate,” the other player will be in an angry state, leading her to choose “don’t cooperate” in the next stage. If the deviating player chooses “cooperate” at that stage then he will get $0 instead of $150, so that he ends up losing more than he gained from his one-time deviation. If the deviating player instead continues to choose “don’t cooperate,” in subsequent stages he will lose $100 (relative to what he could gain if he always chose “cooperate”) every time he does so.
The only opportunity for a deviating player to gain a profit is when his behavior has no implications for the future—which is when there is no relevant future, that is, in the very last stage of the game. But if the deviating player is a two-state automaton whose state depends only on the actions of the other player (meaning that he exhibits emotional behavior), his actions cannot depend on which stage the game is currently in. We conclude that an emotional player cannot improve his total payoff by acting differently from the automaton described above. It follows that cooperation at each and every stage forms an equilibrium.
The interesting point here is that each of the two emotional players in this situation will earn more under equilibrium than either would if both players were rational players playing the same game. From this perspective, emotional behavior is better for sustaining cooperation in the repeated Prisoner’s Dilemma game even when the number of stages of the game is definitely known by both players.
Now, let’s return to our Spanish waiter and why you tip him. In your interaction with the waiter each of you behaves like an automaton with two possible actions: “tip” and “don’t tip” for you and “provide good service” and “provide bad service” for the waiter. Every day each of you is controlled by one of the following emotional states: “anger” and “happiness.” These states are determined by the recent action of the other party. You are happy if you got good service, and the waiter is happy if he gets his tip. Finally, a state of happiness drives you to leave a tip and drives the waiter to provide good service. All this dictates a dynamic in which the date (i.e., whether it is the last day of your vacation) plays no role. You and your waiter are simply automatons too simple to get the date into the equation. If you are an emotional automaton, as so many of us seem to be, then you will be rewarding him for today’s service, and he will reward you with quality service in response to the tip you left during your last visit to the restaurant. The fact that today is your last day in Spain doesn’t matter; you will only punish bad service.
If you thought of feeling insulted by this description, you shouldn’t. You are conscious and smart enough to know the date and whether it is the last day of a vacation in Spain, but your emotional state prevents you from making the link between this piece of information and the decision of whether or not to leave a tip.
What would happen if one of you, say you yourself, is perfectly rational (and selfish) while the other is an emotional automaton like the one described above? You would still tip your waiter on each day except for the last one. Failing to do so will trigger bad service tomorrow, but you won’t be there. But if both of you are perfectly rational, the waiter can expect you not to give a tip on your last day, and therefore will offer bad service. As argued earlier in the context of the Prisoners’ Dilemma, your cooperation is doomed to fail. You will leave no tip, and you will get lousy service throughout your entire vacation.
The main take-home insight from the entire analysis here is quite surprising: it is simplicity and straightforwardness, rather than sophistication and subtlety, that are conducive to cooperation and eventually to making both parties in an interaction better off.