5
Risk and Reward

Defining decision-making in terms of action-selection creates the potential for decisions that are illogical from a value-based perspective. The mammalian decision-making system includes multiple decision-making components, which can produce conflict between action options.

A teenager floors the accelerator as he navigates the winding curves of a wooded road; the car skids at each turn, barely containing its acceleration. A college student spends thousands of dollars on clothes and doesn’t have enough money to pay for rent. An alcoholic stops by the bar on the way home. “Just one drink,” he says. Hours later, his kids come to take him home. A firefighter stands frozen a moment before a burning building, but then gathers her courage and rushes into the building to pull a child to safety. Each of these examples is a commonly cited case of a conflict between decision-making systems.

The teenager is balancing the thrill of speed with the risk of losing any potential future he might have. The college student spends the money now for the short-term gain of having clothes and ignores (or forgets) the long-term gain of actually staying in her apartment. The alcoholic risks his family, his happiness, and often his job and livelihood for the simple, immediate pleasures of a drink. (Of course, the problem is that the one drink turns into two and then three and then more.) The firefighter must overcome emotional fear reactions that say “don’t go in there.” These are all conflicts between decision-making systems.

Risk-seeking behavior

Colloquially, risk entails the potential for loss in the face of potential gains. (Our hypothetical teenager is risking his life driving too fast.) In economics, the term risk is defined as the variability in the outcome1—the stock market has more risk than an FDIC-insured savings account. This is a form of uncertainty, and it is now known that there are representations of risk and uncertainty in our brains.2 As we will discuss in depth in Chapter 14, there are differences between expected variability (a known probability, such as the chance of rain), unexpected variability (a surprise, such as when something that has been working suddenly breaks), and unknown variability (ambiguity, when you know that you don’t know the probabilities). Animals will expend energy to translate unknown variability into expected variability and to reduce expected variability.3

Imagine you are an animal, entering a new environment. You don’t know what dangers lurk out there. You don’t know what rewards are available. Before you can make yourself safe from those dangers and before you can gain those rewards, you have to explore the environment.4 Imagine that you have explored part of the environment. Now you have a choice: Do you stay and use the rewards you’ve already found, or do you go looking for better ones?

Animals tend to be risk-averse with gains and risk-seeking with losses. This is the source of the difference in the Asian flu example in Chapter 3—phrased in terms of saving lives, people take the safe option, but phrased in terms of losses, people take the risky option.5 But this isn’t the whole story either, because risk and reward interact with exploration and uncertainty.

Interestingly, the most risk-seeking behavior tends to occur in adolescents.6 Adolescents are at that transition from dependence to independence. Humans first develop through a youth stage during which they are protected and trained. At the subsequent adolescent stage, humans go exploring to see what the parameters of the world are. As they age out of adolescence, humans tend to settle down and be less risk-seeking. This ability to learn the parameters of the world after a long training period makes humans particularly flexible in their ability to interact with the world.

An interesting question is then to ask What drives exploration? Animals, including humans, have an innate curiosity that drives us to explore the world. But as the proverb says, “curiosity killed the cat.” Exploring is dangerous. A fair assessment of the dangers could lead us to be unwilling to take a chance. On average, however, people tend to underestimate dangers and to be overoptimistic about unknowns,7 which tends to drive exploration, while still maintaining intelligent behavior in the parts of the world we do know. (Remember, “the grass is always greener on the other side of the hill”—except when you get there, and it’s not.)

So how is the teenager speeding down a road an example of exploration overwhelming exploitation? Our hypothetical teenager is unlikely to be speeding down an untraveled road. When animals are exploring new territory, they tend to travel very slowly, observing everything around them.8 But our hypothetical teenager is testing his limits, trying to figure out how good his reaction times are, how fast he can go while still maintaining control of the car. If the teenager is overestimating his abilities or underestimating the likelihood of meeting an oncoming car on that windy, dark road that late at night, the consequences can be tragic.

We will see throughout this book that personality entails differences in the parameters underlying the decision-making system. Here we have two parameters and an interaction: How curious are you? How overoptimistic in the face of the unknown? How willing are you to risk danger for those new answers? (Adolescents, of course, are notorious for both of these properties—risk-taking behavior driven by curiosity and overoptimism.) But each individual, even in both youth and adulthood, has a different take on these parameters,9 and thus each of us has our own level of risk-seeking and overoptimism, our own threshold of what risks we will tolerate, and how badly we estimate those risks. Balancing risk and reward can be the difference between success and failure, and finding that right balance is not always easy.

Waiting for a reward

Which would you rather have, $10 today or $10 in a week? Barring some really strange situations, pretty much everyone would want the $10 today. What this means is that $10 in a week is worth less to you than $10 today. We say that rewards delivered in the future are discounted.10 Logically, discounting future rewards makes sense because things can happen between now and then; waiting for the future is risky.11 If you get hit by a bus or win the lottery or the world explodes in thermonuclear war, that $10 in a week just isn’t worth as much as it was today.

$10 today can also be invested,12 so in a week, you could have more than $10. This of course works with nonmonetary rewards as well (food, mates, etc.)13 To a starving animal, food now is worth a lot more than food in a week. If you starve to death now, food next week just isn’t that valuable. And, of course, in the long term, food now will lead to a stronger body, more energy, and more ability to find food later.

So rewards now are worth more than rewards in the future. By asking people a series of these money-now-or-money-later questions (Which would you rather have, $9 now or $10 next week?), we can determine how valuable $10 in a week actually is. If we were to plot this as a function of the delay, we can derive a discounting curve and measure how value decreases with delay.14

Of course, we can’t ask animals this question,A but we can offer them a choice between a small amount of food now and a large amount of food later.15 For example, in a typical experiment, a rat or monkey or pigeon is offered two options (levers16 or paths leading to a food location17): taking the first option provides one food pellet immediately, but taking the second option provides more (say three) food pellets after a delay. By observing the animal’s choices, we can again determine the discounting curve.

If discounting were simply due to inflation or the ability to invest money, we would expect it to follow an exponential function of time.18 In an exponential function, the value decreases by the same percentage every unit of time.B So if your discounting function were an exponential with a half-life of a year, you would be equally happy with $5 now and $10 in a year or $2.50 now and $10 in two years. Exponential discounting curves have a very useful property, which is that they are self-similar: your choices now are the same choices you would make in a year.

But neither humans answering questionnaires nor animals pressing levers show exponential discounting curves.19 In fact, humans answering questionnaires, animals pressing levers for food, and humans pressing levers for food all show a discounting curve best described as a hyperbolic function, which drops off very quickly but then becomes more flat with time.

Any nonexponential discounting curve (including hyperbolic ones) will show a property called preference reversal, in which the choices change depending on your temporal vantage point.20 You can convince yourself that you too show preference reversal. Ask yourself which you would rather have, $10 today or $11 in a week? Then ask yourself which you would rather have, $10 in a year or $11 in a year and a week? Each person has a different reversal point, so you may have to find a slightly different set of numbers for yourself, but you will almost certainly be able to find a pair of numbers such that you won’t wait now, but will wait next year. Notice that this doesn’t make any sense rationally. Today, you say it’s not worth waiting for that extra dollar, but for a year from now, when faced with what is really the same choice, you say you’ll wait. Presumably, if you ask yourself next year, you’ll say it’s not worth waiting—you’ll have changed your mind and reversed your preferences. Even though this is irrational, it feels right to most people.

This is the mistake being made by our hypothetical extravagant student, who is saying she would prefer to spend the money now on the immediately available reward (new clothes) rather than waiting for the larger reward (paying rent) later. In fact, when her parents ask her at the beginning of the school year whether she’s going to pay her rent or buy new clothes, she’s likely to tell them (we’ll assume honestly) that she’s going to pay the rent. This switch (between wanting to pay the rent early and buying clothes) is an example of the preference reversal that we’re talking about.

Preference reversal can also be seen in the opposite direction. This might be called the Cathy effect, after the comic strip Cathy by Cathy Guisewite, which has a regular motif of Cathy’s inability to resist high-sugarC treats like Halloween candy or birthday cake. (In panel 1, Cathy says she won’t eat the cake this time, but in panel 3, when faced with the actual cake, we find her eating it.) From a distance, the chocolate is discounted and has low value and the long-term value of keeping to her diet is preferred, but when the chocolate is immediately available, it is discounted less, and poor Cathy makes the wrong choice.

In part, Cathy’s dilemma comes from an interaction between decision-making systems: from a distance, one system wants to stay on her diet, but from close up, another system reaches for the cake. This is the Parable of the Jellybeans that we saw in Chapter 1—it is hard to reject the physical reward in front of you for an abstract reward later.22 One possible explanation for the success of Contingency Management (which offers concrete rewards to addicts for staying clean of drugs) in addiction and behavioral modification is that it offers a concrete alternative option, which allows the subject to attend to other options rather than the drug.23

In a positive light, preference reversal also allows for an interesting phenomenon called precommitment, in which a person or an animal sets things up so that it is not given the choice in the future.24 For example, an alcoholic who knows that he will drink if he goes to the bar can decide not to drive by the bar in the first place. From a distance, he makes one choice, knowing that he will make the wrong choice if given an opportunity later. If our hypothetical extravagant student is smart and knows herself well, then she might put the money away in a prepaid escrow account that can be used only for rent. The two selves (the student at the beginning of the school year who wants to pay rent and the student midyear who wants to buy clothes) are in conflict with each other.25 If she’s smart, she can precommit herself when she’s that early, thrifty student to prevent herself from wasting the money when she’s that later, extravagant student. As we explore the actual components of the decision-making system, we will find that precommitment involves an interaction among Deliberative (Chapter 9) and Self-Control (Chapter 15) systems.26 The ability to precommit to one option over another remains one of the most powerful tools in our decision-making arsenal.27

Stopping a prepotent action

The interaction between multiple decision systems can also produce conflict directly. Only some of the multiple decision-making systems that drive actions are reflected in our consideration of consciousness. This means that there can be actions taken, after which we find ourselves saying, “Now why did I do that?”

Other examples of conflict (such as the firefighter) come from one system wanting to approach or flee a stimulus and another component telling it to hold still.28 In Chapter 6, we will see that there are emotional components that depend on the amygdala (a deep, limbic brain structure involved in simple approach-and-retreat cue–response phenomena29), learned-action components that depend on the basal ganglia (other neural structures that learn situation–action pairs30), and deliberative components that depend on the prefrontal cortex and the hippocampus (cognitive structures that enable the imagination of future possibilities31). When these three systems select different actions, we have a conflict.

Classic experiments on rats and other animals have studied a simple process called fear conditioning.32 In a typical experiment, a rat would be placed in a box, and, at irregular intervals, a tone would be played and then the rat would receive a mild shock.D In modern experiments, the shock is never enough to hurt the animal, just enough to be unpleasant. Similar experiments can be done on humans, where they get a mild shock after a cue.33 What happens is that the rat begins to fear the tone that predicts that a shock is coming. Humans explicitly describe dreading the oncoming shock and describe how unpleasant the expectation of a shock is. In fact, Greg Berns and his colleagues found that the dread of waiting for a shock was so unpleasant that people would choose to have a larger shock quickly rather than have to wait for a smaller one later.34;E Rats cannot, of course, tell us that they are fearful and nervous about the shock, but they freeze in fear and startle more in response to an unexpected event. If you are tense, waiting for a shock, then a loud noise will make you jump. Rats do the same thing.

This brings us to tone–fear conditioning, in which a rat learns to expect a shock after a tone is played. Fear conditioning can then be extinguished by subsequently providing the rat with the tone but no shock. As we saw in the previous chapter, behavioral extinction entails new activity in the infralimbic cortex that projects down to the amygdala and inhibits the learned fear response.35

This matches the Yadin Dudai study on courage that was mentioned in the previous chapter—Dudai and his colleagues studied humans overcoming their fear of snakes.36 The sight of the snake triggered the emotional (Pavlovian) system that had an innate reaction to the snake (“Run away!”) while the Deliberative system had to overcome that prepotent action. As is said in many a movie and many a novel, courage is not the lack of fear, but the ability to overcome that fear.F The fact that we now know the neural mechanisms underlying courage does not diminish what courage is.

Summary

In this first part of the book, we’ve defined the question of decision-making in terms of action-selection, identified that we cannot simply say that we are evaluating choices, and identified that there are multiple systems that drive action-selection within the mammal. In the next part, I want to turn to the question of what these multiple decision-making systems are and how they interact with each other. Afterwards (in the third part of the book), we’ll turn to their vulnerabilities and what those vulnerabilities mean for decision-making problems such as addiction and post-traumatic stress disorder.

Books and papers for further reading

• George Ainslie (2001). Breakdown of Will. Cambridge, UK: Cambridge University Press.

• Robert Kurzban (2010). Why Everyone (Else) is a Hypocrite. Princeton, NJ: Princeton University Press.

• Uri Nili, Hagar Goldberg, Abraham Weizman, and Yadin Dudai (2010). Fear Thou Not: Activity of frontal and temporal circuits in moments of real-life courage. Neuron, 66, 949–962.