9
Deliberation

Some decisions are made by explicitly imagining the potential consequences of that decision. This imagination process is particularly good for decisions that cannot be learned slowly over time, or for major decisions that require a large upfront commitment cost. Deliberation is a computationally expensive process that requires a number of neural systems, including attention, imagination, and evaluation processes, and as such takes time to calculate which action to take.

Deliberation entails the sequential, serial search through possibilities. It entails a prediction of “what would happen if,” followed by an evaluation process and an action-selection process. This means that deliberation requires a host of complex processing. This complex processing makes deliberation incredibly flexible (knowing that a path leads to an outcome does not imply that one must take that path to that outcome) but also makes deliberation slow (it takes time to do all that computation).

Deliberation also requires a knowledge of the structure of the world, particularly the cause–effect structure.1 (If you can’t predict, you can’t search through those predictions.) Deliberation requires imagination.2 Deliberation requires one to be able to construct possible futures, and a process to know that those potential futures are imagined and not real.A Deliberation also requires an active memory process capable of remembering what paths you’ve searched through already and what paths are still left to explore.4 When comparing options, you need to be able to keep multiple options, their evaluations, and the paths to them in your head. By attending to specific pros and cons, one can even manipulate the evaluation.5 This requires a process called working memory, in which concepts are held in an active processing loop.B

The cognitive map and declarative memory

The concept of the cognitive map was introduced by Edward Tolman in the 1930s. Tolman suggested that both rats and humans (and presumably other mammals as well) maintained a “map of possibilities.”11 This meant that learning didn’t have to be aimed at getting rewards and avoiding punishments (as suggested by his contemporaries, such as Clark Hull and B. F. Skinner12). Tolman’s concept implied that it was worth it to an animal to learn information about the world even when that information didn’t immediately lead to a reward or punishment. Information about the world could be used in future decisions, making basic information a worthwhile pursuit itself.

In modern neuroscience, the cognitive map is usually described as if it is a spatial map, but Tolman’s original concept was more “cognitive” than “map,” and more modern interpretations of cognitive maps are closer to the concept of a schema about the structure of the world.13 Certainly, one of the most important things to know about the structure of the world is where things are and how the spatial world is connected. Knowing that there are skyways connecting the building my office is in and the mail-room will be very useful today when it is going to be 2° Fahrenheit outside.C But it is also part of the cognitive map to know that one can check books out of a library and therefore one doesn’t need to buy them. And to remember a friend’s phone number, or that one can call 911 in an emergency (in the United States). Or that Paris is in France and Tokyo in Japan, that New York is on the east coast of the United States and San Diego is on the west coast. All of this knowledge about the structure of the world forms your cognitive map. If you’ve already learned these facts, you can use them to search through options to find the choice you want to make.

These facts are sometimes referred to as declarative14 because they are things that can be “declared”—if I tell you that Tokyo is the capital of Japan, or that I grew up near Washington DC but that I live in Minnesota, you know those facts. You don’t need to practice them. In contrast, I can tell you how to throw a baseball or ride a bike or play the piano or clarinet, but to become any good at it, you need to practice them. In the memory literature, these other skills are called procedural because they require memories of the process and procedures rather than facts.

Declarative memory is often divided into two subcategories: semantic memory and episodic memory.15 Semantic memory entails the facts and knowledge about the structure of the world. Episodic memories are the memories of your own life. The statement that at my sixth birthday party, my parents made me a cake in the shape of a rotary-dial telephone is a semantic memory. My memory that my dad said I had to wait to get the last piece because my initials were in the center (presumably trying to teach me to be polite and let my guests have cake first), and that I snuck my finger in and ate off the initials while my dad was serving a piece to someone, and then demanded a piece of cake since I had already got the letters, is an episodic memory.

The key evidence that animals (including humans) use a cognitive map in making decisions is a process called latent learning. Learning about the structure of the world may not show up in behavior until one needs to call upon that knowledge. Tolman’s initial hypothesis came from observing rats running mazes.16 Rats with experience on a specific maze (just exploring, with no rewards on the maze) would subsequently learn much faster than rats with no experience on that maze. The experiment that finally convinced Tolman that animals had to remember knowledge about the structure of the world was that a rat (who was neither hungry nor thirsty) who was allowed to explore a Y-shaped maze with food on the left fork and water on the right would immediately go to the left fork when made hungry and to the right fork when made thirsty. For humans, at least, this latent learning is about cause and effect. We recognize this knowledge in terms of if–then rules. If I drop this glass, it will break. If I drop this plastic bottle, it won’t. Therefore, it is safer to give a plastic bottle to my three-year-old daughter since she may well drop the glass.

We can think of deliberation as having four parts: the knowledge of the world, the search process using that knowledge of the world to predict what will happen, an evaluation process measuring how good that outcome is, and, finally, an action-selection process to actually take the first step forward.

Search

The first attempts at building artificial decision-making systems were based on search processes.17 Allen Newell and Herb Simon’s General Problem Solver (originally proposed in 1957) searched through possibilities to find a path from a starting point to a goal. It was able to create subgoals in the plan, and then solve those subgoals. It worked by searching through potential subgoals and then solved each subgoal by searching through known actions until it was able to find a path from the starting point to the final given goal. Over the past 50 years, a tremendous amount of work has been done improving search algorithms to the point that highly efficient search algorithms can beat humans at complex games (like the now-famous Deep Blue that was able to beat the grandmaster Garry Kasparov at chess in 1997).18;D

Over the past few decades, however, it has become clear that these highly efficient, optimized search processes are not how human experts solve most of these tasks.19 Search is something done early in learning and in response to changing situations. The more expertise people gain in a field, the smaller the set of futures they search through. Current theories suggest that this is because experts have cached (stored) the better choices ahead of them.20 Search requires holding a lot of temporary results in memory and keeping track of a tremendous number of subparts. This means that it is effortful and requires extensive computational processing.21

As it became clear that human experts do not solve games like chess by deep searches through long possible paths, it became popular to say that search was uniquely human. These hypotheses suggested that search was difficult to do, that humans could do search when they needed to, but that it required complex cognitive and conscious processes. In particular, these theories suggested that search required a linguistic description of the problem (that people needed to talk to themselves as they searched through those possibilities). This implied that it is a uniquely human characteristic, and that animals wouldn’t or couldn’t do search.22

On the other hand, it had been known for many years that animals could make decisions based on expectations of the outcomes of their actions.23 Given the simple processes that were hypothesized at the time, it was assumed that these expectations formed from simple associations between the action and the outcome. However, computational studies of decision-making that began to be developed in the past decade led to the realization that those simple associations are insufficient to solve the problem. Although one can solve a one-step decision (approach Cue One when hungry, Cue Two when thirsty) with a simple association (Cue One has food, Cue Two has water), if the recognition of expectations must cross choices (go left, then right, then left again to get to the food), then the only way to solve this is with processes that encode the entire cause-and-effect structure of the situation.24 Thus, several authors have now suggested that animals are actually doing search-and-evaluate processes under certain conditions.25

In 2007, Adam Johnson and I found that neural signals in the rat hippocampus played out representations of future paths at the times when the computational theories predicted that animals would be searching through the future.26 Hippocampal cells in the rat encode the location of the animal on a maze.27 This means that if you record from enough cells simultaneously, you can decode that representation.28 Adam developed a new mathematical technique29 that allowed us to decode these representations at very fast timescales (cognition happens faster than behavior30), and when we looked at movies of that decoded representation over time, we found that sometimes the animal would pause at a difficult decision point and those representations would sweep down the possible paths, first one way and then the other.E

In humans, deliberation depends on the process of imagination, in which one creates a mental image of a potential future.32 Whether these sequential sweeps of representation down the potential paths ahead of the rat that we saw are the same phenomenon as a human consciously thinking about the future is still an open question, but the neural structures involved match remarkably well. We know, for example, that humans with hippocampal lesions have difficulty imagining the future, and that episodic future thinking in humans produces hippocampal activity as measured by fMRI. In humans, consciously deliberated decisions depend on prefrontal structures coupled to hippocampal systems. In rats, similar prefrontal structures become functionally coupled to hippocampal systems during deliberative decision-making.33 As we will see in the next section, the same structures (orbitofrontal cortex and ventral striatum [nucleus accumbens], both functionally coupled to the hippocampus) are involved in evaluation processes in both humans and animals.34

Evaluation

Current economic and computational neuroscience models of deliberation do not actually do much “deliberation.”F That is, even the ones that include search processes simply look up the value of that outcome as a single number, expressed in a common currency. Obviously, if you are going to compare two choices (such as eating an apple or an orange), you have to put them onto the same scale so you can compare them. Economic models (and the computer models on which current neuroscience models are based) call this same scale a common currency.37 However, this is not our conscious experience of deliberation; our conscious experience of deliberation is that we evaluate the outcomes in relation to each other.G

Introspective and casual observations of the situations where we deliberate over decisions suggest that they tend to be more about difficulties in the evaluation step than the future prediction step. Much of the computational literature has spent its time working on the prediction step of decisions,44 likely because it arose out of the robotics literature, where one needs to have the robot take each step. But, as noted in the earliest search models (such as Newell and Simon’s General Problem Solver), we don’t spend our time working out the path to the outcome; instead, we create subgoals.45

For example, if we imagine a person with two job offers, the action needed is simple—send an email to one job saying “yes” and another to the other job saying “sorry.” This might engender a long series of subsequent actions (signing a contract, finding a house, moving), but our hypothetical applicant doesn’t need to consider how to get to the job in order to evaluate the two possibilities, only to know that it is possible to move to wherever the job is. The complexity in determining whether subgoals matter is that traversing the subgoal can matter in some conditions, but not others. If I’m deciding whether to take the train or the airplane from Washington DC to New York, getting to either the train station or the airport is easy (they’re both on the Metro). However, if I’m considering whether to take the train or the airplane from Baltimore, traversing the subgoal becomes an issue because the train station is in town, while the airport is a long ways out of town. This issue of how to identify when one can skip the subgoal step and when one needs to include it is a complex issue that is the subject of much current research.

In any case, once we have identified the potential outcomes, we need to evaluate them. That evaluation step is complex, depends on attention to details, and is subject to interesting framing effects.46 Although the mechanistic process by which we evaluate these multiple noncomparable objects is currently unknown, we know that this process is affected by the other decision-making systems, particularly the emotional system.47 It’s not surprising to anyone who has observed his or her fellow human beings, but tests have shown that your emotional state at the start of a decision can influence your ultimate deliberative decision.48 At the most trivial level, we can see this in Tolman’s original rat experiment—hungry rats went to the food, thirsty rats went to the water.49 But this also occurs at more complex levels—for example, people who are in an angry or sad mood are more likely to reject offers in trading games.50

We also know some of the brain structures involved in these evaluation processes, because we know what happens when these structures break down. Two structures known to be deeply involved in evaluation in general and in deliberative evaluation in particular are the ventral striatum and the orbitofrontal cortex.51 The ventral striatum, sometimes called the nucleus accumbens, sits at the bottom of the striatum, which is a large, evolutionarily old, subcortical structure.H The orbitofrontal cortex sits at the bottom of the front of the cortex, just behind the eyes.I When these structures are damaged, people and animals have difficulty evaluating options.53

Neural firing in both of these structures relates to value judgments of outcomes. Both structures contain some cells that respond to reward consumption, as well as other cells that respond to cues that predict rewards, and other cells that seem to represent the expected value of an outcome.54 This has been most explicitly tested in monkeys and rats making economic decisions between objects. For example, Camillo Padoa-Schioppa and John Assad gave thirsty monkeys decisions between two flavors of juices. By varying the amount of juice offered, they could determine how much each juice was worth to the monkey. (Is one drop of grape juice worth two drops of apple juice?) What they found is that some cells in the orbitofrontal cortex represented how valuable the apple juice was, other cells represented how valuable the grape juice was, and some cells represented how valuable the choice the monkey made was, translated into a common currency, independent of whether it chose apple or grape juice. Similar effects have been found in humans using fMRI experiments.55

Orbitofrontal neurons also reflect framing effects. In tasks where all of the choices are interleaved, the neurons correctly reflect the revealed preference value between the choices.56 In tasks where the options are given in groups, the cells reflect the valuation between the available options. Leon Tremblay and Wolfram Schultz gave a monkey choices between raisins and apples or apples and cereal. From previous experiments, they knew that the monkey they were recording from preferred raisins to apples and apples to cereal. They found that value-encoding neurons fired very little to the apples in the raisin–apple comparison but a lot to apples in the apple–cereal comparison. In contrast, Camillo Padoa-Schioppa and John Assad did not find these changes to their three juice options, but the Padoa-Schioppa and Assad experiment interleaved the choices, while the Tremblay and Schultz experiment grouped the choices. This grouping is called “blocks” in the jargon of behavioral neuroscience. That is, a subject may receive a block of choices between apple and grape (say for 50 trials) and then another block of choices between orange and banana (say for 50 trials) and then another block of apple and orange … etc. In subsequent experiments, Padoa-Schioppa found that the activity of orbitofrontal cortex neurons depended on exactly this issue—What is the available menu of choices? Just like humans making choices between televisions determine the value of the choices relative to the other options,J monkeys making choices between foods or juices determine the value of the choices relative to the other options available.

Of course, these preferences can change. If you get sick after eating something, you will learn to dislike that food.58 (This is why chemotherapy patients are given strange flavors of foods, so that the extreme nausea that chemotherapy induces does not teach them to dislike their favorite foods.59) Similarly, overexposure to a given food flavor can produce satiation, and preference for other foods.60 As an animal changes its preference (say because it gets satiated because it has had enough of the apple juice, or because it has a bad experience after drinking that apple juice), the orbitofrontal cortex neurons change to reflect that changed preference.61 One of the effects of drugs of abuse, particularly cocaine, is that they disrupt the ability of orbitofrontal cortex neurons to recognize changes in value.62 Orbitofrontal neurons in animals with extensive cocaine experience do not successfully change valuation when the action that leads to a sweet flavor is changed to now lead to a bitter flavor. Part of the problem with drugs is that they affect the ability of the brain to correctly re-evaluate situations when they change.

In a very interesting recent experiment, Geoff Schoenbaum and his colleagues tested whether rats could recognize a change in the reward that would occur after taking an action63 (either by increasing the number of food pellets provided, say from one food pellet to three, or by changing the flavor of the food provided, say from banana-flavored to cherry). If the change in value was numeric (that is, changing the size but not the flavor of the reward), the animal needed the ventral striatum intact, but not the orbitofrontal cortex. However, animals needed both the ventral striatum and the orbitofrontal cortex to recognize the change between flavors. This suggests the ventral striatum is key to recognizing changes in valuation, and the orbitofrontal cortex to recognizing changes in commodity.

So the data suggest that these structures are involved in evaluation, but are they active during the evaluation step in deliberation? fMRI data strongly suggest a role for both the orbitofrontal cortex and the ventral striatum during some aspect of deliberation.64 Direct evidence for a role for these neural structures in evaluation during deliberation comes from neural recording data taken by my former postdoc Matthijs van der Meer and by my graduate student Adam Steiner.65 As we discussed earlier, at certain difficult decisions, rats pause and look back and forth (a phenomenon known as “vicarious trial and error”).66 We have already discussed how, during this process, the hippocampus of rats represents potential choices ahead of the animal, as if the animal was searching down possible paths.67 When he was in my lab, Matthijs van der Meer recorded from ventral striatal neurons from rats on this same task and found that at the same times that Adam Johnson had found the hippocampus sweeping through those future paths, the ventral striatal reward cells (cells that responded positively during consumption) activated again, as if the ventral striatum was covertly representing something about the fact that the animal would get reward if it went in that direction. Recording orbitofrontal cortex neurons, again, from the same task, Adam Steiner found that orbitofrontal reward cells (cells that responded positively during consumption) also activated, again, as if the orbitofrontal cortex was covertly representing something about the fact that the animal would get reward. One interesting difference we found, however, is that the ventral striatal cells reflected the expectation of reward before the animal turned toward that reward (suggesting involvement in the action-selection), while orbitofrontal cortex cells were active after the animal had already turned toward the reward (suggesting involvement in the prediction of an outcome more than the action-selection).

The evaluation step in deliberation is still an area of active research both in humans and in nonhuman animals. We know some of the structures that are involved, and something about the timing of the mechanisms, but we still don’t understand how attention changes the framing of the evaluation process. For example, when comparing cars, imagine trying to decide between a sports car and a minivan or hybrid: one is snazzier, one can carry more, one gets better gas mileage. There is no direct way to compare snazz, size, and gas mileage on a “common currency.” Instead, people go back and forth, deliberating between what is important to them, constructing value hypotheses for comparison. In an interesting twist, patients with orbitofrontal cortex damage do not go back and forth between options—instead, they look at one option at a time and determine if it’s good enough.68 It is not known, however, whether this change in strategy occurs because the orbitofrontal cortex actually controls the strategy used or because the orbitofrontal cortex aids in the comparison and the patients recognize that they just aren’t very good at doing comparisons. In any case, it is clear that the orbitofrontal cortex is critically involved in comparing options during deliberation.

Action selection

The final step in deliberation, the taking of the action, is still a nearly completely open question. How do people and other animals converge on a decision? Is it a process of simply saying “is that good enough?” after each evaluation? Is it a process where evaluations are held in memory and directly compared? At this point, scientists don’t know, but there is some evidence in favor of the comparison hypothesis (that the expected outcome and its evaluation are held in working memory and compared with other alternatives).

If deliberation entails predicting future options and comparing them, then you have to remember each option to be able to compare them. Working memory is the ability to hold multiple things in your head at the same time.69 This would suggest that the effect of taxing working memory would be to make people more inconsistent in their choices—with fewer working memory resources, there is less time to get the deliberation right, and one becomes more likely to give up the search through futures early. This is what the evidence shows.70 People with better working memory are more willing to wait for later rewards. They are also better at making choices. In fact, it is possible to train working memory and improve some people’s ability to wait for delayed rewards.

The advantage of deliberation is that it provides flexibility. Knowing that going to the casino will allow me to gamble does not obligate me to go to the casino. Coupling deliberation with imagination can allow us to make decisions about very complex decisions that we can’t (or shouldn’t) make lots of times (like which job to take or which college to go to or whom to marry). Coupling deliberation with attention allows us to push ourselves away from choices that we are highly motivated to take (that cigarette) and toward choices that are better for us in the long run (exercising at the gym). The problem with deliberation is that we can get “lost in thought.” Like Hamlet, trapped between difficult possibilities, we spend our time predicting outcomes, weighing options, listing pros and cons, and never actually taking action.71

Tolman and Hull

I’d like to end this chapter with a short discussion about the history of deliberation and its role in decision-making. The difference between deliberative and habit (what we have called “procedural”) decision-making became one of the great scientific debates of our time.72 In the 1930s and 1940s, two major laboratories were running rats on mazes, trying to determine how they found the goal. (Both of these labs were behaviorists in the sense that they wanted to be quantitative and to operationalize the process of decision-making in animals that they could study with appropriate controls, in response to the introspection of the Gestaltists.) On the East Coast, at Yale University, Clark Hull argued that rats learned to associate stimuli with responses, while on the West Coast, at Berkeley, Edward Tolman argued that rats were cognitive creatures that mentally planned routes to goals. As late as 1970, in a review of this still-ongoing debate, Endel Tulving and Stephen A. Madigan wrote that “Place-learning organisms, guided by cognitive maps in their heads, successfully negotiated obstacle courses to food at Berkeley, while their response-learning counterparts, propelled by habits and drives, performed similar feats at Yale.”

We’ve already seen how latent learning led Tolman to hypothesize the presence of a cognitive map. Tolman also watched his rats learning the maze, and noticed that as they made mistakes, there seemed to be a method to those mistakes—rats would make all left turns, then all right turns, then they would alternate left and right, as if they were actually testing hypotheses.73 The problem with these observations is that if the rats were just making random choices during learning, then sometimes rats making random choices would take all lefts. (If there are three left–right choices on the maze, then one rat in eight will go left on all three choices even if the rats were really making random choices.) Because our brain likes to find patterns, those rats making all lefts on the three left–right choices would stand out. Similarly, even a rat that knows the correct answer isn’t necessarily always going to take that choice. (Maybe there’s some noise in the outcome; maybe the animal is exploring to check its hypotheses.) Actually determining when the animal knows the answer is mathematically complicated.74

We now have the statistical methods to identify whether these sequences are random,75 but these complex statistics were not available to Tolman. Randy Gallistel, Stephen Fairhurst, and Peter Balsam used these new statistical analyses to explicitly examine how rats learn and found that (on certain tasks) individual rats didn’t learn gradually, but suddenly switched on a single trial from getting it mostly wrong to getting it mostly right.76 They argued that the gradual learning rate seen in many studies was an artifact of averaging across different rats, each of whom reached its “Aha!” moment at different times.K

Because it was so hard to analyze the variability in the mistakes made early in learning, Hull decided to analyze rats who had reached a regularity in their paths and were no longer making mistakes. Hull used the analogy of a switchboard operator creating a connection between two telephones to explain his ideas, while Tolman had only the analogy to human inspiration to attempt his explanation. While Hull was able to provide a mathematical explanation for his suggestions, Tolman was not. This made it possible for Hull (and the others arguing against cognitive processes, such as B. F. Skinner) to explain specifically when phenomena would occur and made it possible for them to be quantitative about it.78 Science is about prediction and replication. Tolman’s observations were hard to quantify. Inspiration hit rats at different times. So, while any individual rat would suddenly “get it” and start running the maze correctly, the average of all the rats showed a slowly developing curve. Similarly, even though each rat would individually show vicarious trial and error on some trials, the trials on which the vicarious trial and error behavior occurred would differ from rat to rat.79

Mathematics is a technology. Like any other technology, it allows us to see things we could not see before. (Compare the microscope, which allowed Robert Hooke to see cells, or the telescope, which allowed Galileo Galilei to see the moons of Jupiter.80) It is important to realize that Tolman was before Claude Shannon, so he was lacking the mathematics of information theory. He was before the first computer (developed during World War II), before John von Neumann and Alan Turing, and before Norbert Wiener’s breakthrough Cybernetics book (published in 1948), and thus had no access to the new conceptualization of intelligence as computation. And he was before Alan Newell and Herb Simon, so he was lacking the mathematics of algorithm.81 Tolman’s major work was in the 1930s and 1940s, summarized in his Psychological Review paper “Cognitive Maps in Rats and Men” published in 1948. Claude Shannon published his breakthrough paper on information theory in 1948. Alan Turing did his major work on computability in the late 1930s and through World War II in the early 1940s, but it did not have a major influence on psychology until the development of the field of artificial intelligence in the 1950s. Allen Newell and Herb Simon developed the first algorithmic concepts of search and intelligence with their publication of the General Problem Solver in 1959, the year Tolman died.

Information theory quantifies the ability of a signal to be pulled out of noise. Algorithm provides the concept that a sequence of computations can be linked together through control mechanisms to come to a conclusion that is hidden within data. One of the most important ideas, however, was that of computational complexity, which originated with Turing. We are now familiar with computers running long algorithms—we have all seen computers thrashing, the hard drive spinning, computing a complex calculation. Before the concepts of information theory, algorithm, and computation, the idea that results hidden within data could be found only after calculations that inherently take time to calculate was unappreciated.

We now know that both Tolman and Hull were right. When faced with complex decisions between similar choices or between changing choices, one needs a flexible system even though it may be computationally complex, but when faced with a regular decision in which the right answer does not change, animals develop response-chain habits (sequences of actions).82 In practice, early performance on mazes is often accomplished by Tolman-like deliberative search processes, while later performance is often accomplished by Hull-like stimulus–response habit processes.83 In fact, if we go back and look at their data, Tolman studied how the mazes were learned and concentrated his analysis on the early-learning time, while Hull studied “asymptotic performance” once behavior had settled down to a regular routine. Both systems (Tolman’s computationally expensive search process and Hull’s computationally simple stimulus–response process) are capable of solving tasks and making decisions. Each system, however, has advantages and disadvantages and is accomplished by a different set of interacting brain structures. In the next chapter, we’ll turn to what is known about that habit system.

Books and papers for further reading

• Randy L. Buckner and Daniel C. Carroll (2007). Self-projection and the brain. Trends in Cognitive Sciences, 11, 49–57.

• Adam Johnson and David A. Crowe (2009). Revisiting Tolman, his theories and cognitive maps. Cognitive Critique, 1, 43–72.

• Yael Niv, Daphna Joel, and Peter Dayan (2006). A normative perspective on motivation. Trends in Cognitive Sciences, 10, 375–381.

• John O’Keefe and Lynn Nadel (1978). The Hippocampus as a Cognitive Map. Oxford: Oxford University Press.