Peter M. Todd, Ralph Hertwig, and Ulrich Hoffrage
Traditional cognitive psychology, the study of the information-processing mechanisms underlying human thought and behavior, is problematic from an evolutionary viewpoint: Humans were not directly selected to process information, nor to store it, learn it, attend to it, represent it—nor even, in fact, to think. All of these capacities, the core topics of cognitive psychology, can be seen as by-products arising over the course of the evolution of solutions to the central challenges, survival and reproduction. Moreover, while the subgoals of those two main goals—finding food, maintaining body temperature, selecting a mate, negotiating status hierarchies, forming cooperative alliances, fending off predators and conspecific competitors, raising offspring, and so on—relied on gathering and processing information, meeting the challenges of each of these domains would have been possible only by in each case gathering specific pieces of information and processing it in specific ways. This suggests that to best study the faculties of memory, or attention, or reasoning, one should take a goal- and domain-specific approach that focuses on the use of each faculty for a particular evolved function, just the approach exemplified by the other chapters in this handbook.
Cognitive psychology's traditional approach, however, is domain general or domain agnostic, as if cognitive capacities arose in a void and orthogonal to any environment-specific selective pressures. Nonetheless, we believe that even while taking the traditional domain-agnostic approach to studying the mind, cognitive psychology can still benefit from as well as contribute to an evolutionary perspective on thinking and reasoning. This is because in addition to the selective pressures shaping domain-specific mechanisms, there are also a number of important selective forces operating across domains more widely, such as those arising from the costs of decision time and information search. Much as our separate physiological systems have all been shaped by a common force for energy-processing efficiency, individual psychological information-processing systems may all have been shaped by various common pressures for information-processing efficiencies. These broad pressures can in turn lead to common design features in many cognitive systems, such as decision mechanisms that make choices swiftly based on little information.
In this chapter, we show how a set of broad forces operating on multiple domains can impact the design of specific cognitive systems. In particular, we first discuss how the costs of gathering information, and of using too much information, can be reduced by decision mechanisms that rely on very limited information—or even a lack of information—to come to their choices. Next, we explore how the pressures to use small amounts of appropriate information may have produced particular patterns of forgetting in long-term memory and particular limits of capacity in short-term memory. Finally, we show how selection for being able to think about past sets of events can help explain why different representations of the same information, for instance samples versus probabilities, can produce widely varying responses. Throughout, we focus on three topics of central interest to cognitive psychologists—decision making, memory, and representations of information. But at the same time, we also lay out three main theses that will be less familiar to those taking a traditional view of cognition as computation unfettered by external, environmental considerations: First, simple decision mechanisms can work well by fitting environmental structures; second, limited memory systems can have adaptive benefits; and third, experience-based representations of information can enhance decision making. Thus, while we ignore many of the topics typically covered in cognitive psychology, we aim to sketch out some existing questions that we think an evolution-savvy cognitive psychology should explore. (For other views of evolutionary cognitive psychology and consideration of further issues such as individual differences, see Kenrick et al., 2009; Kenrick, Sadalla, & Keefe, 1998.)
We begin by considering decision mechanisms, which process perceived and stored information into choices leading to action. Because decision processes stand close to behavioral actions, they are also close to the particular functionally organized selective forces operating on behavior. Thus, decision mechanisms may have been strongly affected by individual selective forces to become domain specific, in contrast to more general-purpose perceptual systems. Nonetheless, there are also broad selection pressures operating across domains that, we propose, have shaped a wide range of decision mechanisms in similar directions. Foremost is selection for making an appropriate decision in the given domain. This does not mean making the best possible decision, but rather one that is good enough (a “satisficing” choice, as Herbert Simon, 1955, put it), and on average better than those of one's competitors, given the costs and benefits involved. Good-enough decisions depend on information, and the specific requirements of the functional problem along with the specific structure of the relevant environment will determine what information is most useful (e.g., valid for making adaptive choices) and most readily obtained.
But gathering information also has costs and is subject to selection pressures (Todd, 2001), which cognitive psychologists studying the adaptive nature of inference should attend to. First, there is the cost of obtaining the information itself, in time or energy that could be better spent on other activities. Such costs can arise in both external information search in the environment and internal search in memory (Bröder, 2012). Second, there is the cost of actually making worse decisions if too much information is taken into consideration. Because nobody ever faces exactly the same situation twice, decision makers must generalize from past experience to new situations. Yet, as a consequence of the uncertain nature of the world, some of the features of earlier situations will be noise, irrelevant to the new decision. Thus, by considering too much information, one is likely to add noise to the decision process, and to overfit when generalizing to new circumstances—that is, to make worse decisions than if less information had been considered (Gigerenzer & Brighton, 2009; Martignon & Hoffrage, 2002).
Thus, there seem to be two selective pressures shaping decision making in opposite directions: the need to make good choices and the need to use little information. But this apparent accuracy/effort tradeoff can be sidestepped: Many environments are structured such that little information suffices for making good-enough choices, and decision mechanisms that operate in a “fast and frugal” manner can outperform those that seek to process all available information (Gigerenzer, Todd, & the ABC Research Group, 1999; Payne, Bettman, & Johnson, 1993). When these simple heuristics are used in particular environments with a stable information structure that they can exploit, they lead to what has been termed “ecological rationality,” emphasizing the important match between mental and social and physical environmental structures in a way that fits closely with an evolutionary perspective (Hertwig, Hoffrage, & the ABC Research Group, 2013; Todd & Gigerenzer, 2007; Todd, Gigerenzer, & the ABC Research Group, 2012). We now briefly survey some of the types of decision heuristics in the mind's “adaptive toolbox” (Todd, 2000) that flourish at the intersection of these selective forces.
Minimal information use can come about by basing decisions on a lack of knowledge, capitalizing on one's own ignorance as a reflection of the structure of the environment. If there is a choice between multiple alternatives along some criterion, such as which of a set of fruits is good to eat, and if only one of the alternatives is recognized and the others are unknown, then an individual can employ the “recognition heuristic”: Choose the recognized option over the unrecognized ones (D. G. Goldstein & Gigerenzer, 1999, 2002). Following this simple heuristic will be adaptive and ecologically rational, yielding good choices more often than would random choice in particular types of environments—specifically, in those where exposure to options is positively correlated with their ranking along the decision criterion being used. Thus, in our food choice example, the recognition heuristic will be beneficial because those things that we do not recognize in our environment are often inedible; humans have done a reasonable job of discovering and incorporating edible fruits into our diet. (See Pachur, Todd, Gigerenzer, Schooler, & Goldstein, 2012, for analysis of environments in which recognition will lead to adaptive decisions.)
When the options to be selected among are all known, the recognition heuristic can no longer be applied, and further cues must be consulted. The traditional approach to rational decision making stipulates that all of the available information should be collected, weighted properly, and combined before choosing. A more frugal approach is to use a stopping rule that terminates the search for information as soon as enough has been gathered to make a decision. In the most parsimonious version, “one-reason decision making” heuristics (Gigerenzer & Goldstein, 1996, 1999) stop looking for cues as soon as the first one is found that differentiates between the options being considered. Among the many possible one-reason decision heuristics, take-the-best searches for cues in the order of their ecological validity (proportion of correct decisions). Take-the-last looks for cues in the order determined by their past decisiveness, so that the cue that was used for the most recent previous decision is checked first during the next decision. The minimalist heuristic lacks both memory and knowledge of cue validities and simply selects randomly among those cues currently available.
Heuristics employing this type of one-reason decision making can successfully meet the selective demands of accuracy and little information use simultaneously in appropriately matched environments. For instance, take-the-best is ecologically rational in environments comprising cues that have a noncompensatory, or roughly exponentially decreasing, distribution of the importance of cues. By letting the world do some of the work, these heuristics can be simpler and more robust (resistant to overfitting). A similar analysis within the world of linear models was undertaken by Dawes and Corrigan (1974), who pointed out that simplicity and robustness can be two sides of the same coin: Simply ignoring much of the available information means ignoring much irrelevant information, which can consequently increase the robustness of decisions when generalizing to new situations (see also Gigerenzer & Brighton, 2009, for a theoretical account of how cognitive systems can achieve robustness through appropriate simplifying “biases”).1
Moreover, people appear to learn to apply these fast and frugal heuristics that use minimal information in environments that have the appropriate cue structure (Rieskamp & Otto, 2006), and where information is costly or time-consuming to acquire (Bröder, 2012; Newell & Shanks, 2003; Rieskamp & Hoffrage, 1999). Socially and culturally influenced decision making can also be based on a single reason through imitation (e.g., in food choice or mate choice copying), norm following, and employing protected values (e.g., moral codes that admit no compromise, such as never taking an action that results in human death—see Tanner & Medin, 2004). And when a single cue does not suffice to determine a unique choice, people still often strive to use little information, for instance via an elimination heuristic (Tversky, 1972) that uses as few cues as needed to eliminate all but one option from consideration (again in food choice, mate choice, or habitat choice).
When choice options are not available simultaneously, but rather appear sequentially over an extended period or spatial region, different types of decision mechanisms are needed. In cases where a single option is to be chosen, there must be a stopping rule for ending the search for alternatives themselves. For instance, long-term mate choice requires making a selection from a stream of potential candidates met at different points in time, based on some amount of information gathered about each one (Saad, Eba, & Sejean, 2009). Classic economic search theory suggests that one should look for a new mate (or anything) until the costs of further search outweigh the benefits that could be gained by leaving the current candidate. But in practice, performing a rational cost-benefit analysis is typically difficult and expensive in terms of the information needed (as well as making a bad impression on a would-be partner). Instead, a “satisficing” heuristic, as conceived by Simon (1955, 1990), can be adaptive: Set an aspiration level for the selection criterion being used, and search for alternatives until one is found that exceeds that level. In mutual mate choice, for example, aspiration levels can be set by upward adjustment after successful interactions on the mating market and downward adjustment after failures (Beckage, Todd, Penke, & Asendorpf, 2009; G. F. Miller & Todd, 1998; Todd & Miller, 1999).
In other settings, the individual aims to gain benefits from a succession of chosen options and must decide how long to spend exploiting each option before leaving and exploring for a new option. The best-known instance of this kind of exploitation/exploration tradeoff is foraging for food, deciding when to leave a resource patch that has been depleted. Here, simple patch-leaving heuristics can trigger renewed exploration when the time since the last resource item found in the current patch grows too long (e.g., Wilke, Hutchinson, Todd, & Czienskowski, 2009). But these search mechanisms may also have been exapted from their food-domain origins for use in other domains, including the search for information (Hills, 2006; Todd, Hills, & Robbins, 2012). Thus, people appear to employ patch-leaving rules that achieve near-optimal performance both when searching for information among patches of web pages online (Pirolli, 2007) and when searching for concepts in memory (Hills, Jones, & Todd, 2012), in ways that are similar to searches for resources distributed spatially.
The heuristics described above, by ignoring much of the available information and processing what they do consider in simple ways, typically do not meet the standards of classical rationality, such as full information use and complete combination of probabilities and utilities. Furthermore, heuristics may produce outcomes that do not always follow rules of logical consistency. For instance, take-the-best and the priority heuristic can systematically produce intransitivities among sets of three or more choices (Brandstätter, Gigerenzer & Hertwig, 2006; Gigerenzer & Goldstein, 1996). However, when used in appropriately structured environments, whether ancestral or current, these mechanisms can be ecologically rational, meeting the selective demands of making adaptive choices (on average) with limited information and time.
Furthermore, different environment structures can be exploited by—and hence call for—different heuristics. But matching heuristics to environment structure does not mean that every new environment or problem demands a new heuristic: The simplicity of these mechanisms implies that they can often be used in multiple, similarly structured domains with just a change in the information they employ. Thus, an evolution-oriented cognitive psychologist should explore both the range of (possibly domain-general) simple decision mechanisms appropriate to a particular adaptive problem, and the domain-specific cues in the environment that will allow those mechanisms to solve that problem effectively.
The information that decisions are based on can be accessed immediately from the external environment, or from past experience stored internally in some form of memory. Beginning with Ebbinghaus (1885/1964), cognitive psychologists usually focus on three aspects of human memory—its capacity, its accuracy, and its structure (e.g., Koriat, Goldsmith, & Pansky, 2000; Tulving & Craik, 2000)—but pay little attention to how it has been shaped by selective pressures, those costs and benefits arising through its use for particular functions in particular environments. Recently, however, researchers have begun to investigate the relationship between the design of memory systems and how they meet their adaptive functions. In this section, we describe some of the trends toward putting evolutionary thinking into the study of memory.
Memory has “evolved to supply useful, timely information to the organism's decision-making systems” (Klein, Cosmides, Tooby, & Chance, 2002, p. 306). The evolution of memory to serve this function has occurred in the context of a variety of costs, which also shape the design of particular memory systems. Dukas (1999) articulated a wide range of costs of memory, including (a) maintaining an item once it has been added to long-term memory, (b) keeping it in an adaptable form that enables future updating, (c) growing and feeding the brain tissue needed to store the information, and (d) silencing irrelevant information. But taking into consideration the demands of decision mechanisms outlined earlier, the two main selective pressures acting on memory systems (particularly long-term memory) appear to be, first, to produce quickly the most useful stored information, and second, not to produce too much information.
These pressures, like the ones we focused on for decision mechanisms, are broad and general—applying to memory systems no matter what domains they deal with. One way to meet these pressures would be to store in the first place just that information that will be useful later. Having limited memory capacity can work to restrict initial storage in this way, as we will see later with regard to short-term memory. In the case of long-term memory, Landauer (1986) estimated that a mature person has “a functional learned memory content of around a billion bits” (p. 491). This is much less than the data storage capacity of a single hour-long music CD, suggesting that we are indeed storing very little of the raw flow of information that we experience. On the other hand, most of what little we do remember is nonetheless irrelevant to any given decision, so our memory systems must still be designed to retrieve what is appropriate, and not more. How can this be achieved? One way is through the very process that at first glance seems like a failure of the operation of memory: forgetting.
Anderson (1990) put forward an approach he called the rational analysis of behavior as a method for understanding psychological mechanisms in terms of their functions or goals—equivalent to Marr's (1982) computational level of analysis, and also the level at which evolutionary psychology should be focused (Cosmides & Tooby, 1987). Having in mind a view of evolution as constrained local optimization (or hill climbing), Anderson set out to assess the explanatory power of the principle that “the cognitive system operates at all times to optimize the adaptation of the behavior of the organism” (1990, p. 28). Anderson and Milson (1989) took this approach to propose that memory should be viewed as an optimizing information retrieval system with a database of stored items from which a subset is returned in response to a query (such as a list of key terms). A system of this sort can make two kinds of errors: It can fail to retrieve the desired piece of information (e.g., failing to recall the location of one's car), thus not meeting the pressure of usefulness. But if the system tried to minimize such errors by simply retrieving everything, it would commit the opposite error: producing irrelevant pieces of information (and thus not meeting the pressure of parsimony), with the concomitant cost of further examining and rejecting what is not useful. To balance these two errors, Anderson and Milson propose, the memory system can use statistics extracted from past experience to predict which memories are likely to be needed soon, and keep those readily retrievable. Consequently, memory performance should reflect the patterns with which environmental stimuli have appeared and will reappear in the environment.
This argument can be illustrated with the famous forgetting curve, first described by Ebbinghaus (1885/1964): Memory performance declines (forgetting increases) with time (or intervening events) rapidly at first and then more slowly as time goes on, characterizable as a power function (Wixted, 1990; Wixted & Ebbesen, 1991, 1997). Combining this prevalent forgetting function with Anderson's rational analysis framework yields the following prediction: To the extent that memory has evolved in response to environmental regularities, the fact that memory performance falls as a function of retention interval implies that the probability of encountering a particular environmental stimulus (e.g., a word) also declines as a power function of how long it has been since it was last encountered. Anderson and Schooler (1991, 2000) analyzed real-world data sets to find out whether the environmental regularities match those observed in human memory. One of their data sets, for example, consisted of words in the headlines of the New York Times for a 730-day period, and they assumed that reading a word (e.g., “Qaddafi”) represents a query to the human memory database with the goal of retrieving its meaning.
At any point in time, memories vary in how likely they are to be needed. According to the rational analysis framework, the memory system attempts to optimize the information-retrieval process by making available those memories that are most likely to be useful. How does it do that? It does so by extrapolating from the past history of use to the probability that a memory is currently be needed—the need probability of a particular memory trace. Specifically, Anderson (1990) suggested that memories are considered in order of their need probabilities, and if the need probability of a memory record falls below a certain threshold, it will not be retrieved. Consistent with their view that environmental regularities are reflected in human memory, Anderson and Schooler (1991) found that the probability of a word occurring in a headline of the New York Times at any given time is a function of its past frequency and recency of occurrence. In other words, the demand for a particular piece of information to be retrieved drops the less frequently it occurred in the past and the greater the period of time that has passed since its last use. This regularity parallels the general form of forgetting that has so often been observed since the days of Ebbinghaus. From this parallel, Anderson and Schooler concluded that human memory is a highly functional system insofar as it systematically renders pieces of information less accessible when they have not been used for a while. This functionality operates across domains as a response to broad selection pressures for maintaining quick access to information likely to be useful in upcoming situations (and conversely not maintaining access to information less likely to be needed).
William James, in the Principles of Psychology (1890), argued that “in the practical use of our intellect, forgetting is as important a function as recollecting” (p. 679). Contemporary psychologists have begun to specify some of the following particular adaptive functions of forgetting.
Bjork and Bjork (1996) argued that forgetting is critical to prevent out-of-date information—say, old passwords or where we parked the car yesterday—from interfering with the recall of currently needed information. In their view, the mechanism that erases out-of-date information is retrieval inhibition: Information that is rendered irrelevant becomes less retrievable (see also Schacter, 2001).
Forgetting may also boost the performance of decision heuristics that exploit partial ignorance, such as the recognition heuristic described earlier. Ignorance can come from not learning about portions of the environment in the first place, or from later forgetting about some earlier encounters. To examine whether human recognition memory forgets at an appropriate rate to promote the use of the recognition heuristic and its close relative, the fluency heuristic (Hertwig, Herzog, Schooler, & Reimer, 2008), Schooler and Hertwig (2005) implemented these heuristics within an existing cognitive architecture framework, ACT-R (Anderson & Lebiere, 1998), built on the rational analysis of memory mentioned earlier; specifically, ACT-R learns by strengthening memory records associated with, for instance, the names of foodstuffs, habitats, or people based on the frequency and recency with which they were encountered in the environment. In Schooler and Hertwig's simulations, both heuristics benefited from (a medium amount of) forgetting, suggesting that another beneficial consequence of forgetting is to foster the performance of heuristics that exploit (partial) ignorance.
Could forgetting parts of one's autobiography—in particular, traumatic experiences—also be adaptive? Betrayal trauma theory (Freyd, 1996; Freyd & Birrell, 2013) suggests that the function of amnesia for childhood abuse is to protect the child from the knowledge that a key caregiver may be the sexual perpetrator. In situations involving treacherous acts by a person depended on for survival, a “cognitive information blockage” (Sivers, Schooler, & Freyd, 2002, p. 177) may occur that results in an isolation of knowledge of the event from awareness. Betrayal trauma theory yields specific predictions about the factors that will make this type of forgetting most probable—for instance, it predicts that amnesia will be more likely the more dependent the victim is on the perpetrator (e.g., parental vs. nonparental abuse). While controversial (see DePrince & Freyd, 2004; McNally, Clancy, & Schacter, 2001, and Sivers et al., 2002), the theory illustrates how domain-specific forgetting may have unique adaptive functions.
Another key component of memory posited within traditional cognitive architectures is short-term memory (Atkinson & Shiffrin, 1968). This temporary memory store appears quite limited: The classic estimate of its capacity is seven plus or minus two chunks of information (G. A. Miller, 1956), and more recent estimates make it even smaller (Cowan, 2001). Given the traditional view that more information is better, many cognitive psychologists have asked, why is short-term memory so small?
Perhaps the best-studied evolutionarily informed answer to this question denies the premise that bigger is better. Kareev (1995a, 1995b, 2000; Kareev, Lieberman, & Lev, 1997) argued that limited memory capacity can enhance adaptively important inferences of causality by fostering the early detection of covariation between two variables in the environment (e.g., do these tracks mean a predator is nearby?). To the extent that the degree of covariation is derived from the information in one's working (short-term) memory, there will be an upper bound on the size of the information sample that can be considered at one time. Taking Miller's estimate as a starting point, Kareev suggested that using samples of around seven observations of the co-occurrence (or lack thereof) of two events increases the chances for detecting a correlation between them, compared to using a greater number of observations (and assuming that the population correlation is not zero). Specifically, looking at small randomly drawn data samples increases the likelihood of encountering a sample that indicates a stronger correlation than that of the whole population (the reason for this lies in the skewedness of the sampling distribution of correlation coefficients, based on small samples of observations). Thus, a limited working memory can function as an amplifier of correlations, allowing those present in the population to be detected swiftly. This enhanced ability to detect contingencies seems particularly important in domains in which the benefits of discovering a causal connection outweigh the costs of false alarms, which also increase in number with smaller sample sizes (a point highlighted by Juslin & Olsson, 2005—but see Fiedler & Kareev, 2006, and Kareev, 2005, for further considerations). Such domains may be characterized by situations in which missing potential threats would be extremely costly (cf. Haselton & Nettle, 2006).
Of course, overreliance on small samples will exact a price in terms of systematic misperceptions of the world—but the important thing to ask from an evolutionary cognitive psychology perspective is how large that price is compared to the potential benefits accruing to their use. Kareev's analysis can be taken as a challenge to the premise that the more veridical the mental representations of the world, the better adapted the organism; instead, these results support the idea that systematically inaccurate mental models of the world (models with a “bias”—Gigerenzer & Brighton, 2009) can confer functional benefits to organisms whose aim is not to explain the world but to survive and reproduce in it. Other proposals for a functional benefit of limited short-term memory include Hertwig and Pleskac's (2010) related demonstration that small samples amplify the difference between the expected earnings associated with the payoff distributions (e.g., food patches), thus making the options more distinct and facilitating choice, along with MacGregor's (1987) theoretical argument that memory limitations can speed up information retrieval. These and other combinations of a functionalist view with a cost-benefit analysis of particular memory mechanisms, as often employed in evolutionary cognitive ecology (Dukas, 1998), can move us closer to a thorough understanding of the workings of human memory.
In the previous section, we discussed memory from an evolutionary point of view. But why do we have memory at all? Why should we be able to recall representations of the past? After all, changes in behavior could arise through learning even without the ability to remember independently any aspects of the events that we learned from. Being able to store and retrieve information about what happened in the past, however, lets us process that information further in the light of new information and experience. It also allows us to communicate the information to others (as well as to ourselves at later points in time) and combine it with information from them in turn. Ultimately, recalled information from the past enables us to form expectations about the future which can guide behavior in the present.2
Internal memories, our focus in the previous section, are not the only innovation over the course of evolution for representing past events. Paintings of animals in Pleistocene caves, for instance, demonstrate one step in the development of representations that have been used to externalize internal states—here, memories of what the early artists had previously experienced outside the cave. During the evolution of culture, such external representations were complemented by symbols that became standardized and gradually reached greater and greater levels of abstraction (such as alphabets and number systems—Schmandt-Besserat, 1996). As a consequence, the sources of information that could be used as a basis for judgments and decisions have increased over the course of human evolution, from individual experiences (a source that we share with even the lowest animals), through reports from family or group members (a source that social animals have, and that humans have in greatly developed form, including across generations), to modern statistics (a source that has been added only very recently during our cultural evolution). Does it make a difference, in terms of individual decision making, what form the information takes as a consequence of its source? Adopting an evolutionary point of view, one would hypothesize that the answer is “yes,” because our cognitive systems have been exposed to different forms and sources of information for different amounts of time. In particular, forms that have been created during our most recent cultural development may pose a bigger challenge to our information-processing capacities than those to which the human species had much more time to adapt, as the next two examples demonstrate.
Much of decision making can be understood as an act of weighing the costs against the benefits of the uncertain consequences of our choices. Take the decision of whether to engage in short-term mating: Although casual sex has obvious evolutionary benefits (e.g., Trivers, 1972), it can cause one to contract a sexually transmitted disease or suffer violence at the hands of a jealous partner (Buss, 2004). Each of these consequences is uncertain, and choosing to have casual sex is thus like rolling a die, each side of which represents one or more possible consequences of that choice.
The metaphor of life as a gamble (see W. M. Goldstein & Weber, 1997) has exerted a powerful influence on research on behavioral decision making, giving rise, for example, to the ubiquitous use of monetary lotteries in laboratory experiments. Studies that employ such lotteries typically provide respondents with a symbolic—usually written—description of the options, for example:
A: Get $4 with probability .8, or B: Get $3 for sure.
$0 with probability .2
The most prominent descriptive theory of how people decide between such lotteries is prospect theory (Kahneman & Tversky, 1979; Tversky & Kahneman, 1992). One of its central assumptions is this: Relative to the stated probabilities with which an outcome can be expected to occur (e.g., .8 and .2 in option A above), people make choices as if small-probability events receive more weight than they deserve and as if large-probability events receive less weight than they deserve. This assumption can explain why, for instance, most people are inclined to choose lottery B over A above, though A has the higher expected value: The rare outcome in A, receiving $0, receives more weight than it deserves, reducing the perceived value of A.
But are choices between options like A and B representative of the gambles that life presents us? Hertwig, Barron, Weber, and Erev (2004) argue that we rarely have complete knowledge of the possible outcomes of our actions and their probabilities. When deciding whether to have a one-night stand, for instance, we do not make a decision from description, consulting a written list of the possible consequences and their stated likelihoods. Instead, we rely on the experience that we (or others) have accumulated over time. Hertwig and colleagues referred to this kind of choice as a decision from experience. (Note that because animals do not share humans' ability to process symbolic representations of dicey prospects, all their decisions are decisions from experience—see also Weber, Shafir, & Blais, 2004.)
Do people behave differently when they learn about outcomes and probabilities from written descriptions as opposed to experience? To find out, Hertwig et al. (2004) created an experimental paradigm in which decision makers started out ignorant of the outcomes and the outcome probabilities associated with pairs of lotteries. Respondents saw two buttons on a computer screen and were told that each button was associated with a payoff distribution (for instance, option A vs. B). When they clicked on a button, an outcome was randomly sampled from its distribution (e.g., $3 if they chose B above, or $0 on 20% of clicks and $4 on 80% of clicks if they chose A). Respondents could sample from either distribution as many times as they wished. After they stopped sampling, they were asked which lottery they wanted to play for real payoffs.
Comparing choices made in this experience-based paradigm with choices made in the usual, structurally identical description paradigm revealed dramatic differences (Hertwig et al., 2004): Across six problems, the average absolute difference between the percentage of respondents choosing the option with the higher expected value (e.g., A above) in the experience and description groups was 36 percentage points. Moreover, in every problem, this difference was consistent with the assumption that rare events (e.g., $0 in A) had more impact than they deserved (given their objective probability) in decisions from description—consistent with prospect theory—but had less impact than they deserved in decisions from experience.
Since its original demonstration, this description-experience gap has been shown to be robust across numerous investigations and experimental paradigms (Hertwig, in press; Hertwig & Erev, 2009). A number of factors have been identified as contributing to the description-experience gap, including reliance on small samples (Hertwig et al., 2004), recency (Hertwig et al., 2004), the search policy people apply to explore the payoff distributions (Hills & Hertwig, 2010), their aspiration levels (short-term vs. long-term maximization; Wulff, Hills, & Hertwig, 2014), and the cognitive processes used to gauge the value of payoff distributions based on the stated or experienced outcome and probability information (Gonzalez & Dutt, 2011).
The implication of the robust description-experience gap is that representations that are identical mathematically can be different psychologically—because they differ in form. Furthermore, the two types of information also differ in the length of evolutionary time that they have exerted a pressure on our cognitive abilities to understand and process them appropriately. Throughout the course of human evolution, we have experienced events in our interactions with the environment, but only very recently have we begun to aggregate such information and communicate it in the form of statistical descriptions.3 Thus, one might speculate that our cognitive strategies for making decisions under risk and uncertainty are more likely tuned to experienced frequencies than to described probabilities. This assertion is also supported by research done in the domain of Bayesian reasoning.
How should a Pleistocene hunter update his belief regarding the chance of finding prey at a particular location after he has seen some unusual movements in the grass there? Humans have been facing the task of updating beliefs for a long time, and there should have been sufficient selective pressure to produce a mechanism able to perform such inferences. At first glance, however, empirical results have been inconclusive: Whereas research by Gallistel (1990) and Real (1991) suggests that other animals can be adept at such Bayesian inferences (updating of beliefs in light of new evidence), humans often seem to lack this capability: “In his evaluation of evidence, man is apparently not a conservative Bayesian: he is not a Bayesian at all” (Kahneman & Tversky, 1972, p. 450). Are animals really better at making Bayesian inferences than humans?
As in the previous section, the answer lies in the different ways that information can be represented. Animals encounter the statistical information about environmental features on a trial-by-trial basis, that is, by sequentially experiencing cases. Experiments with human participants in which cases are sequentially presented have shown that people are well able to estimate the probability of observing the criterion given the presence of the predictor (Christensen-Szalanski & Beach, 1982).
In contrast, those studies leading to the conclusion that people are not able to reason in a proper Bayesian fashion have presented participants with descriptions given in terms of probabilities. For example, Eddy (1982, p. 253) presented 100 physicians with the following information: The probability of breast cancer is 1% for a woman at age 40 who participates in routine screening. If a woman has breast cancer, the probability is 80% that she will have a positive mammography. If a woman does not have breast cancer, the probability is 9.6% that she will also have a positive mammography.
The physicians were then asked to imagine a woman in this age group who had a positive mammography in a routine screening, and to state the probability that she actually has breast cancer. Out of those 100 physicians, 95 judged this probability to be about 75%, whereas the Bayesian solution, which is usually seen as the normatively correct answer, is actually about 8%.
By considering what kinds of representations our minds evolved to deal with, Gigerenzer and Hoffrage (1995) created an effective compromise between sequential acquisition of information and descriptions in terms of probabilities: They presented participants with descriptions in which the probabilities were translated into natural frequencies. Natural frequencies result from natural sampling (Kleiter, 1994) in which cases are randomly drawn from a specified reference class. Eddy's task, with probability information converted into natural frequencies, reads as follows: Out of 10,000 women, 100 have breast cancer. Out of those 100 women with breast cancer, 80 have a positive mammogram. Out of the remaining 9,900 women without breast cancer, 950 nonetheless have a positive mammogram.
Asking for the probability that a woman has breast cancer given a positive mammogram now becomes “How many of those women with a positive mammogram have breast cancer?”—and now the answer is much easier: 80 out of 1,030.
Across 15 tasks like this, when participants were presented with the information as probabilities, they reasoned the Bayesian way only 16% of the time, but when the information was presented as natural frequencies, this percentage rose to 46% (Gigerenzer & Hoffrage, 1995). Similar results were obtained with physicians (Hoffrage & Gigerenzer, 1998), medical students (Hoffrage, Lindsey, Hertwig, & Gigerenzer, 2000), and lawyers (Lindsey, Hertwig, & Gigerenzer, 2003).
Gigerenzer and Hoffrage (1995) proposed two explanations to account for the facilitating effect of natural frequencies: computational simplification and evolutionary preparedness for (natural) frequencies. Further studies (e.g., Brase, 2002) showed that computational simplification alone cannot account for the increased performance of people using natural frequencies. The overall conclusion of this research is that reasoning performance increases substantially when information is presented in terms of the natural frequencies that correspond to the way organisms have acquired information through much of evolutionary history—that is, by naturally sampling (and tallying) events observed in the natural environment.
Cognitive psychologists have long studied the limitations of human thought, and with good reason. Despite Hamlet's exhortation that we humans are “noble in reason… infinite in faculty” (Act 2, Scene 2), we struggle to keep more than a half dozen things in mind at once, we quickly forget what we have learned, we ignore much of the available information when making decisions, and we find it difficult to process deeply what information we do consider. But in focusing on the negative implications of these limitations, cognitive psychology may have grabbed the wrong end of the stick. The limited human mind is not just the compromised result of running up against constraints that can little be budged, such as the current birth-canal-limited size of the skull; rather, it is a carefully orchestrated set of systems in which limits can actually be beneficial enablers of functions, not merely constraints (Cosmides & Tooby, 1987). A less limited mind might fare worse in dealing with the adaptive problems posed by the structured environment. As Guildenstern later responded to Hamlet, presciently summing up modern psychology's computationally intensive theories of cognition, “There has been much throwing about of brains.” In many cases, throwing less brains at a task might do the trick. More is by no means always better (and indeed, recent pharmaceutical attempts to enhance properties of the cognitive system may exact enormous detrimental side effects, including by compromising the beneficial effects of limits—see Hertwig & Hills, in press; Hills & Hertwig, 2011).
Considering the widespread selective pressures and attendant costs and benefits that have acted over the course of evolution on our cognitive mechanisms can help us to uncover these surprising instances when limitations are beneficial (and help us understand the design and functioning of those mechanisms even when their limits are constraining). As we have seen in this chapter, limited information use can lead simple heuristics to make more robust generalizations in new environments. Forgetting in long-term memory can improve the performance of recall, and can protect individuals from harmful reactions at vulnerable periods in their lives. And limited short-term memory can amplify the presence of important correlations in the world. (See Hertwig & Todd, 2003, for more on how cognitive limits can even enable functions that may not be possible otherwise.)
These potential benefits of cognitive limitations compose one of the main themes we believe should be addressed within an evolution-inspired cognitive psychology. We have portrayed the importance of considering how general selective pressures—those arising in multiple task domains—can shape adaptive cognitive mechanisms, in addition to the shaping forces of domain-specific task requirements and environment structure (as covered in other chapters in this handbook). But much of the picture remains to be sketched in. Here are few of the important questions open for further exploration: How does the mind's adaptive toolbox of cognitive mechanisms get filled—that is, what are the processes through which heuristics and other strategies evolve, develop, are learned individually, or are acquired from others (Hertwig et al., 2013)? How do people select particular cognitive strategies in particular situations or environments? What role do noncognitive and social factors—for instance, social emotions such as shame, guilt, and empathy as well as social norms—play in heuristics? What selective pressures have shaped other limited cognitive capacities we have not touched upon, such as attention, categorization, and planning (e.g., Hullinger, Kruschke, & Todd, 2014)? What selective pressures (if any) have shaped how cognitive aging affects our cognitive strategies and processes? How does the use of particular cognitive strategies actually shape the environment itself (e.g., Hutchinson, Fanselow, & Todd, 2012; Todd & Kirby, 2001)? And what methods are most appropriate for studying the action of selective forces on cognitive adaptations?
Taking an evolutionary perspective can help introduce new ideas and hypotheses into cognitive psychology. But the benefits of bringing the cognitive and evolutionary approaches to psychology together do not flow solely from the latter to the former. Cognitive psychology is also a salutary approach for evolutionary psychologists to engage with: It points to the importance of information, hence of the environment that it reflects, and the structure of the environment must be a central aspect of any evolutionary explanation of behavior. The field's experimental methodology is an important component of supporting and revising evolutionarily inspired hypotheses regarding human cognition and action. Finally, cognitive psychology also reminds us of the crucial role that processing information with specific algorithmic mechanisms plays in the generation of adaptive behavior. This step—cognition—is often the “missing link” in nonpsychological approaches to investigating the evolution of behavior (Cosmides & Tooby, 1987), and is still too often missing within evolutionary psychology studies, as in those that merely assert correlations between environmental cues and behavioral outcomes. By cross-fertilizing these two traditions, evolutionary and cognitive, a more vigorous hybrid psychology will arise, espousing the rigorous analysis of the functional aspects of human cognition.