5
The internalization of socially rooted and historically developed activities is the distinguishing feature of human psychology.
—LEV VYGOTSKY, MIND IN SOCIETY
Human cognition and thinking are much more complex than the cognition and thinking of other primates. Human social interaction and organization are much more complex than the social interaction and organization of other primates as well. It is highly unlikely, we would argue, that this is a coincidence.
Complex human cognition is of course responsible for complex human societies in the sense that human societies would fall apart if human-like cognition were not available to support them. But this cognition-to-society causal link is not a plausible direction for an account of evolutionary origins. For that direction of effect, there would need to be some other behavioral domain in which powerful cognitive skills were selected, and then those skills were somehow extended to solving social problems. But it is not clear what other behavioral domain that might be, given that we are trying to explain the many particularities of cognitive skills supporting humans’ unique forms of collaboration and communication, including in the end such things as cultural conventions, norms, and institutions. It seems highly unlikely that cognitive skills adapted for, say, individual tool use or the tracking of prey could be exapted in this way for such complex cooperative enterprises.
And so, in the current view, the most plausible evolutionary scenario is that new ecological pressures (e.g., the disappearance of individually obtainable foods and then increased population sizes and competition from other groups) acted directly on human social interaction and organization, leading to the evolution of more cooperative human lifeways (e.g., collaboration for foraging and then cultural organization for group coordination and defense). Coordinating these newly collaborative and cultural lifeways communicatively required new skills and motivations for co-operating with others, first via joint intentionality, and then via collective intentionality. Thinking for co-operating. This, in broadest possible outline, is the shared intentionality hypothesis.
But our evolutionary story has taken many more detailed twists and turns as we have attempted to account, in detail, for the many different aspects of uniquely human thinking as they relate, in detail, to the many different aspects of uniquely human collaboration and communication. Because there are no other contemporary evolutionary stories with exactly this focus, we have thus far made scant reference to other theories. But there are a number of other contemporary accounts of the evolution of uniquely human cognition and/or uniquely human sociality in general, and a broad survey of these will help to better situate the shared intentionality hypothesis within the current theoretical landscape.
When asked what makes human cognition and thinking unique, a kind of default answer for many cognitive scientists would be something like “general intelligence.” Since humans have evolved very large brains (roughly three times larger than those of other great apes), and since larger brains have more computing power, the idea is that humans are able to engage in all kinds of cognitive processing, including thinking, in bigger, better, and faster ways. But even if this description is in some sense true, the question remains of how it came about evolutionarily. It is implausible in the extreme to just say that being smart is more adaptive than being dumb and so humans became smarter. This is a just-so story of the most egregious kind. Being able to fly and walk would be better than being able to walk only, so why did humans not come to fly as well? The point is that a plausible evolutionary account must be built on an adaptive scenario involving a specific set of circumstances in which a specific set of cognitive skills provided a specific set of advantages to the individuals who possessed them.
In the case of general intelligence, if this is a useful construct at all, recent data suggest that the more specific story would almost certainly be some variant of our social account. Thus, Herrmann et al. (2007, 2010) administered a comprehensive battery of cognitive tests—assessing skills for dealing both with the physical world and with the social world—to large numbers of two of human’s closest primate relatives, chimpanzees and orangutans, and to 2.5-year-old human children. If the difference between human and ape cognition were based on general intelligence, then the children in this study should have differed from the apes uniformly across all the different tasks. But this was not the case. The finding was that the children and apes had very similar cognitive skills for dealing with the physical world, but the children—old enough to use some language but still years away from reading, counting, or going to school—already had more sophisticated cognitive skills than either ape species for dealing with the social world. The hypothesis was thus that human adults are cleverer than other apes at almost everything not because they possess an adaptation for greater general intelligence but, rather, because they grew up as children using their special skills of social cognition to cooperate, communicate, and socially learn all kinds of new things from others in their culture, including the use of all of their various artifacts and symbols (Herrmann and Tomasello, 2012).
A similar but different argument applies to theories that propose narrower, but still domain general, cognitive processes to distinguish humans from other primates. The most systematic such attempt is that of Penn et al. (2008). They claim that what distinguishes human from nonhuman primate cognition is humans’ ability to understand and reason with various kinds of higher-order relations. In addition to several empirical disputes about the data they cite for great apes, the overall problem is that this theory would also predict across-the-board differences in how well humans and other great apes deal with various kinds of problems in different domains of activity. But, again, the Herrmann et al. (2007, 2010) results are not consistent with this account. Furthermore, Penn et al. have no evolutionary story specifying the adaptive context(s) that could account for humans’ special skills with relational conceptualizations. The discussion in box 1 (chapter 3) proposed an alternative account, namely, that humans’ especially sophisticated relational thinking derives from comprehension of the individual roles involved in various types of joint and collective intentionality. Thus, this special form of relational thinking is just one outcome of the process of adapting cognitively to new forms of social engagement. And something similar may be said about Corbalis’s (2011) proposal that the key to human cognitive uniqueness is recursion, especially as manifested in language, “mental time travel,” and theory of mind. Recursion also plays a key role in the current account, but again, we would claim that it is not the whole story. Rather, it is an outcome of the process by which humans came to collaborate and communicate with others in special ways; specifically in this case, it is the special way that individuals had to make inferences to participate in cooperative (ostensive-inferential) communication.
A second set of hypotheses to explain human cognitive uniqueness invoke language and/or culture. In the case of language, some theorists have pointed to the unique kinds of computational processes that language enables, namely, various kinds of combinatorial/syntactic productivity, including recursion (see Bickerton, 2009, for a recent version of this view). More philosophically minded theorists have focused on the role of language in reasoning, that is, on the way that humans make assertions aimed at truth, and then attempt to justify them to others with articulated reasons (as in science and mathematics and, perhaps, courts of law and political disputes), and this is only possible in the medium of a language of some kind (see, e.g., Brandom, 1994). Of course, no one can dispute the crucial role of language in human thinking—and language is a key part of our proposed second step in human cognitive evolution—but, in the current view, it plays its role only fairly late in the process. Indeed, we have argued previously that human language was made possible by a number of earlier adaptations for joint intentionality (e.g., joint goals, common conceptual ground, recursive inferences), and that its eventual emergence was part of a larger process in which many human activities were conventionalized and normativized (Tomasello, 2008). In our view, saying that only humans have language is like saying that only humans build skyscrapers, when the fact is that only humans, among primates, build any kind of stable shelters at all. Language is the capstone of uniquely human cognition and thinking, not its foundation.
Somewhat relatedly, many social and cognitive anthropologists have insisted that what is most remarkable about human cognition, compared with that of other primates, is its variability across different human populations, which attests to its grounding in processes of culture (e.g., Shore, 1995; Chase, 2006). More radically, various postmodern theorists have claimed that basically all of human experience takes place within the discursive practices of a human culture, and so uniquely human thinking is only imaginable within this cultural framework (e.g., Geertz, 1973). Again, these claims for the key role of culture are all, in some general sense, true. But again, if our question is evolutionary origins, they are not sufficient. Human thinking became unique even before the flourishing of human cultural variability, specifically, in the evolution of species-wide skills of collaboration, cooperative communication, and joint intentionality more generally (and it can be seen today in the species-unique skills of prelinguistic human children). These skills then enabled the evolution and development of culture at a subsequent time. This analysis also applies to the account of Richerson and Boyd (2006), who have argued for the crucial role of cultural group selection in most things uniquely human. Again, the second (cultural) step of our story invoked this process as well, but again, there are numerous prerequisite and concomitant uniquely human capacities that make cultures, and consequently cultural group selection, possible in the first place (e.g., conformity, conventionalization, and normativization). Since a culture comprises conventionalized ways of doing things, for modern human cultures to be the way that they are, some things must have already been complex and fundamentally cooperative “naturally,” before they were conventionalized.
And so we agree with almost everyone that language and culture were necessary for the evolutionary emergence of modern human cognition and thinking. We have just argued that they were made possible by other uniquely human social and cognitive processes—namely, those associated with joint and collective intentionality more generally—that emerged earlier or concomitantly in human evolution. A full account must therefore acknowledge the role of these earlier and/or concomitant processes, and indeed, our own view is that an understanding of how language and culture work as modes of social engagement and interaction at all requires a full explication of the underlying processes of joint and collective intentionality involved (Tomasello, 1999, 2008).
The third and final set of hypotheses comes from evolutionary psychology. Tooby and Cosmides (1989) have proposed a Swiss army knife metaphor in which the human mind comprises a varied collection of special-purpose modules evolved to solve specific and unrelated problems, the most numerous and crucial of which arose with early humans and their small-group social interactions. This focus on specific adaptive challenges and the evolved cognitive capacities for solving them is a necessary and important propaedeutic in the mostly evolution-free field of cognitive psychology. But, in practice, evolutionary psychologists have focused mainly on noncognitive (or only weakly cognitive) problems such as mate selection and incest avoidance. In terms of cognition, Tooby and Cosmides (2013) have been content simply to point out various ways in which human cognition shows the imprint of its evolutionary history in various domains. For example, in the domain of reasoning, humans solve some logical problems better if they are presented in social contexts similar to those from the environment of evolutionary adaptedness; in the domain of spatial cognition women have better spatial memories than men because they are adapted for plant gathering; and in the domain of visual attention humans pay special attention to the comings and goings of animals. So far, however, these theorists have given no comprehensive account of human cognitive modules in general, or of their unique aspects relative to other primates in particular.
There are several theories from this general perspective—that is, focusing on modularity and adaptedness—that make more systematic attempts to account for human cognitive uniqueness. First, Sperber (1996, 2000) argues that humans, like all animal species, possess a host of highly specific cognitive modules, such as snake detection and face recognition, as well as some more general modules, such as intuitive physics and intuitive psychology. These support what he calls intuitive beliefs (fast, resistant to evidence). What makes human cognition especially powerful is a kind of supermodule that enables individuals to entertain metarepresentations, that is to say, representations that not only cognitively represent the world but also represent others’ or their own representations of the world. Individuals do this “propositionally” (compositionally and recursively), leading to what Sperber calls reflective beliefs (which may be formed either by having good reasons or by adopting the beliefs of others whom one trusts). If other animals engage in meta-representation at all, it is only in very rudimentary fashion, without compositionality and recursivity. This metarepresentational ability (Sperber actually thinks there may be three different metarepresentational modules) enables everything from cooperative (ostensive-inferential) communication, to teaching and cultural transmission, to reasoning by arguing with others. The capacity for metarepresentation coevolved with and interacts with a separate language module, which is also, obviously, uniquely human.
Carruthers (2006) gives an account of nonhuman primate cognition that involves representations and inferences, but he also stresses the limitations imposed by the “compartmentalization” of nonhuman primate cognitive modules. Human cognition is much more creative and flexible because, in the course of human evolution, additional modules were added, the most important of which are a mind-reading system (going beyond what apes do), a language-learning system, and a normative reasoning system. These modules can be applied simultaneously in the same situation, which creates some novelties, and in addition, humans evolved a disposition to imagine and rehearse action plans creatively in working memory, a capacity that enables all of the other modules to interact with one another more flexibly.
Mithen (1996) makes a systematic attempt to provide a modular theory of human cognitive evolution closely tied to the artifactual record. He makes a distinction between early humans and modern humans, noting that early humans were relatively limited cognitively, using the same tools everywhere over many millennia, with no symbolic behavior, and so forth. He explains this limitation by positing that early humans, like most animals, possessed several different cognitive modules that were not integrated with one another. Specifically, they had a technical intelligence with tools, a natural history intelligence with animals, and a social intelligence with conspecifics, none of which interacted any other module. With modern humans, we get symbolic capacities and language, which enabled the modules to work together, creating the kind of “cognitive fluidity” associated with modern human thinking.
What all of these more specific evolutionary psychology accounts have in common is the proposal that nonhuman primates, and perhaps even early humans, are dominated by highly compartmentalized modules, and this means that their cognitive processes are relatively narrow and inflexible. In contrast, human cognition is broader and more flexible because humans have modules, including some novel modules, that somehow work together or communicate with one another (via metarepresentation, symbols and language, or some horizontal processes such as creative imagination in working memory). The means that nonhuman animals (and perhaps early humans) operate only with system 1 intuitive inferences, whereas modern humans operate in addition with system 2 reflective inferences based on actual thinking. But this view—a kind of strict view of modularity for all animals except modern humans—is not compatible with the data on great ape thinking at all. There is no evidence that great apes operate only with highly compartmentalized modules, and indeed, chapter 2 presented evidence that they do not; they often use system 2 processes to think before they act in both the physical and social domains, in both cases using abstract representations, simple inferences, and protological paradigms (structured by physical causality or social intentionality). In our view, then, these attempts to both be true to modularity theory and simultaneously make room for human flexible thinking simply do not accord well with available empirical evidence.
It is also disconcerting to see how different are the specific modules that the different theorists posit—indeed, they often operate at very different levels of analysis (compare snake detection and face recognition with technical intelligence and normative reasoning). Perhaps a more systematic and comprehensive list could be compiled, but the real problem is that modularity theorists do not often ask the question of origins beyond seeking a single evolutionary function for a module (what it is “good for”). It is well known that in evolution new functions are often subserved by existing structures, perhaps put together in some new ways. Thus, for example, the proposed module for normative reasoning would almost certainly have been constructed out of earlier skills and motivations for such things as making individual inferences, conforming to others and the group, evaluating others and being sensitive to their evaluations, cooperative communication, and other skills. Looking at the architecture of contemporary human cognition for a single evolutionary function (via “reverse engineering”) misses out on the dynamics of evolution, the way that new functions are created by cobbling together already existing structures as evolution proceeds. This dynamic means that there are deep relations among many cognitive functions via “common descent.” A complex adaptive behavior such as collaborative foraging, for example, may comprise many component process such as fast running, accurate throwing, and skillful tracking—not to mention skills of joint intentionality—that may each have other adaptive functions, either on their own or in other complex behaviors. Once we get past narrowly defined problems with immediate and urgent fitness benefits (e.g., mate selection and predator detection), this hierarchical structure is crucial for understanding how different cognitive skills interrelate with others.
Our preference would thus be not to use the word module, which suggests a static architectural or engineering perspective. Rather, we would prefer the word adaptation, which suggests dynamic evolutionary processes. Adaptations may be quite narrowly targeted, and we ourselves have invoked the ethological notion of adaptive specialization (e.g., spiders building webs), which is very close in spirit to the notion of a module. But other adaptations may apply more broadly, either initially or through extensions over time. For example, great apes do not seem to be specifically adapted for tool use, as neither gorillas nor bonobos (and only some orangutan populations) use tools in the wild. But all great apes use tools, and quite adeptly, in appropriate circumstances in captivity. The adaptation thus seems to be more for causal understanding in the manipulation of objects, which can then be applied in the use of tools if the need arises for an individual (in contrast to some bird species, which seem to be specifically adapted for tool use).
Even more generally along these lines, the question arises whether there exist any truly domain-general horizontal abilities. (The metaphor is that specific content, like space or quantity, is vertical, whereas general processes like representation, memory, and inference are horizontal.) Some modularity theorists believe that seemingly horizontal abilities do not represent single domain-general processes; rather, each module has its own computational procedures that have nothing to do with those in other modules. Our view is that this again misses the importance of hierarchical organization in complex adaptations. Processes such as cognitive representation, inference, and self-monitoring may have evolved initially—in some ancient vertebrate ancestor—in the context of some fairly narrow behavioral specializations. But as new species have evolved, in the face of new and complex problems, these processes have been co-opted for use as subcomponents, as it were, in many different adaptations, some quite broad. This co-option process is especially important in highly flexible organisms such as great apes and humans, and indeed, wide-ranging occurrence of this process is a key component of cognitive flexibility.
Finally, we must also note that human skills and motivations for shared intentionality do not, in our view, represent typical cognitive adaptations occurring within individuals. Early humans had their own individual cognitive skills, but then they began attempting to coordinate with others toward joint goals with joint attention. Solving these coordination problems did not end the matter, however; rather, it opened up a whole new way of operating for early humans, especially the possibility of communicating referentially about basically everything in their experience with modified processes of representation and inference. The emergence of shared intentionality thus effected a restructuring, a transformation, a socialization, of all the processes involved in individual intentionality and thinking—an unusual, if not unprecedented, evolutionary event. This does not mean that humans do not also operate with many system 1 processes impervious to this transformation; they do, quite often making “gut decisions” about event probabilities, moral dilemmas, dangerous situations, and so forth (see, e.g., Gigerenzer and Selton, 2001; Haidt, 2012). But still, humans may consider and even communicate about all of these things in their system 2 thinking, even if this does not affect their eventual behavioral decision. Skills and motivations for shared intentionality thus transform the way that humans think about almost everything—because they can communicate about almost everything.
In any case, as noted at the outset, none of these various hypotheses—in any one of the three sets just reviewed—is a direct competitor to the shared intentionality hypothesis, as none of them focuses specifically on human thinking and its component processes. Each has captured its own segment of the truth about uniquely human cognition and thinking, we would argue, but the current account is both more comprehensive in covering all of the many different aspects of human thinking and more true to the way that evolutionary processes operate in cobbling together complex behavioral functions out of preexisting component processes. In addition, as we shall now see, the shared intentionality hypothesis also fits quite well with contemporary theories of the evolution of human sociality.
There are a number of different theories of the evolution of human sociality, but they all agree on one thing: the general direction is one of ever more cooperation (at least until the rise of agriculture, cities, and stratified societies, some 10,000 years ago). As distinct from other great apes, early humans began mating via pair bonding, with the result that nuclear families became newly cooperating social units (Chapais, 2008). Relatedly, humans—again as opposed to other great apes—began various forms of cooperative childcare in which adults other than mothers cared for youngsters (Hrdy, 2009). This new form of childcare may have been a precursor to, but also may have occurred in conjunction with, collaborative foraging, as grandmothers and other females remained at home with the children while the healthiest females foraged and brought back the food to share—which turned the network of families involved into new cooperating units (Hawkes, 2003). And with the rise of modern humans, entire cultural groups—potentially encompassing whole clans or tribes with individuals who might not even know one another—became cooperating units as they competed with other human groups for valuable resources in cultural group selection (Richerson and Boyd, 2006).
How this trend toward cooperation might have interacted with humans’ ever increasing cognitive competencies has been little explored, or even speculated about. There are two main exceptions. First, in support of the social brain hypothesis, Dunbar (1998) has documented across primate species a strong positive correlation between brain size (presumably reflecting cognitive complexity) and population size (presumably reflecting social complexity). Modern humans are the extreme case: human brain size and population size are both several times larger than those of their nearest great ape relatives. Gowlett et al. (2012) attempted to trace this relationship across human evolution and found an especially big jump in both brain size (as estimated by cranial volume) and estimated group size at around 400,000 years ago with Homo heidelbergensis—which is, of course, precisely at the time of our hypothesized first step in the evolution of human thinking via joint intentionality. However, group size is only a very gross indicator of social complexity (Dunbar focuses on the greater numbers of social relationships and reputations to be kept track of), and brain size is only a very gross indicator of cognitive capacities, so the social brain hypothesis gives us only a very general indication of the actual processes involved on either side of the correlation.
A more specific attempt to link human sociality and cognition is provided by Sterelny (2012), who has focused on human cooperation and its many facets, including cooperative childcare, cooperative foraging, and cooperative communication and teaching. The human cooperative lifestyle depends on individuals acquiring huge amounts of information during ontogeny—everything from how to track an antelope, to how to make a spear, to how the kinship relations of the group are constructed—and so the cooperative transmission of information from expert adults to novice children is crucial for individual survival. Humans have thus constructed learning environments within which their own offspring develop, which ensures that these offspring gain the information they need to perform critical subsistence activities such as toolmaking and collaborative foraging. Tomasello (1999) also offers a version of this theory, focusing especially on the ways in which human cognitive ontogeny is made possible by children acquiring the material and symbolic artifacts (including language) created by their forebears. In a generally similar vein, Levinson (2006) focuses on the uniquely human “interactive engine” of cooperative social engagement and how its evolution has created uniquely human forms of multimodal communication. Hrdy (2009) stresses that some of the adaptations involved here could have been for infant behavior itself, for example, special skills of cooperation and communication that enabled infants to navigate the newly complex world of multiple caregivers from an early age.
From the current perspective, these two accounts of the interrelation of human sociality and cognition are both useful and generally correct. But we have focused here specifically on the underlying processes of thinking involved. We have done this at a level of detail showing relatively precisely how specific problems in the coordination of action (collaboration) and the coordination of intentional states (cooperative communication) might have presented themselves to humans at two different evolutionary periods, and how humans might have solved them via new forms of thinking (employing new forms of cognitive representation, inference, and self-monitoring). Early humans needed not just to keep track of social relationships and transmit useful information to their young, but in addition and most immediately they also needed to meet the many and varied challenges of subsistence via social coordination—which they did by developing the many and varied skills and motivations of shared intentionality, including the ability to conceptualize situations for others recursively in cooperative and conventional communication. The indissociability of social coordination and human thinking is captured quite nicely by Sellars (1962/2007, p. 385), who writes: “Conceptual thinking is not by accident that which is communicated to others, any more than the decision to move a chess piece is by accident that which finds an expression in a move on a board between two people.”
And so, by way of a general summary of our account, let us focus on the specific issue of the relation between sociality and thinking as it arises at each step of the proposed natural history. The main conclusions may be expressed in four very general propositions.
1. COMPETITION WITH GROUPMATES LED TO SOPHISTICATED FORMS OF NONHUMAN PRIMATE SOCIAL COGNITION AND THINKING WITHOUT HUMAN-LIKE FORMS OF SOCIALITY OR COMMUNICATION. Basic mammalian sociality is simply the motivation to live in a social group. Within-group competition engenders social relations of dominance and, along with other factors, affiliation. Great apes, and perhaps other primates, engage in more-than-average social competition and so have developed skills for understanding the goals and perceptions of others as a way of flexibly predicting their behavior. They also are especially skillful at manipulating physical causes in tool use and the intentional states of others in gestural communication. Great apes collaborate—that is, actually work together—very little, and when they do, it is best characterized as what Tuomela (2007) calls “group behavior in I-mode,” as in chimpanzees’ group hunting in which each individual is attempting to capture the monkey for itself. Great ape communication is almost exclusively about attempting to direct the recipient’s attention and behavior in some desired ways, not to inform them of things useful to them. There are no human-like joint goals; there is no cooperative communication for coordinating actions.
Great ape cognition and thinking are adapted to this social, but not very cooperative, way of life. Great apes attend to situations relevant to their goals and values and, in certain problem situations, simulate or imagine the effects of various causes on the problem ahead of time before acting, as a way of making an effective behavioral decision. They do this with cognitive representations that are imagistic and schematic, understanding that “this is another one of those.” They also understand in many cases how situations (and their components) relate to one another causally or intentionally, which enables them to simulate nonactual situations and make all kinds of causal and intentional inferences about them, including logical inferences organized into paradigms. For example, they infer not only “if X is present, then Y will be absent,” but also “if there is only silence coming from here, then X must be there,” and even “if X wants Y and perceives it in location Z, then she will go to location Z.” These causal and intentional inferences also generate a kind of instrumental rationality in decision making, as the individual infers “if situation X obtains, then the best action to choose is Y.” Great apes also self-monitor their own decision making not only by monitoring how outcomes match goals but also by monitoring the information available to them, and their confidence in it, before making a decision.
And so, the upshot is that great ape sociality has led to some remarkable skills of social cognition, to complement sophisticated skills of physical condition, what we have called skills of individual intentionality. But this form of sociality has not led to any transformations in the way that individuals conceptualize the world or think about problems in general. Individual intentionality has enabled great apes, and perhaps other nonhuman primates, to actually think about problems in specific situations, and to do this without any of humans’ unique forms of sociality or communication. Individual intentionality and instrumental rationality may thus be considered general primate issue for “thought in a hostile world” (Sterelny, 2003.)
2. EARLY HUMAN COLLABORATIVE ACTIVITIES AND COOPERATIVE COMMUNICATION—EMPLOYING NEW FORMS OF SOCIAL COORDINATION—LED TO NEW FORMS OF HUMAN THINKING WITHOUT EITHER CULTURE OR LANGUAGE. For more than 5 million of the 6 million years that humans have been on their own evolutionary pathway, their thinking was mainly ape-like (though their skills at making tools may have enhanced their causal understanding). But then there was a change in ecological conditions that forced some early humans to begin collaborating in new ways to obtain food. This made individuals interdependent with one another in an especially urgent way. In mutualistic activities such as these, communication could become fully cooperative since it was in the interest of each individual to coordinate with others toward their mutualistic goal and to inform them of things useful to them in their role. And so were born early humans who could survive and thrive only by collaborating and communicating cooperatively with social partners.
Collaborative foraging created a number of difficult problems of social coordination. The basic solution was to form with others joint goals to do things together, to which both participants were jointly committed. This created the dual-level structure: joint goals with individual roles, along with joint attention with individual perspectives. In the cooperative communication used to coordinate individuals’ perspectives (and so actions) within these activities—initially via pointing and pantomiming—the communicator was committed to cooperation in the form of an honest informative act, and communicator and recipient collaborated to ensure successful communication. The recipient followed the pointing gesture, or imagined the referent of the pantomime, and then made an abductive inference from that to what, given their common ground, the communicator intended to communicate. The communicator, for his part, knew that this was what the recipient would be doing and so attempted to conceptualize the situation for her in his choice of referents—anticipating her perspective of his perspective on her perspective recursively—in a way that facilitated her abductive leap. Moreover, in the special context of joint decision making, early human communicators sometimes pointed out relevant situations to their partner that (implicitly) provided reasons for them to jointly decide on a certain course of action based on their common ground understanding of the causal and/or intentional implications of the indicated situation.
To do all of this effectively required thinking of a type not possible for great apes and their individual intentionality: the communicator had to make judgments not only about his common conceptual ground with recipient but also about which aspects of the current situation the recipient would find both relevant and new—and so what kind of abductive inference she would make given different possible referential acts. Doing this led to what we have called second-personal thinking, comprising (1) cognitive representations that are perspectival and symbolic, (2) inferences that are recursively structured to include intentional states within intentional states, and (3) self-monitoring that incorporates the imagined social evaluation and/or comprehension of the collaborative and/or communicative partner. These changes all served to basically “cooperativize” great ape individual intentionality into a kind of second-personal joint intentionality and thinking.
And so, early humans’ joint intentionality and second-personal thinking represented a radical break, a new type of relation between sociality and thinking. The cooperative and recursive sociality of early humans created an adaptive context requiring individuals, if they were to survive and thrive, to coordinate their actions and intentional states with others, which required them to “cooperativize” their cognitive representations, inferences, and self-monitoring, and so the processes of thinking that these enabled. Importantly for theories of the relation of sociality and thinking, this new type of second-personal thinking took place without conventionalization, culture, or language or anything else going beyond direct, second-personal, social engagements.
3. MODERN HUMAN PROCESSES OF CONVENTIONALIZED CULTURE AND LANGUAGE LED TO ALL OF THE UNIQUE COMPLEXITIES OF MODERN HUMAN THINKING AND REASONING. Modern humans faced some new social challenges due to increases in group sizes accompanied by competition among groups. For survival, modern human groups had to begin operating as relatively cohesive collaborative units, with various division-of-labor roles (see Wilson, 2012). This created the problem of how individuals could coordinate with in-group strangers, with whom they had no personal common ground. The solution was the conventionalization of cultural practices: everyone conformed to what everyone else was doing, and expected others to conform as well (and expected them to expect them to, etc.), which created a kind of cultural common ground that could be assumed of all members of the group (but not other groups). Modern humans’ ways of communicating were conventionalized in this same way as well, which meant that individuals operated in a cultural common ground comprising a kind of group perspective and with conventionalized linguistic items and constructions that could be used effectively with anyone in the group.
This group-minded structuring of modern humans’ activities and interactions, along with their conventional means of communication, meant that modern humans came to construct a kind of transpersonal, “objective” perspective on the world. Conventional communication became fully propositional, not only because of its conventional, normative, “objective” format and topic-focus structuring, but also because the speaker’s communicative motives and epistemic/modal attitudes could be independently controlled in conventional signs, which meant that the propositional content was conceptualized independent of the motives and attitudes of particular individuals. Linguistic constructions enabled unprecedented creativity of conceptual combination, and moreover, they enabled full propositions representing a kind of generic, timeless, “objective” state of affairs, as in pedagogy (“It works like this”) and the enforcement of social norms (“One must not do that”). Group-minded individuals thus constructed an “objective” world.
Conventional linguistic communication provided developing children with a preexisting representational system of alternative means of conceptualization, and everyone knew together in cultural common ground the available alternatives. This opened up a whole new world of both formal and pragmatic inferences. Processes of discourse aimed at effective communication encouraged communicators to make explicit many aspects of their own psychological processes left implicit in previous forms of communication (e.g., intentional states, logical operations), which enabled new ways of reflecting on thinking. In addition, cooperative argumentation for making joint decisions required that individuals make explicit their reasons and justifications to others in order to convince them of their truth; therefore, to be effective, they had to meet the group’s normative expectations for rational discourse. Internalizing this reason-giving process meant that individuals now knew why, for what group-accepted reason, they were thinking what they were thinking. This process provided conceptual links between the individual’s myriad thoughts and propositional representations, leading to a kind of holistic conceptual web. Each individual was also now practicing a kind of normative self-governance in which she, as emissary of the group to which she was a committed member, regulated her own actions and thoughts in terms of the group’s normative standards.
FIGURE 5.1 Summary of the shared intentionality hypothesis
And so, modern humans’ creation of the various form of collective intentionality—comprising cultural conventions, norms, and institutions, including language—led to a kind of agent-neutral, “objective” thinking comprising conventional and objective representations; processes of inferring that were reasoned, reflective, and aimed at truth; and normative self-governance in which individuals monitored and adjusted their thinking to fit with that of the group. Culture and language, as agent-neutral conventional phenomena, thus provide another setting within which a new form of human sociality can lead to a new form of human thinking, specifically, objective-reflective-normative thinking.
From an evolutionary point of view, then, our overall argument is an extension of that of Maynard Smith and Szathmáry (1995): humans have created genuine evolutionary novelties via new forms of cooperation, supported and extended by new forms of communication. Further, this has led to new forms of cognitive representation, inference, and self-monitoring together constituting new forms of thinking. And humans have done this twice, the second step building on the first. Figure 5.1 summarizes the three component processes of human thinking at each of the three steps (i.e., including apes as step 0) of the shared intentionality hypothesis.
4. CUMULATIVE CULTURAL EVOLUTION LED TO A PLETHORA OF CULTURALLY SPECIFIC COGNITIVE SKILLS AND TYPES OF THINKING. All of these processes of joint and collective intentionality are universal in the human species. Most likely, the first step of joint intentionality evolved in Africa before the split between Neanderthals and modern humans and so characterized both species. The second step of collective intentionality likely evolved in a population of modern humans in Africa before they migrated out into other parts of the world after 100,000 years ago. But once they started migrating out and settling in highly variable local ecologies, differences in cultural practices became pronounced. Different human cultures created very different sets of particular cognitive skills, for example, for navigating across large distances, for building important tools and artifacts, and even for communicating linguistically. This meant that different cultures created, on top of their species-wide cognitive skills of individual, joint, and collective intentionality, many culturally specific cognitive skills and ways of thinking for their own local purposes.
Importantly, these culturally specific skills build on one another over historical time within a culture in a kind of ratchet effect, leading to cumulative cultural evolution. Because of humans’ especially powerful skills of cultural learning, along with adult teaching and children’s tendency to conform, the artifacts and practices of a culture acquire a “history.” Individuals mediate their interactions with the world through the culture’s artifacts and symbols from early in ontogeny (Vygotsky, 1978; Tomasello, 1999), thus absorbing something of the wisdom of the entire cultural group and its history. Cumulative cultural evolution is what enabled humans to conquer all kinds of otherwise uninhabitable places all over the globe.
As one dramatic example in the contemporary world, we may point to what are arguably the most abstract and complex forms of human thinking, that is, those involved in Western science and mathematics. The point here is that these forms of thinking are simply not possible without special forms of socially constructed conventions, namely, those in written form, that developed over historical time in Western culture. This point is stressed especially by Peirce (1931–1958) and is summarized in the classic text of modern logic by Lewis and Langford (1932, p. 4): “Had it not been for the adoption of the new and more versatile ideographic symbols, many branches of mathematics could never have been developed because no human mind could grasp the essence of their operations in terms of the phonograms of ordinary language.” Many scholars of literacy would also argue that written language makes certain forms of reasoning, if not possible, at least more accessible (Olson, 1994). Writing also greatly facilitates metalinguistic thinking and the possibility to analyze, criticize, and evaluate our own linguistic communication, as well as that of others. Pictures and other graphic symbols used as communicative devices are collective representations that contribute to the process in important ways as well.
Those modern cultures that have created active communities of scientists, mathematicians, linguists, and other scholars are pretty much unthinkable without written language, written mathematical numerals and operations, and other forms of visually based and semipermanent symbols. Cultures that have not created and do not currently possess any of these kinds of graphic symbols cannot currently participate in these activities. This demonstrates quite clearly that many of the most complex and sophisticated human cognitive processes are indeed culturally and historically constructed. It also opens the possibility that some other human cognitive achievements are a kind of co-evolutionary mixture. Our own view would be that many of the complexities of human language are of this nature: built on universal cognitive processes but with culturally constructed concrete manifestations (Tomasello, 2008).
It is theoretically possible that this entire account applies not to human thinking in general but only to a kind of modularized thinking for collaboration and communication specifically (see Sperber, 1994, for something in this general direction). But this does not seem to be the case. Human perspectival and objective representations, recursive and reflective inferences, and normative self-monitoring—the constituents of uniquely human thinking—do not just go away when humans are not collaborating or communicating. On the contrary, they structure nearly everything that humans do, with the possible exception of sensory-motor activities. Thus, humans use recursive inferences in the grammatical structures of their languages, in mind-reading in noncommunicative contexts, in mathematics, and in music, to name just the most obvious examples. Humans use perspectival and objective representations for thinking about everything, even in their solitary reveries, and they are engaged in normative self-monitoring whenever they are concerned about their reputation—which is pretty much all of the time. We might also recall here skills of relational thinking, which are products of dual-level collaboration but used more broadly, and skills of imagination and pretense, which are products of imagining in pantomime but are now used in all kinds of artistic creation. Collaboration and communication may play the crucial instigating roles in our story, but their effects on cognitive representation, inference, and self-monitoring extend much more broadly to basically all of humans’ conceptually mediated activities.
Along these same lines, we should also be clear that the new forms of social cognition that this account proposes are not just modularized theory of mind skills. Rather, such things as perspectival representations, recursive inferences, and social self-monitoring evolved so that individuals could now understand the world in new ways by putting their heads together with others in acts of shared intentionality. Doing this requires more than just some specific cognitive skill aimed at some specific content domain, because coordinating actions and intentional states with others toward outside referents requires new ways of operating across the board. Skills and motivations for shared intentionality thus changed not just the way that humans think about others but also the way they conceptualize and think about the entire world, and their own place in it, in collaboration with others.
Although we have used ontogenetic data in various ways in this account, our focus here has not been on human ontogeny per se. It is thus important to make two key points about the role of ontogeny in the origins of uniquely human thinking.
First, although ontogeny does not have to recapitulate phylogeny, in the current case the relation between joint intentionality and collective intentionality is partly logical—one must have some skills for coordinating with other individuals before coordinating with the group—and so the ontogenetic ordering is basically the same as our hypothesized phylogenetic ordering (Tomasello and Hamann, 2012). In fact, however, things are more complicated than this because, as noted, young children are modern human beings, and they are exposed to many cultural artifacts, including a conventional language, practically from birth. But we would claim that, until around their third birthday, young children’s social interactions with others are basically second-personal, not group based, and they do not fully understand how such things as language, artifacts, and social norms work as conventional creations.
And so, in the current view, the sequence is roughly this: Young children begin collaborating and communicating cooperatively with others with a second-personal orientation—through direct participation with specific other individuals—at around their first birthday. This includes engaging with others in joint attention, taking others’ perspective in simple ways, and using the pointing gesture with others creatively (Carpenter et al., 1998; see Moll and Tomasello, in press, for a review). Importantly, this developmental timing is characteristic of children from a wide variety of cultural settings—including small-scale, nonliterate societies (Callaghan et al., 2011)—but it is not characteristic of chimpanzee ontogeny, even those raised by humans (Tomasello and Carpenter, 2005; Wobber et al., in press). This set of facts suggests a highly canalized and species-specific developmental pathway for the first emergence of skills of joint intentionality.
Skills of collective intentionality begin to emerge at around the third birthday. This is when young children first begin to understand social norms and other conventional phenomena as products of some kind of collective agreement. Thus, at around three years of age, young children do not just follow social norms but begin actively to enforce them on others (and to feel guilty when they break norms themselves). They do this in ways that demonstrate their understanding that particular norms apply only in particular contexts and only to individuals in the group who have conventionalized them. They also understand that some pieces of language, for example, common nouns, are conventional for everyone in the group, whereas proper names are conventional only for those who know the person (see Schmidt and Tomasello, 2012, for a review). Skills of collective intentionality have not been studied in any depth outside of Western, middle-class culture, and so the cross-cultural generality of this developmental timing is not known.
The second point about the role of ontogeny is this: neither joint nor collective intentionality would exist without it. This is true of many human traits, as the human species has evolved an extended ontogeny for all kinds of things that other species possess in mature, or almost mature, form at birth. Thus, whereas many small primates have brains that develop very rapidly in the first months of life, maturing in less than a year, and chimpanzees have brains that develop to maturity in only about five years, the human brain takes more than ten years to reach its fully mature adult volume (Coqueugniot et al., 2004). Because this extended ontogeny is highly risky both for youngsters and mothers, there must be offsetting advantages, presumably in terms of such things as especially flexible behavioral organization, cognition, and decision making—as well as time to master the local group’s cultural artifacts, symbols, and practices (Bruner, 1972).
Human skills of joint and collective intentionality thus come into existence during an extended ontogeny in which the child and her developing brain are in constant interaction with the environment, especially the social environment. Our hypothesis is that they would not come into existence without this interaction. To make the point as concretely as possible, let us invoke a thought experiment we have used before, and then add a novel twist. Imagine a child born on a desert island, miraculously kept alive and healthy until adulthood, but all alone. The hypothesis is that this child, as an adult, would not have skills of either joint or collective intentionality. This social isolate could not, as an adult, enter a human group and start collaborating by forming joint goals with individual roles, or communicating cooperatively in the context of joint attention with individual perspectives. This individual would thus not develop during its isolated lifetime second-personal thinking with perspectival and symbolic representations, recursive inferences, and social self-monitoring. How could he develop an appreciation for perspectives that differ from his own without in fact experiencing different perspectives? How could he develop socially recursive inferences without any social partners with whom to communicate? How could he worry about others evaluating him if no such others existed? No, skills of shared intentionality are not simply innate, or maturational; they are biological adaptations that come into existence as they are used during ontogeny to collaborate and communicate with others.
This thought experiment might be called Robinson Crusoe, as the child is alone on the desert island. But now imagine a Lord of the Flies scenario. In this case it would be multiple infants born and growing to maturity on a desert island, with no one to interact with but each other. Perhaps surprisingly, the hypothesis in this case is that these children would indeed have the kind of social interactions necessary for developing joint intentionality—but not collective intentionality. That is to say, these orphaned peers should develop, through their social interactions with one another, skills of second-personal, recursive sociality. They would find ways to collaborate with one another with joint goals and attention, communicate with one another taking different perspectives (by pointing and pantomiming), and monitor their behavior through the eyes of their interdependent partner. To develop in this way, sophisticated adults and all of their cultural paraphernalia are not necessary.
But we do not think that peer interaction alone would be sufficient for our orphaned peers to develop skills of collective intentionality in their lifetimes. They might develop on their own some conventions and norms of some kind, as skills of joint intentionality plus imitation might be sufficient, and they might create something like a culture over many generations. But they would not develop during their own lifetimes a full-fledged culture or conventional language, as these take multiple generations over historical time to develop. And the same could be said of cultural institutions with standing status functions, like chiefs and money. In general, fully fledged skills of collective intentionality and agent-neutral thinking require, on our hypothesis, ontogeny in the midst of a preexisting cultural collective with preexisting conventions, norms, and institutions, including a conventional language. How could one become group minded, represent things objectively, and regulate one’s behavior and reasoning by the cooperative and communicative norms of the social group if there was in fact no social group that antedated one’s social and cognitive development? No, skills of collective intentionality are not simply innate or maturational either; they are biological adaptations that come into existence only through an extended ontogeny in a collectively created and transmitted cultural environment—which takes multiple generations to emerge. In this case, then, adults and all of their cultural paraphernalia are indeed necessary for the ontogenetic development of skills of collective intentionality.
It is not incoherent to believe that all of the cognitive and thinking skills we have described could be built-in—to the degree that our wild child or orphaned peers could, if discovered as adults, immediately display perfectly mature forms of uniquely human thinking at both levels. It is just, in our view, highly unlikely. Humans biologically inherit their basic capacities for constructing uniquely human cognitive representations, forms of inference, and self-monitoring, from out of their collaborative and communicative interactions with other social beings. Absent a social environment, these capacities would wither away from disuse, like the capacity for vision in a person born and raised completely in darkness.
One could, in principle, collect data on the role of ontogeny in the emergence of uniquely human thinking—but only if one had no moral scruples. One would have to be prepared to randomly assigned newborn children to different rearing environments. Natural experiments, such as the feral child Victor of Aveyron and other “wolf ” children, are not definitive on the question for many reasons, not the least of which is that some of these children could have been abandoned by their parents precisely because they were not normally functioning (Candland, 1995)—and none of them was tested for the relevant cognitive skills either. Some interesting indirect evidence for the important role of a human-like social environment is provided by so-called enculturated apes. When apes are raised by humans in the midst of all kinds of human-like social interaction and artifacts, they do not develop more human-like skills of physical cognition (e.g., space, object permanence, tool use), but they do develop more human-like skills of imitation and communication (Call and Tomasello, 1996; Tomasello and Call, 2004). The significance of these findings for human ontogeny are, however, not straightforward.
In any case, while everyone will continue to be fascinated by the question of wild children and how much and what kinds of social experience are necessary for humans to develop their unique forms of cognition and thinking, the question is very likely to remain a deep mystery for the foreseeable future. In the meantime, our hypothesis is that, like many human adaptations, adaptations for shared intentionality are built to grow and flourish only in the midst of rich social and cultural nourishment of particular kinds.