This chapter discusses the relationship of language to the brain. It begins with a discussion of animal communication, from which human language evolved and that provides a guide to the basic neural elements and processes that underlie our own system of communication. The next section of the chapter provides a brief summary of human language and its relation to the brain, relying on studies of patients with various types of neurological diseases and on brain imaging in normal subjects.
The ultimate goal of most animal communication, like that of behavior in general, is reproduction. Thus, signals that indicate the sender’s species, gender, and degree of reproductive readiness account for the vast majority of natural communication. These are the messages being broadcast by chirping crickets, flashing fireflies, pheromone-releasing moths, and singing birds. Frequently, the mate-attraction call is also used to warn off potential rivals; for example, bird and cricket songs are also used to identify the boundaries of territories or personal space. Differences in signal quality are the usual basis of mate choice in species for which some degree of discrimination is evident. This competition for the attention of members of the opposite sex has led to the development of more conspicuous signals and, in many cases, to the evolution of displays, signaling morphology (i.e., sounds vs. shapes vs. color vs. movement vs. odors), and messages that go far beyond the basic needs of species and sexual identification (Gould & Gould, 1996).
Most animals are solitary except when mating; they abandon their eggs or larvae before the offspring are born. However, a number of species engage in some degree of parental care. For them, signals between parent and offspring are often very important. Most birds, for instance, have about two dozen calls that communicate mundane messages such as the need to eat, defecate, take cover, and so on. In cases in which both parents tend the young—the usual circumstance in birds, for instance—additional signals are required to agree on a nest site, synchronize brooding shifts, and guard the nestlings. Most primates also utilize two to three dozen signals for similar purposes.
The minority of species that are highly social have the most elaborate communication systems of all. They need messages for a variety of elements of social coordination, including, in many cases, group hunting or foraging, defense, and working out of a social hierarchy.
To ensure species specificity in mating, most species rely on more than one cue to identify a sexual partner. (Exceptions include some of the species that rely on pheromones.) Thus, multiple signals are sent, and a choice must be made between sending simultaneous messages and sending sequential messages (or a combination of the two). The sequential strategy has the advantage that the individual signals must be correct and the order must be appropriate. A female stickleback, for instance, requires the male to have a red ventral stripe, perform a zigzag dance, poke his nose in a nest, and then vibrate her abdomen; the odds of this concatenation of signals occurring together in this order by chance are remote. However, sequential signaling is time-consuming. A faster strategy is to provide all the cues in parallel, an approach that accepts the larger chance that these cues can occur together by chance. For example, when a parent herring gull waves its bill in front of chicks to see if they need to be fed, the young simultaneously see a vertical beak, a red spot, and a horizontal motion, each of which is a discrete cue that combine in the mind of the chick to elicit pecking.
Nearly all animal communication is innate: the sender produces the appropriate signal in the correct context even without any opportunity for learning, and it can be recognized for what it is by equally naive conspecifics. The basis of innate recognition appears to lie with feature detectors in the nervous systems—the inborn circuits that automatically isolate iconic visual or acoustic elements (Gould, 1982). The availability of many visual, auditory, tactile, and olfactory feature detectors, which can, in theory, be used in any specific combination or order, accounts for most of the diversity of animal communication, and reliance on these individual feature detectors, rather than pattern detection, accounts for its limitations. More complex behavior in relatively short-lived species, such as nest building in birds, is also usually innate, although some improvement with experience is also evident (Gould & Gould, 1999, 2007). Therefore, looking at how complex innate communication can be is useful in considering the extent to which as complex a system as human language must be learned.
In terms of its ability to communicate information, the most complex system of nonhuman communication known at present is the dance of honeybees. The system has some properties that are reminiscent of human language: it uses arbitrary conventions to describe objects distant in both space and time; that is, it does not reflect a real-time emotive readout, as might be the case when a primate gives an alarm call or grunts at a banana.
In terms of information content, a honeybee dance is second (albeit a very distant second) only to human speech, and so far as is known is far richer than the language of any primate. However, the dance language suffers from at least two limitations: it is a closed system in that there seems to be no way to introduce new conventions to deal with novel needs, and it is also graded; instead of discrete signals for different directions or distances, single components are varied over a range of values (angles and durations). The less complex but more flexible systems of birds and primates are more likely to illustrate the evolutionary precursors of speech.
Some birds have innate songs. In these species, individuals raised in isolation produce songs that are indistinguishable from their socially reared peers and respond to calls appropriately without prior experience. Chickens, doves, gulls, and ducks are familiar examples. Most songbirds, however, illustrate a different pattern: one that shows how potent a combination instinct and innately directed learning can be. Isolated chicks sing a schematic form of the species song, but the richness of a normal song is absent. Adult conspecifics can recognize innate songs as coming from members of their species, but in general these impoverished vocalizations produce lower levels of response. Typically, there is a sensitive period during which exposure to song must occur if it is going to have any impact (Fig. 49.1). In most species, there is a gap between this sensitive period and the process of overt song development—that is, practice and perfection of the adult song are based on the bird’s memory of what it heard during its sensitive period (Gould & Marler, 1987).
Figure 49.1 Birdsong development in most species is characterized by a sensitive period during which a song of the species must be heard. Later, during subsong, the bird practices making notes and assembles them into the correct order and pattern (A). Birds not allowed to hear their species’ song sing a schematic version of the song (B and C); birds deafened before subsong cannot sing (D).
Given a range of possible song models during isolated rearing, a chick selects an example from its own species and memorizes it. If it hears only songs of other species, the mature song is the unmodified innate song (Fig. 49.1). Thus there is an innate bias in the initial learning. Where this bias has been studied, it appears to depend on acoustic sign stimuli (i.e., species-specific “syllables”). Chicks are able to extract syllables of their species embedded in foreign songs, or scored in a way never found in their species (e.g., a syllable that is repeated at an accelerating rate presented to a species that sings syllables at a constant rate, or vice versa will be extracted and used in the species-typical manner).
Practice is essential in the normal development of birdsong, and part of this practice occurs in a babbling phase known as subsong, which begins at a species-typical age. A bird deafened after its sensitive period, but before it begins producing notes in preparation for singing (subsong), is unable to produce even an innate song (Fig. 49.1). During the earliest parts of subsong, birds try out a number of notes. These notes are typical of the species, but most are absent from the song they eventually sing. The learning process may involve producing each member of an innate repertoire of notes, listening to them, checking to see if they match any element in the memorized song, discarding the unnecessary ones, and rearranging, scoring, and modifying the others to produce a reasonable copy of the original song heard during the sensitive period.
There is some flexibility in song development. For example, when the chick has heard two very different specimens of its own species’ song, it will often incorporate elements from each. Similarly, when the chick has been exposed to the sight of a singing conspecific and simultaneously the sound of a heterospecific song, it may pick out elements of the abnormal song and adapt them as best it can into its own species-specific organization.
Birdsong, therefore, depends on two processes that involve an interaction between innate capacities and learning: imprinting the song in memory and then learning to perform it. This system is flexible, but only within clear limits.
Primates are the species with the closest evolutionary links to humans, and therefore the communication systems found in these animals are important to study for clues regarding the neural basis of human language.
Vervets, a species of monkey, provide a good case study. Vervets, like all social primates, have a large repertoire of innate calls used for social communication. Among these approximately three dozen signals are four alarm calls (Cheney & Seyfarth, 1990). In some parts of their range, one of the calls is specific for martial eagles; in another, the same call is used for certain hawks. In either case, the call causes monkeys to look up; those at the tops of trees drop to the interior, whereas those on the ground move into bushes or under trees. A second call, specific to leopards in one region and to other solitary hunters elsewhere, sends the warned individuals up to the tree tops. A third, specific for snakes, induces the other members of the troop to stand up and look around in the grass. A fourth call is heard in the presence of humans or group-hunting predators.
The development of calling is revealing. Young vervets appear to understand the class of animals each call refers to, but not the particular species that are dangerous. Thus, infants will give the eagle call to harmless vultures, storks, and even falling leaves, but not to snakes or leopards. Consequently, adults generally respond to the alarm calls of infants with a casual look around, followed by their own alarm if there is a genuine danger. Juveniles make fewer mistakes, and adult errors are confined to calls produced when the potential threat is so far away that human observers require binoculars to identify the species. In short, young vervets seem to learn the details of how to apply an innate categorical vocabulary.
Nonhuman primates have virtually no voluntary control over respiration, the vocal tract, or vocalization, so this modality is not available for producing a large number of differentiated signals. Other modalities of communication, such as gesture, however, are also used for relatively restricted purposes, mostly related to items of immediate biological importance and social hierarchy. Recent detailed studies of gesture use in captive chimpanzees indicate that the meanings of particular gestures differ from one group to another.
Reseachers have attempted to teach chimpanzees to use a variety of communicative systems that bypass their vocal limitations, such as sign language and manipulation of object tokens, in an effort to determine to what extent these naturally occurring communication systems can be augmented. If anything, this work emphasizes the ways in which nonhuman primates are not capable of full human language. There are dramatic differences between trained chimpanzees and humans in terms of vocabulary size (chimps never get beyond what a 2-year-old can do), conceptual capacities, complex grammar, and so on, despite the fact that chimps have remarkable skills at social cognition.
The neural mechanisms that underlie animal communication may give clues as to those that allow for human language. These systems of course differ in different phyla and species. This section briefly reviews aspects of the neural basis for birdsong and primate communication. It would be of great interest to report on the neural basis for the dance of the bees, but nothing is known about this subject.
Birdsong appears to be produced by a small number of nuclei (collections of nerve cells). In the canary, four important nuclei are involved: HVc, RA, DM, and nXIIts. Lesions in any of these nuclei destroy or seriously impair song production. These nuclei also respond when song is presented, and some neurons in some nuclei in some species (e.g., neurons in nXIIts in the zebra finch and the white-crowned sparrow) respond specifically to particular aspects of song. This indicates an overlap in the neural mechanisms involved in song perception and production, a feature that may carry over to humans as well.
Birdsong is produced by a structure between the trachea and the bronchi called the syrinx. There is one syrinx on each side of the trachea. In some species, song is entirely, or mainly, produced by one of these two structures; in other species, both syrinxes participate in song, with each producing different parts of the song. Each syrinx is controlled by structures in its corresponding side of the brain and, when one syrinx is responsible for song, these structures are larger on that side. The perceptual system is also duplicated on both sides of the brain, and, in some species, the two sides of the brain respond to different aspects of song.
In species in which only male birds sing (most species), song nuclei are much larger in males than in females (“sexual dimorphism”). This is not true in those species in which males and females perform duets. In general, there is a correlation between song nucleus size and song complexity in individual birds. Furthermore, song nuclei grow by a factor of almost 100% during the spring mating season (when birds sing) compared to the fall and winter (when they do not).
Changes in the size of song nuclei are related to testosterone production (they can be induced in females by testosterone injection in some circumstances) and by other factors usually related to day length—part of the birds’ innate calendar. They are due to many changes in nerve cells: increase in the number of dendrites, increase in cell size, and, most interestingly, increase in the number of cells. Studies using labeled thymidine, which is taken up by dividing DNA, have shown neurogenesis in neurons in the HVc nucleus in relationship to periods of increased song, indicating that new nerve cells are formed. These neurons project to the RA nucleus, strongly suggesting that they play a role in song.
The neural basis for primate vocalizations is beginning to be charted. Unlike humans, vocalizations in primates do not appear to begin in a lateral cortical region but rather in the medial portion of the brain (the cingulate), a region involved in connecting basic instinctual to more advanced cognitive functions. Actual motor planning appears to involve mainly brainstem nuclei. On the perception side, there are neurons in the auditory cortex of some species of macaque monkeys that selectively respond to conspecific calls. These regions show some degree of functional asymmetry in some species, with responses from cells in the left hemisphere and not the right.
It is dangerous to infer too much about human language, or about how the human brain may support language, from studies of species that are very distant, such as birds. However, these species provide evidence for the possibility of certain aspects of language and its neural basis that could have developed several times in evolution or that could date to a distant common ancestor.
At a behavioral level, human speech and language could have developed from the repertoire of two to three dozen innate calls that are typical of birds and primates. In that case, we would expect to find that language contains elements that are innately recognized and that may be distinguished by the sorts of acoustic features that provide the basis for the innate discrimination of sign stimuli in other species. In fact, babies have an innate ability to discriminate a small number of sounds found in human languages, which function as acoustic sign stimuli. The evolution of human speech could also have retained the sorts of sensitive periods and innate learning biases so evident in song birds and vervet monkeys. In this case, we might expect to see a species-typical babbling phase, and perhaps a sensitive period for easy learning of new languages. If the evolution of speech and language preserves the use of preexisting (innate) forms, we might expect there to be innate aspects of human language, which would presumably be found in all languages and which might surface in situations where we see a “default grammar,” not unlike the unlearned and impoverished songs of birds reared in isolation. Creoles potentially exemplify such cases. Creoles are languages developed by the children of people speaking different languages who are drawn together and communicate with very simple pidgin. They have a number of features not present in the languages spoken by the parents of the children who spoke the pidgin, which have been taken to be related to innately specified features of language (Bickerton, 1990).
From the neurological perspective, several features of the neural organization for communication systems appear to be found in humans in a way that is relevant to language. These features are the fact that there are specialized neural nuclei for song production; that the neural basis for song production is often asymmetric, with one hemisphere producing most of song or the two hemispheres producing different aspects of song; that production and perception of song use the same structures to some degree; and that there is a relationship of size to function. However, other important features of the neurobiology of birdsong, such as the neurogenesis that is important in the seasonal changes in song production, are not known to play a role in human speech and language. As for primates, the neural system responsible for call production is quite different from that in humans, being based in the cortex that is transitional from limbic to association regions (reflecting the limited semantic content and immediate biological relevance of most calls). Asymmetries in the neural basis for call perception, and the existence of neurons in the auditory association cortex that selectively respond to conspecific calls, may be quite direct evolutionary precursors of the human neural substrate for language.
Human language differs from animal communication systems in many ways. Functionally, as we have seen, most animal communication systems serve the purposes of identifying members of a species and designating items of immediate biological significance. Human language provides means to designate an infinitely large number of items, actions, and properties of items and actions. Human language also allows items, actions, and properties to be related to one another, and for relationships between events and states of affairs in the world, such as temporal order and causation, to be represented at the level of discourse. Language expresses these different semantic values with different types of representations: words, words made from other words, sentences, and discourse. Each of these “levels” of linguistic representation consists of specific forms that are related to specific aspects of meaning. The forms at each level are intricately structured, and there are an infinite number of different structures that can be built at each level of the system.
Current models of language processing subdivide functions such as reading, speaking, auditory comprehension, and writing into many different, semi-independent components, which are sometimes called “modules” or “processors.” These components can be further divided into variable numbers of highly specialized operations, such as those involved in mapping features of the acoustic signal onto phonemes or in constructing syntactic structures from words. Each operation accepts only particular types of representations as input and produces only specific types of representations as output.
Models of language processing are often expressed as flow diagrams (or “functional architectures”) that indicate the sequence of operations of the different components that perform a language-related task. Figure 49.2 presents a basic model illustrating the functional architecture of the lexical processing system. Figure 49.2 is greatly simplified because not all components are represented, parallel feedforward and feedback pathways are not depicted, and the nature of the operations in each component is not specified.
Figure 49.2 A model of the major psycholinguistic operations involved in processing simple words.
Some important operating characteristics of components of the language processing system emerged from experimental studies. These include the following features:
1. Language processors are highly specialized. For instance, recognition of phonemes is accomplished by mechanisms that appear to separate very early in the processing stream from those that recognize other auditory stimuli.
2. Language processors are obligatorily activated when an individual attends to the input. For instance, if we attend to a sound that happens to be the word “elephant,” we must hear and understand that word; we cannot hear this sound as just a noise.
3. Language processors generally operate unconsciously.
4. Language processors operate quickly and accurately. For instance, it has been estimated on the basis of many different psycholinguistic experimental techniques that spoken words are usually recognized less than 125 ms after their onset—that is, while they are still being uttered. The speed of the language processing system as a whole occurs because of the speed of each of its components, but also is achieved because of the massively parallel functional architecture of the system, which leads to many components of the system being simultaneously active.
5. Language processors require “processing resources.” Although we do not appreciate it consciously, language processing demands some effort. It is unclear whether there are separate pools of processing resources for each language processing component or for language processing as a whole or whether language processing can “borrow” resources from other systems if it needs to.
Language develops, without explicit instruction, in the auditory-oral modality, if this channel is available. Prelingual deafness leads to language developing in the visual and gestural modalities. Languages can also be represented orthographically, using a variety of scripts (alphabetic, syllabic, consonantal, ideographic). Many of the features of spoken language processing we have reviewed apply to the processing of written and signed language as well, though there are differences due to constraints imposed by the modality of presentation. For example, written language does not develop in the absence of explicit instruction.
By some counts, there are over 6500 languages in the world. These languages differ in many ways from one another: they have different vocabularies, use different sounds, and have different ways of forming words and different syntactic rules. However, beyond these surface level differences, there are features that are common to all human language. For instance, the sound systems of all human languages consist of alternations of consonants and vowels; no language forms different words by varying features of sound such as its loudness or whether a word is whispered. These “linguistic universals” form a framework within which the features of individual languages occur.
Linguistic universals can arise because of universal features of motor or sensory systems or, possibly, because of constraints on cognitive computational capacities. They may also arise because language itself is constrained to have certain features. Chomsky’s work in the area of syntax is the best known and most controversial example of analyses along these lines (Chomsky, 1965, 1995). Chomsky and his colleagues and students have argued that very abstract features of syntax are common to all languages. There have been numerous suggestions as to what these features are; one recent proposal is that they include the ability to generate an infinite number of expressions from a finite set of elements (Hauser, Chomsky, & Fitch, 2002). Many linguists believe that the structure of language is much less abstract than Chomsky and his colleagues have suggested and, accordingly, that universal “laws” of linguistic structure are also less abstract, mostly taking the form of statements that the presence of one type of element in a language implies that a second type will also be present.
The issues of how abstract linguistic representations are and what universal properties of language exist are bound up with two major issues—how language is acquired and how language evolved—that are in turn relevant to the neural basis for language. Chomsky’s view is that language acquisition consists of fixing the parameters in an innate language template that correspond to the constraints on possible forms of human language that are present in the language a child is exposed to. A parallel in nonhuman learning might be imprinting (Immelmann, 1972), whose neurological basis has been studied very little. Models of language that see language structures as much less abstract argue for less elaborate innate knowledge and a greater role for learning in the development of language abilities (e.g., Seidenberg, 1997; Tomasello, 2003).
The same disagreements affect ideas about how language evolved. In Chomsky’s model, a critical neural development is the emergence of structures and processes in the brain that support universal properties of language. Assuming recursion is one of these universal properties, the finding that starlings can discriminate a recursive from nonrecursive sequences of notes (Gentner, Fenn, Margoliash, & Nusbaum, 2006) implies that some of these neural structures are not unique to humans, but if recent studies showing that some species of monkeys do not recognize recursive structures are true of all higher primates, the evolutionary path for these features must have taken a number of divergent paths and reemerged in humans. The major alternative to the view that language evolution consisted of the emergence of neural structures capable of supporting abstract organizational features that are found in all human languages is that language rests upon an evolutionary change that supports much less complex and abstract representations and the complexity of human language is a cultural development, like architectural complexity.
One well-known model of this sort that has a neural basis has been developed by Rizzolati and Arbib (1998), who have argued that the neural structures that are critical for language are modifications of the mirror neurons found in monkeys that fire both when the monkey grasps particular objects and when the animal sees a person grasping those objects. In Arbib and Rizzolati’s view, these neurons encode the relation of a number of actors to an action and an object, and thus lay the basis for neural mechanisms that allow an arbitrary number of substitutions of actors and objects in relation to the action described by a verb. We note that the fact that culture influences what structures a language has (Everett, 2005) does not resolve this debate, because any neurologically normal human infant would presumably learn the language of his or her environment.
There are very limited empirical data regarding the neural and genetic basis for the evolution of human language. Lieberman (2006) argues that the critical evolutionary change that made for human language was not specifically neural but consisted of the descent of the root of the tongue into the oropharynx, which allows humans to produce the inventory of speech sounds of a language. In his view, two genetic effects—increases in the size of the hominid brain related to the ASPM gene and, more importantly, changes in the ability to exploit the increased channel capacity in humans by changes in the FOXP2 gene—underlie the use of the resulting increase in the vocal repertoire for communicative purposes. Deficits in the FOXP2 gene in a single family have been associated with a variety of language and articulation problems, and the FOXP2 gene has been referred to as a (or “the”) “language gene” by some researchers, However, the deficits in affected family members are not specific to syntax or word formation, and, as Lieberman argues, it is more likely that this gene codes for neuronal features of subcortical gray nuclei, in particular the basal ganglia and, to a lesser extent, the cerebellum that make for more elaborate and efficient control of sequencing abilities than for cortical structures that represent and process language per se.
Human language depends on the integrity of the unimodal and multimodal association cortex in the lateral portion of both cerebral hemispheres. This cortex surrounds the sylvian fissure and runs from the pars triangularis and opercularis of the inferior frontal gyrus [Brodman’s areas (BA) 45, 44: Broca’s area] through the angular and supramarginal gyri (BA 39 and 40) into the superior temporal gyrus (BA22: Wernicke’s area) (Fig. 49.3). For the most part, the connections of these cortical areas are to one another and to the dorsolateral prefrontal cortex and lateral inferior temporal cortex (Fig. 49.3). These regions have only indirect connections to limbic structures. These areas consist of many different types of association cortex, devoted not to sensation or motor function but to a more abstract type of analysis. The nature of this cortex and its patterns of connectivity are thought to combine to give language its enormous representational power and to allow its use to transcend biological immediacy.
Figure 49.3 A depiction of the left hemisphere of the brain showing the main language areas. The area in the inferior frontal lobe is known as Broca’s area, and the area in the superior temporal lobe is known as Wernicke’s area, named after the nineteenth-century physicians who first described their roles in language. Broca’s area is adjacent to the motor cortex and is involved in planning speech gestures. It also serves other language functions, such as assigning syntactic structure. Wernicke’s area is adjacent to the primary auditory cortex and is involved in representing and recognizing the sound patterns of words.
The evidence that language involves these cortical regions was originally derived from deficit-lesion correlations. Patients with lesions in parts of this cortex have been described who have had long-lasting impairments of language (“aphasia”). Disorders affecting language processing after perisylvian lesions have been described in many different types of disease, in all languages that have been studied, in patients of all ages and both sexes, and in both first and subsequent tongues, indicating that this cortical region is involved in language processing independent of these factors. Functional neuroimaging studies have documented increases in regional cerebral blood flow (rCBF) using PET or blood oxygenation level-dependent (BOLD) signal using fMRI in tasks associated with language processing in this region. Event related potentials whose sources are likely to be in this region have been described in relationship to a variety of language processing operations. Stimulation of this cortex by direct application of electrical current during neurosurgical procedures interrupts language processing. These data all converge on the conclusion that language processing is carried out in the perisylvian cortex.
Regions outside the perisylvian association cortex also appear to support language processing. Working outward from the perisylvian region, evidence shows that the modality of language use affects the location of the neural tissue that supports language, with written language involving the cortex closer to the visual areas of the brain and sign language involving brain regions closer to those involved in movements of the hands than movements of the oral cavity. Some ERP components related to processing improbable or ill-formed language are maximal over high parietal and central scalp electrodes, suggesting that these regions may be involved in language processing. Both lesion studies in stroke patients and functional neuroimaging studies suggest that the inferior and anterior temporal lobe is involved in representing the meanings of nouns. Activation studies also implicate the frontal lobe just in front of Broca’s area in word meaning. Injury to the supplementary motor cortex along the medial surface of the frontal lobe can lead to speech initiation disturbances; this region may be important in activating the language processing system, at least in production tasks. Activation studies have shown increased rCBF and BOLD signal in the cingulate gyrus in association with many language tasks. This activation, however, appears to be nonspecific, as it occurs in many other, nonlinguistic, tasks as well. It has been suggested that it is due to increased arousal and deployment of attention associated with more complex tasks.
Subcortical structures may also be involved in language processing. Several studies report aphasic disturbances following strokes in deep gray matter nuclei (the caudate, putamen, and parts of the thalamus). It has been suggested that subcortical structures involved in laying down procedural memories for motor functions, in particular, the basal ganglia, are involved in “rule-based” processing in language, such as regular aspects of word formation, as opposed to the long-term maintenance of information in memory, as occurs with simple words and irregularly formed words. The thalamus may play a role in processing the meanings of words. In general, subcortical lesions cause language impairments when the overlying cortex is abnormal (often the abnormality can be seen only with metabolic scanning techniques), and the degree of language impairment is better correlated with measures of cortical than subcortical hypometabolism. It may be that subcortical structures serve to activate a cortically based language processing system but do not themselves process language. The cerebellum has also increased its rCBF in some activation studies involving both language and other cognitive functions. This may be a result of the role of this part of the brain in processes involved in timing and temporal ordering of events, or in its being directly involved in language and other cognitive functions.
Most language processing goes on in one hemisphere, called the “dominant” hemisphere. Which hemisphere is dominant shows considerable individual differences and bears a systematic relationship to handedness. In about 98% of right-handed individuals, the left hemisphere is dominant. The extent to which left hemisphere lesions cause language disorders is influenced by the degree to which an individual is right handed and by the number of non-right-handers in his or her family. About 60 to 65% of non-right-handed individuals are left hemisphere dominant; about 15 to 20% are right hemisphere dominant; and the remainder appear to use both hemispheres for language processing. The relationship of dominance for language to handedness suggests a common determination of both, probably in large part genetic.
The neural basis for lateralization was first suggested by Geschwind and Levitsky (1968), who discovered that part of the language zone (the planum temporale—a portion of the superior temporal lobe; Fig. 49.4) was larger in the left than in the right hemisphere. Subsequent studies have confirmed this finding and identified specific cytoarchitectonically defined regions in this posterior language area that show this asymmetry. Several other asymmetries that may be related to lateralization have also been identified. The exact relationship between size and function is not known, as there are instances of individuals whose dominant hemisphere is not the one with the larger planum temporale. In general, however, relative size is a good predictor of lateralization. This is another example of the “bigger is better” principle that we saw applied to song nuclei in birds.
Figure 49.4 Depiction of a horizontal slice through the brain showing asymmetry in the size of the planum temporale related to lateralization of language
Although not as important in language functioning as the dominant hemisphere, the nondominant hemisphere is involved in many language operations. Evidence from the effects of lesions and split brain studies, experiments using presentation of stimuli to one or the other hemisphere in normal subjects, and activation studies all indicates that the nondominant hemisphere understands many words, especially concrete nouns, and suggests that it is involved in other aspects of language processing as well. Some language operations may be carried out primarily in the right hemisphere. The best candidates for these operations are ones that pertain to processing the discourse level of language, interpreting nonliteral language such as metaphors, and appreciating the tone of a discourse—for example, the fact that it is humorous. Some scientists have developed models of the sorts of processing that the right hemisphere carries out. For instance, it has been suggested that the right hemisphere codes information in a more general way compared to the left, representing the overall structure of a stimulus as opposed to its details. This may be a general feature of processing in the two hemispheres and apply to other functions, such as visual perception.
Historically, theories of the relationship of parts of the perisylvian association cortex to components of the language processing system have included distributed (“holist”) and localizationist models. Contemporary studies have strongly supported the latter, and we will selectively review representative work.
Evidence for localization of language processing components comes from the finding that selective language deficits occur in patients with perisylvian lesions, often in complementary functional spheres. For instance, some patients have trouble producing the small grammatical function words of language (such as the, what, is, he), whereas others have trouble producing common nouns. The existence of these two disorders indicates that the tissue involved in producing function words is not involved in producing common nouns in the first set of patients, and vice versa in the second set.
The first localizationist theories were based on clinical observations in the mid- and late nineteenth century. The pioneers of aphasiology—Paul Broca, Karl Wernicke, John Hughlings Jackson, and other neurologists—described patients with lesions in the left inferior frontal lobe whose speech was hesitant and poorly articulated and other patients with lesions more posteriorly, in the superior temporal lobe, who had disturbances of comprehension and fluent speech with sound and word substitutions. These correlations led to the theory that language comprehension required the unimodal auditory association cortex (Wernicke’s area, Brodman area 22) adjacent to the primary auditory cortex (Heschl’s gyrus, Brodman areas 41) and that motor speech planning required the unimodal motor association cortex in Broca’s area (Brodman areas 44 and 45) adjacent to the primary motor cortex (Brodman area 4). Geschwind (1965) added the hypothesis that word meaning was localized in the inferior parietal lobe (Brodman areas 39 and 40) because word meanings consist of associations between sounds and properties of objects, and the inferior parietal lobe is an area of multimodal association cortex to which fibers from the unimodal association cortex related to audition, vision, and somasthesis project.
This model has many limitations and problems. It only deals with words, not with other levels of the language code. The functions that are localized are entire language tasks, such as comprehension of spoken language or speech production, not components of the language processing system. The correlations between deficits and lesions are found primarily in the chronic stage of recovery from stroke, not in acute or subacute phases of stroke or in other diseases, and even in chronic strokes a very large number of lesions do not occur where the model predicts they will.
Modern studies describe deficits in aphasic patients in terms of specific linguistic representations and psycholinguistic processes. This is similar to modern studies using functional neuroimaging. In the next sections, we review studies of localization of two language processes: spoken word recognition and comprehension, and syntactic operations in comprehension. We have selected these functions because they contrast in several ways; they vary in their abstractness, the extent of the temporal intervals over which they integrate sensory input, and the extent to which current studies converge on localization of the operations they involve. They thus reflect the range of results found in current studies of localization of components of the language processing system.
Many of the processes involved in recognizing linguistically relevant units of sound, activating words from these units, and activating the meanings of words appear to rely on specific areas of the brain. This is not evident in the aphasia literature, which shows that both individuals with lesions involving the inferior frontal gyrus (IFG) and those with lesions involving the superior temporal gyrus (STG) display deficits in perceiving acoustic-phonetic stimuli linguistic categories such as phonemes (Blumstein, Cooper, Zurif, & Caramazza, 1977; Blumstein, Tartter, Nigro, & Statlender, 1984). However, results from functional neuroimaging studies with normal participants have provided evidence that speech perception recruits a neural processing stream involving both left posterior and anterior brain structures (Binder & Price, 2001; Hickok & Poeppel, 2007; Scott & Wise, 2004) and that this neural stream comprises different processing stages that are located in specific areas.
The evidence suggests that temporo-spectral acoustic cues to feature identity are integrated in unimodal auditory association cortex lying along the superior temporal sulcus immediately adjacent to the primary auditory koniocortex (Binder, 2000). Some researchers have suggested that the unconscious, automatic activation of features and phonemes as a stage in word recognition under normal conditions occurs bilaterally. In their view, the dominant hemisphere supports controlled and conscious phonemic processing such as subvocal rehearsal, making explicit decisions in phoneme discrimination and identification and rhyme judgment tasks, and other similar functions.
Based on functional neuroimaging results, activation of the long-term representations of the sound patterns of words is thought to occur in the left superior temporal gyrus. Scott and her colleagues have argued that there is a pathway along this gyrus and the corresponding left superior temporal sulcus such that word recognition occurs in a region anterior and inferior to primary auditory cortex, and that word meanings are activated further along this pathway in anterior inferior temporal lobe bilaterally (Scott & Johnsrude, 2003; Scott & Wise, 2004). This pathway constitutes the auditory counterpart to the visual “what” pathway in the inferior occipital-temporal lobe.
Speech perception is connected to speech production, especially during language acquisition when imitation is crucial for the development of the child’s sound inventory and lexicon. On the basis of lesions in patients with Reproduction Conduction aphasia, the neural substrate for this connection has been thought to consist of the arcuate fibers of the inferior longitudinal fasciculus, which connect auditory association cortex (Wernicke’s area in the posterior part of the superior temporal gyrus) to motor association cortex (Broca’s area in the posterior part of the inferior frontal gyrus). Recent functional neuroimaging studies and neural models have partially confirmed these ideas, providing evidence that integrated perceptual-motor processing of speech sounds and words makes use of a “dorsal” pathway separate from that involved in word recognition.
Hickok and Poeppel (2007) developed the first comprehensive statement of the two-pathway model (Fig. 49.5). They proposed that the ventral stream consists of an acoustic-phonetic conversion process located in STG bilaterally and a “lexico-semantic interface” in the posterior part of the left middle temporal gyrus, which is connected to inferior frontal gyrus. The ventral pathway (to middle temporal gyrus) is activated automatically and unconsciously in the process of ordinary spoken word comprehension. The dorsal pathway transmits phonetic representations from superior temporal gyrus to a left dominant temporoparietal sensorimotor interface area (Area Spt) that interacts reciprocally with an articulatory network comprising the posterior inferior frontal gyrus, premotor area, and anterior insula. The dorsal pathway is activated in tasks that involve conscious and/or controlled meta-phonological processing and/or (possibly subvocal) rehearsal, such as phoneme discrimination, rhyme judgment, and tasks that involve verbal working memory.
Figure 49.5 (A) Depiction of the lateral surface of the brain showing areas involved in the functional neuroanatomy of phonemic processing. H is Heschl’s gyrus, the primary auditory cortex. STP is the superior temporal plane, divided into posterior and anterior areas. STG is the superior temporal gyrus. Traditional theories maintain that pSTP and STG are the loci of phonemic processing. Hickok and Poeppel (2000) argue that these areas in both hemispheres are involved in automatic phonemic processing in the process of word recognition. Other research suggests that more anterior structures, aSTP and the area around the superior temporal sulcus (STS), are involved in these processes. The inferior parietal lobe (AG, angular gyrus; SMG, supramarginal gyrus) and Broca’s area (areas 44 and 45) are involved in conscious controlled phonological processes such as rehearsal and storage in short-term memory (from Hickok and Poeppel (2007)). (B) Activation of left superior temporal sulcus by the contrast between discrimination of phonemes and discrimination of matched nonlinguistic sounds (from Hickok and Poeppel (2007)).
As described above, traditional neurological models maintained that the meanings of words consist of sets of neural correlates of the physical properties that are associated with a heard word, all converging in the inferior parietal lobe. It is now known that most lesions in the inferior parietal lobe do not affect word meaning and functional neuroimaging studies designed to require word meaning do not tend to activate this region. Evidence is accruing that the associations of words include “retroactivation” of neural patterns back to unimodal motor and sensory association cortex (Damasio, 1989), and that different types of words activate different cortical regions. Verbs are more likely to activate frontal cortex, and nouns temporal cortex, possibly because verbs refer to actions and nouns refer to static items. More fine-grained relations between words and activation of sensory-motor brain areas have been described (Pulvermuller, 2005). Hauk, Johnsrude, and Pulvermüller (2004) found activation in primary motor and premotor cortex at the homuncular level of the leg when subjects recognize the written word “kick,” the mouth when they recognize “lick,” and the hand when they recognize “pick” (Fig. 49.6).
Figure 49.6 Activation of areas of the motor strip and premotor area by actions and action words (from Hauk et al. (2004)).
Both deficits and functional activation studies have suggested that there are unique neural loci for the representation of object categories such as tools (frontal association cortex and middle temporal lobe), animals and foods (inferior temporal lobe and superior temporal sulcus), and faces (fusiform gyrus) (see Caramazza and Mahon (2006) for review), and some of these localizations are adjacent to areas of the brain that support perception of features that are related to these categories (animals activate an area of lateral temporal adjacent to the area sensitive to biological motion). Debate continues about how these category effects arise, and whether these activations should be taken as reflecting the meanings of these words. At the same time as these specializations receive support, evidence from patients with semantic dementia and from functional neuroimaging indicates that another critical part of the semantic network that represents word meanings and concepts is located in the anterior inferior temporal lobes. It has been suggested that this region is a “convergence” area, where different properties of items are bound.
We mentioned above that one feature of language is that it allows words to be related to one another to express relations among items and actions and their properties. This “propositional” information is crucial to many human functions, including conveying information stored in semantic memory, planning activities, and others. Languages have “syntactic rules” that determine how words are related to one another. The existence of such rules allows language to express any relation among words, including unlikely or impossible relations, and therefore allows language to express hypothetical events and situations, which is critical to reasoning. Thus, though syntax is not a feature of language that is as familiar to people as words, it is a critical aspect of language. The neural basis of syntactically-based comprehension has been studied with both deficit-lesion correlations and functional neuroimaging.
Deficits of the ability to use syntax to understand sentences are typically established by showing that patients can understand sentences with simple syntactic structures (e.g., The boy chased the girl) and sentences with complex syntactic structures in which the relationships between nouns and verbs can be inferred from a knowledge of real-world events (e.g., The apple the boy ate was red), but not sentences with complex structures in which the relationships between nouns and verbs depend on a syntactic structure that needs to be constructed (e.g., The boy the girl pushed was tall). Patients with this pattern of comprehension disturbances were first described by Caramazza and Zurif (1976). Among the affected patients were ones with “Broca’s aphasia,” who produced “agrammatic” speech (speech that lacks function words and morphological markers). Caramazza and Zurif noted that the deficits in comprehension and expressive language of these patients both involved problems with function words, morphology, and syntax, and suggested that these deficits were connected. Their view was that the underlying deficit in these patients affected syntactic “algorithms,” and suggested that these rules were supported by Broca’s area (the left posterior inferior frontal gyrus), in which lesions were associated with expressive agrammatism. Variants of this hypothesis have been developed by a number of researchers, in particular Grodzinsky (2000), who linked lesions in this area to the loss of quite specific syntactic elements (“traces,” in Chomsky’s model of syntactic structure).
However, the data that are available indicate that deficits in syntactic processing in sentence comprehension occur in all aphasic syndromes and after lesions throughout the perisylvian cortex (Caplan, Baker, & Dehaut, 1985; Caplan, Hildebrandt, & Makris, 1996; Caramazza, Capitani, Rey, & Berndt, 2001). Conversely, patients of all types and with all lesion locations have been described with normal syntactic comprehension (Caplan et al., 1985). Caplan et al. (2007) studied 42 patients with aphasia secondary to left hemisphere strokes and 25 control subjects for the ability to assign and interpret three syntactic structures in enactment, sentence-picture matching, and grammaticality judgment tasks. In regression analyses, lesion measures in both perisylvian and nonperisylvian regions of interest predicted performance after factors such as age, time since stroke, and total lesion volume had been entered into the equations. Patients who performed at similar levels behaviorally had lesions of very different sizes, and patients with equivalent lesion sizes varied greatly in their level of performance.
Functional neuroimaging studies have also led several researchers to suggest that syntactic processing is localized in Broca’s area or in portions of this region (Grodzinsky & Freiderici, 2006). However, most neuroimaging studies show that multiple cortical areas are activated in tasks that involve syntactic processing (Fig. 49.7) and, as with deficit-lesion correlation studies, different studies have shown different patterns of activation.
Figure 49.7 Activation of Broca’s area and superior temporal sulcus when subjects processed sentences with particular syntactic features (questioned noun phrases) compared to matched sentences without those features From Ben-Shachar, Palti, and Grodzinsky (2004).
The discrepancies across studies using both patient data and activation techniques need to be understood. One possibility is that the neural tissue that is responsible for the operations underlying sentence comprehension and syntactic processing is localized in several neural regions, and that the extent to which an area is involved differs in different individuals. This would be consistent with the pattern of deficit-lesion correlations reported by Caplan et al. (2007) but is hard to reconcile with the neuroimaging results because, if it were true, one would expect there to be a variety of activation patterns in the individuals tested in any one study, most likely resulting in no specific activation in most studies. Another possibility is that different studies have involved different aspects of syntactic processing. This could be true of both patient and activation studies.
Pursuing this possibility, we note that poor performance in an aphasic patient can reflect many different deficits. For instance, chance level performance on a sentence type may result from a failure to assign the structure and meaning of a sentence, as many researchers have assumed, but could also result from an intermittent failure to assign structure and meaning. Recent studies using measures of on-line processing (self-paced word-by-word listening, and eye tracking) have provided evidence for the latter: some patients whose performance is at chance have normal on-line processing measures for sentences they interpret correctly and abnormal measures for sentences that they respond to erroneously. Examination of on-line performance in sentences that produced errors has suggested that a variety of deficits can cause comprehension failure. Some patients show abnormal on-line measures at points of greatest complexity (Caplan et al., 2007), suggesting intermittent overload of resource availability, and others have shown abnormal returns to previously relevant portions of a sentence after the point of maximum processing load has passed, suggesting that the level of a persisting competing interpretation was abnormally high. We have also found that syntactic comprehension deficits most often affect certain structures in only one task: for instance, a patient may be unable to match a sentence to one of two pictures but able to demonstrate its meaning by manipulating objects, suggesting that some deficits arise in the interaction of comprehension with satisfying the demands of different tasks. Different patterns of deficit-lesion correlation in different studies are consistent with localization of the processes that are affected in these different patients in different brain regions.
The variety of activation patterns seen in functional neuroimaging may have a similar explanation. Different sentence contrasts in different studies highlight different syntactic operations and may activate correspondingly different brain areas. In addition, activation related to performing a task, such as encoding sentence meaning in short term memory to answer a question, may also contribute to the results of many studies (Caplan, 2010), though much remains to be learned about the effects of these processes.
Human language is a unique representational system that relates aspects of meaning to many types of forms, each with its own complex structure. Animal communication systems are neither as complex nor as powerful as human language. Nonetheless, they have some similarities to language and provide clues as to the neural basis of human language. Some language traits are part of a shared evolutionary ancestry with other species, others represent convergent, independent evolution of analogous processes, while yet others must be unique to our species. Perhaps the most important lessons learned from animal studies are that complex communication systems can be innate, that learning can shape an innately specified range of behavioral possibilities, and that communication systems can rely on specific nuclei within the brain and be lateralized. Other features of the neural basis for animal communication systems, such as seasonal variation in the size of birdsong nuclei and the mechanisms that underlie this phenomenon, are not obviously relevant to human language.
Deficit-lesion correlations and neuroimaging studies are beginning to provide data about the neural structures involved in human language. It appears that one general area of the brain—the left perisylvian association cortex—is especially important in representing and processing language (although other regions are involved as well), and that, within this area, many language operations are localized in specific regions. Charting the exact areas that support language operations will require attention to both the nature of the language units that are activated (e.g., different types of sounds, different syntactic structures) and the tasks that are being performed (e.g., unconscious automatic vs. controlled, conscious phoneme processing; encoding sentence meanings into memory vs. determining if a sentence is plausible). Many possibilities exist, such as that some language functions are localized in very specific regions and others are supported by more widely distributed areas of cortex or by multiple areas of cortex. The way the brain is organized to support different aspects of language still presents exciting challenges and opportunities for discovery for cognitive neuroscientists.
1. Ben-Shachar M, Palti D, Grodzinsky Y. The neural correlates of syntactic movement: Converging evidence from two fMRI experiments. Neuroimage. 2004;21:1320–1336.
2. Bickerton D. Language and species Chicago, IL: University of Chicago Press; 1990.
3. Binder J. The new neuroanatomy of speech perception. Brain. 2000;123:2371–2372.
4. Binder JR, Price C. Functional neuroimaging of language. In: R. Cabeza, A. Kingstone, eds. Handbook of functional neuroimaging of cognition. Cambridge, MA: MIT Press; 2001:187–251.
5. Blumstein SE, Cooper WE, Zurif EB, Caramazza A. The perception and production of voice-onset time in aphasia. Neuropsychologia. 1977;15:371–383.
6. Blumstein SE, Tartter VC, Nigro G, Statlender S. Acoustic cues for the perception of place of articulation in aphasia. Brain and Language. 1984;22:128–149.
7. Caplan D. Task effects on BOLD signal correlates of implicit syntactic processing. Language and Cognitive Processes. 2010;25:866–901.
8. Caplan D, Baker C, Dehaut F. Syntactic determinants of sentence comprehension in aphasia. Cognition. 1985;21:117–175.
9. Caplan D, Hildebrandt H, Makris N. Location of lesions in stroke patients with deficits in syntactic processing in sentence comprehension. Brain. 1996;119 993–949.
10. Caplan D, Waters G, Kennedy D, et al. A study of syntactic processing in aphasia II: Neurological aspects. Brain and Language. 2007;101:151–177.
11. Caramazza A, Mahon BZ. The organisation of conceptual knowledge in the brain: The future’s past and some future directions. Cognitive Neuropsychology. 2006;23:13–38.
12. Caramazza A, Zurif EB. Dissociation of algorithmic and heuristic processes in language comprehension: Evidence from aphasia. Brain and Language. 1976;3:572–582.
13. Caramazza A, Capitani E, Rey A, Berndt RS. Agrammatic Broca’s aphasia is not associated with a single pattern of comprehension performance. Brain and Language. 2001;76:158–184.
14. Cheney DL, Seyfarth RM. How monkeys see the world Chicago, IL: University of Chicago Press; 1990.
15. Chomsky N. Aspects of the theory of syntax Cambridge, MA: MIT Press; 1965.
16. Chomsky N. The minimalist program New York, NY: Praeger; 1995.
17. Damasio A. Time-locked multiregional retroactivation: A systems-level proposal for the neural substrates of recall and recognition. Cognition. 1989;33:25–62.
18. Everett D. Cultural constraints on grammar and cognition in Pirahã: Another look at the design features of human language. Current Anthropology. 2005;76:621–646.
19. Gentner TQ, Fenn KM, Margoliash D, Nusbaum HC. Recursive syntactic pattern learning by songbirds. Nature. 2006;440:1204–1207.
20. Geschwind N. Disconnection syndromes in animals and man. Brain. 1965;88:237–294 585–644.
21. Geschwind N, Levitsky W. Human brain: Left-right asymmetries in temporal speech region. Science. 1968;161:186–187.
22. Gould JL. Ethology New York, NY: Norton; 1982.
23. Gould JL, Gould CG. Sexual selection 2nd Ed. New York, NY: Freeman; 1996.
24. Gould JL, Gould CG. The animal mind 2nd Ed. New York, NY: Freeman; 1999.
25. Gould JL, Gould CG. Animal architects New York, NY: Basic Books; 2007.
26. Gould JL, Marler P. The instinct to learn. Scientific American. 1987;256(1):74–85.
27. Grodzinsky Y. The neurology of syntax: Language use without Broca’s area. Behavioral and Brain Sciences. 2000;23:47–117.
28. Grodzinsky Y, Friederici A. Neuroimaging of syntax and syntactic processing. Current Opinion in Neurobiology. 2006;16:240–246.
29. Hauk O, Johnsrude I, Pulvermüller F. Somatotopic representation of action words in human motor and premotor cortex. Neuron. 2004;41:301–307.
30. Hauser M, Chomsky N, Fitch WT. The faculty of language. Science. 2002;298:1569–1579.
31. Hickok G, Poeppel D. Towards a functional neuroanatomy of speech perception. Trends in Cognitive Sciences. 2000;4:131–138.
32. Hickok G, Poeppel D. The cortical organization of speech processing. Nature Reviews. 2007;8:393–402.
33. Immelmann K. Sexual and other long-term aspects of imprinting in birds and other species. Advances in the Study of Behavior. 1972;4:147–174.
34. Lieberman P. Towards an evolutionary biology of language Cambridge, MA: Belknap Press of Harvard University Press; 2006.
35. Pulvermuller F. Brain mechanisms linking language and action. Nature Reviews. 2005;6:576–582.
36. Rizzolatti G, Arbib MA. Language within our grasp. Trends in Neurosciences. 1998;21(5):188–194.
37. Scott SK, Johnsrude IS. The neuroanatomical and functional organization of speech perception. Trends in Neuroscience. 2003;26:100–107.
38. Scott SK, Wise RJS. The functional neuroanatomy of prelexical processing of speech. Cognition. 2004;92:13–45.
39. Seidenberg M. Language acquisition and use: Learning and applying probabilistic constraints. Science. 1997;275:1599–1603.
40. Tomasello M. Constructing a language: A usage based theory of language acquisition Cambridge, MA: Harvard University Press; 2003.
1. Gould JL, Gould CG. The honey bee 2nd Ed. New York, NY: Freeman; 1995.
2. Grodzinsky Y, Amunts K, eds. Broca’s region. Oxford, UK: Oxford University Press; 2006.
3. Hauser M. The evolution of communication Cambridge, MA: MIT Press; 1996.
4. Posner M, Raichle ME. Images of mind New York, NY: W.H. Freeman; 1997.
5. Pulvermuller H. The neuroscience of language Cambridge, UK: Cambridge University Press; 2002.