1

The Rubicon

Where, then, is the difference between brute and man? What is it that man can do, and of which we find no signs, no rudiments, in the whole brute world? I answer without hesitation: the one great barrier between the brute and man is Language. Man speaks, and no brute has ever uttered a word. Language is our Rubicon, and no brute will dare cross it.

So declared Friedrich Max Müller (1823–1900), professor of philology at the University of Oxford, in a lecture on the science of language delivered in 1861. Müller was protesting against Charles Darwin’s famous treatise On the Origin of Species, which had been published just two years earlier.1

The essence of Darwin’s theory of evolution is natural selection, the process by which biological traits become more or less common in a population. This in turn depends on natural variation between organisms, so that variants with higher rates of reproduction become more populous. The nature of this “selection” is such that it has no purpose or direction. Because the variation is small, evolution works slowly and in small increments. Darwin wrote without knowing anything about genes or DNA, but we now know that genes are subject to mutations, creating the variations upon which natural selection operates.

To Müller, then, the difference between language and animals’ communication was simply too profound to have come about through incremental tweaking—too wide a Rubicon for evolution, with its mincing little steps, to cross. And language is widely considered the commodity that most clearly defines us as human. Barring exceptional circumstances, we all acquire it. That in itself is not extraordinary, because we also learn to walk, just as birds learn to fly. Language, though, seems different, in that it is complicated and allows a freedom of expression far beyond that available even to our closest nonhuman relatives, chimpanzees and bonobos. Even linguists don’t yet fully understand the rules by which we generate sentences or tell coherent stories. In contrast, the “brutes” that Müller disparages communicate in very limited and stereotyped ways, at least if we consider vocal communication. I shall argue later, though, that the seeds for a more flexible form of communication lie in the hands rather than the voice.

The most dominant languages in the modern world are English and Chinese, which are vastly different from one another. Chinese has the largest number of native speakers, but English takes the lead if you include those who speak it as a second language. Chinese is complicated by the fact that there are several versions; these are generally regarded as dialects of a common language but may in fact be as diverse as the Romance languages. Nevertheless the great majority of Chinese people, some 960 million, speak Mandarin Chinese as their native language, and that alone probably puts Chinese in the ascendancy—ahead of Spanish with about 400 million. Ironically, English and Chinese are among the most difficult languages for nonnative speakers to learn. Chinese is a tonal language, and getting the tone wrong can lead to misunderstanding; you may think you’re saying , meaning “chicken,” but a false note yields , meaning “whore.” English has consonant clusters that are awkward for non-English speakers, as in street or exempts, and boasts some twenty different vowel sounds, as in par, pear, peer, pipe, poor, power, purr, pull, poop, puke, pin, pan, pain, pen, pawn, pun, point, posh, pose, and parade. Spanish, in contrast has only five vowel sounds.2

In spite of the oppressive dominance of English and Chinese, at least six thousand different languages are spoken around the globe, each more or less unintelligible to the rest. An extreme example is the Pacific archipelago of Vanuatu, with an area of only about 4,379 square miles, which is host to over one hundred different languages.3 Sometimes we have difficulty understanding even those who supposedly speak the same language; George Bernard Shaw once remarked that “England and America are two countries separated by the same language.” He might also have had Scotland in mind, because the English dialogue in the 1996 movie Trainspotting, set in Scotland, required subtitles when shown in the United States. Language is deeply cultural, and serves to exclude outsiders as much as to bind insiders together. As the title of Robert Lane Greene’s recent book puts it, You Are What You Speak.

But we shouldn’t be complacent, because it has been estimated that over twenty-four hundred of the world’s languages are in danger of disappearing.4 Around a quarter of living languages have fewer than one thousand speakers, and many languages spoken by local communities are being replaced by dominant regional, national, and international languages. Mark Turin refers to the loss of languages as “linguicide.”5

Sign languages too are diverse, in spite of the fact that signs generally originate as mimed representations of objects or actions. In the course of time, these representations become stylized—or conventionalized, to use the technical term—and so lose much if not all of their pictorial or action-based character. Sign languages are typically invented anew by different deaf communities, and different sign languages are just about as mutually unintelligible as are different spoken languages.

In spite of the extraordinary differences between the languages of the world, though, it seems safe to assume that any person can learn any language, provided they start early in life. This suggests that language is as much biological as cultural—the capacity to learn it is biological, but the form it takes depends on culture. There remains a question as to whether this biological capacity for language is specific to language itself or comes about because we humans are smart and inventive in general ways. Nevertheless, as far as we know we are the only species with that capacity. Our closest nonhuman relatives are chimpanzees and bonobos, with whom we share a common ancestry dating back six or seven million years. In geological time this is really just an eye-blink away from the present, and it has also been estimated that we share some 99 percent of our genes with these oddly humanlike animals.6 Attempts to teach them language, though, have failed rather miserably. To be sure, a few have been trained to make simple requests using a form of sign language rather than speech, but there are few if any glimmerings of gossip, reminiscence, observations about the world, storytelling, or explanations of how things work. Parrots can learn to utter words and even give answers to simple questions, but they too do not use language in the flexible way that we humans do. They can be agreeable and friendly companions, but they are not really candidates for a conversation, and they cannot tell us what it’s like to be a parrot. Language-wise, we humans seem to be alone in the world—and possibly in the universe.7

Language is not only uniquely human—it is also universally so. In every part of the world, people speak (or sign) to one another, although there are of course a few interesting exceptions. Children isolated from human contact do not learn to speak properly (some such cases are the stuff of legend more than of fact). Reports of so-called wild children brought up by animals, including wolves and bears, have long featured in folklore and have formed the basis of such fictional characters as Rudyard Kipling’s Mowgli, J. M. Barrie’s Peter Pan, or Edgar Rice Burroughs’s Tarzan. Whether there are truly instances of human children raised by animals is doubtful.

The celebrated case of Amala and Kamala, two girls reportedly discovered by missionaries in a forest in India and said to have been raised by wolves, turned out to be a ruse to attract funds for the orphanage in which they were eventually placed. The best-documented case of a child deprived of a normal social environment is Genie, a Californian girl who was isolated by her family from infancy until the age of thirteen. When she was then discovered, she attracted great interest from psychologists and linguists, and strenuous efforts were made to teach her to speak. She did develop some ability to communicate by vocalizing and gesturing, and even by drawing, but she never acquired normal grammatical speech.8 The best she could manage was a kind of telegraphese, a sort of “me Tarzan you Jane” level of speaking. Such examples have led to the idea of a “critical period” for the learning of language; once you pass puberty, it seems, the game is all but over.

What this suggests is that acquiring a first language can take place only when the brain is itself developing. Of course people do learn second languages as adults, but it can be a hard slog, and it seems impossible to get rid of a foreign accent. This is in marked contrast to the effortless way in which young children learn languages. Learning a second language as an adult, moreover, is not the same as learning a first language, because you can use the first language as the scaffold on which to build the second. And because the brain is at its most plastic and impressionable while growing, the secret of language may well lie partly in the prolonged period of growth that our large brains undergo. Most of this growth occurs after birth, so that the developing brain is exposed to the world outside of the womb and can be shaped by the sights and sounds that the world inflicts on us. Compared to monkeys and apes, we humans are born prematurely and spend a longer time to reach maturity. It has been said that in terms of the general pattern followed by other primates, human babies should be born at eighteen months of gestation, not nine. But birth is difficult enough as it is without having to wait another nine months; even I, as a hapless male, can appreciate that.

Early birth was probably driven by the fact that our species, unlike the other apes, elected to stand and walk on two legs rather than four—to reverse the slogan of the rampant pigs in George Orwell’s Animal Farm, “two legs good, four legs bad!” This in turn restricted the size of the birth canal, so our kids need to be born before they grow too large. Even so, birth is difficult, as any mother can attest, but the tradeoff is that human babies are exposed to the postwomb environment while their brains are still immature and ready to be shaped by the social and physical environments into which they are born. Our persistent two-legged stance is in many ways an impediment, giving rise to back and neck problems, hemorrhoids, hernias, and of course the excessive pain of giving birth. Bipedalism, one might say, is a pain in the ass. But one far-reaching advantage is that it extends the period of growth outside the womb, allowing the brain to grow and adapt while exposed to the sights and sounds of the world.

We are bathed in language from very early in life. Even at one day old, babies can tell their mother’s voice from that of a stranger,9 suggesting that tuning in to the mother’s voice takes place in the womb. Not for nothing do we speak of a mother tongue. We should not forget, too, that language is not wholly a matter of voice, because we gesture and point in the course of normal discourse—and of course sign language is entirely a matter of gesture. Ultrasound recordings show that fetuses in the first trimester move their arms, and most move the right arm more than the left.10 In the second trimester they suck their thumbs, and again it’s more often the right thumb than the left.11 These asymmetries may well set the stage for the fact that most of us are right-handed and have language controlled by the left side of the brain.

But it’s only when they emerge into the light and bustle of day that babies can begin to associate sounds or gestures with the diversity of what they can see, touch, and hear. The babbling of babies in the first year begins to take on some of the characteristics of the language they are exposed to, and between ages one and two pointing plays the major role.12 The very helplessness of human infants also adds to the impact of language, because it brings them into closer contact with caregivers. There’s nothing like the sight of a newborn baby to bring out infantile behavior in otherwise serious and responsible adults, as their language deteriorates into baby talk with simplified words, cooing sounds, clucks, and goos. This is known as motherese—although in a politically correct world it is now more often called parentese. Even dads can cluck and goo.

So it is that we mold our babies’ babbles into words. We know too that manual and facial gestures play a role in helping infants learn spoken as well as signed language. In the early years, at least, pointing is essential for learning the names of things, even if the names themselves consist of signs rather than spoken words. Young babies often point in order to share attention, as if to say “Look at that!” whereas chimpanzees point mainly to make requests, as if to say “Gimme that!” Shared attention through pointing is one of the first indications of an inborn disposition for language.13

Although it depends on early experience, language has a robustness that defies at least some forms of disability or disadvantage. As I have already mentioned, communities of deaf people, denied normal speech, spontaneously develop signed languages, carried out silently with movements of the hands and face. Indeed I shall argue later in this book that language evolved from manual signs rather than from animal calls. Language is normally lodged in the left side of the brain, but if the left side is damaged early in childhood, or even removed, the right side can take over with little impediment. Our very brains seem to burst with the desire for expression. It takes extreme circumstances, such as those suffered by Genie, to prevent language from developing normally.

Regardless of the language or languages we speak or sign, we follow rules for how to string words or hand movements together to form meaningful content. The way we do this is complex, and linguists have still not fully explained the rules that govern it, whether they are specific to individual languages or apply generally across languages. It is a singular fact that speakers of any given language know the rules at some intuitive level, so they can generally tell whether a given utterance is grammatical or not, but they cannot tell you exactly what those rules are.

The rules need not conform to textbook definition or what high-school teachers tried to instill in reluctant students. Slang and street talk also follow rules. People seldom diverge from the language of their group, and they even switch depending on whom they’re talking to. Teenagers speak to other teenagers differently from how they talk to their parents or teachers. Whatever they are, though, the rules operate in open-ended fashion, such that there is in principle no limit to the number of things we can say or sign. Noam Chomsky referred to language as possessing the property of “discrete infinity.” That is, we have a finite number of discrete sounds or signs, but these can be combined in a potentially infinite number of ways to create new meanings. We can produce sentences we have never previously uttered and understand sentences we have never heard before—provided of course they are made up of words put together in ways that we are familiar with.14

My favorite example occurred when I called in to a publishing house in the south of England a few years ago. I was greeted at the door by the publisher himself, who said, “We are having a bit of a crisis here. Ribena is trickling down the chandeliers.” The words ribena, trickling, and chandeliers were familiar to me, but I had never before heard them in that particular combination; still, I understood the publisher’s predicament. Ribena is a drink made from black currants and is high in Vitamin C content; for some time it was delivered free to English schoolchildren. The publisher and his concerned staff had initially thought that a red substance trickling from their chandeliers was blood, suggesting that some foul deed had taken place upstairs. It transpired that there was a nursery school upstairs, and one of the little girls had thought it fun to tip her ribena onto the floor instead of into her mouth. As they do.

A more famous example was coined by the philosopher Alfred North Whitehead, in conversation with Burrhus Frederick (B. F.) Skinner, the well-known behavioral psychologist. Skinner was extolling the power of behaviorism to explain what people do and even what they say, so there is no need to appeal to mental processes. Listening to this, Whitehead was moved to utter the sentence “No black scorpion is falling upon this table” and ask Skinner to explain why he said that. This conversation took place in 1934, and it was not until 1957, in an appendix to his book Verbal Behavior, that Skinner attempted an answer. For a behaviorist dismissive of psychoanalysis, Skinner gave a curiously Freudian interpretation. He proposed that Whitehead was unconsciously likening behaviorism to a black scorpion and declaring that it would have no part in his understanding of the human mind.

Ironically, though, 1957 was also the year in which Noam Chomsky published his book Syntactic Structures, which presented a view of language totally opposed to a behaviorist account. Two years later, Chomsky made explicit his objection to behaviorism in a scathing review of Verbal Behavior.15 Where Skinner regarded language as vocal behavior emitted by speakers and reinforced by the language community, Chomsky proposed that language must depend on innate rules to govern the formation of sentences. Reinforcement of sequences simply could not explain the sheer novelty and diversity—the “discrete infinity”—of natural language.

Our ability to generate sentences of seemingly endless variety arises from combinations rather than from the simple accumulation of elements. There are 311,875,200 different poker hands of five cards that be dealt from the full deck of 52. This illustrates how vast, if not infinite, numbers of combinations can arise from relatively small vocabularies. This example is a bit misleading, though, because not all combinations of words are meaningful, but our deck of words is much larger than 52—a college-educated person may have a vocabulary of some 50,000 words.16 To be slightly more realistic, suppose that we distinguish between words corresponding to objects and words corresponding to actions, so we can compose utterances like “man walks” or “elephant dances.” Let’s suppose that our deck contains 40 object words and 12 action words. We can then compose 40×12, or 480, utterances—still a considerable advance over the 52 elements, although this too produces some utterances that scarcely match reality, such as “tree laughs” or “butter pontificates.” But then we can add other elements, such as the victim of an action, as in “man bites dog,” or add another object as part of a transaction, as in “girl gives dog bone.” We can then start adding words to describe qualities and other paraphernalia to indicate place, time, and so on, and the possible combinations multiply by orders of magnitude, as in “Yesterday, that lovely young girl generously gave my dyspeptic dog a disgusting old bone.”

In his short story “The Library of Babel” the Argentinian writer Jorge Luis Borges took the idea of discrete infinity to a limit—although technically not actually reaching infinity. The narrator in the story inhabits a universe consisting of a huge beehivelike expanse of hexagonal rooms, each with four walls of bookshelves. The books themselves contain all possible combinations of the 25 characters in Spanish—22 letters, a period, a comma, and a dash. Although the vast majority of the books are gibberish, they must also include all the books ever written, and all that will be written, including this one. The library also includes every possible book, so that the only books that are excluded are the impossible ones—the narrator notes, for instance, that “no book can be a ladder, although no doubt there are books which discuss and demonstrate and negate this possibility and others whose structure corresponds to that of a ladder.”17

The number of books in that library is at least 251,312,000. This amounts to about 1.956 × 101,834,097, which is, oh, a lot, when you consider that current estimates of the number of atoms in the universe is only 1080 or so. Even so, the number is still not infinite, but about as close as one can imagine.

Recursion

But there’s more. We can add to the stretch toward infinity by adding recursive principles so that combinations can be embedded in combinations. “The House That Jack Built” provides a favorite, if perhaps overworked, example:

This is the farmer sowing his corn

that kept the cock that crowed in the morn

that woke the priest all shaven and shorn

that married the man all tattered and torn

that kissed the maiden all forlorn

that milked the cow with the crumpled horn

that worried the dog

that chased the cat

that killed the rat

that ate the malt

that lay in the house

that Jack built.

In principle this could go on indefinitely. You could insert the chap who was nobly born who accosted after the third word. Or you might add John said that Mary thought that Fred claimed that at the beginning. There are of course psychological limits to how much embedding you can take, because you might simply forget how the sentence began by the time you reach the end, or fail to remember the full cast.

In “The House That Jack Built” clauses are added to the right. This is called right-embedding. Much more psychologically taxing is so-called center-embedding, where clauses are inserted in the middle of clauses. We can cope with a single embedded clauses, as in

The malt that the rat ate lay in the house that Jack built.

But it becomes progressively more difficult as we add further embedded clauses:

The malt [that the rat (that the cat killed) ate] lay in the house that Jack built.

Or worse:

The malt [that the rat (that the cat {that the dog chased} killed) ate] lay in the house that Jack built

I added brackets in the last two examples that may help you see the embeddings, but even so they’re increasingly difficult to unpack. Center-embedding is difficult because words to be linked are separated by the embedded clauses; in the last example above, it was the malt that lay in the house, but the words malt and lay are separated by twelve words. In holding the word malt in mind in order to hear what happened to it, one must also deal with the separations between rat and ate and between cat and killed. The mind boggles—but “The Library of Babel” still copes.

In spoken English, single embedded clauses are common enough, but double embeddings are rare and triple embeddings (as in the last example) virtually nonexistent. Center-embeddings are more common in written language than in spoken language, perhaps because when language is written you can keep it in front of you indefinitely while you try to figure out the meaning. The Finnish linguist Fred Karlsson examined a huge corpus of writings in several European languages and found only thirteen instances of triple center-embeddings.18

Although the embeddings of phrases within phrases adds to the complexity of language and increases the number of sentences that can be derived from a finite set of elements, its importance may well be exaggerated. Some languages do not seem to incorporate recursive embedding at all. Two such examples are the languages of the Pirahã, a remote Amazonian community, and the Iatmul of New Guinea.19 Linguists commonly argue that the constraints that prevent us from using multiple embeddings, or from generating more than a rather limited corpus of utterances, are psychological rather than linguistic. The linguistic rules that underlie our language faculty can create utterances that are potentially, if not actually, unbounded in potential length and variety. These rules are as pure and beautiful as mathematics. But perhaps they are not real. The rules themselves may be more limited than linguists like to think.

Universal Grammar?

Humans seem uniquely and universally disposed to acquire language, whether spoken or signed. Moreover, children readily learn whatever language they are exposed to. A child from the highlands of New Guinea, say, if raised in Boston, will readily acquire Bostonian English. Such observations have led Noam Chomsky, the prominent linguist of the past half-century, to conclude that language depends on a biologically determined instinct, which he calls “universal grammar.”20

Given that there are some six thousand languages in the world, the challenge is to specify what they have in common or what rules of universal grammar might apply across all languages. Chomsky himself seems to have determinedly ignored the vast variations in human languages, at one stage remarking: “I have not hesitated to propose a general principle of linguistic structure on the basis of observation of a single language. The inference is legitimate, on the assumption that humans are not specifically adapted to learn one rather than another human language.”21 This insistence that a single language can reveal all we need to know about the general nature of language may be the Achilles’ heel of any notion of universal grammar. Any universality may lie not so much in language itself as in the common ways we parse the world, as I shall suggest later in this book. For instance, all languages probably have ways to refer to objects and actions, but that could be because our worlds are largely composed of these elements, and if we didn’t have innate ways to refer to them we’d no doubt soon invent them. Of course our parsing of the world into objects and actions in turn depends on our biological makeup; mosquitoes or fish may well see their worlds quite differently.

The way in which we map the events in the world onto words also varies between cultures. In English we generally think of words as belonging in different categories, such as nouns, verbs, adjectives. But some categories are missing in some languages. One example is articles. In English, for example, the definite article the and the indefinite article a are used to distinguish between entities that are specified from those that are not: the elephant refers to a particular elephant under discussion, but an elephant refers to some unspecified elephant—perhaps one that just happened to wander onto a basketball court. But there are no articles in Russian or Chinese. The use of tense to mark verbs according to when some activity or situation occurred also occurs in some languages but not in others. In languages like Latin and Italian, this is done by altering the verb itself. Thus in Latin video means “I see,” videbam means “I was seeing,” videbo means “I will see,” and so on—and on. English seems simpler in that we have only four basic forms of regular verbs (argue, argues, argued, arguing), but we change tense or mood largely by adding auxiliaries in often complex ways (“they might have been arguing”). But Chinese has no tenses at all and uses other devices to indicate when something happened or will happen.22

Across different languages it is not even clear which words belong in which category. Nicholas Evans gives several examples. The concept of paternal aunt, a noun phrase in English, is expressed by a verb in the Australian aboriginal language Ilgar, and the expression of love is simply a suffix in the South American language Tiriyo. Words generally hold single meanings, but some words are hosts to several meanings. For example, in the language of the Seneca people, an indigenous group in North America, words describing events also carry the idea of the participant or participants in the event. Thus the Seneca verb wa’e:yeh includes not only the idea of an event of “waking up” but also the idea of the person who woke up, in this case a female. The English equivalent would be she woke up, where the participant is captured in the separate word she. In Seneca it is conveyed by the e sound in the second syllable of wa’e:yeh, but this sound by itself does not convey the notion of the person who woke up in the way that the word she does in English. Unlike English, Seneca is holophrastic in that words can stand for events in a holistic manner, where the component parts of the event are inseparable.

In some languages quite complex concepts may be packed into a single word. Turkish is one extreme example of a so-called agglutinative language, where there are said to be over two million forms of each verb. The different forms represent not only the tense but also the subject, object, and indirect objects of the verb—and a lot else besides. Nouns also take multiple forms. The word arabamizdakilerinle means “with those, in our car, that are being possessed by you.” In some languages, entire sentences are packed into a single word. Nicholas Evans and Stephen Levinson give the examples of Ȩskakhǭna’tàyȩthwahs from the Cayuga of North America, which means “I will plant potatoes for them again,” and abanyawoihwarrgahmarneganjginjeng from the Northern Australian language Bininj Gun-wok, and means “I cooked the wrong meat for them again.”23

Comparing languages across differing cultures suggests an inverse relation between the complexity of grammar and the complexity of culture; the simpler the culture in material terms, the more complex the grammar. Mark Turin notes that colonial-era anthropologists set out to show that indigenous peoples were at a lower stage of evolutionary development than the imperial Western peoples, but linguistic evidence showed the languages of supposedly primitive peoples to have surprisingly complex grammar. He writes: “Linguists were returning from the field with accounts of extremely complex verbal agreement systems, huge numbers of numeral classifiers, scores of different pronouns and nouns, and incredible lexical variation for terms that were simple in English. Such languages appeared to be untranslatable.”24

Turin writes that Thangmi, an unwritten Tibeto-Burman language spoken by around thirty thousand people in north-central Nepal and in part of West Bengal in India, often has several words to express meanings conveyed by a single word in English or French. For example, Thangmi has four words, with completely different stems, to express the notion “to come” in the contexts of coming up a hill (wangsa), down a slope (yusa), from the same level or around an obstacle (kyelsa), or from an unknown or unspecified direction (rasa). In English, in contrast, we can use a single verb to express the idea of coming and then qualify it with additional words; thus the different meanings are created by combining words rather than by packing meaning into individual words. This move toward combinatorial structure may be heightened in complex cultures where there are simply more objects and concepts to name, so the pressure to coin new words may have been offset by restricting the different forms that individual words can take. The industrial and computer revolutions, to name two major cultural forces, created vast numbers of new concepts for which we need words. Even sentences appear to have become simpler in the world of Twitter.

Another influence may relate to the forms of language itself. Sign languages tend to be agglutinative, probably taking advantage of the visual system’s capacity to process different aspects of the world simultaneously.25 The auditory system, in contrast, is better adapted to processing rapid sequences of information. The persistence of agglutinative languages may conceivably reflect the origins of language in gesture and pantomime, with the agglutinative structure tending to break down as speech, with its relentless sequential structure, grew more dominant. I explore this possibility in more detail in chapter 7.

Based on the remarkable differences between languages, not only in the structure of words but also in the ways they are combined into grammatical sequences, Nicholas Evans and Stephen Levinson conclude that “the emperor of Universal Grammar has no clothes.”26 Another expert on language and its evolution, Michael Tomasello, once remarked similarly that “universal grammar is dead.”27

Language Templates

One problem with theories of grammar has to do with the idea of discrete infinity itself. As suggested earlier, although we may be able in principle to generate sentences of unlimited length and variety, our actual ability is really rather limited. Even the novelist Henry James, famous for the long sentences that perfused much of his writing, had his limits. According to the Guinness Book of World Records, the longest sentence in literature comes from William Faulkner’s novel Absalom Absalom! and contains 1,288 words. Literature gives a rather inflated estimate of sentence length in any case, because one can hold the sentence on the open page while one tries to figure it out. Even in literature, though, it seems that sentences become shorter over time, at least in English. Edmund Spenser, writing at the end of the sixteenth century, averaged 50 words a sentence, whereas Thomas Macaulay in the middle of the nineteenth century averaged just 23. The trend may have continued.28 Brian Hayes happened upon the syllabus for an English course at Northeastern University that begins “Choose a big Victorian novel not to read.”29

In normal conversation we are much more restricted. And although recursion is one of the major mechanisms underlying discrete infinity, we have seen that our ability to embed phrases within phrases is actually very limited, and some languages do not seem to exhibit this facility at all. The paucity of everyday language relative to the potential for discrete infinity can be attributed to our limited memory, restricted powers of attention, or inability to plan more than a few steps ahead. Some have suggested, though, that the rules of language are themselves more limited than the somewhat idealized rules of grammar imply.

Much of our language actually seems to rely simply on memorized stock phrases that we slip into our conversations, rather than on the application of rules. Alison Wray gives some examples:

Fancy seeing you here.

Watch where you’re going.

You’ll never guess what happened.

How dare you!30

Much of conversation proceeds through clichés of this sort.

The English comedian and raconteur Stephen Fry, in conversation with Hugh Laurie, gives more examples:

I love you.

Don’t go in there.

You have no right to say that.

Shut up.

I’m hungry.

That hurt.

Why should I?

It’s not my fault.

Help!

Marjorie is dead.31

We might go further and suppose that we use memorized stock phrases to create new ones, which is simpler and more immediate than generating sentences from basic rules. James Hurford gives the example of the idiom having a chip on one’s shoulder.32 One can then substitute new words within this frame, as in having milk in one’s coffee, having butter in one’s fridge, having money in the bank—and there are even other idioms based on the same structure, like having a bee in one’s bonnet. Much of the generativity of language may then derive from adapting stock expressions by altering the key words, or changing person or tense, as in she had a chip on her shoulder. The most extreme example of this approach is that of the Dutch computational linguist Rens Bod, who suggests that speakers store all of the sentences they have heard and then adapt them to new situations.33 This would surely stretch memory capacity too far and create enormous redundancy, but it does reinforce the point that much of our ability to produce sentences depends on stored information, rather than the use of the rather idealized rules proposed in much grammatical theory. It also effectively denies universal grammar.

Bod’s theory may perhaps impose impossible demands on memory, but much of our ability to produce sentences is indeed probably based on our knowledge of phrases and sentences we have heard, rather than on the application of innate rules. This has led to what has been termed the “data oriented parsing model” (DOP) of sentence production, which holds that people learn the structure of sentences, not from innately given linguistic rules, but from their linguistic experience. They can then select an appropriate structure from overlaps with sentences previously experienced. It’s perhaps not so much that people commit actual sentences to memory as that they learn implicitly how sentences work—how they sound or look, and how they are structured. Bod has also adapted this approach to account for how children build up the experience to construct sentences.34

Perhaps because of the logjam of theories of grammar, the 1990s in fact saw something of a general swing toward explanations of grammar in terms of what we learn, rather than what we’re born with. Morten Christiansen and Nick Chater have developed theories of language learning based on connectionism—the idea that the operations of the mind can be simulated with vast numbers of connections between elements that can be modified through experience. This model even simulates recursion, supposedly a key ingredient of universal grammar. As suggested above, our capacity for embedding is limited, and Christiansen and Chater’s model more realistically places recursion within human limits rather than supposing it to depend on a process that is theoretically unlimited. They write as follows: “This work suggests a novel explanation of people’s limited recursive performance, without assuming the existence of a mentally represented competence grammar allowing unbounded recursion.”35

These developments therefore seem to be edging away from Chomskian orthodoxy. It is clear that we house not only a vast store of words but also a compendium of phrases and that we can use them to generate new utterances with little or no reference to underlying rules. This is the essence of what has come to be known as construction grammar,36 which is also part of an approach known as cognitive linguistics,37 whereby language is understood in terms of the general ways in which people think rather than in terms specific to language itself. Language then becomes a product of learning rather than of innate rules, although the human capacity to learn, make generalizations, and invent variations pushes language to a level of complexity well beyond the communication systems developed by any other species.

The idea that language competence is learned rather than innate takes us back to 1957 and the vastly different views of language on display in Skinner’s Verbal Behavior and Chomsky’s Syntactic Structures. In the ensuing decades, theories of grammar mushroomed, to the point that in 1982 James D. McCawley published a book titled Thirty Million Theories of Grammar. He was joking, of course, but the point was well made. Many began to question whether there would ever be a coherent theory of grammar for any language, let alone a universal grammar.38 One might say that the Learning Empire has struck back, as learning theories have grown much more sophisticated, in part through the use of computer models to simulate systems with some of the complexity of human learning. No longer is it enough to base theories of learning complex things like language on rats pressing bars or pigeons pecking keys—the foundations of Skinnerian behaviorism.

This is not to say that genetics plays no role. We have created a vast linguistic environment where language permeates almost everything we do. This creates its own selection pressure, perhaps even driving the increase in brain size that so distinguishes us from the (other) great apes, whose brains are only about a third the size of ours. In technologically advanced societies, reading is critical, if not to survival then at least to effective living. Literacy emerged long after language itself and is still not universal, yet variations in reading ability seem to be genetically linked.39 In short, we have ourselves created environments that we must adapt to, and the linguistic environment is one example. Natural selection becomes a spiral—it creates new forms of nature itself and added selection pressure.

Language is not the only product of the human ability to create complex structures. Our capacity to make machines is vastly beyond that of any other species. It is true that spiders make intricate webs, and beavers construct dams, and chimpanzees and crows fashion simple tools, but these pale in comparison with the runaway complexity of our cities, transportation devices, computers, electronic communication—and even the children’s toys that I trip over after my granddaughter has invaded. Mechanical construction also builds on the past, both as individual experience and as accumulated knowledge across generations. And of course language helps too, because it enables us to share expertise and pass it on through the generations. As Isaac Newton said, “If I have seen further it is by standing on the shoulders of giants.”40

But however we came to possess it, language does seem to have a miraculous quality, separating us from other animals and filling our lives with seemingly endless possibilities. As theorists continue to grapple with its nature, language may seem to have an almost unreachable quality, as though beyond the grasp of science or intellect. As we shall see in the next chapter, this is probably why language has long been regarded as a miracle, perhaps bequeathed by God.

The Rubicon that is language is indeed a daunting river to try to cross—even a dangerous one, as I may well discover. But that is the challenge I have set myself in this book—to explain how language might have come about through the incremental processes of Darwinian evolution, and not as some sudden gift that placed us beyond the reach of biological principles.