Trippingly on the Tongue

There is one gene that is worth scrutinizing in much more depth. It’s a gene that has much to say about our history and speaks volumes about evolution, and how we talk about evolution, and that is because it is a gene essential for speech. The story begins in Great Ormond Street Hospital in London in the 1990s. A family, known simply as KE, were being treated for a particular type of rare verbal apraxia, meaning that many members of the family had significant difficulty in turning sounds into syllables, syllables into words, and words into sentences. Fifteen people across three generations had these symptoms, most obviously the children, who would say things like “bu” instead of “blue,” and “boon” instead of “spoon,” among other verbal flubs. Further investigation showed that affected members of the family also had troubles that were not just related to articulation of words, but with more basic but specific movements of the face and mouth. When the same condition is seen in multiple generations in one family, we draw a pedigree and label the members who bear it. We can therefore assume that the random shuffling of genomes that happens when sperm and egg are made has not diluted the disease-causing DNA out of the lineage, but has been retained in those individuals. The inheritance pattern in the KE family pointed toward a single genetic defect being the cause. Though things are hugely more complicated now, at that time in the history of clinical genetics, most of the diseases that had been identified were indeed rooted in a single gene—conditions such as cystic fibrosis, Huntingdon’s disease or hemophilia. In those ancient days of genetics, researchers would use a pedigree like this to hunt down a gene, and in 1998 Simon Fisher and his team found the sole cause of this family’s speech and language problems. It was a gene that was named FOXP2, and since then has become an icon in genetics and evolution.

The gene FOXP2 encodes a transcription factor.* These are proteins whose only function is to clamp onto very specific bits of DNA (such as the enhancer HACNS1 previously described). That way, one gene can control the activity of a second, a third, and so on, and a cascade of complex activity is triggered that helps to specify the different cells and tissue in a developing embryo. All genes are important, but some are more important than others, and transcription factors fall into that latter category. During your time as an embryo in utero, you grew from one single cell into trillions, carefully arranged into different types of cell, in different tissues doing very specific things. Transcription factors have a major role in the growth of an embryo. They function as controllers, busying away like foremen, setting up major building works, such as determining which end of an amorphous blob of cells is going to be the head and which the tail. Once that is in place, other transcription factors can set up ever more precise plans that specify “a brain goes up at this end,” “in the brain area, eyes go here,” “in the eye area, the retina goes here,” “in the retina, the photoreceptors go here” and “among photoreceptors, these ones are going to be rods.” The details get more and more specific as the embryo develops, and the tissues differentiate into their mature fates. FOXP2 is one of those which operates in the middle of those grand schemes of a developing embryo, and primarily has the effect of instructing the growth of more cells. When we look at where it is active in an embryo, it’s in discrete areas all over the brain, clearly directing all sorts of neuronal growth, including in motor circuitry, the basal ganglia, thalamus and cerebellum.

Of the weapons in the arsenal of geneticists, seeing where the gene is active is just one. We can also extract the protein and see what it interacts with, a sort of molecular fishing trip. When we fish with FOXP2, it is fairly promiscuous, but some of its targets again offer alluring clues, such as a short stretch of DNA known as CNTNAP2, which is itself associated with speech disorders.

With all this in place, we have a gene that, when defective, causes a litany of speech and language disorders, and is active in various bits of tissue that are closely associated with speech. Other animals communicate orally, but in terms of sophistication, our language trumps even the closest by miles by every measure.* Given that we are the only organism that speaks with complex syntax and grammar, a genetic basis for our language skills is of great use in trying to demarcate ourselves as different from the other animals.

FOXP2 was not created de novo in us. In fact, it is an extremely ancient gene, as transcription factors often are. Similar versions are found in mammals, reptiles, fish and birds, many of which vocalize in some form. We know that in songbirds, their version of FOXP2 is active in the brain when they are learning new songs from other males to woo females.

In chimpanzees, their FOXP2 is only two amino acids different out of the 700 that make up the protein, but the consequences are clearly significant—we speak and they do not. In Neanderthals, it is the same as in us, but other sections of their DNA may regulate differences in what the gene is doing. In mice, with whom we last shared a common ancestor about nine million years before the dinosaurs were wiped out, they have a version of FOXP2 that is only four amino acids different. When we look at where the mouse Foxp2 is active, it’s in entirely equivalent places in the brain during development. When one copy of the gene is experimentally removed in mice, they display some abnormalities, one of which is a reduction in the number of ultrasonic peeps that pups usually make (if both copies are knocked out, the baby mice die after twenty-one days).

The fact that it is clearly essential for human speech and grammar, that it is different in us from the mice and chimp versions, and that it has undergone positive selection in Homo sapiens shows the elemental importance of FOXP2. It shows that this one particular gene is terribly important, but not all-important.

We can dissect the body at a number of different scales, and genetics is ultra-micro-anatomy. If we zoom out, the next useful resolution might be actual anatomy. After all, genes code the proteins that direct the cells that are assembled into our bodies. Anatomy changes over time: embryology is the study of how a single fertilized egg grows into an embryo, and developmental genetics is the study of the genes that moderate that growth. We think often only about adult vocal tracts, but it hardly needs stating that children are born immature, and this is relevant to understanding the development of speech. Tongues are large versatile muscles that aren’t just the bit in your mouth laden with taste buds. They’re rooted all the way down the larynx, and heavily innervated to control the movement and sensation that we need. In a newborn the tongue is almost entirely held within the mouth, which is so that the airflow of the larynx is connected to the nose, and the baby can breathe while nursing. As children grow up, the tongue descends into the larynx, and this enables the formation of full vowel sounds, such as “i” and “u.”

There’s a very important horseshoe-shaped bone in our throats called the hyoid. It sits under the chin, the horns pointing backward, and moves up and down when we swallow. It’s intricately carved to accommodate twelve different muscle attachments, which gives us an idea of what a sophisticated piece of bone it is. Birds, mammals and reptiles all have versions of hyoids, but ours are much more intricate than all others, which is a reflection of the complex anatomical architecture required to create the vast range of sounds that comes so naturally to us, in combination with fine motor control of the muscles of the larynx and face. We think Neanderthals also had similarly elaborate hyoids, at least based on one specimen found in the Kebara Cave in Israel. Their overall anatomy was different from ours, not by much but enough for us to speculate that their hyoid would have been doing slightly different things to ours. But none of this is enough to think that Neanderthals couldn’t speak; they had similar genetics, neuroscience and anatomy. That, for now, is the best we can do.

FOXP2 is significant in human evolution, but also in the evolution of science. It was one of the first genes to be characterized as causing a specific neurological defect when broken and has been singled out for that reason as having a significant impact on our nature, legitimately more than many other genes. It has been the subject of some breathless commentary as “the language gene,” and indeed as the trigger that fired the gun of our modernity. We will come to the role of speech in our behavior in a few pages, but it’s key to understand that complexities of genetics in relation to anatomy and behaviors are both inscrutably complex and poorly understood. We can see that FOXP2 is essential, but it is active in a whole bunch of cells in the brain, and therefore has influence over other biological functions. The KE family’s troubles were not restricted to speech. They also struggled with lexical tasks, where the subject has to distinguish between real words and nonsense words that obey general rules of English, such as “glev” or “slint.” That is a psycholinguistic effect. Again, this indicates the complex interplay between our motor and cognitive skills.

A most intricate hyoid

The great linguist Noam Chomsky has suggested a romantic notion that there was a switch, a spark that lit the fire of language in us, where the best any other creature could manage was grunts and gestures. His timescale is plausible, thousands of generations, but it implies a focused linearity founded in a singular trigger.

Evolution does not work like that. Modern genetics has shown that humans have been much more mobile than previously thought, and have interbred continuously in and out of Africa, facts that don’t lend support to a linear view of our deep history. Furthermore, speech is not one single thing. The physical capability of speech, with its anatomy and neural control of that anatomy, is not distinct from the neural control of speech. We are a system, made up of small interconnected cogs and parts. We have to consider how brains develop, and what genes are doing in that process. Neural tissue is highly specialized, and comprises hundreds of different cell types, each with their own genetically determined identity. Cells become neural tissue, and once on that pathway, grow, migrate and adorn themselves with synapses and dendrites that connect with adjacent cells or ones that are millimeters or centimeters apart (which is a long way if you’re a neuron). After you were born, your brain went through a process of synaptic pruning for many years until your teens, when the connections between neurons were cut back or reinforced in order to streamline thinking and learning. All of that is controlled by genes and their interaction with our environment. The point is that one gene involved in this crazily complex building project is likely to have multiple effects on different tissues, and dozens if not hundreds of genes are going to play a role.

Speech is the audible output that rests upon dozens of highly complex, interconnected biological phenomena. FOXP2 is necessary but not sufficient. A highly structured hyoid is necessary but not sufficient. A neurological framework with the ability to fine-tune motor control of the muscular fibers in the larynx, tongue, jaw and mouth, as well as forming a psychological basis capable of perception, abstraction and description is absolutely necessary, but not sufficient. And of course, when we speak, we disturb particles of air, which vibrate the drums of our ears and trigger the similarly complex process of hearing. Without ears or air, there is no speech. Genes are templates, brains are frameworks, the environment is a canvas. We separate out each of these parts only in order to understand the bigger picture, but let us not pretend that they all popped into being at once.

A far better way of understanding the acquisition of speech, and indeed the acquisition of any emergent characteristic in humans, is the selection and genetic-drift model, and via a changing interaction between culture and our genes, a mutation in FOXP2 set in a framework from which language could develop. We don’t know if the Neanderthals had that same framework; we can reasonably imagine that they did, given their similarities in material culture, morphology, and a version of FOXP2 that is the same in us and different from chimps. I suspect that they were speakers, but it will take a very clever experiment to help clarify that question, one of which I cannot quite conceive, at least not yet.