24 – 2 = 23

Species are defined by their morphology and not by their DNA. That taxonomy exists for historical reasons: we were classifying organisms using the current system since Linnaeus devised his binomial nomenclature in the eighteenth century – genus followed by species, Homo and sapiens, Pan and troglodytes. Every human being has a unique genome, but they are similar enough that we can be sure that we are one species. Crucially, all living humans typically have the same number of chromosomes.1 Each chromosome is a long thread made of DNA, and parts of each thread are genes, around 20,000 of them for us, spread over those twenty-three pairs of chromosomes. Gorillas, chimpanzees, bonobos and orangutans have twenty-four.

Chromosomes are all different sizes, and our number 2 is one of the biggest, representing about 8 per cent of our DNA, and harbouring around 1,200 genes. It’s that big because at some point, maybe six or seven million years ago, one member of the common ancestors of all the great apes gave birth to a child with a gross chromosomal abnormality. During the formation of the egg and sperm that would fuse to begin this life, instead of replicating all the chromosomes perfectly, somehow two of them crunched together and stuck. By lining up all the great apes’ chromosomes, we can see very clearly that the genes on our chromosome 2 are spread over two different chromosomes in chimps, orangutans, bonobos and gorillas.

Most mutations of this magnitude are utterly lethal, or cause terrible diseases, but this ape got lucky, and was born with a fully functional genome that was significantly different from his or her parents. From that point on, the genealogical lineage of twenty-three pairs of chromosomes would trace a line all the way to you.

We now have the full genomes of other types of humans, the Neanderthals and the Denisovans, but annoyingly, chromosome count is not preserved in the fragmented DNA that we can get from their bones. We reasonably suppose that they also had twenty-three pairs due to their relatedness to us, but we cannot be absolutely sure, until we get much better quality samples out of the sparse DNA-laden bones. We know we bred with them, and a different number of chromosomes is often a very sturdy barrier to reproductive success, though not always: living equids – that is, species of horse, ass and zebra – show clear evidence of having interbred despite having chromosomes varying between sixteen and thirty-one pairs. No one has figured out how though, yet.

We haven’t been able to extract DNA from most specimens from the ancient human family tree, and may never be able to, as so much of our ancestors’ remains are from Africa, where heat renders preservation of DNA fairly untenable. It is likely that all apes after the split from what would become chimps, bonobos, gorillas and orangutans, have twenty-three pairs of chromosomes.

Genes are translated into proteins, and proteins perform actions in bodies. This includes everything from forming hair or the fibres in muscle cells, to manufacturing the components of cells that are fatty or bony, or acting as the enzymes and catalysts that process food or energy or waste. Subtle variations in genes result in changes in the shape or efficiency of proteins, and that means that some people have blue eyes and some have brown,2 or that some people can process milk after weaning, but most can’t, or that some people’s urine smells after they’ve eaten asparagus and other people’s doesn’t (and some people can smell it and others can’t). Genetic variation becomes physical variation. We call the specific sequence of DNA the genotype, and the physical characteristic it encodes the phenotype.

DNA changes randomly, and these mutations are subject to selection if the phenotype is beneficial to the survival of the organism, or impairs it. Over time, bad mutations are generally weeded out, because they reduce the overall fitness of the creature that bears them, and good ones spread. Sometimes it’s a bit of both: having one defective version of the beta-globin gene acts as protection from malaria; having two copies means you get sickle cell disease. Many simply drift – the genetic mutations encode change that is neither good nor bad.

Though we have almost the same set of genes as the other great apes, many of those genes are slightly different, and a few of them are new to the human genome. Those differences are us. There are lots of ways that, over generational time, genes and genomes can change and create new information. They can subsequently be selected, in a direction that may ultimately become a unique combination for a distinct species. I won’t go through all of them, as all happen in all creatures. But some mechanisms by which mutation occurs are pertinent to the formation of our uniquely human genome and are worth looking at more closely.

DUPLICATION

Imagine you were composing a symphony, and you’d written it down by hand onto sheet music, of which you have only one copy. If you wanted to experiment with the theme, you’d be crazy to write over the only copy you have, and risk messing it up with something that doesn’t work. You’d photocopy it, and use that one to play around, while making sure the original was preserved intact as a back-up. That’s not a bad way to think about genome duplications. A working gene is constrained by being useful, and is not free to mutate at random, as most mutations are likely to be deleterious. But if you duplicate a whole section of DNA containing that gene, the copy is free to change and maybe acquire a new role, without the host losing the function of the original. That’s how a primate ancestor of ours went from two-colour vision to three – a gene on the X chromosome encodes a protein that sits in the retina and reacts to a specific wavelength of light, and thus enables detection of a specific colour. By thirty million years ago, this had duplicated, and mutated sufficiently that blue had been added to our vision. This process has to happen during meiosis, where sperm and eggs are formed, if the new function is to be potentially permanent, as the new mutation will be inherited in every cell of the offspring, including the cells that will become the sperm or eggs.

Primates seem prone to genome duplication, and the great apes particularly. Something like 5 per cent of our genome has come about from duplications of chunks of DNA, and about a third of that is unique to us. Duplicated regions of the genome have always been troublesome to analyse, simply because they are copies and look much the same as each other. But with patience and persistence, geneticists are beginning to work out how to sieve them out, and with that comes new insights into why we have so many photocopies, and if there are genes within that give us powers beyond those of our ape cousins.

So far, a handful of genes have been identified that are intriguing duplication candidates that appear to be unique to us. They’ve all got extraordinarily dull names. In June 2018, a subtly different version of a human gene called NOTCH2NL was unearthed from a mass of very similar ones, but crucially, this new one is not present in chimpanzees. It looks like an earlier version of NOTCH2NL was duplicated poorly in a common ancestor of all the great apes, but around 3 million years ago, the dud version was spontaneously corrected in our lineage, whereas it remains mangled in chimps. We don’t know what the uniquely human version of this gene does precisely, but it appears to bolster the growth of a type of brain cell called radial glia, which span the cortex and have the job of making more neurons, and thus fuelling brain growth. As ever, we learn a lot about what genes do by studying what effect they have when broken by mutations, and one of the diseases associated with mutated NOTCH2NL is microcephaly – a reduction in brain size.

We have four copies of a gene called SRGAP2, where other apes have one. We can see that these duplications occurred at specific times: the first was around 3.4 million years ago; this version was then copied twice more, once 2.4 million years ago, and again a million years ago. The next thing you do is look for the tissues in which this gene is active, and this is where it gets really interesting. The first and third duplications don’t appear to do much, and might be just sitting there slowly rusting in our genomes. But the second duplication resulted in a gene that does its business in our brains. It seems to have the specific effect of increasing the density and length of the branching extensions called dendrites in neurons in the cortex. This type of neural patterning is unique to humans: mice brains don’t have it, but when we insert the human version into mouse neurons, they grow into fattened, dense dendrites. This version of the gene, SRGAP2C, emerged 2.4 million years ago, at a time when the brains of our ancestors significantly increased in size. It was also around this time that we began to flake and knap stones into the Oldowan tool set.

The connections seem obvious, but I am speculating. Though perhaps not wildly. These three things – the timing of the birth of this new gene, what it appears to do in the brain, and what behaviour was emerging at that time – are temptingly related. For now, that is the best we can say. This is not the one gene that made our brains the way they are, but it might be one of a few, even if we don’t know quite what they do. They become clues to isolating key differences between ours and the brains of others, and more genetic hints will emerge in time. None of them are singular triggers though, just part of the picture of how evolution crafted us.

BRAND NEW GENES

Duplication and transfer from other genetic sources are examples of nature’s ability to co-opt existing tools: evolution the tinkerer. Evolution also creates from scratch. We call these de novo mutations, and they arise when a seemingly nonsensical run of DNA mutates and changes into a readable sentence.

The way the code works is that there are four letters in DNA, and in a gene they are laid out in three-letter chunks – each of which codes for an amino acid – which are strung together in a particular order to make a protein. Using language as an analogy, we have letters (of which there are twenty-six), words (which can be any length), and sentences (which also can be any length). In genetics, there are only four letters, and all the words are three letters long. The gene is the sentence, and like language, these can be any length. When a gene is created from scratch, it still has to have evolved. Unlike duplications and insertions which have evolved somewhere else, de novo genes aren’t installed into our genomes already in working order. In a book, every word should have a purpose; genomes are full of DNA that isn’t words or sentences, just random bits of filler. So, imagine there was a section of letters like this:

THEIGDOGATETHEFOXANDWASILL

If you strain, you can probably see that there is a simple sentence in there struggling to get out. If we insert a B after the third letter, it becomes:

THEBIGDOGATETHEFOXANDWASILL

Which if you add spaces, three letters per word, becomes:

THE BIG DOG ATE THE FOX AND WAS ILL

It only makes sense with all the letters in the right order. In genetics, this is called an ‘open reading frame’. There are no spaces in genes, but cells still understand the three-letter structure. De novo genes arise when a clump of letters is converted into a meaningful sentence by chance, and thus suddenly becomes understandable by the mechanics of the cell, and translated into a protein. The protein that results is utilised in some way. If it is used, then the organism that has acquired this new gene will pass it on.

In 2011, sixty genes that are new to humans were identified, and this number may still go up. We mostly don’t yet know what they do, but they all tend to be short, which makes sense, given the way they arise – the longer a sequence is, the more chance that the open reading frame will collapse. The fact that these are unique to humans does not make them defining genetic characteristics of humans. They might not do much at all; genes that have mutated to be unique to us but are inherited or duplicated from ancestors are overwhelmingly more common in our genomes.

INVASION

One other thing to note is that genetically, we are not entirely human – around 8 per cent of our genome has not been inherited from an ancestor at all. Instead, it’s been forcibly implanted into our DNA by other entities trying to enact their own replication. Think of a virus as a kind of hijacker, who breaks into a factory and replaces the normal plans with their own, so that the factory starts producing according to the hijacker’s wishes rather than the factory owner’s. When a virus storms the barricades of our cellular factories, it brings with it its own DNA (or RNA)3 and can insert it into the host genome, whereon the host cell simply does the virus’s bidding and makes new viruses. More often than not, this insertion is bad news. Much of the symptoms of having a cold, or many other viruses, is our immune system reacting to an alien invasion or the cell’s self-destruction at the behest of the virus. Sometimes the insertion can be in the middle of a gene that puts the brakes on how often a cell divides and, in doing so, can cause unregulated division – a tumour. Sometimes though, they just sit there doing not much. The DNA of the virus is inserted, and it’s not a big deal. It has happened countless times in our evolution, which makes up that 8 per cent. In total, for comparison, that is much more DNA than comprises our actual genes, and more than several chromosomes, including the Y. By this measure, humans are significantly more virus than they are male.

What this alien DNA is doing in us varies, but one example shines above all others, and it is in the formation of the placenta. There are cells throughout the body in specialised tissues with the beautiful name syncytium. They have multiple nuclei, formed when cells fuse with each other, which happens in the development of some muscle tissue, bone and heart cells. Syncytium in the placenta make up a highly specialised and essential tissue with the even more beautiful name syncytiotrophoblast. These are the spindly fingers from the growing placenta that invade the wall of the uterus and provide the interface between the mother and embryo, where liquids, waste and nutrients are exchanged. It’s also a tissue that suppresses the immune system of the mother, to stop her body automatically rejecting the growing child as an alien presence. These cells are at the junction of human reproduction, where one life is giving rise to the next. The genes that drive those placental cells to form are not human at all. Primates acquired them from a virus around forty-five million years ago; in the virus, the genes also encourage fusion of the host cell with the virus itself, and help suppress the immune response to this infection. But they were co-opted and integrated into our own genomes, and are now essential for successful pregnancy. Of course, mammals have had placentas for much longer than forty-five million years, and this is a truly weird and wonderful story in evolution. In mice, who also have an essential syncytiotrophoblast, they have a very similar set of genes involved that have also been acquired from a virus, but a completely different one to us. It is an astonishing example of convergent evolution at a molecular level. Acquisition of a viral genetic programme has driven the development of mammals several times over, in almost identical ways.

1 There are a handful of viable chromosomal abnormalities where people have extra chromosomes, or in some cases too few. The most well known is Down’s syndrome – an extra chromosome 21, three instead of the requisite two – but there are also conditions such as Klinefelter’s (a man with an extra X, making him XXY) and Turner’s (a woman with an unpaired X).

2 The genetics of eye colour are taught in schools as one of the benchmarks of understanding genetics. In fact, eyes are a useful benchmark of how poorly we understand inheritance. Though the brown version of one gene is dominant over the blue version, there are many other genes that have a role in determining iris pigmentation, meaning that there is a spectrum of eye colours from palest blue to almost black, and it is effectively impossible to accurately predict what colour eyes a child will have based on the eye colours of the parents. Furthermore, it is possible for any colour combination in the parents to produce any colour in the child. Genetics is complex and probabilistic, even in the traits that we think we understand well.

3 RNA is the cousin to DNA. It is a very similar nucleic acid (the –NA bit), but unlike DNA, which is normally two strands linked into its iconic double helix structure, RNA remains in a single strand. In the process of a gene becoming a protein, the process typically is that DNA is transcribed into a RNA molecule, which is then translated into a series of amino acids that form the protein itself. Some viruses store their genetic material as DNA, but others, such as HIV, carry only RNA, which is converted to DNA once it has infected a host cell, and this gets inserted into the host genome using a viral protein called an integrase.