It’s quite possible that the most wonderful and compelling aspect of biology is its glorious inconsistency. Biological systems have evolved in magnificently creative ways, usurping and repurposing processes for completely new uses wherever possible. It means that almost every time we think a theme is emerging, we find exceptions. And sometimes it can be very difficult to unravel which is the norm and which the exception.
Let’s take junk DNA and non-protein-coding RNAs. Based on pretty much everything we have seen so far, it would be perfectly reasonable to develop a hypothesis along the following lines:
When junk DNA encodes a non-protein-coding RNA (junk RNA), the function of the RNA is to act as a kind of scaffold, directing the activity of proteins to particular regions of the genome.
This hypothesis would certainly be consistent with the roles of long non-coding RNAs. They act as the Velcro between epigenetic proteins and DNA or histones. The proteins frequently operate in a complex, and at least one member of the complex is often an enzyme, i.e. a protein that can bring about a chemical reaction. This can be the reaction that adds or removes epigenetic modifications on DNA or histone proteins, or that adds another base to a growing messenger RNA molecule.
In all these situations, the protein is the verb in the molecular sentence. It’s the ‘doing’ or action molecule.
Attractive as this model sounds, it has one unfortunate flaw. There is a situation where the roles are entirely reversed. In this reversed situation, the proteins are relatively silent, but the junk RNA acts as an enzyme, causing a chemical change to another molecule.
This sounds so peculiar that it is tempting to assume that it’s a one-off quirky exception. But if that’s the case, it’s a really quite remarkable exception because the junk RNA molecules that have this function account for about 80 per cent of the RNA molecules present in a human cell at any given time.1 We’ve actually known about these peculiar RNA molecules for decades, making it yet more surprising that we have maintained such a protein-centric vision of our genomic landscape.
The RNA molecules with this odd function are called ribosomal RNA molecules, or rRNA for short. Logically enough, they are mainly found at structures in the cell called ribosomes. These structures are not in the nucleus but in the cytoplasm, which we first encountered in Chapter 2 and which was shown in Figure 2.3 (see page 16). The ribosomes are the structures where the information in the messenger RNA molecules is converted into strings of amino acids joined together, creating protein molecules. Using our analogy of the knitting pattern from Chapters 1 and 2, the ribosomes are all the ladies knitting away and turning the information on the printed page into warm socks and gloves for the overseas soldiers.2
If analysed by weight, the rRNA makes up about 60 per cent of the structure of a ribosome, and proteins make up the other 40 per cent. The rRNA molecules cluster into two major sub-structures. One contains three types of rRNA and around 50 different proteins. The other sub-structure contains just one type of rRNA and around 30 proteins. The ribosome is sometimes referred to as a macromolecular complex as it is a really big, structured conglomeration of many different components. We can think of it as a large protein-synthesis robot.
When messenger RNA molecules are produced for protein-coding genes, these messenger RNAs are transported out of the nucleus and over to the region of the cell where the ribosome robots are located. Messenger RNA molecules are fed through the ribosome and the genetic instructions carried on the messenger RNA are ‘read’ by the ribosomes. This results in amino acids being connected together in the correct sequence. It’s the ribosomal RNA that carries out the reaction by which an amino acid is joined to its adjacent neighbour. This creates the long, stable protein molecule.
As the messenger RNA is fed through the ribosome, another ribosome may bind at the start of the same message. It too will create protein chains. This is why one messenger RNA molecule can be used as the template for multiple copies of the same protein. The process is shown in Figure 11.1.
The amino acids are brought to the ribosomes by another type of junk RNA called transfer RNA, or tRNA. These are quite small non-coding RNAs, only about 75 to 95 bases in length.3 But they are able to fold back on themselves creating an intricate three-dimensional structure usually referred to as a clover leaf. A specific amino acid is attached to one end of the tRNA. At the far end, on a loop, is a sequence of three bases. This base triplet can bind to the correct matching sequence on a messenger RNA molecule. It does this by using essentially the same rules as the base pairing in DNA.
Figure 11.1 A messenger RNA molecule moves through a ribosome, travelling from left to right. The ribosome builds the protein chain. As the beginning of the messenger RNA emerges from the processing ribosome, it can engage another ribosome. As a consequence, there may be multiple ribosomes on a single messenger RNA molecule, all building full-length proteins.
The tRNA molecules act as the adapters between the nucleic acid sequence carried on the messenger RNA (and originally the DNA), and the final protein. This ensures that the amino acids are lined up in the right order to create the proper protein. This is shown in Figure 11.2. When two amino acids are held next to each other at the ribosome, the rRNA can carry out a chemical reaction that attaches the end of one amino acid to the beginning of the next and thereby builds the protein chain.
Some of the triplets on the messenger RNA don’t have a match to any triplet on a tRNA. These triplets are known as stop signals. When the ribosome reads one of these, it can’t fit a tRNA in place and the ribosome falls off the messenger RNA and the protein stops growing. These are the roofing LEGO bricks we met in Chapter 7 (see page 85). The ribosome then finds another messenger RNA molecule to translate into protein, or could even go back to the start of the first one.
Figure 11.2 As the messenger RNA moves through the ribosomes, transfer RNA molecules bring the appropriate amino acids to the correct position on the chain, using base pairing. The ribosomal RNA machinery joins up adjacent amino acids to create the protein chain.
Even though the entire procedure relies on a giant complex of four types of ribosomal RNA and around 80 associated proteins, and is a very sophisticated task, the process of adding new amino acids into a growing protein chain is remarkably fast. It’s difficult to measure this accurately in human cells, but in bacteria each ribosome can add amino acids at the rate of about 200 a second. It’s probably not as fast as this in human cells, but it will still be about ten times faster than we could possibly stick two bricks together if we were making a LEGO tower. And don’t forget that the ribosome isn’t sticking together random LEGO bricks. It’s as if we had to choose just two out of 20 different types of LEGO bricks (there are 20 different amino acids) and stick them on top of each other in exactly the right order every fraction of a second. Quite a task.
Our cells need to produce millions of protein molecules every second, and so we need our ribosomes to work very efficiently. We also need a lot of ribosomes to meet the demand, up to 10 million robots in a single cell.4 In order to create enough ribosomes, our cells have accumulated lots of copies of rRNA genes. Instead of being dependent on creating rRNA from the classical situation of one gene inherited from each parent, we inherit about 400 rRNA genes, distributed across five different chromosomes.5
One consequence of this vast number of rRNA genes is that we aren’t very prone to disorders caused by mutations in these genes. That’s because if one copy is mutated we have lots of redundancy. So the chances are that we can make up for the defect from all the normal versions encoding the same rRNA molecule. This isn’t true of mutations in the genes coding for the proteins that are also present in the ribosomes. We don’t know what many of these do in detail, and some don’t seem to be important at all in ribosome function. But there are others where mutations do result in human disorders.
The two best-known examples are called Diamond-Blackfan Anaemia and Treacher-Collins Syndrome. They are caused by inherited mutations in different protein-coding genes. The consequence in both cases is a decrease in the number of ribosomes. But there are clearly subtleties that we don’t understand in how this affects cell function, because if the only important factor was this reduced number, we would expect the clinical consequences to be identical. But they aren’t. The major symptom in Diamond-Blackfan Anaemia is a defect in the production of red blood cells. The major symptoms in Treacher-Collins Syndrome are malformations of the head and face, leading to problems with breathing, swallowing and hearing.6
Because we need a lot of ribosomes and hence a lot of rRNA genes, it’s not unreasonable that we also need a lot of tRNA genes to ensure that there are plenty of tRNA molecules to transport the amino acids to the ribosomes. The human genome contains nearly 500 tRNA genes, distributed across almost every chromosome.7 This brings the same benefits as those described above for multiple copies of rRNA genes.
There’s also an odd and intriguing possible overlap between rRNA and imprinting. As described in Chapter 10, there are a small number of patients with Prader-Willi syndrome where the disorder has been localised to a junk DNA region that encodes a batch of non-coding RNAs (see page 140). These are called snoRNAs, for small nucleolar RNAs.* These non-coding RNAs migrate to a region of the nucleus called the nucleolus, which is very important in ribosome biology. The nucleolus is the place where the mature ribosomes are assembled, as shown in Figure 11.3.
In the nucleolus, the rRNAs and the proteins are modified and then assembled into mature intact ribosomes which are transported back into the cytoplasm, ready to carry out their functions as protein-creating robots. The snoRNAs are required to make sure that certain modifications take place properly on the rRNA molecules. Just as DNA and histone proteins can be modified by the addition of a methyl group, rRNA molecules can also be methylated. The snoRNAs probably facilitate this by finding regions on the rRNA with which they can form pairs. Once again this is possible because of bonding between complementary bases on the two nucleic acid molecules. Once they bind, the snoRNAs attract enzymes that can add methyl groups to the rRNAs. This may be similar to the ways in which long non-coding RNAs attract enzymes that modify histones.* It’s not altogether clear why these modifications matter to the rRNA, but one suggestion is that they help to stabilise interactions between the rRNAs and the proteins in ribosomes.

Figure 11.3 Messenger RNA molecules for ribosomal proteins are created in the nucleus and then shipped out to existing ribosomes in the cytoplasm. The new ribosomal proteins are transported back into a specific region in the nucleus. Here they join up with ribosomal RNA molecules to create new ribosomes, which are moved out into the cytoplasm to act.
Although it’s tempting to speculate that the symptoms of Prader-Willi syndrome are caused by problems in the snoRNAs’ control of rRNA modifications, this remains just a theory at the moment. The problem is that we now recognise that the snoRNAs can also target lots of other types of RNA molecules, so we can’t be sure exactly which process is going wrong in the children with this disorder.
Ribosomes are extremely ancient structures, and can be detected in really primitive organisms. They are even found in bacteria, the tiny single-celled organisms which don’t have a nucleus in their cells to separate their DNA from their cytoplasm. Evolutionary biologists often use the DNA sequences of the genes that encode rRNAs to track how species have diverged over time.
Bacteria and higher organisms diverged about 2 billion years ago8 so although we can still recognise the rRNA genes in our unicellular (very) distant cousins they are really different from ours. This has turned out to be A Good Thing. Some of the most common and successful antibiotics work by inhibiting the bacterial ribosomes.9 These include tetracycline and erythromycin. These antibiotics disrupt the activity of the bacterial ribosomes, but not human ones. In the West we are so accustomed to antibiotics that we sometimes forget how important they have been, saving literally tens of millions of lives, at a conservative estimate, since they really hit the medical scene in the 1940s. It’s odd to think that many of these lives have been saved because of variation between species in what purists would consider junk DNA.
We depend on our invaders
It’s even odder to think that each one of us has been colonised by organisms that probably developed around the same time our ancestors were diverging from the forebears of modern bacteria. ‘Colonised’ is really an understatement. Our entire survival and that of every other multicellular organism on this planet from grass to zebras and from whales to worms relies on this colonisation. It’s even true of the yeast we depend on for bread and beer.
Billions of years ago the cells of our earliest ancestors were invaded by tiny organisms. At this stage there probably weren’t any organisms more than four cells in size and the four cells would have been pretty non-specialist. Instead of warring against each other, these cells and their tiny invaders reached a compromise. Each benefitted from the compromise and so a beautiful friendship, lasting billions of years, was born.
These tiny organisms evolved into critical components of our cells called mitochondria. The mitochondria reside in the cytoplasm and are little power generators. They are the sub-cellular organelles that produce the energy we need to power all of our standard functions. It’s the mitochondria that have allowed us to make use of oxygen to create useful energy from food sources. Without them, we would be smelly little four-celled nobodies with hardly enough energy to do anything useful.
One of the reasons we are confident that mitochondria are the descendants of these once free-living organisms is that they have their own genome. It’s much smaller than the ‘proper’ human genome that is found in the nucleus. It is just over 16,500 base pairs in length compared with the 3 billion base pairs of the nuclear genome, and unlike our chromosomes it is circular. The mitochondrial genome only codes for 37 genes. Remarkably, well over half of these don’t code for proteins. Twenty-two of them encode mitochondrial tRNA molecules10 and two encode mitochondrial rRNA molecules. This allows the mitochondria to produce ribosomes, and to use these to create proteins from the other genes in its DNA.*,11
This seems a very risky strategy in evolutionary terms. Mitochondrial function is critical for life and ribosomal function is absolutely critical to mitochondrial function. So why have such an important process with no safety net of extra copies of the ribosomal genes in our power generators?
We can get away with this because mitochondrial DNA isn’t inherited in the same way as nuclear DNA. In the nucleus we inherit one set of chromosomes from each parent. But mitochondrial inheritance is different. We only inherit our mitochondria from our mother. This would seem to make for an even riskier scenario because it means if we inherit a mutant mitochondrial gene from our mother, there is no chance of a back-up normal gene from dad.
But there is (of course) a complication. We don’t just inherit one mitochondrion from our mother, we inherit hundreds of thousands, maybe even a million. And they aren’t all the same genetically, because they haven’t all originated from one mitochondrion in a previous cell. Every time a cell divides, the mitochondria also divide and are passed on to daughter cells. Even if some of these mitochondria have developed mutations, there will be plenty of other mitochondria in the cell that are fine.
That’s not to say that problems never develop, and many of those that do have been reported to be in the tRNA genes on the mitochondrial DNA. These include conditions with muscle weakness and wasting;12 hearing loss;13 hypertension14 and cardiac problems.15 But the symptoms may vary a lot from patient to patient, even within the same family. The most likely reason for this is because symptoms may not develop until the percentage of mutant mitochondria in a tissue reaches a threshold. This may not be until relatively late in life, as a consequence of random unequal distribution of ‘good’ and ‘bad’ mitochondria when a cell divides.
If all of this hasn’t been enough to demonstrate that RNA is not just some poor relation of DNA or an inferior species compared with proteins, consider this. Despite DNA being the poster child for biology, all life on earth may have originated not with DNA but with RNA.
In the beginning was the RNA (possibly)
DNA is a great molecule. It stores a lot of information, and because of its double-stranded nature it’s easy to copy and to maintain the sequence stably. But if we try to think back billions of years, to when life began to develop, it’s hard to see how it could happen based on a DNA genome.
That’s because although DNA is fantastic at storing information, it’s no use in terms of creating something from that information, not even another copy of itself. DNA can never function as an enzyme. Because of this, it can’t make copies of itself so how could it have been the starting genetic material? It is always reliant on proteins to do its bidding.
But if we look at rRNA, a molecule which has received very little by way of the spotlight even among most scientists, there’s a bit of a eureka moment. rRNA contains sequence information but it is also an enzyme. This raises the possibility that RNA could have had a range of enzymatic activities in the past, and this could have led to the evolutionary development of self-sustaining and self-propagating genetic information.
In 2009 researchers published extraordinary work in which they generated such a system. They genetically created two RNA molecules both of which could act as enzymes. When they mixed these molecules in the lab, and gave them the raw materials they needed, including single RNA bases, the two molecules made copies of each other. They used the existing RNA sequences as the templates for the new molecules, creating perfect copies. As long as they were supplied with the necessary raw materials, they made more and more copies. The system became self-sustaining. The researchers went even further by mixing higher numbers of different RNA molecules, each of which had enzymatic activity. When they activated the experiment, they found that two sequences would rapidly outnumber all the others. Essentially, the system was not only self-sustaining, it was also self-selecting because the most efficient pairs of RNA molecules would recreate each other far more rapidly than any of the other pairings.16 Very recently, scientists have even succeeded in creating a type of enzymatic RNA that will generate copies of itself.17
An expression that is still heard in the UK is ‘Where there’s muck, there’s brass’, meaning that where there is dirt or rubbish, there’s money. Maybe where there’s junk, there’s life.