‘We live in a dancing matrix of viruses; they dart, rather like bees, from organism to organism, from plant to insect to mammal to me and back again, and into the sea, tugging along pieces of this genome, strings of genes from that, transplanting grafts of DNA, passing around heredity as though at a great party.’
LEWIS THOMAS
It is the year 1964. A woman on the young side of middle age, dressed in a white lab coat, is adjusting a large machine in a laboratory at St Thomas’s Hospital in London. Originally from Glasgow, June Almeida, née Hart, left school at sixteen, but after training as a laboratory technician has become skilled at operating electron microscopes while living and working in Toronto with her Venezuelan artist husband and daughter. So skilled that she has recently been lured back from Canada to help with the electron microscope at this London hospital. She has developed a technique for gathering virus particles beneath the electron beam so that they can be photographed.
On this particular day in 1964, she is focusing the electrons on a sample sent from the Common Cold Research Unit, a British government laboratory devoted to understanding what causes runny noses. The unit’s Dr David Tyrrell has identified one cold, taken originally from the nose of a boy at a Surrey boarding school, that seems to be unusual – it won’t grow in the normal cell cultures, although it readily causes colds if squirted up the noses of volunteers. It is clearly a virus, passing through filters that sequester bacteria; however, it fails tests that single out influenza viruses, rhinoviruses, adenoviruses and other viruses. Unlike these other viruses, it is easily killed by ether, which implies that it might have an oily or fatty component.
Norman James/Toronto Star via Getty Images
Dr Tyrrell had sent a sample to Almeida. She skilfully managed to take a photograph of the virus, and sure enough it had a distinctive appearance: round and decorated with a ghostly crown that reminded her of the gaseous effusions from the surface of the sun known as the solar corona. This rang a bell with Almeida. She had seen similarly shaped viruses in mice and chickens. But when she submitted a paper to a journal and included the pictures, it was rejected – these are just blurred images of ordinary influenza viruses, she was told. Undeterred, Almeida stuck to her guns, gathered more data and, in 1968, she and seven colleagues wrote to the journal Nature and suggested that a new class of virus be recognised, the ‘coronaviruses’. The sample from the Surrey boarding school was later lost so we do not know which coronavirus it was, but we now know that there are four types of coronaviruses that cause the common cold. They go by the names 229E, NL63, HKU1 and OC43.
Today the coronaviruses are divided into four groups based on their genetic differences. The rarest of those studied to date are the gamma- and delta-coronaviruses, none of which infect people. These are mostly carried by birds and pigs. The most commonly found are the alpha-coronaviruses, mostly from mammals, and two of which infect human beings: 229E and NL63. The remainder are the beta-coronaviruses, found frequently in bats and rodents, of which four have infected people before 2019: HKU1, OC43, MERS and SARS. Within the beta-coronaviruses lies a ‘species’ known by three different names: lineage-B, SARS-like (or SARS-related) or sarbecoviruses. SARS-CoV and SARS-CoV-2 are both sarbecoviruses.
Viruses can only make more copies of themselves by hijacking a host cell and leveraging the host machinery to make more viruses. This inability to replicate on their own leads most biologists to conclude that they cannot be described as living things, even though they are clearly made of protein and nucleic acids and are parasites of living organisms. Similar to living organisms, viruses are built on a genetic code that is enclosed by a protein shell and sometimes a membrane envelope. Once inside the cell of a host, viruses reveal their true nature as they churn out ribbons of genetic tape instructing the cell to turn over all its machinery to replicating and multiplying virus particles.
Viruses are all around us all the time. All are highly specialised so the vast majority cannot infect people, though they each have to infect some living organism to reproduce. In the oceans alone there may be a million trillion trillion viruses. Many viruses, called phages, actually infect and kill bacteria, so many scientists today are innovating phage therapies as an alternative to antibiotics in the face of rising bacterial resistance against frontline antibiotics.
The SARS-CoV-2 coronavirus that causes Covid-19 in human beings can also infect animals such as cats, tigers, lions, minks, ferrets, otters, raccoon dogs, dogs and non-human primates. It is very, very small. Each particle of SARS-CoV-2 weighs about thirty quintillionths of an ounce, or approximately one quadrillionth of a gram. All the SARS-CoV-2 viruses in all the millions of patients in the world at any given point in time could fit inside a single soda can.
The infectious diseases that have ravaged human populations throughout history were not all viruses. It was a bacterium, the plague, that devastated the population of much of the Old World in the 1300s. It was a virus, smallpox, that wiped out much of the population of the New World after European contact. And one of today’s biggest killers, malaria, is neither: it is a protozoan. Very roughly, in terms of their size, if malaria were a cat, the plague would be a mouse and smallpox a flea.
Bacteria were known from the late 1600s, but the first virus was not discovered until the mid-1800s. Scientists studying tobacco crops found that the plants were not afflicted by a bacterium or fungus, but by an entirely new type of ‘invisible contagion’ that was much, much smaller than single-cell microbes such as bacteria. Unable to satisfy the criteria known as Koch’s postulates, which had become the gold standard to demonstrate a microbial infection, the scientists nonetheless proved that the plant disease could be transmitted by inoculating healthy plants with sap from diseased plants. In 1898, a Dutch scientist named Martinus Beijerinck coined the word ‘virus’, derived from a Latin word meaning a liquid poison. Within a few years, the first human virus was identified: the yellow fever virus.
Today, although some viruses get into us through our food, such as norovirus, and some arrive via the mouthparts of insects, like yellow fever, dengue and Zika, the most common viruses that afflict people in developed countries are respiratory viruses. These get into our bodies via our noses or throats. Some two hundred kinds of common cold, most of them rhinoviruses, adenoviruses and coronaviruses, and a bunch of flu strains, swarm around the world, descending on schools with special glee. The average child catches seven colds a year. The fact that people live in dense cities, crowd into schools or bars, and travel long distances, suits such viruses just fine. It may be no accident, as we shall see, that bats carry a galaxy of viruses, for they live in larger aggregations than any other mammals bar us. In one Texan cave, at certain times of year, twenty million Mexican free-tailed bats roost together, or roughly the (human) population of the Mexico City area.
As with influenza, a coronavirus genome is made of RNA, a slightly less tidy cousin of DNA. These genomes spell out instructions for building more copies of the virus in a four-letter code: A, T, G and C for DNA, but A, U, G and C for RNA. As we shall see, scientists have chosen to represent the genomic texts of RNA viruses in the language of their DNA equivalent. This means very simply that they use the letter T instead of U.
Much of the argument about where the SARS-CoV-2 coronavirus came from will turn on this textual analysis of its genome. Just as the people who study Jacobean plays can trace the authorship by close analysis of the text and can follow the pedigree of different versions of the plays by the mistakes that were made by one scribe and copied by others, so we can trace the ancestry of a virus by the mutations that happened to its text in one generation and were inherited by later viruses. If you compare three genomic texts – one taken from a bat virus, one from a pangolin virus and one from a human SARS-CoV-2 virus – and find one part of the text to be more similar in the bat and the person, while another part is more similar in the pangolin and the person, then you might be looking at a genome that was the result of recombination: a paragraph of one text replaced by the equivalent paragraph from another text. By such methods can we inch towards an understanding of where and how the text originated.
Here’s an analogy. The play ‘All Is True’ was later renamed as ‘The Famous History of the Life of King Henry the Eighth’ when it was included in the first folio of William Shakespeare’s works. Until 1850 it was thought to have been written by Shakespeare alone. Then the scholar James Spedding argued that it had been partially written by John Fletcher, who was Shakespeare’s successor as the principal playwright for the King’s Men theatre company. He made this claim based on the appearance and style of many eleven-syllable lines that were characteristic of Fletcher. More than a century later, another scholar, Cyrus Hoy, also tried to map the play, dividing the scenes between Shakespeare and Fletcher, based on the words used. Fletcher, for example, uses ‘ye’ more often than Shakespeare, who prefers ‘you’. Hoy’s scheme mapped neatly onto Spedding’s hypothesis. Meanwhile, the contemporary sources used by the two authors can also be deduced from their text. The majority came from Raphael Hollinshed’s Chronicles, but both also consulted the authors John Foxe, John Stow and John Speed.
Our point is that the textual analysis of the genome of a virus can be equally illuminating and in a surprisingly similar way. Tiny characteristic features of the genome can indicate something critical about where the virus came from, how it evolved, or what it is related to, and yet such signs can be ambiguous and highly debated among scientists.
RNA messages are translated into another: the language of proteins. Whereas RNA has a four-letter alphabet, each letter being a different nucleotide, proteins have a twenty-letter alphabet, with each letter being a different amino acid. The RNA text is read in three-letter words known as codons, and there is redundancy: several different three-letter RNA words can translate into the same amino acid. So some changes in the RNA message do not alter the protein code. Such changes are said to be synonymous. However, changes in the RNA message that do alter the protein code are described as non-synonymous and are more likely to affect the properties of the resulting protein. As an analogy, in transcribing Shakespeare, changing ‘ye’ to ‘you’ is a synonymous change, but ‘ye’ to ‘we’ is not. Virus genomes also experience both synonymous and non-synonymous changes.
The four coronaviruses that cause common colds attack the upper respiratory tract. The viruses that cause Severe Acute Respiratory Syndrome (SARS) and Middle East Respiratory Syndrome (MERS) are less effective in the upper respiratory tract but instead target the lungs to cause severe respiratory disease. SARS-CoV-2 excels in both the upper and lower respiratory tract. In patients where the virus stays in the upper respiratory tract, symptoms similar to the common cold can develop, such as coughing and a sore throat. But when the virus makes it into the lungs, it can cause severe disease by ravaging the fine structure of the pockets known as alveoli, resulting in characteristically opaque shadows on scans. In especially severe cases, the infection can trigger a ‘cytokine storm’ of inflammation caused by an overreaction of the immune system, and in the worst case it can lead to irreversible tissue damage.
SARS-CoV-2 does not stop at the respiratory tract. It also infects other organs of the body, including the intestines, heart, blood, sperm, eyes and parts of the nervous system. The infection is systemic, sometimes damaging even the kidney, liver and spleen. Less well understood is how the virus affects the immune system – whether it can infect and kill immune cells, how it camouflages itself to avoid detection by the immune system, and how different proteins in the virus can disrupt antiviral host responses. No other virus has been the research focus of so many scientists across such diverse disciplines and across so many countries. Scientists have generated a tremendous amount of data characterising this virus, but are still learning the tricks it has up its sleeve.
How does the virus get inside a cell? When it first encounters a susceptible human cell, let’s say in the throat, the virus uses the spike proteins protruding from its surface to latch onto a receptor on the human cell surface. Specifically, the spike protein has a receptor-binding domain (RBD) that determines exactly which host receptor the virus can use for infecting cells. For SARS-CoV-2, this receptor is called Angiotensin Converting Enzyme-II (ACE2) and is present across a plethora of cell types in the human body; ACE2 has its own job to do, chiefly to modulate blood pressure. Unfortunately, this means that the cells that line arteries and veins, as well as lungs and small intestine, are covered in ACE2 receptors, offering an open invitation to the virus. ACE2 can also be found in heart, kidney, liver, testes and brain. So it is no surprise that Covid-19 has more severe implications for patients who are obese, diabetic or suffer from hypertension – medical conditions that have already weakened the organs targeted by SARS-CoV-2. Covid-19 symptoms range from common ones, such as fever, aches, cough, shortness of breath and loss of smell, to less expected ones such as headache, diarrhoea and stroke. Some Covid-19 patients who have recovered also report ‘long Covid’ symptoms: a lasting fatigue, difficulty breathing or engaging in rigorous physical activity, and mind fogginess that plague even individuals in their prime.
Other animals also have their own version of ACE2, which is why SARS-CoV-2 can infect a wide range of species. The first SARS virus uses ACE2 as a receptor too and was also found to infect a variety of animals: during the 2002–3 epidemic, palm civets, raccoon dogs and hog-badgers among other species were found to carry the virus. Scientists later discovered that the SARS virus could also infect ferrets and hamsters, which are used in laboratories to better understand the disease. Over the past two decades, in the study of SARS viruses in the lab, scientists have tested them on cultured cells that express ACE2 from different species and have engineered mice that express the human ACE2.
SARS-CoV-2’s spike was found to bind to human ACE2 at least as strongly as the spike of the 2003 SARS virus. But that was not all. Scientists inspecting the genome of SARS-CoV-2 quickly noticed that the novel coronavirus possesses a feature called a furin cleavage site in its spike that has not been found in any other SARS-like virus, though it does appear in other distantly related coronaviruses. This short genetic sequence helps the virus to infect varied types of cells, specifically those that include a protein called furin, and is probably one of the reasons the virus is capable of causing a pandemic. Where it got the furin cleavage site from is controversial, as we shall discuss later.
After entering the cell, the virus takes off its coat, as if visiting a friend’s house. But this guest is not here for idle chit-chat. The genetic material of the virus (approximately thirty thousand letters of single-stranded RNA in the case of SARS-CoV-2) gets to work commandeering the cell’s machinery. The SARS-CoV-2 virus has fifteen genes spelled out by its RNA, which between them specify the recipes for twenty-nine proteins. These include sixteen non-structural proteins that are not parts of new virus particles but play a role in transforming the host cell into a virus factory. For instance, nsp12 is the RNA-dependent RNA polymerase (RdRp) that forms part of a complex that makes more copies of the virus genome; nsp3, nsp4 and nsp6 reorganise the internal structure of the host cell to be more conducive to virus replication; and nsp14 proofreads the replicated genomes to ensure that they are accurate. This is particularly critical for coronaviruses because their genomes are almost double the length of some other RNA viruses such as the influenza virus. Too many errors can render the virus incapable of infecting new hosts. The RNA genome also encodes structural proteins that form part of the final virus particles that are released from the host cell: the nucleocapsid (N) protein that wraps the RNA, the membrane (M) protein that holds the membrane together, the envelope (E) protein that encloses the core, and the spike (S) protein that sticks out from the membrane of the virus and latches onto host cells to facilitate infection. Several accessory proteins, which have a variety of jobs, are also encoded in the genome; there are eight of these in the 2003 SARS virus and nine predicted in SARS-CoV-2.
Of the four coronaviruses that cause versions of the common cold, two are thought to have been caught originally from bats and two from rodents. The most common of them is called OC43. It is a highly seasonal bug that causes about one in every ten common colds. There is good evidence that this one is quite a recent arrival in our noses and throats. Its genome is a 96 per cent match to a coronavirus that causes diarrhoea in young calves, called bovine coronavirus or BCoV. In 2005, Dr Marc van Ranst at Leuven University in Belgium examined parts of the genomes of the two viruses and, based on estimates of the rate of evolutionary change, concluded that they diverged in around 1890.
It may be just a coincidence, but 1889–90 saw a pandemic, the worst in the nineteenth century. It has always been assumed that it was caused by an influenza virus, but there is no direct evidence of this. Given that human coronaviruses were unknown till the 1960s, the possibility that this was a coronavirus pandemic was not considered until, in the early 2000s, Dr van Ranst, a coronavirus expert, noticed the coincidence of the date of the pandemic and the likely date of OC43’s arrival in the human species. That hypothesis has now been somewhat strengthened by the Covid-19 pandemic, because of some similarities between the epidemiology and symptoms of the two pandemics. In particular, both diseases largely spared children, unlike influenza, and both affected men more severely than women. Both cause loss of taste and smell in some people.
The outbreak began in central Asia, in the independent city state of Bukhara in May 1889. By October, it had reached Krasnovodsk on the Caspian Sea, from where it took the train along the Volga eventually to Moscow and St Petersburg, the Russian capital. The symptoms included high temperatures, swollen hands, face rashes and agonising body aches. The acute phase of the illness lasted for five or six days but sometimes left the victim exhausted for weeks. That autumn, according to one estimate, the disease laid low 180,000 people in St Petersburg, out of a population of about a million. By December, military hospitals in the capital were unable to cope and several factories had shut down for the lack of workers. Sweden’s turn came in November, when most of the soldiers of an artillery corps stationed on the island of Vaxholm fell ill. Within eight weeks, more than half the population of Stockholm had caught the virus. In December, the Prussian capital of Berlin was paralysed by the novel pathogen. Universities suspended lectures, government officials were said to be unable to carry out their functions and the fire brigade was disabled for lack of manpower. A contemporary report stated that schools were controversially closed throughout Germany even though the disease was not as great a threat to children as to adults. In Vienna, the schools closed early for Christmas and stayed shut till late January.
The same month Italy, France, Spain and Britain fell to the invisible invader. By Christmas, Paris’s hospitals were overwhelmed. In Madrid, in early January, three hundred people a day were dying, and being buried at night so as not to spread alarm. In London, on one day in January at St Bartholomew’s Hospital in the City of London, Dr Samuel West found more than a thousand people crowded into the casualty ward, most of them men. According to a modern analysis, the death rate peaked in the week ending 1 December 1889 in St Petersburg, 22 December in Germany and 5 January 1890 in Paris. By February, the disease had reached America, South Africa and India. By April, it was in Australia and China. This was the first global pandemic made possible by railways and steamships.
As fast as it came, the plague subsided and by the summer it was apparently over, though not in the southern hemisphere where it was winter. It returned north in the autumn for a second wave, killing Queen Victoria’s grandson Albert Victor, among other prominent people, then again and again in the next few years with diminishing ferocity.
The question of whether new viruses evolve to be more or less virulent over time in new host species populations has no easy answer. The sicker you get, the more viruses are being bred in your body and likely expelled into the environment and transmitted to other people. But a virus that kills its host too quickly reduces the chances of transmission in the community. A person lying on their deathbed is a dead end for the virus. Selection will therefore favour strains that keep their hosts alive and healthy long enough to spread the virus from person to person. Pumping out lots of viruses while keeping the patient active would be the perfect compromise.
A key insight came from the evolutionary biologist Dr Paul Ewald, now of the University of Louisville, beginning in the 1980s. Ewald argued that the mode of transmission influences the trade-off between virulence and contagion. Diseases spread by direct contact, and which cannot survive for long outside the body, will evolve to be low in virulence, so that the infected person remains as active as possible, interacting with a large number of people. Diseases that spread by other means – especially dirty water or insect bites – are expected to remain or become highly virulent, such as cholera, plague and malaria, because they do not pay a price for immobilising and killing the host. Diseases that spread by sex will become good at hiding in the body for a long time, so that the victim has time to move on to a new sexual partner. There is another category. Ewald argued that what he called attendant-borne illnesses – which are spread from patient to patient by nurses or other helpers – would often remain or become highly virulent, because they could spread from incapacitated and dying hosts: the sicker the patient, the more visits they get from carers.
Respiratory viruses spread through the air by coughs and sneezes can benefit from low virulence, because they have more chances of spreading from people who don’t feel too ill to go to work or to parties – resulting in superspreading events.
How then would Ewald explain the high death rate from influenza in 1918? It was not spread by insects or dirty water or sex, was not durable, and yet it killed with gusto. Unlike almost every other flu outbreak, this one incapacitated many of its victims and killed a significant proportion of them. What is more, it became more deadly as time went on. The second wave in the autumn of 1918 was more lethal than the first in the spring. The episode, in which an estimated fifty million people died, cast a long shadow over the future. Every time a new variety of flu appeared, for example in 1957, 1997 or 2009, the world shuddered with fear expecting a terrible pandemic. Yet each time the flu faded into mildness, killing comparatively few people. What was so special about 1918?
Ewald hypothesised that the ‘high virulence of the 1918 pandemic resulted from natural selection acting under unusual environmental conditions’. Those unusual conditions were to be found on the Western Front of the First World War. They allowed ‘individuals immobilised by illness to be transported repeatedly from one cluster of susceptible hosts to another, in trenches, tents, hospitals, and trains’. A highly virulent version of the virus that rendered its victim so ill he could not move was now at no disadvantage. Imagine two soldiers who catch the flu in the front line. One has a mild case, the other a life-threatening one. The first is sent to rest in a dugout or billet somewhere, meeting a few comrades on the way. The second is taken by stretcher bearers to a crowded medical station, then on a crowded ambulance to a crowded field hospital, from where by crowded train he is sent to a nursing home back in England, say. The virus was free to hop off into wounded or convalescing troops, and those attending them, at various points along the way.
To test this hypothesis, Ewald examined the mortality in different phases of the epidemic. If the virulence was explained by the novelty of the virus in human hosts, its first outbreaks should have been the most virulent. This was not the case. The first recorded cases, in military camps in Kansas in March 1918, showed normal mortality for flu. As the virus spread through army camps and cities in the spring and summer of that year, mortality remained moderate. It was in the diary of Jefferson Kean, the deputy chief surgeon of the US Expeditionary Force, in northern France in mid-August, that reports of high death rates first emerge. He had described the flu as mild in April, May, June and July. On 17 August, he wrote, ‘influenza increasing and becoming more fatal’. By September, many doctors had noticed the change. The more lethal strain spread round the world, later subsiding in both frequency and virulence during 1919. Ever since, this H1N1 flu has been not a tiger but a house cat, and so has every other strain that has come along – possibly due to the prevailing (but fading) immunity in the hundreds of millions of people who were infected by the 1918 flu and later other strains of influenza.
How SARS-CoV-2 will continue to evolve in terms of deadliness and transmissibility between humans (and other animal hosts) remains to be seen. But we do know one thing: the virus is here to stay.