9. The furin cleavage site

‘Often what people don’t say or leave out, tells the real story.’

SHANNON ALDER

In all the 29,903 letters of RNA that comprise the genome of a typical SARS-CoV-2 virus, there is one little segment, just twelve letters long, spelling out the recipe for four amino acids, that has since become a miniature but fiercely contested battleground in the war to understand the origin of the pandemic. There are three reasons that this short sequence matters so much: it appears to play a big role in making the virus more infectious; it seems to be unique to this virus among all the sarbecoviruses; and it is the sort of sequence in exactly the location that scientists have been deliberately inserting into other coronaviruses.

It is called a ‘furin cleavage site’. Though ‘furin’ might sound like a drink taken by a Viking before a raid, it is in fact short for the boring and unilluminating phrase ‘FES Upstream Region (FUR) Protein’. Furin is a vital protein doing steady work in human cells every minute of every day, going around cleaving proteins in two to change their shapes, the better to enable them to do their myriad different jobs. Furin is lured to do this in the right place in each protein by special sequences of amino acids. The furin cleavage site in SARS-CoV-2’s genome reads CGG-CGG-GCA-CGT, which is the recipe for the amino acids arginine-arginine-alanine-arginine, RRAR. It lies in a key spot in the spike gene of the virus, immediately upstream of the point where the furin will cleave the spike in two.

When the genome sequence of SARS-CoV-2 was first released to the world by Dr Edward Holmes and Dr Zhang Yongzhen on 11 January 2020, scientists quickly spotted this novel feature. A team of scientists in France and Canada, including Dr Bruno Canard and Dr Etienne Decroly in Marseille, were the first into print outside China. In a paper submitted for publication on 3 February 2020, they wrote that the furin cleavage site may provide a gain of function to SARS-CoV-2 compared with other SARS-like viruses, increasing the efficiency with which the virus spreads among people. Two weeks earlier, on 21 January, a team of scientists from four universities across cities spanning China from north to south – Tianjin, Jinan, Nanjing, Kunming – published a manuscript in the Chinese Journal of Bioinformatics, titled ‘A Furin Cleavage Site Was Discovered in the [Spike] Protein of the 2019 Novel Coronavirus’. Their paper claimed to be the first to report the ‘very important mutation’; they too thought it was likely to enhance the infectiousness of the virus.

What is so special about a furin cleavage site? The spike of a coronavirus determines which host species the virus can infect and how many different types of cells in the body the virus can invade. In order for the spike protein to help the virus get inside a cell, it has to be cleaved by cutting at two sites. This sounds like an accident, but it is not. The spike protein is cut, but it does not fall apart. Spikes cluster in groups of three, like a bouquet, and after cleavage, each spike opens out into a new configuration like an origami puzzle. One half of each spike, called S1, clutches the entry receptor (ACE2 in the case of SARS-CoV-2), while the other half, S2, entices the cell membrane to fuse with the virus membrane. Cleavage at the boundary between S1 and S2 causes a change in the shape of S2. Cleavage at a second site within the S2 domain then forces the fusion of virus to cell so that the virus genome slips into the cell. (Pause to admire the fact that we live in an age when science can tell us such intricate and beautiful facts about invisibly small entities!)

Three spike proteins, each shown in a different shade of grey, fitted together on the surface of a SARS-CoV-2 virus. The closed form (left) can shift to the open form (right) once a cleavage is made by furin at the point indicated. This exposes the ACE2 binding domains, as shown by one of the spike proteins (top right).

Martin Brown

These cutting events take place on entry into a cell but they also occur as virus particles are being made inside a cell that has already been hijacked, priming the new viruses to invade other cells. Human cells manufacture the little protein scissors called proteases whose day job is to prepare lots of proteins for work by cleaving them. Furin is one such protease, found in the brain, lung, digestive tract, kidney, pancreas and reproductive organs. Each protease recognises a specific sequence in its target proteins at which to make the cut. Some viruses have acquired the ability to exploit this tool, using furin, or other proteases, for their own purposes – to reshape their own proteins. Several other coronaviruses that infect humans attract furin, including MERS, HKU1 and OC43. These are also betacoronaviruses, but they are not in the sarbecovirus subgenus. Other well-known viruses such as avian influenza, HIV and measles also rely on a similar furin-based mechanism.

In May 2020, Drs Markus Hoffmann, Hannah Kleine-Weber and Stefan Pöhlmann in Göttingen, Germany, listed the spike sequences of fifty-five SARS-like viruses, including SARS-CoV-2, five strains of human SARS, two civet strains, one raccoon dog strain, one pangolin version of SARS-CoV-2 and forty-five different bat viruses. There is no sign of a furin cleavage site at the S1/S2 junction in any of them except one: human SARS-CoV-2 stands out like a sore thumb. In this virus’s spike, four extra amino acids – PRRA – interrupt the otherwise similar sequence in the S1/S2 junction. Since the next letter is R (standing for arginine), this means there is a sequence reading RRAR in just the right place to attract the attention of furin scissors. In this group, even the bat virus RaTG13 and the pangolin virus that have the spikes most similar to SARS-CoV-2 have no furin cleavage site.

Facts like these have led some scientists to think that the insertion of a sequence with such import into the gene at just the right place is a strong sign that SARS-CoV-2 was deliberately altered in a laboratory. Others furiously disagree, arguing that the virus probably acquired this feature naturally. In April 2021, scientists from the NIH found long inserts had appeared naturally in the genomes of some SARS-CoV-2 variants during the pandemic. They wrote that the furin cleavage site resembled these inserts, although they could not identify a precise mechanism by which it originated in the virus. Both sides of this debate can mount a decent case.

One thing is for certain: furin cleavage is a big reason that SARS-CoV-2 has such pandemic potential. Several groups of scientists have found that when they remove the furin cleavage site from the spike of SARS-CoV-2, the virus replicates much less efficiently in the respiratory tract and causes less severe disease in both hamsters and humanised mice. In SARS-CoV-2 viruses sampled from human beings, mutations of the Rs that define the PRRAR motif have been exceedingly rare: only about one in ten thousand has mutations at one of the first two Rs in the motif, and essentially none have mutations at the last R, which is where the cleavage occurs. This implies that it is a key feature of the virus.

The discovery of furin cleavage sites

The history of this niche field of research goes back thirty years. In 1992, a team at the Institute of Virology in Marburg, Germany, discovered that a type of furin cleaves the haemagglutinin protein on the surface of an avian influenza virus, enabling the virus to get inside cells. A particular sequence of four amino acids at the cleavage site was necessary to attract the attention of furin: R-X-K/R-R, where X can be anything and the third amino acid can be either lysine (K) or arginine (R). Just a few months later, a different group of scientists led by the same senior authors demonstrated that such a sequence motif in HIV also serves as a furin cleavage site that renders the virus infectious.

Since then, spotting furin cleavage motifs in virus proteins has become a bit of a hobby among virologists, a potentially useful way of gauging how dangerous a virus might be. They always begin and end in arginine (R). The cut happens just after the last R. In SARS-CoV-2, the motif is RRAR; in mouse hepatitis virus, it is RRAHR; in bovine coronavirus, RRSRR; in OC43, a common cold coronavirus, RRSR.

Yet other coronaviruses, including SARS, manage to be infectious without furin cleavage at the S1/S2 boundary. The truth is that scientists still do not fully understand what is going on. This is why several labs around the world have been deliberately inserting furin cleavage site sequences into the spike genes of different coronaviruses to see how this changes the virus’s ability to infect different types of cells. We will briefly describe five experiments carried out over fifteen years in which scientists manipulated furin cleavage sites in the spike genes of coronaviruses. Our purpose is to show how widespread, indeed fashionable, it has been in virology laboratories to insert or alter these sites in coronaviruses in recent years.

Dr Jack Nunberg, a biologist whose career has included spells at a pharmaceutical firm and two biotech firms before he returned to academia at the University of Montana, carried out the pioneering experiment in 2006. He wanted to see what the spike of the 2003 SARS virus could do if it was given a furin cleavage site at the S1/S2 boundary. It should be stressed that he did not do the experiment with whole viruses, only the spike protein molecules that cannot make more copies of themselves. It was not a dangerous gain-of-function experiment. Dr Nunberg, working with his colleagues, took a SARS spike and made a crucial change to it so that it had a new RRSRR motif at the S1/S2 boundary, markedly enhancing the tendency of an infected host cell to fuse with another potential host cell containing the ACE2 receptor. However, pseudoviruses fitted with these spikes were no better at infecting human kidney cells carrying the ACE2 receptor.

In 2009, a team at Cornell University introduced a furin cleavage site into the SARS spike at the S1/S2 junction, but this time with a second site added where the S2 domain is usually cleaved. They observed a dramatic increase in cell fusion. Again, the experiment did not use whole, infectious viruses so there was no risk of a laboratory escape of an enhanced SARS virus.

In May 2015, a scientific collaboration between the Huazhong Agricultural University in Wuhan and Utrecht University in the Netherlands published a study showing that the creation of a furin cleavage site in the spike of Porcine Epidemic Diarrhea Coronavirus conferred increased ability to enter cells and trigger cell fusion. As the name implies, this virus gives pigs diarrhoea and has recently caused devastating epidemics on farms in Asia and America. This experiment actually used live coronavirus, albeit not one that infects human beings. The sequence was not inserted at the S1/S2 junction but at the secondary S2 cleavage site in the spike gene. The only funding acknowledged was a grant from the Natural Science Foundation of China and two of the experimenters came from Huazhong Agricultural University. By changing just one amino acid in the spike, the scientists rendered the viruses capable of triggering fusion in cells. The authors concluded that introducing furin cleavage sites into a coronavirus in the lab could help a virus to infect more cell types both in a Petri dish and in an animal.

Also in 2015, Drs Shi Zhengli and Ralph Baric were co-authors on a paper describing a similar experiment to those of Dr Nunberg and others. The virus in this case was MERS. Comparing the MERS genome to the genome of HKU4, a MERS-like coronavirus discovered in bats in Hong Kong, Dr Fang Li’s University of Minnesota team hypothesised that two tiny differences in the spike gene were responsible for the fact that MERS could cross the species barrier into human beings while HKU4 could not. One of the two differences was an S1/S2 furin cleavage site that is present in the MERS spike but not that of HKU4. Using a pseudovirus, they made just two small changes in the sequence of the MERS spike gene, which rendered MERS largely incapable of entering human cells because it could no longer be cleaved by a protease. They also engineered the HKU4 spike so that it now had the two cleavage sites, including a novel S1/S2 furin cleavage site. The result was pseudoviruses more capable of infecting different types of human cells. Once again, the risk of this work was very low. The pseudoviruses had no coronavirus genome inside so they could not generate more live coronaviruses even after infecting cells. Interestingly, the scientists also found that bat cells had a different protease system from human cells – perhaps explaining why some of these cleavage sites differ in bat viruses and human viruses, although some bat coronaviruses do possess furin cleavage sites in their spikes. The conclusion of the study was that ‘the two functional human protease motifs in MERS-CoV spike played a critical role in the bat-to-human transmission of MERS-CoV’. It was possible therefore that just two mutations would be necessary to turn HKU4, a relatively harmless bat virus found in China, into a pathogen that could take its first steps into human beings.

We need to recount one more furin cleavage site experiment to bring the story up to date. In October 2019, the Key Laboratory of Animal Epidemiology of the Ministry of Agriculture in Beijing published the results of an experiment on infectious bronchitis virus. This disease of chickens was the earliest coronavirus disease to be identified, back in the 1930s in America, decades before the name ‘coronavirus’ was coined. The experiment showed that putting a furin cleavage site into the S2 domain resulted in a more lethal virus and one that could damage the blood-brain barrier and infect brain cells to cause encephalitis. The scientists feared that such a sequence could evolve naturally in infectious bronchitis virus and result in terrible losses in the poultry industry.

It is safe to say that by 2019 the practice of artificially introducing or removing furin cleavage sites in the spike genes of coronaviruses, or their equivalents in other viruses, had become a routine experiment in virology.

CGG CGG

On 6 May 2021, in a lengthy online essay, the veteran science journalist Nicholas Wade, former deputy editor of Nature and science writer on the New York Times, quoted the eminent virologist Dr David Baltimore as saying, ‘When I first saw the furin cleavage site in the [SARS-CoV-2] viral sequence, with its arginine codons, I said to my wife it was the smoking gun for the origin of the virus . . . These features make a powerful challenge to the idea of a natural origin for SARS2.’ This sent a tremor through the scientific community, which until that point had largely decried any notion of a laboratory origin of the virus as deeply unlikely or even a conspiracy theory.

Dr Baltimore won the Nobel Prize in Physiology or Medicine in 1975 for his discovery that RNA tumour viruses, specifically retroviruses, can insert their genomes into the DNA genomes of host cells, using a ‘reverse transcriptase’ enzyme that makes a DNA copy of the RNA virus genome. His biography includes the presidency of both the California Institute of Technology and Rockefeller University. He won the US National Medal of Science. So, although other less well-known scientists had suggested that the furin cleavage site in SARS-CoV-2 was possibly a product of genetic engineering, Dr Baltimore’s statement gave this conjecture much more weight. Later Dr Baltimore would partly retract his comment, telling the Los Angeles Times he ‘should have softened the phrase “smoking gun” because I don’t believe that it proves the origin of the furin cleavage site but it does sound that way. I believe that the question of whether the sequence was put in naturally or by molecular manipulation is very hard to determine but I wouldn’t rule out either origin.’ He also told Caltech Weekly that ‘the fact that evolution might have been able to generate SARS-CoV-2 doesn’t mean that that’s how it came about. I think we very much need to find out what was happening in the Wuhan Institute of Virology.’

‘The SARS-CoV-2 furin cleavage site is yet again in the news – this time because of a quote by Nobel laureate David Baltimore,’ tweeted Dr Kristian Andersen from the Scripps Research Institute in La Jolla, California, on 9 May 2021. Dr Andersen had been the lead author on a highly influential article, ‘The Proximal Origin of SARS-CoV-2’ published in Nature Medicine in March 2020, which had concluded that: ‘Our analyses clearly show that SARS-CoV-2 is not a laboratory construct or a purposefully manipulated virus.’ At the time and even till today, the ‘Proximal Origin’ paper has been cited by many other scientists to rule out all laboratory-based hypotheses. Indeed Dr Andersen had been quoted in the Scripps press release as saying that two features of the virus ‘rule out laboratory manipulation as a potential origin for SARS-CoV-2’. It therefore came as a surprise in early June 2021 when FOI’-ed emails were published showing Dr Andersen telling Dr Anthony Fauci, the head of NIAID, on 31 January 2020 that ‘the unusual features of the virus make up a really small part of the genome (<0.1%) so one has to look really closely at all the sequences to see that some of the features (potentially) look engineered’ and that his discussions with other experts found the genome to be ‘inconsistent with expectations from evolutionary theory’. Dr Andersen explained later that it was a normal part of the scientific process to begin with doubts and then assuage them. As more related virus genomes emerged, their team judged that SARS-CoV-2 had evolved naturally. We will return to this developing story on the early 2020 conversation between Dr Andersen and other prominent scientists later in the book.

The basis on which Dr Andersen and his colleagues publicly rejected a laboratory origin was that any genetic engineer would have, first, designed a different-looking spike receptor-binding domain, based on existing data, and, second, used a known genetic backbone. Mr Wade thought both arguments were weak yet ‘his conclusion, grounded in nothing but two inconclusive speculations, convinced the world’s press that SARS2 could not have escaped from a lab’. In response, Dr Andersen pointed out Mr Wade’s ‘troubled history of misrepresenting (and/or misunderstanding) the very basics of evolutionary biology’, a reference to Wade’s contentious 2014 book A Troubling Inheritance: Genes, Race and Human History, which was criticised by numerous scientists who co-signed a letter arguing that ‘there is no support from the field of population genetics for Wade’s conjectures’. Wade was also co-author of Betrayers of the Truth: Fraud and Deceit in the Halls of Science, a 1982 book about scandals in science and their cover-ups. One of the whistleblowers, whom the book described as exposing a scandal in immunology involving his supervisor, was Dr Steven Quay, who also authored a lengthy essay in 2021 arguing that SARS-CoV-2 originated in a laboratory. This is a reminder that until the middle of 2021, the story of the search for the origin of Covid-19 predominantly relied on outsiders and scientists or journalists who were willing to risk their reputation and challenge the scientific consensus.

What are these arginine codons that made Dr Baltimore think that the furin cleavage site is a smoking gun for genetic manipulation? In the universal genetic code for all organisms – the dictionary that translates DNA or RNA language into protein language – there are six different words encoding the amino acid ‘arginine’ (R): AGA, AGG, CGG, CGC, CGA and CGT (the last being CGU in RNA). These three-letter words are called codons. Within the genome of SARS-CoV-2, the commonest codon for arginine is AGA and the rarest is CGG. Yet the furin cleavage site has two of the latter in tandem: CGG-CGG. In contrast to viruses, human and animal cells use CGG rather more frequently. In 2004, a mainly Boston-based team of scientists with collaborators in China altered the spike gene of the 2003 SARS virus to use codons preferred by human cells – they call this process ‘codon optimisation’ – and found that the codon-optimised spike was much more abundantly produced by the human host cell. So it is known that optimising the codons in a SARS-like virus may improve its ability to infect specific host cells. What was the furin cleavage site in SARS-CoV-2 doing with not just one but two codons that are rare in coronaviruses but preferred in human or animal cells? Was the tandem CGG-CGG a sign that the furin cleavage site had been genetically engineered into SARS-CoV-2 in a similar manner to the other experiments where they had been introduced into other viruses? One of the earliest to raise this question was the Russian-Canadian biotech entrepreneur Yuri Deigin, who, like Dr Baltimore, saw this rare appearance of a CGG doublet as a clue that SARS-CoV-2 may have had its furin cleavage site artificially inserted in the lab.

Yuri Deigin is a founding member of Drastic, under its original name of ‘Daszak’s fan club’ – because they had all been blocked on Twitter by Dr Daszak. (The name was changed when Billy Bostickson joined.) Mr Deigin got his bachelor’s degree in computer science and mathematics from the University of Toronto, followed by an MBA from Columbia Business School in 2010. After overseeing the R&D and clinical trials at a biotech start-up for six years, Mr Deigin founded Youthereum Genetics in 2017 to explore the use of gene therapy to reprogramme cells to fight the ageing process. The field of ageing science, with its many snake-oil theories, gave Mr Deigin an acute ability to detect when scientists are making weak arguments, he said. On a Russian Facebook debate club, in March 2020 he began to examine the arguments in Dr Andersen’s ‘Proximal Origin’ paper. Initially, Mr Deigin was inclined to support a natural origin and assumed that the WIV was in Wuhan because it was where bat coronaviruses were found, but the more he looked into the issue, the more his doubts grew and the more he realised that the lab origin hypothesis was not a conspiracy theory. In particular, he concluded that the points put forward by Dr Kristian Andersen reminded him of the story of the emperor who had no clothes. To make sure you really understand something, give a talk or write about it, Mr Deigin thought, so he wrote what was to become an influential essay first in Russian and then in English on the Medium website. This essay was one of the catalysts for the formation of Drastic and signalled the start of a continuous guerrilla campaign on social media about furin cleavage sites and other aspects of the virus genome. Dr Rossana Segreto then approached Mr Deigin to collaborate on a scientific paper analysing the Andersen et al. ‘Proximal Origin’ article and setting out the argument for the laboratory-leak hypothesis, which was ultimately published in November 2020 under the title ‘The Genetic Structure of SARS-CoV-2 Does Not Rule Out a Laboratory Origin’. Later Mr Deigin continued to debate with people on Facebook and Twitter, saying that ‘Nothing moves science forward like the right scientific opponent.’ He has recently collaborated on a project to search all sarbecoviruses for CGG-CGG doublets and found none, except for the one in the furin cleavage site of SARS-CoV-2.

Against this, the frequency at which CGG appears on its own in SARS-CoV-2 overall is comparable to that of other closely related coronaviruses. CGG is used at a 3 per cent frequency in SARS-CoV-2 compared with 5 per cent in the 2003 SARS virus. In Dr Andersen’s words, ‘Nothing unusual here’. And the probability of a CGG doublet appearing depends on what sequence it had mutated from – was a single letter mutation required or more? Dr Andersen highlighted a feline coronavirus where a furin cleavage site shared the same ‘PRRAR’ motif and the first two Rs were encoded by CGG CGA, just one RNA letter different. Dr Andersen’s conclusion was firm. On 9 May 2021, he tweeted: ‘Baltimore’s quote *is* shocking – however, not because it’s true, but because it’s wrong. There’s nothing mysterious about the FCS or the codons – anybody who’d care to take a close look at the data would realize this. That, admittedly, requires a little more than his “first” look.’

Our view is more agnostic. Yes, it is true that virologists altering the sequences of viruses to make them more compatible with human cells are more likely to use CGG codons for arginine than nature does, which makes the CGG-CGG doublet in the furin cleavage site of SARS-CoV-2 suspicious at least. But the argument is suggestive, rather than conclusive, and nature is clearly capable of using these codons. We think – as Alina said on Twitter in May 2020 – ‘there is zero evidence that confirms that the SARS-CoV-2 S1/S2 PRRA(R) FCS arose naturally or artificially, but neither scenario can be ruled out.’ Dr Nunberg likewise was quoted in June 2020 as saying, ‘There is no way to know whether humans or nature inserted the site.’

Natural insertions

In their March 2020 ‘Proximal Origin’ paper, Dr Andersen’s group had confidently predicted that a bat virus with a similar furin cleavage site to SARS-CoV-2’s would soon turn up: ‘Given the level of genetic variation in the spike, it is likely that SARS-CoV-2-like viruses with partial or full polybasic cleavage sites will be discovered in other species.’ Shortly afterwards, a bat virus was found and given a lot of publicity for at least seeming to prove that it did appear to possibly have a natural insertion at the S1/S2 boundary of its spike, although not one that acted as a furin cleavage site. It was a sarbecovirus, in which some parts of its genome were very closely related to their counterparts in SARS-CoV-2.

Its name is RmYN02. Rm stands for Rhinolophus malayanus, the Malayan horseshoe bat; YN stands for Yunnan. The story behind RmYN02 is that between May and October 2019, scientists from the Key Laboratory of Etiology and Epidemiology of Emerging Infectious Diseases in Shandong and the Center for Integrative Conservation at the Xishuangbanna Tropical Botanical Garden caught 227 bats of twenty species in Mengla County in the far south of Yunnan, very near the border with Laos. This is yet another team of scientists that was catching bats in southern China in search of viruses. This far south in China, as well as two widespread species of horseshoe bat, R. sinicus and pearsonii, they found horseshoe bat species that are more typical of Indochina and not found further north, including malayanus. The bats were sampled for viruses and the samples sequenced and analysed. Different tissue and faecal samples were merged into pools for sequencing, with each pool containing as many as eleven samples. Although the approach was cost effective, this meant that when the team found signs of a SARS-like virus, which they dubbed RmYN01 and RmYN02, they had to then figure out which of the eleven samples in the pool contained these two viruses. Only one of the samples, number 123, collected on 25 June 2019, tested positive for both sequences. From this sample, the scientists were only able to verify parts of the RmYN02 genomic sequence but not that the sequenced parts were necessarily from the same virus. RmYN02 looks very similar to SARS-CoV-2 over most of its genome but its spike gene is very different – only a 72 per cent genetic match. Could the spike gene or part of it have come from some other virus lurking in the same sample?

It is in this distantly related spike of RmYN02 that another sequence, ‘PAA’, is found, which the Shandong scientists called a natural insertion similar to that of ‘PRRA’ in SARS-CoV-2. For this reason, they called the similarity ‘strongly suggestive of a natural zoonotic origin of SARS-CoV-2’. However, upon a closer look, this S1/S2 segment of RmYN02’s genome is shorter even compared with most of the close relatives to SARS-CoV-2, let alone SARS-CoV-2 with its four amino acid insertion. As Mr Deigin and Dr Segreto argued in a December 2020 preprint (later published in BioEssays in May 2021), ‘to support the claimed PAA insertion not only a 9-nucleotide insertion, but also a 15-nucleotide deletion must have occurred . . . the claimed PAA insertion is more likely to be the result of mutations’. In other words, RmYN02’s spike is so different from that of SARS-CoV-2 and other closely related viruses that it is difficult to be certain that the ‘PAA’ even constitutes a natural insertion in RmYN02.

Note how ambiguous genetic code can be at this level. Text written in three-letter words with no gaps between words gets garbled if a single letter is removed, and you have to guess how to read it. We are like secret agents trying to decipher coded messages over a faint radio from a spy working at a missile base. The spy has sent a message reading ‘Saw the cat and red fox’ but we only received ‘Saw tec ata nre dfo x’. If we are clever, we spot that an ‘h’ and a ‘d’ are missing. But we might come up with a different interpretation ‘Saw ten cat and red fox’. ‘Cat’ might be code for ‘missile’, say, and we may be badly misled as to how many missiles the spy has seen on the base. Thus we cannot know for sure if RmYN02 has deletions, insertions or both. As for the significance of the text itself, true, ‘PAA’ looks a bit like ‘PRRA’ but it does not have any of the Rs that are critical to the creation of a furin cleavage site.

An alignment of the S1/S2 region of the spike gene of SARS-CoV-2 as compared to other closely related sarbecoviruses listed in descending order of approximate similarity: protein amino acid sequence (top) and RNA nucleotide sequence (bottom). The PRRA (cct-cgg-cgg-gca) insertion is shown in bold. The furin cleavage site is underlined. Letters that deviate from the SARS-CoV-2 sequence are in the lightest shade of grey.

Alina Chan

As the pandemic continued, scientists from different countries went back to their freezers and dug out samples from horseshoe bats to test them for coronaviruses. In December 2020, scientists from the University of Tokyo published the genome of a SARS-CoV-2-like virus, Rco319, from a horseshoe bat of the species Rhinolophus cornutus, captured in 2013 in a cave in Iwate Prefecture in Japan. Its genome was 81 per cent similar to that of SARS-CoV-2. It had no S1/S2 furin cleavage site insertion.

In early 2021, French and Cambodian scientists found two nearly identical viruses, RshTT200 and RshTT182, in two Rhinolophus shameli horseshoe bats. The samples had been collected in a cave in Stung Treng province in Cambodia in 2010, and stored at the Institut Pasteur du Cambodge until being tested after the pandemic began. These were 93 per cent similar to SARS-CoV-2, so not as close as RaTG13, although in some short sections they were closer. Their spike proteins had receptor-binding domains that were slightly less similar to SARS-CoV-2 than RaTG13’s. And they also had no S1/S2 furin cleavage site insertion.

The next new virus to be reported was in Thailand. In June 2020, a team of scientists visited a colony of three hundred horseshoe bats that were living in a large irrigation pipe in a wildlife sanctuary in Chachoengsao province in eastern Thailand. They captured a hundred of the bats and identified them as belonging to the species Rhinolophus acuminatus, one not reported to be found in China. Viruses were detected in thirteen of the bats’ rectal swabs, yielding a single sequence that resembled RaTG13. Full genome sequencing revealed that this virus, which the scientists named RacCS203, was most similar to RmYN02 at the S1/S2 boundary; it had the letters PVA where RmYN02 had PAA. Still no furin cleavage site insertion. Furthermore, the spike of RacCS203 could not use human ACE2 as an entry receptor to infect cells.

These studies showed that SARS-CoV-2-like viruses are widespread in horseshoe bats in Asia. By the time of writing, seven bat species so far have been found to harbour them: R. affinis, malayanus and pusillus in Yunnan, sinicus in Yunnan and Zhejiang, cornutus in Japan, shameli in Cambodia and acuminatus in Thailand. No doubt more will be found and in more locations. The same studies also hinted at the mosaic nature of their genomes, showing evidence of recombination. But the fact that none of them was as genetically close to SARS-CoV-2 as RaTG13, which had been found inside a Mojiang mineshaft by the WIV and brought to Wuhan in 2013, hardly helped the cause of those arguing for a natural origin of the virus. Rather, each discovery of a slightly less similar virus than the one from Mojiang strengthened the case for taking the laboratory-leak hypothesis seriously. As for the furin cleavage site, if dozens or even hundreds more bat SARS-like viruses are collected and still none has a furin cleavage site insertion, the perception of SARS-CoV-2 having originated from a laboratory will go up.

It is worth remembering that the furin cleavage site debate is all about whether the virus was manipulated once in a laboratory; it may or may not clarify whether a natural, non-engineered virus was in a laboratory or had infected researchers during fieldwork. Sure, if the furin cleavage site proves to have been inserted artificially, it confirms that the virus was in a laboratory and was altered. But if, on the other hand, the furin cleavage site proves to be natural, it still says nothing about where the virus came from. A natural bat virus with a natural furin cleavage site could still have leaked from a lab. Indeed, its possession of such an aid to infectivity might be precisely what made it challenging to contain in a BSL-2 or BSL-3 laboratory. The same logic applies to a natural infection: a virus that possessed an advantageous furin cleavage site rather than one less well-endowed would be more likely to spark a natural outbreak once it had spilled over into humans. Almost by definition, a virus that starts a pandemic must have been especially infectious. So finding a natural origin of the furin cleavage site will not clear up the question of whether the virus first jumped into people in the wild or in the course of research activities.

The dog that didn’t bark in the night

The first published paper to analyse and discuss the newly discovered genome sequence of SARS-CoV-2 in great detail was the one authored by Drs Shi Zhengli, Peng Zhou, Ben Hu and twenty-six other colleagues in Wuhan. Several of the authors were experienced virologists who specialised in coronaviruses. This was posted as a preprint on 23 January 2020 and published in Nature on 3 February. In their discussion of the novel coronavirus genome, they zeroed in on the spike, noting that ‘the major differences in the sequence of the [spike] gene of 2019-nCoV are the three short insertions in the N-terminal domain as well as changes in four out of five of the key residues in the receptor-binding motif compared with the sequence of SARS-CoV’. They paid careful attention to other features and insertions in the spike gene sequence but did not even once mention the feature that stands out like a sore thumb: the never-seen-in-a-sarbecovirus-before furin cleavage site insertion. We were not the only scientists to be surprised by this omission. Even without experiments to prove it, this unique feature would have been predicted to affect the infectiousness of the virus and its ability to hijack different host species and cell types. Indeed, a figure in the paper’s ‘extended data’ shows the amino-acid sequence of the S1 section of the spike alongside those of six other viruses, but it stops at position 675, just short of the furin cleavage site (positions 681–685).

It is one of the strangest omissions in a scientific paper. The sarbecovirus specialists were clearly paying extremely close attention to this part of the genome, but the most remarkable feature of all escaped their attention. Not one of them appears to have said, when reading a draft, are you sure you are not leaving out the most interesting bit? It is as if you discover a unicorn and you compare it with other horses, describing in detail the hair and the hooves, but you don’t mention the horn. This was also the paper that first mentioned RaTG13 by its new name and did not connect it to the 4991 SARS-like virus sequence published in 2016 or to the mysterious pneumonia cases in 2012 that had spurred Chinese research teams to scour the Mojiang mine for viruses. If this were the plot of a novel, the reader would think something was up.

A week after the preprint had been posted, another paper was published online, on 31 January 2020, in the journal Emerging Microbes & Infections. This time it had just three authors, of which only one was from the WIV: Dr Shi. In this paper, the authors noted that the genome of the novel coronavirus was 89 per cent similar to the two Zhoushan bat viruses (ZC45 and ZXC21) but did not mention RaTG13. The authors included a diagram showing the location of the S1/S2 junction in the spike. Yet once again they made no mention of the novel furin cleavage site insertion that is absent from ZC45 and ZXC21, and all other sarbecoviruses, yet present in SARS-CoV-2, or 2019-nCoV as it was then known. These three authors – Drs Shibo Jiang, Lanying Du and Shi Zhengli – had been part of the 2015 MERS study described earlier in this chapter, in which an S1/S2 furin cleavage site had been introduced and found to enhance the capabilities of a MERS-like coronavirus spike. Despite saying then that such cleavage sites ‘played critical roles in the bat-to-human transmission of MERS-CoV, either directly or through intermediate hosts’, in 2020 they did not even mention the uniqueness of the furin cleavage site insertion in SARS-CoV-2. The only thing they said was: ‘By aligning 2019-nCoV S protein sequence with those of SARS-CoV and several bat-SL-CoVs, we predicted that the cleavage site for generating S1 and S2 subunits is located at R694/S695.’ For context, the 2003 SARS virus also has a cleavage site at this location. What is novel about SARS-CoV-2 is the insertion directly upstream of the R forming a PRRA(R) furin cleavage site. Even having inspected this very site in the spike, the three scientists said not a word about how the furin cleavage site insertion was missing from the 2003 SARS and bat SARS-like viruses but present in SARS-CoV-2 – a fact that, in our view, should have rung alarm bells about the virulence and infectivity of the new virus. These two papers by Dr Shi were submitted to the journals on the day that the Chinese authorities conceded to the world that human-to-human transmission was happening.

Thus it was left to another Chinese team and a team of French and Canadian scientists, as mentioned at the beginning of this chapter, to publicly point out the obvious in January and February 2020 respectively.

How could the Wuhan scientists have missed this out-of-place feature, which was not only obvious but, as both other papers’ comments made clear, ominous? Could these experienced coronavirus researchers really have missed such a critical discovery in their careful characterisation of the virus’s genome? Even after recently publishing work on introducing an S1/S2 furin cleavage site in the spike of a MERS-like virus? Or did they see the furin cleavage site but decide against drawing attention to it? These are questions we would love to put to Dr Shi and her colleagues if they would respond to emails.

Some scientists have pointed out that two other early papers by Chinese virologists describing SARS-CoV-2’s genome had also missed the furin cleavage site insertion. However, these two other papers had not been looking at the S1/S2 region. One was the paper that had been submitted to Nature on 7 January 2020 by Dr Zhang Yongzhen’s group, describing the first SARS-CoV-2 genome to be made public. Remember that his team had only obtained the sequence two days before submitting their manuscript to the journal, and Dr Edward Holmes who was an author on the paper only had the genome for about an hour before posting it online on 11 January. Their paper had zoned right in on the spike receptor-binding domain, which continues to be the region of greatest interest in the SARS-CoV-2 genome – many of the variants of concern are defined by mutations in this area. The second paper appeared to perform a cursory analysis of the spike, pointing out that the novel coronavirus spike had ‘only a few minor insertions or deletions’ and was ‘longer’ compared to closely related virus spikes.

What we do know is that, for more than a decade, scientists had been doing experiments deliberately designed to insert or delete furin cleavage sites with a view to seeing whether they made viruses more or less able to invade various types of host cells under different conditions. We do not know if the WIV or another laboratory in Wuhan was engaging in similar work using bat coronaviruses, but the omission of the furin cleavage site in two keystone Covid-19 papers of the WIV is curious to say the least.

In our view, it is this fact, more than anything else, that lends this funny little genetic insertion its walk-on part in the Covid-19 play. Given that Dr Shi’s group had made chimeric SARS viruses and Dr Shi and co-authors had recently collaborated on a project studying parallel sites in MERS-like viruses, their silence on the unique furin cleavage site with critical implications when they published the first sarbecovirus genome is the dog that did not bark in the night-time.