‘Creativity in biology is not that different from creativity in mathematics.’
– Gregory Chaitin1
The story of life is really two narratives tightly interwoven. One concerns complex chemistry, a rich and elaborate network of reactions. The other is about information, not merely passively stored in genes but coursing through organisms and permeating biological matter to bestow a unique form of order. Life is thus an amalgam of two restlessly shifting patterns, chemical and informational. These patterns are not independent but are coupled together to form a system of cooperation and coordination that shuffles bits of information in a finely choreographed ballet. Biological information is more than a soup of bits suffusing the material contents of cells and animating it; that would amount to little more than vitalism. Rather, the patterns of information control and organize the chemical activity in the same manner that a program controls the operation of a computer. Thus, buried inside the ferment of complex chemistry is a web of logical operations. Biological information is the software of life. Which suggests that life’s astonishing capabilities can be traced right back to the very foundations of logic and computation.
A pivotal event in the history of computation was a lecture delivered by the distinguished German mathematician David Hilbert in 1928 to an international congress in Bologna, Italy. Hilbert used the occasion to outline his favourite unanswered mathematical problems. The most profound of these concerned the internal consistency of the subject itself. At root, mathematics is nothing but an elaborate set of definitions, axiomsfn1 and the logical deductions flowing from them. We take it for granted that it works. But can we be absolutely rock-solidly sure that all pathways of reasoning proceeding from this austere foundation will never result in a contradiction? Or simply fail to produce an answer? You might be wondering, Who cares? Why does it matter whether mathematics is consistent or not, so long as it works for practical purposes? Such was the mood in 1928, when the problem was of interest to only a handful of logicians and pure mathematicians. But all that was soon to change in the most dramatic manner.
The issue as Hilbert saw it was that, if mathematics could be proved consistent in a watertight manner, then it would be possible to test any given mathematical statement as either true or false by a purely mindless handle-turning procedure, or algorithm. You wouldn’t need to understand any mathematics to implement the algorithm; it could be carried out by an army of uneducated employees (paid calculators) or a machine, cranking away for as long as it took. Is such an infallible calculating machine possible? Hilbert didn’t know, and he dignified the conundrum with the title Entscheidungsproblem (in English ‘the decision problem’, but usually referred to as ‘the halting problem’.) The term was chosen to address the basic issue of whether some computations might simply go on for ever: they would never halt. The hypothetical machine might grind away for all eternity with no answer forthcoming. Hilbert was not interested in the practical matter of how long it might take to get an answer, only whether the machine would reach the end of the procedure in a finite time and output one of two answers: true or false. It may seem reasonable to expect the answer always to be yes. What could possibly go wrong?
Hilbert’s lecture was published in 1929, the same year that Szilárd’s demon paper appeared. These two very different thought experiments – a calculating engine that may not halt and a thermodynamic engine that may generate perpetual motion – turn out to be intimately connected. At the time, however, neither man was aware of that. Nor did they have an inkling that, deep inside biology’s magic puzzle box, concealed by layer upon layer of bewildering complexity, it was the incessant drumbeat of mathematics that bestowed the kiss of life.
Mathematics often springs surprises, and at the time of Hilbert’s lecture trouble was already brewing in the logical foundations of the subject.fn2 There had been earlier attempts to prove the consistency of mathematics, but in 1901 they were startlingly derailed by the philosopher Bertrand Russell, who identified a famous paradox that lurks inside all formal systems of reasoning. The essence of Russell’s paradox is easily described. Consider the following statement, labelled A:
A: This statement is false.
Suppose we now ask: is A true or false? If A is true, then the statement itself declares A to be false. But if A is false, then it is true. By referring to itself in a contradictory way, A seems to be both true and false, or neither. We might say it is undecidable. Because mathematics is founded on logic, after Russell the entire basis of the discipline began to look shaky. Russell’s paradoxes of self-reference set a time bomb ticking that was to have the most far-reaching consequences for the modern world.
It took the work of an eccentric and reclusive Austrian logician named Kurt Gödel to render the full import of self-referential paradoxes evident. In 1931 he published a paper demonstrating that no consistent system of axioms exists that can prove all true statements of arithmetic. His proof hinged on the corrosive existence of self-referential relationships, which imply that there will always be true arithmetic statements that cannot ever be proved true within that system of axioms. More generally, it followed that no finite system of axioms can be used to prove its own consistency; for example, the rules of arithmetic cannot themselves be used to prove that arithmetic will always yield consistent results.
Gödel shattered the ancient dream that cast-iron logical reasoning would always produce irrefutable truth. His result is arguably the highest product of the human intellect. All other discoveries about the world of physical things or the world of reason tell us something we didn’t know before. Gödel’s theorem tells us that the world of mathematics embeds inexhaustible novelty; even an unbounded intellect, a god, can never know everything. It is the ultimate statement of open-endedness.
Constructed as it was in the rarefied realm of formal logic, Gödel’s theorem had no apparent link with the physical world, let alone the biological world. But only five years later the Cambridge mathematician Alan Turing established a connection between Gödel’s result and Hilbert’s halting problem, which he published in a paper entitled ‘On computable numbers, with an application to the Entscheidungsproblem’.2 It proved to be the start of something momentous.
Turing is best known for his role in cracking the German Enigma code in the Second World War, working in secret at Bletchley Park in the south of England. His efforts saved countless Allied lives and shortened the war by many months, if not years. But history will judge his 1936 paper to be more significant than his wartime work. To address Hilbert’s problem of mindless computation Turing envisaged a calculating machine rather like a typewriter, with a head that could scan a moving tape and write on it. The tape would be of unlimited length and divided into squares on which symbols (e.g. 1, 0) could be printed. As the tape passed through the machine horizontally and each square reached the head, the machine would either erase or write a symbol on it or leave it alone, and then advance the tape either left or right by one square, and repeat the process, over and over, until the machine halted and delivered the answer. Turing proved that a number was computable if and only if it could be the output of such a machine after a finite (but possibly huge) number of steps. The key idea here was that of a universal computer: ‘a single machine which can be used to compute any computable sequence’.3 Here in this simple statement is the genesis of the modern computer, a device we now take for granted.fn3
From the pure mathematical point of view, the import of Turing’s paper is a proof that there isn’t, and can never be, an algorithm to solve the Entscheidungsproblem – the halting problem. In plain English, there can be no way to know in advance, for general mathematical statements, whether or not Turing’s machine will halt and output an answer of true or of false. As a result, there will always be mathematical propositions that are quite simply undecidable. Though one may certainly take a given decidable proposition (e.g. eleven is a prime number) and prove it to be true or false, no one can prove that a statement is undecidable.
Though the ramifications of Turing’s imaginary computing machine proved stunning for mathematicians, it was the practical application that soon assumed urgency. With the outbreak of war, Turing was tasked with turning his abstract brainchild into a working device. By 1940 he had designed the world’s first programmable electronic computer. Christened Colossus, it was built by Tommy Flowers at the Post Office’s telephone exchange at Dollis Hill in London and installed at the top-secret code-breaking establishment in Bletchley Park. Colossus became fully operational in 1943, a decade before IBM built its first commercial machine. The sole purpose of Colossus was to assist in the British code-breaking effort and so it was built and operated under an exceptionally tight security blanket. For political reasons, the culture of secrecy surrounding Bletchley Park persisted well after the end of the war and is part of the reason why Flowers and Turing often do not receive credit for being the first architects of the computer. It also allowed the initiative for the commercialization of computers to pass to the United States, where wartime work in this area was rapidly declassified.
Although it was primarily directed at mathematicians, Turing’s work was to have deep implications for biology. The particular logical architecture embodied in living organisms mirrors the axioms of logic itself. Life’s defining property of self-replication springs directly from the paradox-strewn domain of propositional calculus and self-reference that underpins the concept of computation, in turn opening the way to now-familiar properties like simulation and virtual reality. Life’s ability to construct an internal representation of the world and itself – to act as an agent, manipulate its environment and harness energy – reflects its foundation in the rules of logic. It is also the logic of life that permits biology to explore a boundless universe of novelty, to create ‘forms most wonderful’, to use Darwin’s memorable phrase.
Given that undecidability is enshrined in the very foundations of mathematics, it will also be a fundamental property of a universe based on mathematical laws. Undecidability guarantees that the mathematical universe will always be unbounded in its creative potential. One of the hallmarks of life is its limitless exuberance: its open-ended variety and complexity. If life represents something truly fundamental and extraordinary, then this quality of unconstrained possibility is surely key. Many of the great scientists of the twentieth century spotted the connection between Turing’s ideas and biology. What was needed to cement the link with biology was the transformation of a purely computational process into a physical construction process.
Across the other side of the Atlantic from Alan Turing, the Hungarian émigré John von Neumann was similarly preoccupied with designing an electronic computer for military application, in his case in connection with the Manhattan Project (the atomic bomb). He used the same basic idea as Turing – a universal programmable machine that could compute anything that is computable. But von Neumann also had an interest in biology. That led him to propose the idea of a universal constructor (UC), full details of which had to await the posthumous publication of his book Theory of Self-reproducing Automata.4
The concept of a UC is easy to understand. Imagine a machine that can be programmed to build objects by selecting components from a pool of materials and assembling them into a functional product. Today we are very familiar with robot assembly lines doing just that, but von Neumann had in mind something more ambitious. Robotic systems are not UCs: a car assembly line can’t build a fridge, for example. To be a truly universal constructor, the UC has to be able to build anything that is in principle constructible, subject to a supply of components. And now here comes the twist that connects Gödel, Turing and biology. A UC also has to be able to build a copy of itself. Remember that it was precisely the paradox of self-reference that led Turing to the idea of a universal computer. The idea of a self-reproducing machine thus opens the same logical can of worms. Significantly, living organisms seem to be actual self-reproducing machines. We thus gain insight into the logical architecture of life by deliberating on the concepts of a universal computer (Turing machine) and a universal constructor (von Neumann machine).
An important point von Neumann stressed is that it is not enough for a UC simply to make a replica of itself. It also has to replicate the instructions for how to make a UC and insert those instructions into the freshly minted replica; otherwise, the UC’s progeny would be sterile. These days we think of robotic instructions as being invisibly programmed into a computer that drives the robot, but to better see the logic of self-reproducing machines it is helpful to think of the instructions imprinted on a punched tape of the sort that drives an old-fashioned pianola (and close to the concept of the tape in a Turing machine). Imagine that the UC has a punched tape fed into it telling it how to make this or that object and that the machine blindly carries out the instructions imprinted on the tape. Among the many possible tapes, each peppered with strategically located holes, there will be one with a pattern of holes that contains the instructions for building the UC itself. This tape will chug through the machine and the UC will build another UC. But, as stated, that’s not enough; the mother UC now has to make a copy of the instruction tape. For this purpose, the tape now has to be treated not as a set of instructions but as just another physical object to be copied. In modern parlance, the tape must undergo a change of status from being software (instructions) into being hardware – some material with a certain pattern of holes. Von Neumann envisaged what he called a supervisory unit to effect the switch, that is, to toggle between hardware and software as the circumstances demanded. In the final act of the drama, the blindly copied instruction tape is added to the newly made UC to complete the cycle. The crucial insight von Neumann had is that the information on the tape must be treated in two distinct ways. The first is as active instructions for the UC to build something. The second is as passive data simply to be copied as the tape is replicated.
Life as we know it reflects this dual role of information. DNA is both a physical object and an instruction set, depending on circumstances. When a cell is just getting on with life, and this or that protein is needed for some function, the instructions for building the relevant protein are read out from DNA and the protein is made by a ribosome. In this mode, DNA is acting as software. But when the time comes for the cell to replicate and divide, something quite different happens. Special enzymes come along and blindly copy the DNA (including any accumulated flaws) so there is one copy available for each cell after division takes place.fn4 So the logical organization of a living cell closely mirrors that of a von Neumann self-replicating machine. What is still a mystery, however, is the biological equivalent of the supervisory unit that determines when instructions need to switch to become passive data. There is no obvious component in a cell, no special organelle that serves as ‘the strategic planner’ to tell the cell how to regard DNA (as software or hardware) moment by moment. The decision to replicate depends on a large number of factors throughout the cell and its environment; it is not localized in one place. It provides an example of what is known as epigenetic control involving top-down causation,5 a topic I shall discuss in detail later.
Von Neumann recognized that replication in biology is very different from simple copying. After all, crystals grow by copying. What makes biological replication non-trivial is its ability to evolve. If the copying process is subject to errors, and the errors are also copied, then the replication process is evolvable. Heritable errors are, of course, the driver of Darwinian evolution. If a von Neumann machine is to serve as a model for biology, it must incorporate the two key properties of self-replication and evolvability.
The idea of von Neumann machines has penetrated the world of science fiction and spawned a certain amount of scaremongering. Imagine a mad scientist who succeeds in assembling such a device and releasing it into the environment. Given a supply of raw materials, it will just go on replicating and replicating, appropriating what it needs until the supply is exhausted. Dispatched into space, von Neumann machines could ravage the galaxy and beyond. Of course, living cells are really a type of von Neumann machine, and we know that a predator let loose can decimate an ecosystem if it spreads unchecked. Terrestrial biology, however, is full of checks and balances arising from the complex web of life, with its vast number of interdependent yet different types of organism, so the damage from unconstrained multiplication is limited. But a solitary replicating interstellar predator may be a different story altogether.
Although von Neumann didn’t attempt to build a physical self-reproducing machine, he did devise a clever mathematical model that captures the essential idea. It is known as a cellular automaton (CA), and it’s a popular tool for investigating the link between information and life. The best-known example of a CA is called, appropriately enough, the Game of Life, invented by the mathematician John Conway and played on a computer screen. I need to stress that the Game of Life is very far removed from real biology, and the word ‘cell’ in cellular automata is not intended to have any connection with living cells – that’s just an unfortunate terminological coincidence. (A prison cell is a closer analogy.) The reason for studying cellular automata is because, in spite of their tenuous link with biology, they capture something deep about the logic of life. Simple it might be, but the Game embeds some amazing and far-reaching properties. Small wonder then that it has something of a cult following; people like playing it, even setting it to music, mathematicians enjoy exploring its arcane properties, and biologists mine it for clues about what makes life tick at the most basic level of its organizational architecture.
This is how the Game works. Take an array of squares, like a chessboard or pixels on a computer screen. Each square may either be filled or not. The filled squares are referred to as ‘live’, the empty ones as ‘dead’. You start out with some pattern of live and dead squares – it can be anything you like. To make something happen there must be rules to change the pattern. Every square has eight neighbouring squares: in a simple CA, how a given square changes its state (live or dead) depends on the state of those eight neighbours. These are the rules Conway chose:
The rules are applied simultaneously to every square in the array and the pattern (generally) changes – it is ‘updated’. The rules are applied repeatedly, each step being one ‘generation’, creating shifting patterns that can have a rather mesmeric effect. The real interest of the game, however, is less for art or amusement and more as a tool for studying complexity and information flow among the shapes. Sometimes the patterns on a computer screen seem to take on a life of their own, moving across the screen coherently, or colliding and creating new shapes from the debris. A popular example is called the glider, a cluster of five filled squares that crawls across the screen with a tumbling motion (see Fig. 8). It is surprising that such compelling complexity can arise from the repeated application of simple rules.fn5
Given a random starting pattern, several things can happen in the Game. The patterns may evolve and shift for a while but end up vanishing, leaving the screen blank. Or they may be hung up in static shapes, or cycle through the same shapes again and again every few generations. More interestingly, they may go on for ever, generating unlimited novelty – just like in real biology. How can one know in advance which starting patterns will generate unbounded variety? It turns out that, generally, you can’t know. The patterns aren’t arbitrary but obey higher-level rules of their own. It has been proved that the patterns themselves can implement basic logical operations in their behaviour. They are a computer inside a computer! The patterns can thus represent a Turing machine or a universal computer, albeit a slow one. Because of this property, Turing’s undecidability analysis can be directly applied to the Game of Life. Conclusion: one cannot in any systematic way decide in advance whether a given initial pattern settles down or runs on for ever.6
I still find it a bit eerie that patterns on a computer screen can become unshackled from their substrate and create a universe of their own, Matrix-like, while still being tied to the iron rule of logic in their every move. But such is the power of Gödelian undecidability: the strictures of logic are compatible with the creation of unpredictable novelty. However, the Game of Life does prompt some serious questions about cause and effect. Can we really treat the shapes on the screen as ‘things’ able to ‘cause’ events, such as the untidy detritus of collisions? The shapes are, after all, not physical objects but informational patterns; everything that happens to them can be explained at the lower level of the computer program. Yet the fundamental undecidability inherent in the system means that there is room for emergent order. Evidently, higher-level informational ‘rules of engagement’ can be formulated at the level of shapes. Something like this must be going on in life (and consciousness), where the causal narrative can be applied to informational patterns independently of the physical substrate.
Though it is tempting to think of the shapes in the Game of Life as ‘things’ with some sort of independent existence obeying certain rules, there remains a deep question: in what sense can it be said that the collision of two shapes ‘causes’ the appearance of another? Joseph Lizier and Mikhail Prokopenko at the University of Sydney tried to tease out the difference between mere correlation and physical causation by performing a careful analysis of cellular automata, including the Game of Life.7 They treated information flowing through a system as analogous to injecting dye into a river and searching for it downstream. Where the dye goes is ‘causally affected’ by what happens at the injection point. Or, to use a different image, if A has a causal effect on B, it means that (metaphorically speaking) wiggling A makes B wiggle too, a little later. But Lizier and Prokopenko also recognized the existence of what they term ‘predictive information transfer’, which occurs if simply knowing something about A helps you to know better what B might do next, even if there is no direct physical link between A and B.fn6 One might say that the behaviour of A and B are correlated via an information pattern that enjoys its own dynamic. The conclusion is that information patterns do form causal units and combine to create a world of emergent activity with its own narrative. Iconoclastic though this statement may seem, we make a similar assumption all the time in daily life. For example, it is well known that as people get older they tend to become more conservative in their tastes and opinions. While this is hardly a law of nature, it is a universal feature of human nature, and we all regard ‘human nature’ as a thing or property with a real existence, even though we know that human thoughts and actions are ultimately driven by brains that obey the laws of physics.
There are many ways in which CAs can be generalized. For example, Conway’s rules are ‘local’ – they involve only nearest neighbours. But non-local rules, in which a square is updated by reference to, say, the neighbours both one and two squares away, are readily incorporated. So are asynchronous update rules, whereby different squares are updated at different steps. Another generalization is to permit squares to adopt more than two states, rather than simply being ‘live’ or ‘dead’. Von Neumann’s main motivation, remember, was to construct a CA that would have the property of both self-reproduction and evolvability. Conway’s Game of Life is provably evolvable, but can it also support self-reproduction? Yes, it can. On 18 May 2010 Andrew J. Wade, a Game of Life enthusiast, announced he had found a pattern, dubbed Gemini, that does indeed replicate itself after 34 million generations. On 23 November 2013 another Game of Life devotee, Dave Greene, announced the first replicator that creates a complete copy of itself, including the analogue of the crucial instruction tape, as von Neumann specified. These technical results may seem dry, but it is important to understand that the property of self-replication reflects an extremely special aspect of the Game’s logic. It would not be the case for an arbitrary set of automaton rules, however many steps were executed.
All of which brings me to an important and still-unanswered scientific question that flows from von Neumann’s work. What is the minimum level of complexity needed to attain the twin features of non-trivial replication and open-ended evolvability? If the complexity threshold is quite low, we might expect life to arise easily and be widespread in the cosmos. If it is very high, then life on Earth may be an exception, a freak product of a series of highly improbable events. Certainly the cellular automaton that von Neumann originally proposed was pretty complex, with each square being assigned one of twenty-nine possible states. The Game of Life is much simpler, but it requires major computational resources and still represents a daunting level of complexity. However, these are merely worked-out examples, and the field is still the subject of lively investigation. Nobody yet knows the minimal complexity needed for a CA computer model of a von Neumann machine, still less that for a physical UC made of molecules.
Recently, my colleagues Alyssa Adams and Sara Walker introduced a novel twist into the theory of cellular automata. Unlike the Game of Life, which plays out its drama across a two-dimensional array of cells, Adams and Walker used a one-dimensional row of cells. As before, cells may be filled or empty. You start with an arbitrary pattern of filled squares and evolve one step at a time using an update rule – an example is shown in Fig. 9. Time runs downwards in the figure: each horizontal line is the state of the CA at that time step, as derived from the row above by application of the rule. Successive applications generate the pattern. The mathematician Stephen Wolfram did an exhaustive study of one-dimensional CAs: there are 256 possible update rules that take into account the nearest-neighbour squares only. As with the Game of Life, some patterns are boring, for example they become hung up in one state or cycle repeatedly among the same few states. But Wolfram discovered that there are a handful of rules that generate far greater complexity. Fig. 10 shows one example, using Wolfram’s Rule 30 and a single filled square as an initial condition. Compare the regularity of Fig. 9 (which uses Rule 90) with the elaborate structure of Fig. 10 (which uses Rule 30).
Adams and Walker wanted a way to make the CA a more realistic representation of biology by including a changing environment, so they coupled two CAs together (computationally speaking): one CA represented the organism, another the environment. Then they introduced a fundamental departure from conventional CA models: they allowed the update rule for the ‘organism’ to change. To determine which of the 256 rules to apply at each step they bundled the ‘organism’ CA cells into adjacent triplets (that is, 000, 010, 110, and so on) and compared the relative frequencies of each triplet with the same patterns in the ‘environment’ CA. (If this seems convoluted and technical, don’t worry; the details don’t matter, just the general idea that introducing non-local rules can be a powerful way to generate novel forms of complexity.) So this arrangement changes the update rule as a function of both the state of the ‘organism’ itself – making it self-referential – and of the ‘environment’ – making it an open system. Adams and Walker ran thousands of case studies on a computer to look for interesting patterns. They wanted to identify evolutionary behaviour that is both open-ended (that is, didn’t quickly cycle back to the starting state) and innovative. Innovation in this context means that the observed sequence of states could never occur in any of the 256 possible fixed-rule CAs, even taking into account every possible starting state. It turned out such behaviour was rare, but there were some nice clear-cut examples. It took a lot of computing time, but they discovered enough to be convinced that, even in this simple model, state-dependent dynamics provide novel pathways to complexity and variety.8 Their work nicely illustrates that merely processing bits of information isn’t enough; to capture the full richness of biology, the information-processing rules themselves must evolve. I shall return to this important theme in the Epilogue.
Whatever the minimal complexity for life may be, there is no doubt that even the simplest known life form is already stupendously complex. Indeed, life’s complexity is so daunting that it is tempting to give up trying to understand it in physical terms. A physicist may be able to give an accurate account of a hydrogen atom, or even a water molecule, but what hope is there for describing a bacterium in the same terms?
A generation or two ago things looked a lot brighter. Following the elucidation of the structure of DNA and the cracking of the universal genetic code, biology was gripped by reductionist fervour. There was a tendency to think that answers to most biological questions were to be found at the level of genes, a viewpoint eloquently articulated by Richard Dawkins with his concept of the selfish gene.9 And there is no doubt that reductionism applied to biology has scored some notable successes. For example, specific defective genes have been linked to a number of heritable conditions such as Tay-Sachs syndrome. But it soon became clear that there is generally no simple connection between a gene, or a set of genes, and a biological trait at the level of the organism. Many traits emerge only when the system as a whole is taken into account, including entire networks of genes in interaction, plus many non-genetic, or so-called epigenetic, factors that may also involve the environment (a topic to which I shall return in the next chapter). And when it comes to social organisms – for example, ants, bees and humans – a complete account requires consideration of the collective organization of the whole community. As these facts sank in, biology began to look hopelessly complex again.
But perhaps all is not lost. The flip side of reductionism is emergence – the recognition that new qualities and principles may emerge at higher levels of complexity that can themselves be relatively simple and grasped without knowing much about the levels below. Emergence has acquired something of a mystical air but in truth it has always played a role in science. An engineer may fully understand the properties of steel girders without the need to consider the complicated crystalline structure of metals. A physicist can study patterns of convection cells knowing nothing about the forces between water molecules. So can ‘simplification from emergence’ work in biology too?
Confronting this very issue, the Russian biologist Yuri Lazebnik wrote a humorous essay entitled ‘Can a biologist fix a radio?’10 Like radio receivers, cells are set up to detect external signals that trigger appropriate responses. Here’s an example: EGF (epidermal growth factor) molecules may be present in tissues and bind to receptor molecules on the surface of a particular cell. The receptor straddles the cell membrane and communicates with other molecules in the cell’s innards. The EGF binding event triggers a signalling cascade inside the cell, resulting in altered gene expression and protein production which, in this case, leads to cell proliferation. Lazebnik pointed out that his wife’s old transistor radio is also a signal transducer (it turns radio waves into sound) and, with hundreds of components, about as complex as a signal transduction mechanism in a cell.
Lazebnik’s wife’s radio had gone wrong and needed fixing. How, wondered Lazebnik, might a reductionist biologist tackle the problem? Well, the first step would be to acquire a large number of similar radios and peer into each, noting the differences and cataloguing the components by their colour, shape, size, and so on. Then the biologist might try removing one or two elements or swapping them over to see what happened. Hundreds of learned papers could be published on the results obtained, some of them puzzling or contradictory. Prizes would be awarded, patents granted. Certain components would be established as essential, others less so. Removing the essential parts would cause the radio to stop completely. Other parts might affect only the quality of the sound in complex ways. Because there are dozens of components in a typical transistor radio, linked together in various patterns, the radio would be pronounced ‘very complex’ and possibly beyond the ability of scientists to understand, given how many variables are involved. Everyone would agree, however, that a much bigger budget would be needed to extend the investigation.
In the expanded research programme a useful line of inquiry would be to use powerful microscopes to look for clues inside the transistors and capacitors and other objects, right down to the atomic level. The huge study might well go on for decades and cost a fortune. And it would, of course, be useless. Yet what Lazenbik describes in the transistor radio satire is precisely the approach of much of modern biology. The major point that the author wanted to make is that an electronic engineer, or even a trained technician, would have little difficulty fixing the defective radio, for the simple reason that this person would be well versed in the principles of electronic circuitry. In other words, by understanding how radios work and how the parts are wired together to achieve well-defined functions, the task of fixing a defective model is rendered straightforward. A few carefully chosen tweaks, and the music plays again. Lazenbik laments that biology has not attained this level of understanding and that few biologists even think about life in those terms – in terms of living cells containing modules which have certain logical functions and are ‘wired together’, chemically speaking, to form networks with feedback, feed-forward, amplification, transduction and other control functions to attain collective functionality. The main point is that in most cases it is not necessary to know what is going on inside those modules to understand what is happening to the system as a whole.
Fortunately, times are changing. The very notion of life is being reconceptualized, in a manner that closely parallels the realms of electronics and computing. A visionary manifesto for a future systems biology along these lines was published in Nature in 2008 by the Nobel prizewinning biologist Paul Nurse, soon to become President of The Royal Society. In a paper entitled ‘Life, logic and information’, Nurse heralded a new era of biology.11 Increasingly, he pointed out, scientists will seek to map molecular and biochemical processes into the biological equivalent of electronic circuit boards:
Focusing on information flow will help us to understand better how cells and organisms work … We need to describe the molecular interactions and biochemical transformations that take place in living organisms, and then translate these descriptions into the logic circuits that reveal how information is managed … Two phases of work are required for such a programme: to describe and catalogue the logic circuits that manage information in cells, and to simplify analysis of cellular biochemistry so that it can be linked to the logic circuits … A useful analogy is an electronic circuit. Representations of such circuits use symbols to define the nature and function of the electronic components used. They also describe the logic relationships between the components, making it clear how information flows through the circuit. A similar conceptualization is required of the logic modules that make up the circuits that manage information in cells.
Philosophers and scientists continue to bicker over whether, ‘in principle’, all biological phenomena could be reduced solely to the goings-on of atoms, but there is agreement that, as a practical matter, it makes far more sense to search for explanations at higher levels. In electronics, a device can be perfectly well designed and assembled from standard components – transistors, capacitors, transformers, wires, and so on – without the designer having to worry about the precise processes taking place in each component at the atomic level. You don’t have to know how a component works, only what it does. And where this practical approach becomes especially powerful is when the electronic circuit is processing information in some way – in signal transduction, rectification or amplification, or as a component in a computer – because then the explanatory narrative can be cast entirely in terms of information flow and software, without any reference back to the hardware or module itself, still less its molecular parts. In the same vein, urges Nurse, we should seek, where possible, explanations for processes within a cell, and between cells, based on the informational properties of the higher-level units.
When we look at living things we see their material bodies. If we probe inside, we encounter organs, cells, sub-cellular organelles, chromosomes and even (with fancy equipment) molecules themselves. What we don’t see is information. We don’t see the swirling patterns of information flows in the brain’s circuitry. We don’t see the army of demonic information engines in cells, or the organized cascades of signalling molecules executing their restless dance. And we don’t see the densely packed information stored in DNA. What we see is stuff, not bits. We are getting only half the story of life. If we could view the world through ‘information eyes’, the turbulent, shimmering information patterns that characterize life would leap out as distinctive and bizarre. I can imagine an artificial intelligence (AI) of the future being tuned to information and being trained to recognize people not from their faces but from the informational architecture in their heads. Each person might have their own identification pattern, like the auras of pseudoscience. Importantly, the information patterns in living things are not random. Rather, they have been sculpted by evolution for optimal fitness, just as have anatomy and physiology.
Of course, humans cannot directly perceive information, only the material structures in which it is instantiated, the networks in which it flows, the chemical circuitry that links it all together. But that does not diminish the importance of information. Imagine if we tried to understand how a computer works by studying only the electronics inside it. We could look at the microchip under a microscope, study the wiring diagram in detail and investigate the power source. But we would still have no idea how, for example, Windows performs its magic. To fully understand what appears on your computer screen you have to consult a software engineer, one who writes computer code to create the functionality, the code that organizes the bits of information whizzing around the circuitry. Likewise, to fully explain life we need to understand both its hardware and its software – its molecular organization and its informational organization.
Mapping life’s circuitry is a field still in its infancy and forms part of the subject known as systems biology. Electronic circuits have components that are well understood by physicists. The biological equivalent is not so well understood. Many chemical circuits are controlled by genes ‘wired’ together via chemical pathways to create features like feedback and feed-forward – familiar from engineering, but the details can be messy. To give the flavour, let me focus on a very basic property of life: regulating the production of proteins. Organisms cleverly monitor their environment and respond appropriately. Even bacteria can detect changes around them, process that information and implement the necessary instructions to alter their state to advantage. Mostly, the alteration involves boosting or suppressing the production of certain proteins. Making the right amount of a particular protein is a delicately balanced affair that needs to be carefully tuned. Too much could be toxic; too little may mean starvation. How does a cell regulate how much of a particular protein is needed at any given time? The answer lies with a set of molecules (themselves proteins) known as transcription factors, with distinctive shapes that recognize specific segments of DNA and stick to them. Thus bound, they serve to increase or decrease the rate at which a nearby gene is expressed.
It’s worth understanding precisely how they do this. Earlier (here), I discussed a molecule called RNA polymerase whose job it is to crawl along DNA and ‘read out’ the sequence, creating a matching molecule of RNA as it goes. But RNA polymerase doesn’t just do this whimsically. It waits for a signal. (‘My protein is needed: transcribe me now!’) There is a region of DNA near the start of the gene that issues the ‘go’ signal; it’s called a promoter, because it promotes the transcription process. The RNA is attracted to the promoter and will bind to it to initiate transcription: docking followed by chugging, colloquially speaking. But RNA will dock to the promoter only if the latter is in ‘go’ mode. And it is here that transcription factors regulate what happens. By binding to the promoter region, a given transcription factor can block it and frustrate the RNA docking manoeuvre. In this role, the transcription factor is known, for obvious reasons, as a repressor. All this is fine if the protein isn’t needed. But what happens if circumstances change and the blocked gene needs to be expressed? Obviously, the blocking repressor molecule has to be evicted somehow. Well, how does that step work?
A good example, figured out long ago, is a mechanism used by the commonplace bacterium E. coli. Glucose tops the bacterium’s favourite menu, but if glucose is in short supply this versatile microbe can muddle through by metabolizing another sugar called lactose. To accomplish the switch, the bacterium needs three special proteins, requiring three adjacent genes to be expressed. It would be wasteful to keep these genes active just as a contingency plan, so E. coli has a chemical circuit to regulate the on–off function of the requisite genes. When glucose is plentiful, a repressor transcription factor binds to the promoter region of DNA, close to the three genes, and blocks RNA polymerase from binding and beginning the transcription process of the said genes: the genes remain off. When glucose is unavailable but there is lactose around, a by-product of the lactose binds to the repressor molecule and inactivates it, opening the way for RNA polymerase to attach to the DNA and do its stuff. The three key genes are then expressed and lactose metabolism begins. There is another switching mechanism to turn the lactose genes off again when glucose becomes plentiful once more.
In all, E. coli has about 300 transcription factors to regulate the production of its 4,000 proteins. I have described a repressor function, but other chemical arrangements permit other transcription factors to serve as activators. In some cases, the same transcription factor can activate many genes, or activate some and repress others. These various alternatives can lead to a rich variety of functions.12 (For comparison, humans have about 1,400 transcription factors for their 20,000 genes.)13
Transcription factors may combine their activities to create various logic functions similar to those used in electronics and computing. Consider, for example, the AND function, where a switch Z is turned on only if a signal is received from switches X and Y together. To implement this, a chemical signal flips the transcription factor X into its active shape X*; X is switched on, chemically speaking. Thus activated, X* may then bind to the promoter of gene Y, causing Y to be produced. If, now, there is a second (different) signal that switches Y to its active form, Y*, the cell has both X* and Y* available together. This arrangement can serve as an AND logic gate if there is a third gene, Z, designed (by evolution!) to be switched on only if X* and Y* are present together and bind to its promoter. Other arrangements can implement the OR logic operation, whereby Z is activated if either X or Y is converted to its active form and binds to Z’s promoter. When sequences of such chemical processes are strung together, they can form circuits that implement cascades of signalling and information processing of great complexity. Because transcription factors are themselves proteins produced by other genes regulated by other transcription factors, the whole assemblage forms an information-processing and control network with feedback and feed-forward functions closely analogous to a large electronic circuit. These circuits facilitate, control and regulate patterns of information in the cell.
Given the vast number of possible combinations of molecular components and chemical circuits, you might imagine that the information flow in a cell would be an incomprehensible madhouse of swirling bits. Remarkably, it is far more ordered. There are many recurring themes, or informational motifs, across a wide range of networks, suggesting a high utility for certain biological functions. One example is the feed-forward loop, the basic idea of which I introduced above in connection with E. coli metabolism. Taking into account the possibility that the logic functions can be either AND or OR gates (see Box 6), there are thirteen possible gene regulation combinations, and of those only one, the feed-forward loop, is a network motif.14 Since it is rather easy for a mutation in a gene to remove a link in a chemical network, the fact that certain network motifs survive so well suggests strong selection pressure at work in evolution. There must be a reason why these recurring motifs are, literally, vital. One explanation is robustness. Experience from engineering indicates that when the environment is changing a modular structure with a small range of components adapts more readily. Another explanation is versatility. With a modest-sized toolkit of well-tried and reliable parts, a large number of structures can be built using the same simple design principles in a hierarchical manner (as Lego enthusiasts and electronic engineers know well).15
Although I have focused on transcription factors, there are many other complex networks involved in cellular function, such as metabolic networks that control the energetics of cells, signal transduction networks involving protein–protein interactions, and, for complex animals, neural networks. These various networks are not independent but couple to each other to form nested and interlocking information flows. There are also many additional mechanisms for transcription factors to regulate cellular processes, either individually or in groups, including acting on mRNA directly or modifying other proteins in a large variety of ways. The existence of so many regulatory chemical pathways enables them to fine-tune their behaviour to play ‘the music of life’ by responding to external changes with a high degree of fidelity, much as a well-tuned transistor radio can flawlessly play the music of Beethoven.
In more complex organisms, gene control is likewise more complex. Eukaryotic cells, which have nuclei, package most of their DNA into several chromosomes (humans have twenty-three). Within the chromosomes, DNA is tightly compacted, wrapped around protein spindles and further folded and squashed up to a very high degree. In this compacted form the material is referred to as chromatin. How the chromatin is distributed within the nucleus depends on a number of factors, such as where the cell might be in the cell cycle. For much of the cycle the chromatin remains tightly bound, preventing the genes being ‘read’ (transcribed). If a protein coded by a gene or a set of genes is needed, the architecture of the chromatin has to change to enable the read-out machinery to gain access to the requisite segments. Reorganization of chromatin is under the control of a network of threads, or microtubules. Thus, whole sets of genes may be silenced or activated mechanically, either by keeping them ‘under wraps’ (wrapped up, more accurately) or by unravelling the highly compacted chromatin in that region of the chromosome to enable transcription to proceed. There is more than a faint echo here of Maxwell’s demon. In this case, the nuclear demons quite literally ‘pull the strings’ and open, not a shutter, but an elaborately wound package that encases the relevant information-bearing genes. Significantly, cancer cells often manifest a noticeably different chromatin architecture, implying an altered gene expression profile; I shall revisit that topic in the next chapter.
As scientists unravel the circuit diagrams of cells, many practical possibilities are opening up that involve ‘rewiring’. Bio-engineers are busy designing, adapting, building and repurposing living circuitry to carry out designated biological tasks, from producing new therapeutics to novel biotech processes – even to perform arithmetic. This ‘synthetic biology’ is mostly restricted to bacteria but, recently, new technology has enabled this type of work to be extended to mammalian cells. A technique called Boolean logic and arithmetic through DNA excision, or BLADE, has been developed at Boston University and the ETH in Zurich.16 The researchers can build quite complex logic circuits to order, and foresee being able to use them to control gene expression. Many of the circuits they have built seem to be entirely new, that is, they have never been found in existing organisms. The group of Hideki Koboyashi at Boston University finds the promise of rewiring known organisms compelling: ‘Our work represents a step toward the development of “plug-and-play” genetic circuitry that can be used to create cells with programmable behaviors.’17 Currently, synthetic circuits are a rapidly expanding area of research in systems biology with more publications detailing novel circuits published every year.18 The medical promise of this new ‘electronics’ approach to life is immense. Where disease (for example, cancer) is linked to defective information management – such as a malfunctioning module or a broken circuit link – a remedy might be to chemically rewire the cells rather than destroy them.fn7
Imagine a physician of the future (doubtless an AI) who, through some amazing technology that can detect gene expression in real time, would gaze at the dancing, twinkling patterns like city lights seen from afar and diagnose a patient’s illness. This would be a digital doctor who deals in bits, not tissues, a medical software engineer. I can imagine my futuristic physician proclaiming that there are early signs of cancer in this or that shimmering cluster, or that an inherited genetic defect is producing an anomalous luminous patch, indicating overexpression of such and such a protein in the liver, or maybe quieter spots suggesting that some cells are not getting enough oxygen, oestrogen or calcium. The study of information flow and information clustering would provide a diagnostic tool far more powerful than the battery of chemical tests used today. Treatment would focus on establishing healthy, balanced information patterns, perhaps by attending to, or even re-engineering, some defective modules, much as an electronics engineer (of old) might replace a transistor or a resistor to restore a radio to proper functionality. (In this respect, what I am describing is reminiscent of some Eastern approaches to medicine.) The digital doctor might not seek to replace any hardware modules but instead decide to rewrite some code and upload it into the patient somehow, at the cellular level, to restore normal functionality – a sort of cellular reboot.
It may seem like science fiction, but information biology is paralleling computer technology, albeit a few decades behind. The ‘machine code’ for life was cracked in the 1950s and 1960s with the elucidation of the DNA triplet code and the translation machinery. Now we need to figure out the ‘higher level’ computer language of life. This is an essential next step. Today’s software engineers wouldn’t design a new computer game by writing down vast numbers of 1s and 0s; they use a higher-level language like Python. By analogy, when a cell regulates, for example, the electric potential across its membrane by increasing the number of protons it pumps out, a ‘machine code’ description in terms of gene codons isn’t very illuminating. The cell as a unit operates at a much higher level to manage its physical and informational states, deploying complex control mechanisms. These regulatory processes are not arbitrary but obey their own rules, as do the higher-level computer languages used by software engineers. And, just as software engineers are able to re-program advanced code, so will bio-engineers redesign the more sophisticated features of living systems.
Biological circuitry can generate an exponentially large variety of form and function but, fortunately for science, there are some simple underlying principles at work. Earlier in this chapter I described the Game of Life, in which a few simple rules executed repeatedly can generate a surprising degree of complexity. Recall that the game treats squares, or pixels, as simply on or off (filled or blank) and the update rules are given in terms of the state of the nearest neighbours. The theory of networks is closely analogous. An electrical network, for example, consists of a collection of switches with wires connecting them. Switches can be on or off, and simple rules determine whether a given switch is flipped, according to the signals coming down the wires from the neighbouring switches. The whole network, which is easy to model on a computer, can be put in a specific starting state and then updated step by step, just like a cellular automaton. The ensuing patterns of activity depend both on the wiring diagram (the topology of the network) and the starting state. The theory of networks can be developed quite generally as a mathematical exercise: the switches are called ‘nodes’ and the wires are called ‘edges’. From very simple network rules, rich and complex activity can follow.
Network theory has been applied to a wide range of topics in economics, sociology, urban planning and engineering, and across all the sciences, from magnetic materials to brains. Here I want to consider network theory applied to the regulation of gene expression – whether they are switched on or off. As with cellular automata, networks can exhibit a variety of behaviours; the one I want to focus on is when the system settles into a cycle. Cycles are familiar from electronics. For example, there’s a new top-of-the-range dishwasher in my kitchen, which I installed myself. Inside it has an electronic circuit board (actually just a chip these days) to control the cycle. There are eight different possible cycles. The electronics has a device to halt the cycle if there is a problem. In that, dishwashers are not alone: the cells in your body have a similar circuit to control their cycles.
What is the cell cycle? Imagine a newborn bacterium – that is to say, the parent bacterium has recently split in two. A daughter cell is just starting out on an independent life. The young bacterium gets busy doing what bacteria have to do, which in many cases involves a lot of just hanging out. But its biological clock is ticking; it feels the need to reproduce. Internal changes take place, culminating in the replication of DNA and fission of the entire cell. The cycle is now complete.fn8
In complex eukaryotic organisms the cell cycle is more complicated, as you would expect. A good compromise is yeast, which, like humans, is a eukaryote, but it is single-celled. The cell cycle of yeast has received a lot of attention (and a Nobel Prize, shared by Paul Nurse and my ASU colleague Lee Hartwell) and the control circuit that runs the cycle was worked out by Maria Davidich and Stefan Bornholdt at the University of Bremen.19 In fact, there are many types of yeast. I shall discuss just one, Schizosaccharomyces pombe, otherwise known as fission yeast. The relevant network is shown in Fig. 11. The nodes – the blobs in the figure – represent genes (or, strictly, the proteins the genes encode); the edges are the chemical pathways linking genes (analogous to the wires in electronics); the arrows indicate that one gene activates the other; and the barred line indicates that a gene inhibits or suppresses the other (similar to the way that the neighbouring squares in the Game of Life may prompt or inhibit the square being filled or vacated). Notice there are some genes with loopy broken arrows, indicating self-inhibition. Each gene adds up all the pluses (‘activate!’) and minuses (‘suppress!’) of the incoming arrows and switches itself on, or off, or stays as it is, according to a specific voting rule.
The job of this network is to take the cell step by step through the cycle, halting the proceedings if something goes wrong and returning the system to its initial state when the cycle is over. In this essential functionality, the network may simply be treated as a collection of interconnected switches that can be modelled on a computer. The gene regulatory network controlling the cell cycle of fission yeast is particularly easy to study because, to a good approximation, the genes involved may be considered either fully on or fully off, not dithering in between. This makes for a pleasing simplification because, mathematically, we may represent ‘on’ by 1 and ‘off’ by 0, then make up a rule table with 0s and 1s to describe what happens when the network is started out in some particular state and allowed to run through its little repertoire.fn9
Time step | Start | A | B | C | D | E | F | G | H | I | Cell phase |
1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | Start |
2 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | G1 |
3 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | G1/S |
4 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | G2 |
5 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | G2 |
6 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | G2/M |
7 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 1 | G2/M |
8 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | M |
9 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | M |
10 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | G1 |
Using the way I have labelled the genes in Fig. 11, the starting state of the network is A: off; B: on; C: off; D: on; E: off; F: off; G: off; H: on I: off: in binary, that is 010100010. The cycle begins when the node labelled ‘start’, which sets off the show, flips on (representing an external chemical prompt along the lines of ‘Well, go on then, get on with it!’). It is then straightforward to run a computer model of the network step by step and compare the output with reality. Table 2 shows the state of the network at each step. The intermediate states of 0s and 1s correspond to recognizable physical states which the cell passes through in the cycle. Those physical states are labelled along the right-hand column; the letters stand for biological terms (for example, M stands for ‘mitosis’). After ten steps the network returns to the starting state, awaiting a new ‘start’ signal to initiate the next cycle.
You could make a movie of Fig. 11 in which the nodes light up when they are on and blink out when they are off. There would be a pretty pattern of twinkling lights for ten steps, like fireflies out of kilter. It could perhaps be set to music – the music of life! Let’s scale up this fanciful image and envision a human being as a dazzling constellation of twinkling gene-lights forming myriads of swirling patterns – a veritable cacophony if set to music. The star field would be far more than just the genes that control the cycles of all the different cell types. There would be 20,000 genes, all performing their own act. Some lights might stay mostly off, some mostly on, while others would flip on and off in a variety of beats.
The point I want to make is that these shifting information patterns are not random; they portray the organized activity of the network and therefore the organism. And the question is, what can we learn from studying them, from using mathematics and computer models to characterize the shimmering patterns, to track the flow of information, to see where it is stored and for how long, to identify the ‘manager genes’ and the drones? In short, to build up an informational narrative that captures the essence of what the organism is about, including its management structure, command-and-control architecture and potential failure points.
Well, we can start with yeast: there are only ten nodes and twenty-seven edges in the Schizosaccharomyces pombe cell cycle network. Yet even that requires a lot of computing power to analyse. First order of business is to confirm that the patterns are non-random. More precisely, if you just made up a network randomly with the same number of nodes and edges, would the twinkling lights differ in any distinctive way from Mother Nature’s yeast network? To answer that, my ASU colleagues Hyunju Kim and Sara Walker ran an exhaustive computer study in which they traced the ebb and flow of information as it swirls around the yeast network.20 This sounds easy, but it isn’t. You can’t follow it by eye: there has to be a precise mathematical definition of information transfer (see Box 8). The upshot of their analysis is that there is an elevated and systematic flow of information around the yeast network well in excess of random. Evolution has, it seems, sculpted the network architecture in part for its information-processing qualities.
One may ask of a given network node, say A, whether knowing its history helps in predicting what it will do at the next step. That is, if you look at, say, the three preceding steps of node A and note ‘on’ or ‘off’, does that three-step history improve the odds of you correctly guessing on or off for the next step? If it does, then we can say that some information has been stored in node A. One can then look at another node, say B, and ask, does knowing the current state of B improve the odds of correctly guessing what A will do next, over and above just knowing the history of A? If the answer is yes, it implies that some information has been transferred from B to A. Using that definition, known as ‘transfer entropy’, my colleagues ranked all pairs of nodes in the yeast cell cycle network by the amount of information transferred, and then compared this rank order with those from an average taken over a thousand random networks. There was a big difference. In a nutshell, the yeast gene network transfers markedly more information than a random one. Digging a little deeper to pin down precisely what is making the difference, the researchers zeroed in on a set of four nodes (B, C, D and H in Fig. 11) that seemed to be calling the shots. The special role of these four genes has earned them the name ‘the control kernel’. The control kernel seems to act like a choreographer for the rest of the network, so if one of the other nodes makes a mistake (is on when it should be off, or vice versa), then the control kernel pulls it back into line. It basically steers the whole network to its designated destination and, in biological terms, makes sure the cell fissions on cue with everything in good order. Control kernels seem to be a general feature of biological networks. So in spite of the great complexity of behaviour, a network’s dynamics can often be understood by looking at a relatively small subset of nodes.
It would be wrong of me to give the impression that information flow in biology is restricted to gene regulatory networks. Unfortunately, the additional complexity of some other networks makes them even harder to model computationally, especially as the simple version of 0s and 1s (off and on) mostly won’t do. On top of that, the number of components skyrockets when it comes to more finely tuned functions like metabolism. The general point remains: biology will ‘stand out’ from random complexity in the manner of its information patterning and processing, and though complex, the software account of life will still be vastly simpler than the underlying molecular systems that support it, as it is for electronic circuits.
Network theory confirms the view that information can take on ‘a life of its own’. In the yeast network my colleagues found that 40 per cent of node pairs that are correlated via information transfer are not in fact physically connected; there is no direct chemical interaction. Conversely, about 35 per cent of node pairs transfer no information between them even though they are causally connected via a ‘chemical wire’ (edge). Patterns of information traversing the system may appear to be flowing down the ‘wires’ (along the edges of the graph) even when they are not. For some reason, ‘correlation without causation’ seems to be amplified in the biological case relative to random networks.
In Surely You’re Joking, Mr. Feynman!,21 the raconteur physicist and self-confessed rascal describes how, as a youngster, he gained a reputation for being able to mend malfunctioning radios (yes, that again!). On one occasion he was initially chided for briefly peering into the radio then merely walking back and forth. Being Richard Superbrain Feynman, he had soon figured out the fault and effected a simple repair. ‘He fixes radios by thinking!’ gushed his dazzled client. The truth is, you generally can’t tell just by looking at the layout of an electronic circuit what the problem might be. The performance of a radio depends both on the circuit topology and on the physical characteristics of the components. If a resistor, say, is too large or a capacitor too small, the information flow may not be optimal – the output may be distorted. The same is true of all networks – biological, ecological, social or technological. Similar-looking networks can exhibit very different patterns of information flow because their components – the nodes – may have different properties. In the case of the yeast cell cycle, a simple on-or-off rule was used (with impressive results), but there are many different candidate mathematical relationships that could be employed, and they will yield different flow patterns. The bottom line is, there is no obvious relationship between the information pattern’s dynamics and the ‘circuit’ topology. Therefore, for many practical purposes, it pays to treat the information patterns as ‘the thing of interest’ and forget about the underlying network that supports it. Only if something goes wrong is it necessary to worry about the actual ‘wiring’.
Two Israeli mathematicians, Uzi Harush and Baruch Barzel, recently did a systematic study using a computer model of information flow in a broad range of networks. They painstakingly tracked the contribution that each node and pathway made to the flow of information in an attempt to identify the main information highways. To accomplish this, they tried meddling with the system, for example ‘freezing’ nodes to see how the information flow changed then assessing the difference it might make to the strength of a signal in a specific downstream node. There were some surprises: they found that in some networks information flowed mainly through the hubs (a hub is where many links concentrate, for example, servers in the internet), while in others the information shunned the hubs, preferring to flit around the periphery of the network. In spite of the diversity of results, the mathematicians report that ‘the patterns of information flow are governed by universal laws that can be directly linked to the system’s microscopic dynamics’.22 Universal laws? This claim goes right to the heart of the matter of when it is legitimate to think of information patterns as coherent things with an independent existence. It seems to me that if the patterns themselves obey certain rules or laws, then they may be treated as entities in their own right.
‘Go to the ant, thou sluggard; consider her ways, and be wise.’
– Proverbs 6:6
Network theory has found a fruitful application in the subject of social insects, which also display complex organized behaviour deriving from the repeated application of simple rules between neighbouring individuals. I was once sitting on a beach in Malaysia beneath a straw umbrella fixed atop a stout wooden post. I recall drinking beer and eating potato crisps. One of the crisps ended up on the ground, where I left it. A little later I noticed a cluster of small ants swarming around the abandoned object, taking a lot of interest. Before long they set about transporting it, first horizontally across the sand, then vertically up the wooden post. This was a heroic collective effort – it was a big crisp and they were little ants. (I had no idea that ants liked crisps anyway.) But they proved equal to the task. Organized round the periphery, the gals (worker ants are all female) on top pulled, while those underneath pushed. Where were they headed? I noticed a vertical slot at the top of the post with a few ants standing guard. This must be their nest. But all that pushing and pulling was surely futile because a) the crisp looked too big to fit in the slot and b) the ants would have to rotate the crisp (which was approximately flat) through two right angles to insert it. It would need to project out perpendicular to the post in a vertical plane before the manoeuvre could be executed. Minutes later I marvelled that the ants’ strategy had been successful: the crisp was dragged into the slot in one piece. Somehow, the tiny, pin-brained creatures had assessed the dimensions and flatness of the crisp when it lay on the ground and figured out how to rotate it into the plane of the slot. And they did it on the first try!
Stories like this abound. Entymologists enjoy setting challenges and puzzles for ants, trying to outsmart them with little tricks. Food and nice accommodation seem to be their main preoccupations (the ants’, that is, though no doubt also the entymologists’), so to that end they spend a lot of time foraging, milling around seemingly at random and seeking out a better place to build a nest. There is a big social-insect research group at ASU run by Stephen Pratt, and a visit to the ant lab is always an entertaining experience. Since almost all ants of the same species look the same, the wily researchers paint them with coloured dots so that they can track them, see what they get up to. The ants don’t seem to mind. Although at first glance the scurrying insects look to be taking random paths, they are mostly not. They identify trails based on the shortest distance from the nest and mark them chemically. If, as part of the experiment, their strategy is disrupted by the entymologist, for example by moving a source of food, the ants default to a Plan B while they reassess the local topography. The most distinctive feature of their behaviour is that they communicate with each other. When one ant encounters another, a little ritual takes place that serves to transfer some positional information to the other ant.fn10 In this manner, data gathered by a solitary ant can quickly become disseminated among many in the colony. The way now lies open for collective decision-making.
In the case of the purloined crisp, it was clear that no one ant had a worked-out strategy in advance. There was no foreman (forewoman, really) of the gang. The decision-making was done collectively. But how? If I meet a friend on the way home from work and he asks, ‘How’s yer day goin’, mate?’ he risks being subjected to five minutes of mostly uninteresting banter (which nevertheless might convey a lot of information). Unless ants are very fast talkers, their momentary encounters would not amount to more than a few logical statements along the lines of ‘if, then’. But integrate many ant-to-ant encounters across a whole colony and the power of the collective information processing escalates.
Ants are not alone in their ability to deploy some form of swarm decision-making, even, one might hazard, swarm intelligence. Bird flocks and fish schools also act in unison, swooping and swerving as if all are of one mind. The best guess as to what lies behind this is that the application of some simple rules repeated lots of times can add up to something pretty sophisticated. My ant colleagues at ASU are investigating the concept of ‘distributed computation’, applying information theory to the species Temnothorax rugatulus, which forms colonies with relatively few workers (less than 300), making them easier to track. The goal is to trace how information flows around the colony, how it is stored and how it propagates during nest-building. All this is being done in the lab under controlled conditions. The ants are offered a variety of new nests (the old one is disrupted to give them some incentive to move house), and the investigators study how a choice is made collectively. When ants move en masse, a handful who know the way go back to the nest and lead others along the path: this is called ‘tandem running’. It’s slow going, as the naïve ants bumble along, continually touching the leaders to make sure they don’t get lost (ants can’t see very far). When enough ants have learned the landmarks, tandem running is abandoned in favour of piggy-backing, which is quicker.
One thing my colleagues are focusing on is reverse tandem running, where an ant in the know leads another ant from the new nest back to the old one. Why do that? It seems to have something to do with the dynamics of negative feedback and information erasure, but the issue isn’t settled. To help things along, the researchers have designed a dummy ant made of plastic with a magnet inside. It is guided by a small robot concealed beneath a board on which the ants move. Armed with steerable artificial ants, my colleagues can create their own tandem runs to test various theories. The entire action is recorded on video for later quantitative analysis. (You can tell that this research is a lot of fun!)
Social insects represent a fascinating middle stage in the organization of life, and their manner of information processing is of special interest. But the vast and complex web of life on Earth is woven from information exchange between individuals and groups at all levels, from bacteria to human society. Even viruses can be viewed as mobile information packets swarming across the planet. Viewing entire ecosystems as networks of information flow and storage raises some important questions. For example, do the characteristics of information flow follow any scaling lawsfn11 as you go up in the hierarchy of complexity, from gene regulatory networks through deep-ocean volcanic-vent ecosystems to rainforests? It seems very likely that life on Earth as a whole can be characterized by certain definite information signatures or motifs. If there is nothing special about terrestrial life, then we can expect life on other worlds to follow the same scaling laws and display the same properties, which will greatly assist in the search for definitive bio-signatures on extra-solar planets.
Of all the astonishing capabilities of life, morphogenesis – the development of form – is one of the most striking. Somehow, information etched into the one-dimensional structure of DNA and compacted into a volume one-billionth that of a pea unleashes a choreography of exquisite precision and complexity manifested in three-dimensional space, up to and including the dimensions of an entire fully formed baby. How is this possible?
In Chapter 1, I mentioned how the nineteenth-century embryologist Hans Dreisch was convinced some sort of life force was at work in embryo development. This rather vague vitalism was replaced by the more precise concept of ‘morphogenetic fields’. By the end of the nineteenth century physicists had enjoyed great success using the field concept, originally due to Michael Faraday. The most familiar example is electricity: a charge located at a point in space creates an electric field that extends into the three-dimensional region around it. Magnetic fields are also commonplace. It is no surprise, therefore, that biologists sought to model morphogenesis along similar lines. The trouble was, nobody could give a convincing answer to the obvious question: a field of what? Not obviously electric or magnetic; certainly not gravitational or nuclear. So it had to be a type of ‘chemical field’ (by which I mean chemicals of some sort spread out across the organism in varying concentrations), but the identity of the chemical ‘morphogens’ long remained obscure.
It was to be many more decades before significant progress was made. In the latter part of the twentieth century biologists began approaching morphogenesis from a genetic standpoint. The story they concocted goes something like this. When an embryo develops from a fertilized egg, the original single cell (zygote) starts out with almost all its genes switched on. As it divides again and again various genes are silenced – different genes in different cells. As a result, a ball of originally identical cells begins to differentiate into distinct cell types, partly under the influence of those elusive chemical morphogens that can evidently control gene switching. By the time the embryo is fully formed, the differentiation process has created all the different cell types needed.fn12
All cells in your body have the same DNA, yet a skin cell is different from a liver cell is different from a brain cell. The information in DNA is referred to as the genotype and the actual physical cell is called the phenotype. So one genotype can generate many different phenotypes. Fine. But how do liver cells gather in the liver, brain cells in the brain, and so on – the cellular equivalent of ‘birds of a feather flock together’? Most of what is known comes from the study of the fruit fly Drosophila. Some of the morphogens are responsible for causing undifferentiated cells to differentiate into the various tissue types – eyes, gut, nervous system, and so on – in designated locations. This establishes a feedback loop between cell differentiation and the release of other morphogens in different locations. Substances called growth factors (I mentioned one called EGF earlier in this chapter) accelerate the reproduction of cells in that region, which will alter the local geometry via differential growth. This hand-wavy account is easy to state, but not so easy to turn into a detailed scientific explanation, in large part because it depends on the coupling between chemical networks and information-management networks, so there are two causal webs tangled together and changing over time. Added to all this is growing evidence that not just chemical gradients but physical forces – electric and mechanical – also contribute to morphogenesis. I shall have more to say on this remarkable topic in the next chapter.
Curiously, Alan Turing took an interest in the problem of morphogenesis and studied some equations describing how chemicals might diffuse through tissue to form a concentration gradient of various substances, reacting in ways that can produce three-dimensional patterns. Although Turing was on the right track, it has been slow going. Even for those morphogens that have been identified, puzzles remain. One way to confirm that a candidate chemical does indeed serve as a specific morphogen is to clone the cells that make it and implant them in another location (these are referred to as ectopic cells) to see if they produce a duplicate feature in the wrong place. Often, they do. Flies have been created with extra wings, and vertebrates with extra digits. But even listing all the substances that directly affect the cells immersed in them is only a small part of the story. Many of the chemicals diffusing through embryonic tissue will not affect cells directly but will instead act as signalling agents to regulate other chemicals. Untangling the details is a huge challenge.
A further complicating factor is that individual genes rarely act alone. As I have explained, they form networks in which proteins expressed by one gene can inhibit or enhance the expression of other genes. The late Eric Davidson and his co-workers at the California Institute of Technology managed to work out the entire wiring diagram (chemically speaking) for the fifty-odd gene network that regulates the early-stage development of the sea urchin (it was this lowly animal that attracted the attention of Dreisch a century ago). The Caltech group then programmed a computer, put in the conditions corresponding to the start of development and ran a simulation of the network dynamics step by step, with half-hour time intervals between them. At each stage they could compare the computer model of the state of the circuit with the observed stage of development of the sea urchin. Hey presto! The simulation matched the actual developmental steps (confirmed by measuring the gene expression profile). But the Davidson team went beyond this. They considered the effects of tweaking the circuitry to see what would happen to the embryo. For example, they performed experiments knocking out one of the genes in the network called delta, which caused the loss of all the non-skeletal mesoderm tissue – a gross abnormality. When they altered the computer model of the network in the corresponding way, the results precisely matched the experimental observations. In an even more drastic experiment, they injected into the egg a strand of mRNA that repressed the production of a critical enzyme called Pmar1. The effect was dramatic: the whole embryo was converted into a ball of skeletogenic cells. Once again, the computer model based on the circuit diagram described the same major transformation.
The various examples I have given illustrate the power and scope of ‘electronic thinking’ in tracking the flow of information through organisms and in linking it to important structural features. One of the most powerful aspects of the concept of information in biology is that the same general ideas often apply on all scales of size. In his visionary essay Nurse writes, ‘The principles and rules that underpin how information is managed may share similarities at these different levels even though their elements are completely different … Studies at higher system levels are thus likely to inform those at the simpler level of the cell and vice versa.’23
So far, I have considered the patterns and flow of information at the molecular level in DNA, at the cellular level in the cell cycle of yeast, in the development of form in multicellular organisms, and in communities of organisms and their social organization. But when Schrödinger conjectured his ‘aperiodic crystal’ he was focusing on heritable information and how it could be reliably passed on from one generation to the next. To be sure, information propagates in complex patterns within organisms and ecosystems, but it also flows vertically, cascading down the generations, providing the foundation for natural selection and evolutionary change. And it is here, at the intersection of Darwinism and information theory, that the magic puzzle box of life is now springing its biggest surprises.