4
Can We Define Information?

The previous chapters showed how pervasive the notion of information is in many aspects of life, from DNA molecules to ecosystems and human language. When seen from an informational perspective, these facets of life join in some form of unity that we will discuss in Chapter 5. Before that, we should manage to get a definition of information that covers the variety of situations that were evoked previously. Information is a rather recent scientific concept: it dates back to the middle of last century. Contrary to other well-established notions, information has not yet found its definitive definition. This chapter relies on some recent theoretical developments in algorithmic information theory that allow us to propose a common framework to describe the various situations we mentioned in which the word “information” occurred spontaneously.

4.1. Information as surprise

Any sign does not carry information. The oil level sensor in my car may invariably indicate level 7. If it could show only this value, the information associated with it would be zero. As we know that other values are possible, but are rare, the information attached to level 7 is very small. If the sensor indicates level 3 one day, the corresponding information that day will have a significant value.

In this example, one has clearly in mind what is at stake: the oil may be leaking. Let us forget about this for a while and concentrate on the sole probabilities. This is precisely what Claude Shannon did half a century ago. Shannon wanted to define information that travels through telecommunication networks to measure its useful rate (Shannon 1948). Quite counterintuitively at that time, he ignored any consideration relative to the semantics (meaning) of messages. The only preserved dimension in Shannon’s theory is the probability of events. A rare event generates more information when it occurs than a frequent one. If one measures the probability of events by its equivalent in coin tosses, one gets (in bits) the amount of information that their occurrence generates. The occurrence of an event that has one in 1,000 chance of happening generates about 10 bits of information, as one has a one in 1,000 chance to get heads 10 times when flipping a coin 10 times. The occurrence of an event that has 50% chance of happening brings only one bit of information.

Information, as defined by Shannon, corresponds to the surprise caused by an event at the time of its occurrence. This surprise is measured by the a priori probability of the event. This definition requires not only the existence of an observer, but an observer who knows all alternatives in advance and who is able to assign a probability to each of them. Shannon’s definition triggered a revolution in telecommunication technologies. Shannon was the first who understood that the effect of noise is not to degrade information, but merely to slow down its communication rate. His work gave rise to much excitement way beyond engineering communities. Quite naturally, biologists and thinkers in the social and human sciences made attempts to import the Shannonian notion of information into their own area with unequal success. Many criticized these efforts as simplistic (e.g. Griffiths 2001). This book, as it were, challenges these critiques.

We must acknowledge that a straightforward transposition of Shannon’s definition of information outside of the strict area of coded transmissions may lead to absurd conclusions. Let us consider DNA. This long molecule (just consider that one DNA molecule may be several centimeter long, for a few millionths of a millimeter thick!) can be described as a string of pearls. These pearls are of four kinds, often represented by the four letters A, T, G and C (see Chapter 2). They are molecules that can bind to each other without preference for any order. A human chromosome may hold some 200 000 of them. How much information is there in one chromosome? Here, a computation of Shannon’s concepts would give an absurd result: 200 million times the information of one single “pearl”, that is 50 Mb. Why is this number absurd? Because it does not take into account the meaning contained in that DNA. It does not make any quantitative difference between that DNA and a random DNA sequence of the same length. Such a random sequence would have no chance of creating an organized being, whereas the information in our DNA was used to create us and can still be used to create our children. Intuitively, we can say that the human DNA contains information, whereas the random DNA sequence is mere noise and, with all due respect to Shannon, it does not contain any information whatsoever.

The gap between the intuitive notion of information and Shannon’s definition widens when one takes average information, also called entropy, into account. Entropy is defined as the average value of information over all observed events (Delahaye 1994). Entropy is maximal when any redundancy has been eliminated. This makes sense. Any form of redundancy in a series of message provides indications about what will come next in that message, which diminishes the amount of information that can be expected from it. For instance, in this chapter, the character string “infor” is almost always followed by “mation” and conversely “mation” is almost always preceded by “infor”. The presence of one string makes the occurrence of the other one certain. If we drop the “mation” suffix, the whole text will provide the same information with less characters and the average information, i.e. the entropy, would be reduced. This concept of entropy is understandably crucial in telecommunication technologies, as redundancy elimination leads to more efficient use of transmission channels. But again, when the concept of entropy is used to measure the average information content of a DNA sequence or of a book, seen as symbol sequences, it goes against intuition. A random sequence has maximal entropy, since it is unlikely to contain redundant sequences. By contrast, meaningful sequences, the DNA of a human being or the string of characters found in this book, include numerous redundancies such as the correlation between “infor” and “mation”. When transposing the concept of entropy to this type of sequence, one reaches the absurd conclusion that they contain less information than random sequences of same length.

What can we do if the main standard definition of “information” fails so blatantly? First, we will observe that Shannon’s principle of suppressing redundancy is worth pursuing. It is pushed to the extreme in the algorithmic theory of information. Second, we will reconsider Shannon’s other idea that links information and surprise. This will lead us to a definition of information that can be applied to various contexts of interest.

4.2. Information measured by complexity

In the 1960s, three mathematicians, Ray Solomonoff, Andrei Kolmogorov and Gregory Chaitin, had the same idea independently. It consists of measuring the information content of an object by the size of its most concise exact summary (Delahaye 1994). This notion is called Kolmogorov complexity. A periodic string such as a-b-c-d-e-a-b-c-d-e-a-b-c-d-e-… is not complex at all, even it is several thousand letters long. It can be summarized by noticing that it results from the repetition of the pattern a-b-c-d-e. This pattern can itself be summarized by taking the first five letters of the alphabet; this is much more concise than having to specify five letters one by one among 26. Similarly, the famous number π is not complex at all. It can be reduced to a simple series: π = 4 ×(1–1/3+1/5–1/7+1/9…). To retrieve π, we just need to form a series from the alternating inverse odd numbers and then multiply its sum by 4. It is somewhat simpler than enumerating the sequence of digits of π. 3-1-4-1-5-9-2-6-5-3-5-8-9-7-9-3-2-3-8-4-6-2-6-4-3-3-8-3-2-7-9-5… up to infinity. This approach to information is called algorithmic information theory, as the best summary of an object can be represented as the shortest algorithm (or computer program) that can generate that object.

This definition of information is in line with Shannon’s idea of redundancy suppression, while carrying it to the extreme. Obviously, a very redundant series, such as a repetitive series, contains little information. It is also clear that the redundancy present in an object makes no contribution to its information content. By keeping only the gist of the object, Kolmogorov complexity measures useful information. The observer who perceives the object tries to summarize to the smallest possible size and infers from the result the quantity of information that the object contains. Information, in that sense, results from an operation of compression. For instance, people using a computer know that they can compress data. The most popular program to do this is certainly “Zip”. Images are generally compressed before being stored or transmitted. Create an image with a painting software and then save it in a pixel format such as “gif”. If your image is all white with only three black spots, the saved file will be much smaller than if your image is complex, with many intertwined points and lines. Even if the “gif” compression is far from being optimal, it illustrates the fact that the second image contains more information, in the sense of Kolmogorov, than the first image, which is simpler.

Contrary to Shannon’s definition of information, the definition based on Kolmogorov complexity does not rely on previous knowledge of probability values. It does not require considering alternatives. It can be applied to unique objects, and not only to emitters that produce events repetitively with certain probabilities. No prior agreement is needed between an emitter and a receiver about the set of symbols that will be transmitted. This is good news if one wishes to consider data such as DNA, as there is no one that can say what is relevant to observe in that long molecule. The measure of complexity is not bound to focus on the occurrence of bases A, T, G, C or on codons, genes or gene groups. Any feature is good as long as it contributes to compressing the object (even if there is never any guaranty that optimal compression has been achieved). The algorithmic definition of information can even, in theory, get rid of any idea not only of emitters, but also of observers. Suffice it to say that all observers are equivalent: if one of them is able to summarize the object using an algorithm, that algorithm can be explained to any other observer. The algorithmic notion of information seems to reach an ideal status of objectivity.

Unfortunately, the algorithmic definition of information does not always match intuition. Randomness is the ultimate complexity. It is precisely this property that motivated Kolmogorov, Chaitin and others in their discovery of the concept of complexity. There is a perfect equivalence between being “random” and being “incompressible”. But this beautiful correspondence is problematic in our context, as we fall back into the flaw mentioned about entropy: a random DNA molecule would be more complex than a human DNA molecule of the same length. By definition, the random molecule contains no redundancy that could be eliminated, leading to a shorter summary; it is maximally complex (note that the summary cannot merely mention the fact that the molecule is random, as the molecule has to be recoverable from the summary) (Delahaye 1994; Gaucherel 2014). On the other hand, any human DNA includes redundancies that would make a summary significantly shorter. We are back to square one: neither Shannon’s definition nor the algorithmic complexity definition provides the intuition-matching notion of information that we need.

4.3. Information as organized complexity

Useful information seems to float between two extremes: extreme redundancy, as in a repetitive sequence, and extreme disorder, as in a random sequence. Shannon’s entropy and Kolmogorov complexity are perfect tools to measure information by assessing the amount of redundancy. They are, however, unable to provide any safeguard against randomness.

Some authors suggested relying on another notion based on complexity and known as logical depth. This notion, introduced by Charles Bennett, aims at capturing the organized part of complexity that is neither too simple nor trivially random (Bennett 1988). Bennett is working at IBM Research. His idea is to consider not only the most concise summary of a given object, but rather the amount of time needed to reconstruct the object from that summary. The idea is appealing. This amount of time is small precisely for objects that our intuition considers devoid of information. A repetitive string such as a-b-c-d-e-a-b-c-d-e-a-b-c-d-e-… can be summarized by describing the repeated pattern. It is not deep within Bennett’s meaning, as it can be quickly retrieved by copying the pattern. At the other extreme, a random sequence such as t-o-z-r-n-i-g-n-z-x-o-t-g-m-y-… cannot be summarized. In other terms, it constitutes its own summary. Between these two extremes, the string of letters that form the present book is slightly more complex within Bennett’s meaning: it can be compressed by eliminating its redundancy (without loss of information), but the converse operation may take some time.

Quite understandably, some authors such as Jean-Paul Delahaye are tempted to use logical depth to measure the information “content” of living beings (Delahaye 1994; Dessalles 2010). It is clear that our DNA is a good summary of a significant part of us. This molecule includes most of what twins have in common, which is quite something. The way people look as individuals can be “compressed” to such a large extent that their DNA, represented as a string, can be stored on a mere CD-Rom. For Delahaye, humans are deep under Bennett’s definition, because it is impossible to construct them quickly from their DNA. The only known way is to perform all operations that occur during embryogenesis.

The notion of logical depth captures another form of complexity, i.e. “organized” complexity. Engineers can probably think of a procedure to assemble a car in such a way that its different parts are built up in a parallel way. The architecture of a car is not that deep. A biological being has to be functional at each step of its development. This excludes the possibility of it being assembled piece by piece. Its organization is as complex as it is for a computer processor, which requires hundreds of operations to be performed in a strict order.

Kolmogorov complexity corresponds to the size of the object after it has been compressed. Logical depth, or organized complexity, takes decompression time into account. The latter notion is interesting and has not yet been fully explored. One of its drawbacks in our context, however, is due to the fact that it breaks the link, introduced by Shannon, between information and surprise or, in other terms, between information and low probability. In reality, the same objection can be addressed to Kolmogorov complexity. It is, however, easy to reconcile Shannon’s information and Kolmogorov complexity, as we see in the following.

4.4. Information as compression

Shannon’s idea was to define information from the receiver’s perspective in a transmission scenario. The receiver detects signals, considers their a priori probability and deduces the amount of information that has been transmitted. This approach proves insufficient in our context. To keep the idea that information measures a certain amount of surprise, we need two points of view. Surprise results from a disparity: it corresponds to the gap between expectation and observation. Information is not an absolute quantity, but a difference between two values.

If we apply this principle to the algorithmic approach to information, absolute complexity does not matter. What matters is the difference between two complexity values, measures from two different points of view. This summary between Shannon’s idea of surprise and Kolmogorov complexity leads to a new definition of information that will prove useful in many contexts that we will consider (Dessalles 2013). Kolmogorov complexity measures the end result of an ideal compression; we will measure information by the amplitude of that compression. Let us consider a few examples.

A first illustration of the principle information is that compression is offered by what we, humans, regard as interesting. For human beings, information coincides with what elicits their interest (see Chapter 1). Suppose that a fire broke out. This is an event, i.e. a unique situation that can be distinguished from any other by its date, its location, the people involved and the occurence that took place. With this definition, every event is certainly unique. Among the hundreds of daily experiences, very few are interesting, narratable events. To be interesting, an event must only be unique (which it obviously is), but it must also be peculiar. A situation is peculiar if it is unique for a simple reason. We hear quite often about fires in the news. The typical place for a blaze is complex: such and such district in Paris, such and such hotel in a town we do not exactly know. If the news tells us that the blaze occurred in the Eiffel Tower, as was the case on July 22, 2003, the location turns out to be simpler than expected. The event is easily characterized, at least more easily than expected. The difference corresponds to a compression, which generates information. The same thing applies if the blaze occurred in a celebrity’s property or near your home. In each case, the simplicity of the place leads to a drop in complexity, i.e. compression. The event becomes peculiar.

On October 16, 2010, the Israeli lottery announced the following draw: 13-14-26-32-33-36. Most people did not play, so they did not detect any information there. Yet, exactly the same combination of numbers had been drawn 3 weeks earlier, on September 21. The news got formidable importance and was even reported in foreign newspapers. This makes sense if one realizes that the event offers significant compression. Usually, the minimal designation of the draw of the day requires enumerating the six drawn numbers. There is no compression. But on that day, a much more concise description was available. It consisted of indicating a rank in the list of past draws. Only one number is required, instead of six, so five numbers are spared. Once converted into bits, the difference measures the value of the information (if one ignores the cost of designating the concepts required from the context: lottery + Israel + date). The same principle of compression is at work to explain why a draw like 1-2-3-4-5-6, if it ever occurred, would be fantastic news even for those who did not play.

The previous examples are about peculiar situations with nothing at stakes for the observer. An expression like “the value of information” is ambiguous. If you hope for a bonus of €10,000 that only two candidates can receive, the information that you got it is worth only 1 bit, while the stakes are measured in thousands of euros. Here, we will restrict the word “information” to its meaning as a compression or complexity drop, ignoring what is at stakes. Note that it is crucial for journalists to include the stakes in order to anticipate the emotional impact of reported events on the readership. To do so, compression can be converted into probability (each bit corresponds to flipping a coin, i.e. probability 1/2; two bits, two coin tosses, probability 1/4; and so on), and then multiplied by the stakes to get the emotional intensity of the event.

Could we transpose the definition information = compression to contexts that are not specific to human communication? When an event is perceived by human beings, they expect a level of complexity that depends on their knowledge of the world, for instance the complexity of producing a lottery draw, and they observe a lower level complexity after the fact. Information can be measured by the difference between these two complexity levels. More generally, any information generates a complexity drop relative to a certain aspect of the world for a given observer. What about animals?

Let us see, for instance, whether the definition applies to the bee dance (see Chapter 1). A bee worker that is waking up and decides to go foraging must first choose which direction to fly. Her choice is complex: if she may choose among about 100 different directions, the complexity of her decision amounts to 7 bits. This corresponds to seven tosses of a coin, because with seven successive binary choices, one can reach up to 128 possibilities. If, instead of picking a flight direction randomly, she decides to read the direction from her sister’s dance, the complexity of her decision drops down to zero: the flight direction is included in the dance, there is nothing left to do to determine it. Our bee got an amount of information that corresponds to the complexity drop, here 7 bits. If she hesitates between two dancers, her decision still requires 1 bit. The amount of compression is now only 6 bits. Note that there is indeed a double perspective: before and after consulting the dancer.

Can we use the complexity drop notion to characterize the amount of information contained in the human DNA? It all depends on the (double) point of view adopted. One may initially see only a random sequence in that DNA, and then observe that it is nothing like random as it can guide the synthesis of many molecules, including the 20,000 different proteins that our cells may contain. The corresponding complexity drop gives one measure of the information included in our DNA, a value up to 750 Mb. But we can adopt a totally different pair of viewpoints. The investigator who finds DNA on a crime scene may have no idea about its owner. They must discriminate among 6 or 7 billion possible individuals, which gives a complexity of 33 bits (if they think that the suspect must be male and French, 25 bits are still needed to determine him). Once the DNA has been analyzed and the culprit is identified, complexity drops down to 0. For the investigator, the amount of information contained in the DNA amounts to 33 bits (or 25 bits in the French case), much less than for the biologist. Note that the requirement of complexity drop confers zero information to a random DNA sequence, as intuition dictates. This problem, which proved insurmountable when adopting Shannon’s definition or the classical complexity definition, does not exist anymore.

Information, according to the definition considered here, depends on an adjustment taking place by the observer. The observed situation is now more compact than before (Dessalles 2013). This applies to an observer who considers a situation and can now see a structure that went unnoticed before. This structure provides information, inasmuch as it offers a mode concise description of the situation. According to Gregory Chaitin’s famous aphorism, “comprehension is compression”. Note that within this logic, a scientist who makes sense of a phenomenon creates information.

4.5. Coding and information reading

As users of recent new technologies, we have no problem associating information, not only to the network on which it is circulating, but also to the devices on which it is stored. The hard disk in our computer contains what we regard as information. It contains, for instance, the buying date of the TV set that just broke down and that is perhaps under guarantee. Information must be stored or it disappears. If information, as claimed before, corresponds to a complexity drop, what is stored in memory?

According to the definition, information only exists in the eye of the observer who interprets it. Saying that information is stored is therefore misusing language. The permanent material medium on which memory is recorded, be it silicon, synapses or DNA, is no more than a precondition of information. Information only exists by the time of interpretation. Stored data must be read for information to be produced. In addition, the reading has to generate simplification for the entity that performs the reading. In many cases, interpretation amounts to mere decoding, but this is not always the case.

A reading device can detect the signs that have been recorded on some material substrate (paper, silicon, magnetic layer, bumps on an optical disc, synapses, DNA molecule, etc.). These signs only become information when the device is able to decode them. Your visual system perceives ink marks on the paper; it is able to transform them into letters, then into words and sentences. At each step, decoding occurs and information is created. A decoder is like the foraging bee of our example. It expects to read one printed character among 100 (including upper and lower case, digits, accents and punctuation). The complexity of its decision amounts to 7 bits. Once it has recognized a character, complexity drops by 7 bits. Seven bits of information have been created. A Chinese reader who would not know about the Latin alphabet would not have created this information. Written signs constitute potential information. Its amount can be quantified by the agent who wrote them down on the material substrate. These signs become actual information only when they are read and decoded.

Codes establish a correspondence that is known in advance by the observer between two domains, a sign space and a meaning space (see Chapter 1). A reader who understands English can decode characters, words and sentences from the present text because she knows the alphabet, the lexicon and the grammar of this language. Another example in which several codes are at work is offered by molecular biology. DNA elements (the base pairs), once transformed into RNA, are read three by three as codons (Chapter 2). The ribosome and its adjuncts that read the RNA operate as a decoder (Rana and Ankri 2016). At each step, 21 possibilities are offered, as the repertoire of meanings includes 20 amino acids and one “stop” instruction. The ribosome that selects the right t-RNA sees complexity drop from some 4.4 bits to 0 bits. Each codon brings more than 4 bits of information. The decoding process does not stop there. If we draw an analogy with language, DNA base pairs correspond to phonemes and amino acids to words. Words are assembled to form sentences; likewise, amino acids are assembled to form proteins. Sentences build up discourse; likewise, proteins build up biological structures (for instance the cytoskeleton of our cells) or chemical reaction cycles (for instance the Krebs cycle that allows our cells to generate energy). Biological combining goes further up, as it includes cells, organs, organisms, societies, species and ecosystems.

These two interpretation hierarchies, the linguistic and the biological ones, use combinatorial codes (Chapter 1). The signs at work at one level result from a combination of meanings decoded at the level below. Sentences are combinations of words, and proteins are combinations of amino acids. If one draws the analogy between language and biology further up, one is confronted with the issue of the nature of meaning. When we read a text, we give meaning to words, then to sentences, then to discourse. At which step of their development do the molecules get their meaning?

In principle, any information has meaning. In the case of language, however, the word “meaning” is used for the upper layers of interpretations, where we build images and attitudes (Tournebize and Gaucherel 2016). The meaning of a sentence like “the carpet is gray” is not yet generated when the letters c-a-r-p-e-t are put together and when the word “carpet” is recognized. Meaning only begins to appear when we form a perceptive representation of the sentence, when we figure out which carpet is concerned and when we form an attitude: disappointment, satisfaction and surprise (we asked for a red carpet; gray carpets do not show the dirt; all carpets are red in the building, but this one). Note that at these upper levels, meaning is not decoded, but computed. Information does not result from mere systematic matching, but requires interpretation and context.

By analogy with language, we may use the word “meaning” to refer to situations in which information requires interpretation to be produced. This means that the reading device, when producing information, carries out non-trivial computations that take context into account. Biology provides a variety of examples in which talking about “meaning” is not far-fetched. When biologists began to understand the genetic code, they naturally thought that protein synthesis consisted of mere systematic matching with chunks of DNA: one gene equals one protein. Reality, in eukaryotes, is quite different. Eukaryotic genes are not decoded. They are interpreted by the cell machinery (see Chapter 2). In its simple form, the mechanism goes like this: a gene is translated into an m-RNA, which is itself translated into a protein. Between these two phases, however, the mRNA may undergo a variety of splicing and editing operations that depend on the presence of other molecules in the cell nucleus, in particular small specialized RNA molecules that result themselves from transcriptions of other regions of the DNA (Chapter 2).

Because of this complex machinery that depends on context, we can say that the cell interprets genes and give them meaning. In context A, a gene is read in a certain way and gives rise to the synthesis of a protein, P1, which is likely to be appropriate to that context. In a different context B, the gene will be interpreted in another way and produces another protein, P2, which is adapted to B, since the cell evolved for this. The meaning of the gene may be P1 or P2, depending on the context. A sentence such as “the flower is in the book” may mean in a certain context that the flower is drawn on a page of the book or, in another context, that a dried flower has been inserted between two pages of the book. The cell genetic machinery in eukaryotes interprets genes as we do for sentences of our language.

The genetic reading system has a unique feature: it is in a situation in which it decodes itself. Molecules that support the interpretation of the genome, for instance the molecules that make up the ribosomes, are themselves coded in the genome and the corresponding genes must be interpreted by ribosomes for these molecules to exist. This circularity is nothing shocking, conceptually. Computer science offers an analog situation. Typical computer programmes must be compiled by a specialized program, called a compiler, to be executed. If the program is written in Java, one needs a Java compiler. But since the compiler is itself a computer program, nothing prevents us from writing it in Java. Such a compiler can execute any Java program, including the one that brought it into existence. It is a case of self-interpretation, as in the case of the ribosome that interprets genetic instructions and generates a new, fully identical, ribosome. In the compiler case, however, the self-interpretation is not material by nature. The only hardware is provided by the computer circuits that remain unchanged. The biological machine is the only one that is known to interpret itself, including at the hardware level. If one day we ever produce robots that, using soldering irons and pliers, can make many things including material copies of themselves, the situation will have changed.

4.6. Memory

By decoding information, observers lower the complexity of their choice or of what is to be remembered. In this sense, information leads to compression. An entity that requires no information to work is a simple automaton that achieves only one task. As a whole, a bacterium looks at first sight like such a simple automaton, as it manages to duplicate in a variety of environments, with no significant change in the result. Yet, like our computers, and even like human individuals who build up their beliefs, the bacterium has potentialities that go way beyond what it eventually achieves. We can verify this easily nowadays: we ask bacteria to produce proteins that are unknown in the bacterial world, such as human insulin, and they do it willingly. For an engineer, the normal functioning of a bacterium looks like a fantastic waste, as if an ultra-powerful computer was used for one single deterministic task such as printing always the same sentence.

This impression is wrong and results from an illusion. We see the (simplified) bacterium in our example as an informationally closed system, because it does not seem to extract much information from its environment during its life. But even in its standard functioning that seems so mechanical, the bacterium gets information from its genome. This information was transmitted to it by its mother bacterium (we ignore here all other genetic transfers that operate in the bacterial world). Memory is an information transfer that operates through time rather than through space (Suddendorf and Corballis 2007). Our genes are messages that were sent to us by our far ancestors and that we are sending to our descendants, if we choose to have some.

Human beings, as compared to other primates, have overdeveloped memory capacities. The major part of their disproportionate cortex is not devoted to making elaborate computations. With a few exceptions, mostly linked to language, the kind of computations that our brain carries out are similar to those that a chimpanzee brain does. The most complex ones are probably associated with processes involved in shape recognition in dynamic environments. The reason why our cerebral mass has increased threefold from our last common ancestor with chimpanzees has more to do with the storing function. Our episodic memory is able to store thousands of life experiences in great detail. We store events that we regard as information, i.e. peculiar situations. Most of these situations are futile from a biological point of view, which means that they have no consequence regarding our life expectancy or our reproduction. Moreover, they are so peculiar that they are highly unlikely to occur ever again. It is even their singularity that induced their memorization. Why is it so? The primary objective of episodic memory is not to store a huge repertoire of specific situations from which we would draw at each moment to pick appropriate behavior (Suddendorf and Corballis 2007). Animals do without such a system that, besides, would be rather inefficient. Most of the time, our daily actions do not result from merely copying past models of action that we would have memorized to reproduce. The primary function of episodic memory is quite different. This human speciality has to do with the narrative function of language (Dessalles 2007a). If we can remember so many peculiar situations, it is to retell them to others during the innumerable story rounds that fill up a good part of our daily language activities. The function of our memory is to make a delayed transmission of information possible. When we experience a peculiar event, our instinctive reaction is to draw attention to it or to tell others about it. This leads our interlocutors to experience the complexity drop attached to the peculiar character of the event by themselves. Our episodic memory is no more than a go-between that allow us to reach interlocutors further through time.

Similarly, the genetic material in our cells or in viruses constitutes a delayed information transmission. Because of this information, the cell machinery can reduce the complexity of its choices. The cellular interpretation device is ultra-powerful. Our machines did not need much information to work until the computer era. The cell machinery is like our computers. It can do “everything”. It uses the information it gets through time from genetic memory to make its choices considerably less complex. Without the possibility of memorizing information, life would be confined to the temporality of dynamic systems. The material recording of information, in texts, in neurons, in molecules or in the physical environment, opens up new possibilities for living beings to travel through time.