weird-maths-7

‌Chapter 3

Chance Is a Fine Thing

So much of life, it seems to me, is determined by pure randomness.

– Sidney Poitier

Many things that happen in the world seem utterly unpredictable. We talk about ‘acts of God’, ‘being in the wrong place at the wrong time’, or ‘pure luck’. Serendipity and good or bad fortune seem to dictate so much of what goes on around us. Thanks to maths, though, we have a tool to see through this fog of apparent turmoil to make out some order in what otherwise appears a riot of randomness.

Thoroughly shuffle a deck of cards and the chances are that you’ve just done something unique. Almost certainly, no one in the history of the world has ever come up with the deck arranged in that particular order before. The reason’s simple: 52 different cards can be arranged in 52 × 51 × 50 × 49 × … × 3 × 2 × 1 ways. That’s a grand total of about 8 × 10⁶⁷, or 80 million trillion trillion trillion trillion trillion different orderings of the cards. If all the people presently alive were to have shuffled a card deck once every second since the universe began that would amount to only about 3 × 10²⁷ shuffles, which is an incredibly tiny number by comparison.

Yet, there have been claims of decks being shuffled and coming out in exactly the order they started when new. This is actually much more likely than the odds of 1 in 8 × 10⁶⁷ of getting any other ordering. When first taken out of its wrapper, a card deck has all the suits, hearts, clubs, diamonds, and spades (though not necessarily in that order), arranged ace, two, three, … , jack, queen, king. If the dealer is so expert as to be able to riffle shuffle without a mistake – splitting the deck in two and exactly interleaving the cards together – the pack can end up back where it was after just eight perfect shuffles. That’s why casinos often use a child’s approach to shuffling with a brand new deck, known as ‘washing the deck’, in which the cards are just spread on the table and swished around willy-nilly for a while. To get a similar level of disorder would take at least seven good but imperfect riffle shuffles. The outcome would then be pretty random; in other words, shown any one card in the deck, the odds of being able to predict the next card, using any fair means available, would be very close to 1 in 51. But would the deck be truly random? What is randomness and is it ever possible to have something that’s completely random?

The notion of randomness, or total unpredictability, has been around as long as civilisation and probably much longer. Flipped coins and rolled dice most obviously spring to mind as ways we commonly use today to ‘randomly’ decide outcomes. Back in ancient Greece, they tossed astragali, or the knucklebones of goats and sheep, in their gambling games. Later they also used regular shaped dice, though where dice first came from isn’t known for sure. The Egyptians are thought to have used dice in their game of Senet, five thousand years ago. The Rigveda, a Vedic Sanskrit text dating back to about 1500 bc, also mentions dice, and actual dice games have been found in a Mesopotamian tomb dating back to the twenty-fourth century bc. Greek tessera were cubic and had numbers on each side from 1 to 6, but it was only in Roman times that dice like those we use today, in which values on opposite sides add up to seven, first appeared.

Pastern bones of an animal, used in games such as knucklebones.

It took a long time for randomness to catch the attention of mathematicians. Before that it was mainly thought to be the province of religion. In both Eastern and Western philosophies the outcome of many events was thought to be in the lap of the gods or some equivalent supernatural force. From China came the I Ching (‘Classic of Changes’), a system of divination rooted in the interpretation of 64 different hexagrams. Some Christians based their decision making on the rather simpler method of drawing straws from inside a Bible. Fascinating though these early beliefs were they had the unfortunate effect of greatly delaying any rational attempts to come to grips with randomness. After all, if eventualities are determined ultimately at some level beyond human comprehension, why bother trying to analyse, logically, why anything happens the way it does? Why try to figure out if there are natural laws that govern the probability of outcomes?

It’s hard to believe that those who used astragali or dice, in ancient Greek or Roman times, didn’t have at least some intuitive feel for the likelihood of certain outcomes. Usually, where money or other material gain is concerned, gamblers and other interested parties quickly catch on to the fine detail of the games they play. So, it seems likely that an intuitive appreciation of odds goes back millennia. But the academic study of randomness and probability had to wait until the seventeenth century and the late Renaissance to take off. Spearheading the breakthroughs at this time were the French mathematician and philosopher Blaise Pascal, who was also a devout Jansenist, and his compatriot Pierre de Fermat. These two great thinkers tackled a problem that, in simplified form, can be put like this: suppose two people are playing a coin-tossing game where the first person to get three points wins a pot of money. The game is interrupted with one person leading by two points to one. If the pot is handed out at this stage, what is the fairest allocation? Before Pascal and Fermat, others had thought about this and come up with a variety of possible solutions. Maybe the pot should be divided evenly, since the game was stopped partway and the eventual outcome couldn’t be known. But this seemed unfair to the person with two points, who should surely get some credit for being ahead. On the other hand, another suggestion, to give the whole pot to the person in the lead, looked unfair on the opponent with one point, who would still have had a chance of winning had the game gone on. A third possibility might be to divide the pot based on the number of points gained, so that the player with two points would get two thirds of the prize and the opponent one third. On the face of it, this seems fair but there’s a problem with it. Suppose the score was 1–0 at the point the game was interrupted. In this case, if the same rule were applied, the person with one point would receive the entire pot, while the other person, who might still have won if the game ran to its intended conclusion, would get nothing.

Pascal and Fermat found a better solution and, at the same time, opened up a new branch of maths. They calculated the probability of each person winning. In order for the person with one point to win, they would have to get two further points in a row, which has a probability of ½ times ½, or ¼. They should, therefore, receive one quarter of the pot. The rest should go to their opponent. Exactly the same method can be applied to any other problem of this type, although, naturally, the calculations can become more complicated.

In studying this problem Pascal and Fermat had hit upon a concept known as expected value. In a gambling game, or any situation where chance is involved, the expected value is the average of what you can reasonably hope to gain. For example, suppose you played a game where you rolled a die and won £6 if you rolled a 3. This game has an expected value of £1, because there’s a one in six chance of rolling a 3 and one sixth of the prize money is £1. If you played many times, you’d earn on average £1 for each game played. After playing 1,000 times, for instance, the average amount you’d earn would be £1,000, so if you paid £1 before playing each time, you’d end up breaking even. Notice that even though £1 is the expected value, it isn’t ever possible to win exactly £1 in this game. It isn’t always possible to win the expected value exactly in one game, but if played repeatedly the expected value is how much you could expect to win on average.

A lottery, generally, has a negative expected value, so that, from a rational point of view, it’s a bad idea to play. (During certain rollovers, depending on the lottery, it may occasionally have a positive expected value.) The same is true of casino games, for an obvious reason: the casino is a business trying to make a profit. Occasionally, though, things can go wrong due to a slight error in calculation. In one instance, a casino changed the pay-out on just one outcome in blackjack, accidentally making the expected value positive, and lost a fortune within a few hours. Casinos depend on an intimate knowledge of the maths of probability theory for their livelihood.

Sometimes coincidences happen that seem so unlikely that people wonder if something funny is going on. A person may win their state lottery twice, or the same numbers may come up in different draws. Often the media jump on such stories and make a big deal out of their seemingly wild improbability. The truth is, however, that most of us aren’t very good at figuring out the likelihood of such events because we start out with some misconceptions. To take the case of someone who won the same lottery twice, it’s natural to personalise the problem and think: what are the chances of ‘me’ winning the lottery twice? Obviously, the answer is fantastically small. However, the rare people who win twice tend to have played regularly over a number of years so that any two wins over that period is less remarkable. More importantly, it has to be borne in mind how many people play the lottery. The vast majority will never win the jackpot once, never mind twice. But with all those people playing it becomes much less astonishing that someone, somewhere will take the prize on two occasions.

This may seem counterintuitive but that’s because we tend to think of it from a personal perspective. Of course, it’s extremely unlikely that you’ll win the jackpot twice. But when considering the probability that someone will, you need to multiply the odds by the number of people who play the lottery, which greatly reduces those odds, as well as the number of ways they can win the lottery twice (approximately half the square of the number of times they play the lottery). After all of this, the odds look much more reasonable that someone, somewhere will scoop the jackpot twice.

This misjudgement of odds based on a failure to consider all the possibilities for an event to happen also underlies the so-called ‘birthday paradox’, which is really not a paradox at all. With 23 people in a room, the odds that two of them will have the same birthday are better than 50-50. It seems that the chances ought to be much smaller than that. You might argue that if it only takes 23 people to find a match, we should all know at least several people who share our birthday, whereas it’s always surprising when it happens. But the birthday paradox doesn’t ask what are the chances of any one person in the room (you, for instance) finding a birthday match, but of any two people being born on the same date. In the other words, the question is not what are the odds of a particular pair of people sharing a birthday, but of any pair of people, from all the different possible pairings, being birthday buddies. The odds of this are 1 – (365/365 × 364/365 × 363/365 × … × 343/365) = 0.507 or 50.7%. With 60 people in a group the odds of a birthday match climb to more than 99%. In contrast, for there to be a 50 percent chance of someone having the same birthday as you, 253 people would need to be present.

One reason this might seem unintuitive is that we tend to conflate the two separate questions. Most people don’t know 253 people well enough to know their birthdays, so it seems unlikely that someone would randomly share a birthday with them, yet this doesn’t mean that it’s as unlikely for two other people to share their own birthdays.

Not only ideas in probability can seem counterintuitive, but so too can the notion of randomness. Of the following two sequences of heads (H) and tails (T), which looks the more random?

H, T, H, H, T, H, T, T, H, H, T, T, H, T, H, T, T, H, H, T

T, H, T, H, T, T, H, T, T, T, H, T, T, T, T, H, H, T, H, T

Many people might be tempted to say the first because it has an even sprinkling of heads and tails arranged in no obvious pattern. The second sequence has an imbalance of tails and longer runs of the same letter. In fact, one of us (Agnijo) used a random number generator to produce the second, whereas he deliberately constructed the first to look like what a person might come up with if asked to write a random sequence of H’s and T’s. A human tends to avoid long runs, deliberately balances the letters, and switches from H to T and vice versa more often than happens at random.

What about this sequence:

H, T, H, H, H, T, T, H, H, H, T, H, H, H, H, T, H, T, T, T ?

It may look random, and statistical methods of catching human-produced sequences will conclude that it wasn’t made by a person. In reality, it’s constructed from the decimal digits of pi (omitting the initial 3), with an H for an odd digit and a T for an even digit. So, are the digits of pi random? Technically, no, because the first decimal digit will always be 1, the second 4, the third 1, and so on, no matter how many times the sequence is generated. If something is fixed and always comes out the same whenever we choose to look at it, it can hardly be random. However, mathematicians do wonder if the decimal digits of pi are statistically random in the sense that they have a uniform distribution: all digits being equally likely, all pairs of digits equally likely, all triplets equally likely, and so on. If they do, then pi is said to be ‘normal in base 10’, which is what the vast majority of mathematicians believe. It’s also believed that pi is ‘absolutely normal’, meaning that not only are the decimal digits of pi statistically random, but so too are the binary digits, if pi is written out in the binary number system using just 0’s and 1’s, the ternary digits, using just 0’s, 1’s, and 2’s, etc. It’s been proved that almost all irrational numbers are absolutely normal, but it turns out to be extremely hard to find a proof for specific cases.

The first example of a known normal number in base 10 was Champernowne’s constant, named after the English economist and mathematician David Champernowne, who wrote about the significance of it while still an undergraduate at Cambridge. Champernowne invented the number specifically to show that a normal number can and does exist, and also how easy it is to construct one. His constant is made up simply of all the consecutive natural numbers: 0.1234567891011121314 … and, therefore, contains every possible sequence of numbers in equal proportions. One tenth of the digits are 1, one hundredth of pairs of consecutive digits are 12, and so on. Normal in base 10 it may be but Champernowne’s constant is obviously pretty bad at producing sequences that look random, in other words lacking any kind of discernible pattern or predictability, especially at the start. Nor do we know if it’s normal in any other base. Other proven normal constants exist, but like that found by Champernowne, they’ve been artificially constructed to be normal. It’s still to be proven whether pi is normal in any base, let alone absolutely normal.

The first couple of hundred digits of pi.

At the time of writing the value of pi is known to 22,459,157,718,361, or about 22 trillion, decimal digits. Of course, we’ll be able to calculate more digits in the future, but those we already know will never change, no matter how many times the calculation is run. The known digits of pi are part of the frozen reality of the mathematical universe, and so they cannot be random. But what about the digits lying beyond those that have been computed? Assuming pi is normal in base 10 they remain essentially statistically random to us. In other words, if someone asked you for a random string of a thousand digits it would be a valid response to build a computer to calculate pi to 1,000 places more than is presently known and use those places as the random string. Asked for another random string of the same length, you could compute the next (previously unknown) thousand digits. This raises an interesting philosophical question about the nature of mathematical things. To what extent are the decimal places of pi that we haven’t yet figured out real? It would be hard to argue that, say, the trillion trillionth digit of pi doesn’t exist or that it doesn’t have a specific fixed value, even though we don’t yet know what it is. But in what sense or form does it exist until, at the end of an immensely long calculation, still to be carried out, it pops into a computer’s memory?

As a curious aside, it’s worth mentioning a discovery made by researchers David Bailey, Peter Borwein, and Simon Plouffe in 1996. They found a fairly simple formula – the sum of an infinite series of terms – for pi that allows any digit of pi to be calculated without knowing any of the preceding digits. (Strictly speaking, the digits calculated by the Bailey–Borwein–Plouffe formula are hexadecimal – base-16 – digits as opposed to decimal digits.) That seems, at first sight, impossible, and it certainly came as a surprise to other mathematicians. What’s more, a computation of, say, the billionth digit of pi, using this method, can be done on an ordinary laptop in less time than it takes to eat a meal at a restaurant. Variations on the Bailey–Borwein–Plouffe formula can be used to find other ‘irrational’ numbers like pi, whose decimal extensions go on forever without repeating.

The question of whether anything in pure mathematics is truly random is a valid one. Randomness implies the complete absence of pattern or predictability. Something is only unpredictable if it’s unknown and, in addition, there’s no basis on which to favour one outcome over any other. Mathematics exists essentially outside of time; in other words, it doesn’t change or evolve from one moment to the next. The only thing that does change is our knowledge of it. The physical world, on the other hand, does change, continuously, and often in ways that at first sight seem unpredictable. Tossing a coin is considered to be sufficiently unpredictable that, by common consent, it’s taken to be a fair way of making decisions when there are just two possibilities. But whether it can be called random depends on the information available. If, for any given toss, we knew the exact force and angle at which the coin was launched, its rotation rate, the amount of air resistance, and so on, we could (in theory) accurately predict which side would land facing up. The same is true if we drop a slice of buttered toast, except that in this case there’s evidence to support the pessimist’s view that toast does tend to land butter-side down more than half the time. Experiments have shown that if toast is tossed up in the air – surely something that would only happen in a lab or a food fight – the chances of it coming down the messy way are 50 percent. But if the toast is knocked off a table or kitchen counter, or slides off a plate, it will indeed hit the floor butter-side down more often than not. The reason is straightforward: the height from which toast normally gets dropped by accident – waist-height, or a foot or so on either side – allows the toast just enough time during its fall to make a half-turn so that if it starts out, in the conventional way, butter-up, it’s more likely than not to end up making a grease stain on the floor.

Most physical systems are a lot more complicated than falling toast. And further to complicate the situation, some of them are chaotic, so that little changes or disturbances in the starting conditions may have enormous implications later on. One such system is the weather. Before modern weather forecasting came along, it was anyone’s guess what the next day would bring. Meteorological satellites, accurate instruments on the ground, and high-speed computers have revolutionised the accuracy of forecasts, out to about a week or 10 days. But beyond that even the best forecasts, using the finest technology, run into the combined problems of chaos and complexity, including the butterfly effect – the notion that the tiny air current caused by a butterfly flapping its wings might eventually be amplified so that it becomes a hurricane.

Hurricane Felix photographed from the International Space Station on September 3, 2007.

Even with all this complexity, it may seem that no matter what the phenomenon, whether it’s the toss of a coin or the global weather system, the same underlying laws of nature are involved and those laws are deterministic. The universe, so it was once believed, is like a giant clockwork mechanism – fantastically elaborate yet ultimately predictable. Two issues, however, stand in the way of this claim. The first harks back to complexity. Even within a deterministic system, one in which the outcome depends on a series of events, each one of which is predictable knowing the exact preceding state, the whole problem can be so complex that there’s no achievable shortcut allowing us to see in advance what will actually happen. In such systems, the best simulation (for example, run on a computer) cannot outpace the phenomenon itself. This is true of many physical systems but also of purely mathematical ones, such as cellular automata, the most famous example of which is John Conway’s Game of Life, which we’ll be talking about more in Chapter 5.

The evolution of any given pattern in the Game of Life is entirely deterministic yet unpredictable: the outcome only becomes known when every step along the way has been calculated. (Of course, some patterns that do the same thing over and over again, such as oscillating back and forth or moving unchanged after a certain number of steps, are predictable after we know their behaviour. But the first time through, we don’t know how they’re going to behave.) In maths, things can be unpredictable even if they’re not random. But, until the turn of the twentieth century, most physicists held the belief that even if we couldn’t know every detail of what happens in the physical universe, we could, in principle, know as much as we wanted. If we had enough information then, using the equations of Newton and Maxwell, we could figure out how events would unfold, to whatever level of accuracy we chose. The dawn of quantum mechanics, however, saw that idea fly out the window.

Uncertainty, it transpires, lies at the heart of the quantum realm: randomness is an unavoidable fact of life in the subatomic world. Nowhere is this capriciousness more evident than in the decay of a radioactive nucleus. True, observations can reveal the half-life of a radioactive substance – the time taken, on average, for half of the original nuclei in a sample to break apart. But that’s a statistical measure. The half-life of radium 226, for instance, is 1,620 years, so that if we started with one gram of it we’d have to wait 1,620 years for half a gram of the radium to remain, the rest having decayed into radon gas or lead and carbon. Focusing on one individual radium nucleus, though, there’s no way to tell if it’ll be among the 37 billion nuclei that decay in the next second in one gram of radium 226, or whether it will decay in 5,000 years time. All we can say is that the probability is ½ – the same as flipping heads or tails – that it will decay at some point in the next 1,620 years. This unpredictability has nothing to do with shortcomings in our measuring gear or computing power. The randomness, at this fine level of structure, is inherent in the very fabric of reality. As a result it can affect phenomena, and thereby introduce randomness, on a larger scale. An extreme case of the butterfly effect, for example, would be the decay of a single radium atom influencing the future weather on a large scale.

It may well be that quantum randomness is here to stay. However, there have been physicists, and Einstein was famously one of them, who couldn’t stomach the idea (to paraphrase Einstein) that God plays dice with the universe. These opponents of quantum orthodoxy favour the view that, behind the apparent quixotic behaviour of things at the ultra-small level, there are ‘hidden variables’ – factors that determine when particles decay and suchlike, if only we could learn what they are and be able to measure them. If the hidden variables theory turns out to be true, then the universe would again revert to being non-random, and true randomness would exist only as some kind of mathematical ideal. But, to date, all the evidence suggests that, on this question of quantum indeterminacy, Einstein got it wrong.

In the looking-glass world of the very small, nothing, it seems, is certain. What we took to be solid little particles – electrons and suchlike – dissolve into waves, and not even material waves but waves of probability. An electron can’t be said to be here or there but only more likely to be here than there, its motion and whereabouts governed by a mathematical construct called the wave function.

All we are left with is probability, and even that’s not an easy concept to pin down. There are different ways of thinking about it. The most familiar is the ‘frequentist’ point of view. In this the probability of an event happening is the limit – the value to which something is heading – of the proportion of times the event occurs. To find out the probability of an event, a frequentist would repeat the experiment many times and see how often the event occurred. For example, if the event occurred 70 percent of the time the experiment was performed, it would have a probability of 70 percent. In the case of an idealised mathematical coin, flipping heads has a probability of exactly ½ because the more the coin is flipped the closer the proportion of heads approaches the value ½. A real, physical coin doesn’t have a probability of exactly ½ of landing heads, for a number of reasons. The aerodynamics of the toss and the fact that, in the case of most coins, the head generally has more mass than the pattern on the other side, bias the result slightly. The outcome also depends to some extent on which side is facing up before the toss: the probability is roughly 51% that the coin will land the same side up as before tossing, as, during a typical toss, it’s marginally more likely to turn an even number of times in the air, but when dealing with mathematical, idealised coins we can ignore this.

The frequentist approach is to say that the likelihood of something is equal to the long-run chance that it happens. But sometimes, such as for an event that can only occur once, this strategy is useless. An alternative is the Bayesian method, named after the eighteenth-century English statistician Thomas Bayes. This bases its calculation of probability on how confident we are of a certain outcome, so that it regards probability as being subjective. For instance, a weather forecaster may talk about a ‘70 percent chance of rain’, which essentially means that they’re 70 percent confident that it will rain. The major difference here between frequentist and Bayesian probability is that the weather forecaster can’t simply ‘repeat’ the weather – they need to give a probability of rain on one specific occasion, rather than an average probability over many trials. They can use a vast array of data, including what occurred in similar cases, but none of these will be exactly identical so they’re forced to use Bayesian probability as opposed to frequentist.

Where differences between the Bayesian and frequentist viewpoints get especially interesting is when they’re applied to mathematical concepts. Think about the question of whether the trillion trillionth decimal digit of pi, which is presently unknown, is 5. There’s no way in advance of knowing what the answer is, but we do know that once it’s been figured out it won’t ever change. We can’t repeat a calculation of the digits of pi and get a different answer from the first time it was done. The frequentist viewpoint therefore implies that the probability of the trillion trillionth digit being 5 is either 1 (certainty) or 0 (impossibility) – in other words it either is or isn’t a 5. Supposing pi were to be proved normal, so that we knew for certain that every digit had an equal density across the infinite sequence that makes up pi. The Bayesian viewpoint, which is our level of confidence that the trillion trillionth digit is 5, would state that the probability is one in ten, or 0.1 (because if pi is normal, any digit is equally likely to be any number from 0 to 9, until calculated). But the probability after we calculate that far (if we ever do) will then be definitely either 1 or 0. Now, the actual trillion trillionth digit of pi won’t change at all, but the probability of it being 5 will change, precisely because we have more information. Information is crucial to the Bayesian viewpoint: more information helps us revise the probability so that it becomes more accurate. Indeed, once we have perfect information (such as by explicitly calculating a digit of pi), frequentist and Bayesian probabilities become equivalent – if we repeat a calculation of a known digit of pi, we know the answer in advance. If we know all details of a physical system (that includes some randomness, for example the decay of radium atoms) we can repeat the exact experiment and get a frequentist probability that exactly matches the Bayesian probability.

While the Bayesian approach may seem subjective, it can be made rigorous in an abstract sense. For example, suppose you had a coin that was biased. It could be biased by any amount from 0 percent heads to 100 percent heads, with each value equally likely. You toss it once, and it comes up heads. It’s possible to prove that the probability of a head on the second toss is 2/3 using Bayesian probability. However, the initial probability of a head was ½ and we didn’t change the coin. The Bayesian viewpoint says that while the first head will not directly affect the probability of the second head, it gives you more information about the coin that allows you to refine your estimate. A coin heavily biased towards tails is highly unlikely to flip heads and a coin heavily biased towards heads is much more likely to flip heads.

Taking a Bayesian approach also helps avoid a type of paradox first pointed out by the German logician Carl Hempel in the 1940s. When people see the same principle, such as the law of gravity, operating without fail over a long period of time they naturally assume that it’s true with a very high probability. This is inductive reasoning, which can be summed up: if things are observed that are consistent with a theory, then the probability that the theory is true increases. Hempel, however, pointed out a snag with induction, using ravens as an example.

All ravens are black, so the theory goes. Every time a raven is seen to be black and no other colour – ignoring the fact that there are albino ravens! – our confidence in the theory ‘all ravens are black’ is boosted. Here, though, is the rub. The statement ‘all ravens are black’ is logically equivalent to the statement ‘all non-black-things are non-ravens’. So, if we see a yellow banana, which is a non-black-thing and also a non-raven, it should bolster our belief that all ravens are black. To get around this highly counterintuitive result some philosophers have argued that we shouldn’t treat the two sides of the argument as having the same strength. In other words, yellow bananas should make us believe more in the theory that all non-black-things are non-ravens (first statement), without influencing the belief that all ravens are black (second statement). This seems to fit with common sense – a banana is a non-raven so observing one can tell us about non-ravens but tells us nothing about ravens. But it’s a suggestion that has been criticised on the basis that you can’t have a different degree of belief in two statements that are logically equivalent, if it’s clear that either both are true or both false. Maybe our intuition in this matter is at fault and seeing another yellow banana really should increase the probability that all ravens are black. Adopting a Bayesian stance, however, the paradox never arises. According to Bayes, the probability of a hypothesis H must be multiplied by the ratio:

where X is a non-black object that’s a non-raven, and H is the hypothesis that all ravens are black.

If you ask someone to select a banana at random and show it to you, then the probability of seeing a yellow banana doesn’t depend on the colours of ravens. You already know beforehand that you’ll see a non-raven. The numerator (the number on top) will equal the denominator (the number on the bottom), the ratio will equal one, and the probability will remain unchanged. Seeing a yellow banana won’t affect your belief about whether all ravens are black. If you ask someone to select a non-black-thing at random, and they show you a yellow banana, then the numerator will be bigger than the denominator by a tiny amount. Seeing the yellow banana will only slightly increase your belief that all ravens are black. You’d have to see almost every non-black-thing in the universe, and see that they were all non-ravens, before your belief in ‘all ravens are black’ went up significantly. In both cases, the result agrees with intuition.

It may seem odd that information is connected to randomness but in fact the two are closely related. Imagine a string of digits made only of 1’s and 0’s. The string 1111111111 is completely orderly and, because of this, contains practically no information (only ‘repeat 1 ten times’), just as a blank canvas where every point is white tells us almost nothing. On the other hand, the string 0001100110, which was generated randomly, has the maximum amount of information possible for its length. The reason for this is that one way of quantifying information is the amount by which the data can be compressed. A truly random string can’t be written in any shorter way while retaining all of its information. But a long, constant string with only 1’s, for example, can be compressed enormously just by listing the number of 1’s in the string. Information and disorder are intimately related. The more disordered and random a string is, the more information it has within it.

Another way to think of this is that in the case of a random string, revealing the next bit gives the maximum amount of information possible. On the other hand, if we see the string 1111111111, it is trivial to guess the next bit. (This only applies to one string as a whole, not part of another string. An arbitrarily long random string will contain 1111111111 infinitely often.) Useful stimuli, as far as we’re concerned, must necessarily occupy a middle ground between these extremes of information. For example, a photograph with minimum information would be a blank monochromatic picture and a book would be a long repetition of pages filled with one letter. Neither of these are in any way interesting in terms of their information content. However, a photograph with maximum information would look like a random mess of static and a book would be a jumble of random letters. These again would not appeal to us. What we need, and is most useful to us, is something in between. A conventional photo conveys information, but in a form and quantity that we can understand. If one pixel is one colour, pixels immediately adjacent to it are likely to be very similar. We know this and can use it to compress pictures without losing the information. The book you’re reading right now is mostly just a string of letters and spaces, with punctuation marks. Unlike in extreme books that contain a jumble of symbols or all the same one, these letters fall into structured patterns known as words, some of which occur occasionally and some, such as ‘the’, that recur extremely frequently, In addition, these words follow certain rules known as grammar to form sentences and so on, so that ultimately the reader can understand the information being conveyed. This simply doesn’t happen in a random hotchpotch.

In his short story, ‘The Library of Babel’, the Argentinian writer Jorge Luis Borges tells of a library, vast – possibly infinite – in size, that contains a dizzying number of books. All the books are of identical format: ‘each book contains four hundred ten pages; each page, forty lines; each line approximately eighty black letters’. Only 22 alphabetic characters from an obscure language plus a comma, full stop, and space are used throughout, but every possible combination of these characters that follow the common format occurs in some book in the library. Most books appear to be just a meaningless jumble of characters; others are quite orderly but still devoid of any apparent meaning. For example, one book contains just the letter M repeated over and over. Another is exactly the same except that the second letter is replaced by an N. Others have words, sentences, and whole paragraphs that are grammatically correct in some language but are nevertheless illogical. Some are true histories. Some purport to be true histories but are, in fact, fictional. Some contain descriptions of devices yet to be invented or discoveries yet to be made. Somewhere in the library is a book that contains every combination of the basic 25 symbols that can be imagined or written down in the given format. Yet, of course, it’s all useless because without knowing in advance what’s true or false, fact or fiction, meaningful or meaningless, such exhaustive combinations of symbols have no value. It’s the same with the old idea of monkeys randomly hitting the keys of typewriters and eventually, given enough time, coming up with the works of Shakespeare. They’d also come up with the solutions to every major problem in science (after countless trillions of years). The trouble is they’d also come up with every non-solution and every convincing refutation of the true solutions, and, for the most part, mind-numbing quantities of pure gobbledygook. Having the answer before you is no use at all if you also have every other possible variant of the symbols that make up the answer, and you have no way of knowing which one is right.

In a sense, the World Wide Web, with its vast collection of knowledge available alongside an even more enormous body of gossip, half-truths, and pure gibberish is becoming like Borges’ library – a repository of everything from the profound to the nonsensical. There are even websites that simulate the Library of Babel, generating, in an instant, pages of random strings of letters, which may or may not include real words or meaningful scraps of information. When there is so much data available at our fingertips, who or what can be trusted to be the arbiters of reason and fact? Ultimately, because the information exists as a collection of digits, inside electronic processors and memories, the answer must lie with mathematics.

In the more immediate future, mathematicians are working to develop an overarching theory of randomness that might connect seemingly very different phenomena in science, from Brownian motion to string theory. Two researchers, Scott Sheffield at MIT and Jason Miller at the University of Cambridge, have found that many of the 2D shapes or paths that can be generated by random processes fall into distinct families, each with its own sets of characteristics. Their classification has led to the discovery of unexpected links between what, on the face of it, look like totally disparate random objects.

The first kind of random shape to be explored mathematically is the so-called random walk. Imagine a drunkard who starts from a lamp post and staggers from one point to another, each step (assumed to be of equal length) being taken in a random direction. The problem is to work out how far from the lamp post he’s likely to be after a given number of steps. This can be reduced to a one-dimensional case – in other words just movements back and forth along a line – by supposing that at each step a coin is tossed to decide which way to move, right or left. The problem was first given a real-world application in 1827 when the English botanist Robert Brown drew attention to what became known as Brownian motion – the haphazard jiggling of pollen grains in water when looked at through a microscope. Later, it was realised that the jiggling was due to individual water molecules striking the pollen grains from different, random directions, so that each pollen grain behaved like the drunkard in our original example. It took until the 1920s for the mathematics of Brownian motion to be fully worked out, by the American mathematician and philosopher Norbert Wiener. The trick is to figure out what happens to the random walk problem as the steps and the time between them are made smaller and smaller. The resulting random paths look very much like those of Brownian motion.

More recently, physicists have become interested in random motion of a different kind, involving not particles following 1D curves but incredibly tiny, wriggling ‘strings’ whose motion can be represented as 2D surfaces. These are the strings of string theory, a leading but as yet unproven theory of the fundamental particles that make up all matter. As Scott Sheffield put it: ‘To make sense of quantum physics for strings, you want to have something like Brownian motion for surfaces.’ The beginnings of such a theory came in the 1980s thanks to physicist Alexander Polyakov, now at Princeton University. He came up with a way of describing these surfaces that’s now known as Liouville quantum gravity (LQG). A separate development, called the Brownian model, also described random 2D surfaces but gave different, complementary information about them. Sheffield and Miller’s big breakthrough was to show that these two theoretical approaches, LQG and the Brownian model, are equivalent. There’s still work to be done before the theory can be applied directly to problems in physics, but eventually it may prove to be a powerful unifying principle that operates on many scales, from the fantastically small scale of strings to the everyday level of phenomena such as the growth of snowflakes or mineral deposits. What’s already clear is that randomness lies at the heart of the physical universe, and at the heart of randomness is maths.

Something that’s truly random is unpredictable. There’s no way of telling what the next member of a random sequence will be. In physics, there’s no way of telling when a random event, like the decay of a radioactive nucleus, will take place. If something is random it’s said to be non-deterministic because we can’t work out, even in principle, what comes next from what’s already happened. In everyday speech, we often say that if something is random it’s chaotic. ‘Randomness’ and ‘chaos’ are used in ordinary language almost interchangeably. But in maths there’s a big difference between the two, a difference we can appreciate by venturing into the strange realm of fractional dimensions.