Do Dice Play God?: The Mathematics of Uncertainty

17 EXPLOITING UNCERTAINTY

The best laid schemes o’ mice an’ men

Gang aft agley.

Robert Burns, To a Mouse

UNTIL NOW, I’VE GENERALLY DISCUSSED uncertainty as a problem; something that makes it difficult for us to understand what’s likely to happen in the future, something that makes it possible for all of our best-laid plans to go ‘agley’, that is, wrong. We’ve investigated where the uncertainty comes from, what form it takes, how to measure it, and how to mitigate its effects. What I haven’t done is to look at how we can use it. In fact, there are many circumstances in which a bit of uncertainty acts to our advantage. So although uncertainty is usually seen as a problem, it can also be a solution, though not always to the same problem.

THE MOST DIRECT USE OF RANDOMNESS occurs in the solution of mathematical problems that seem intractable by direct methods. Instead of simulating the solution and then sampling many runs to estimate the uncertainties involved, this approach turns the whole thing upside down, by making many sample simulations and inferring the solution from them. This is the Monte Carlo method, named after the famous casino.

The traditional toy example is finding the area of a complicated shape. The direct method is to cut the shape into pieces whose areas can be calculated by known formulas, and add the results together. More complex shapes can be handled using integral calculus, which essentially does the same thing, approximating the shape by lots of very thin rectangles. The Monte Carlo approach is very different. Put the shape inside a border whose area is known, say a rectangle. Throw lots of random darts at the shape, and count the proportion that hit it, compared to all the darts that hit the rectangle. If, say, the rectangle has area one square metre, and the darts hit the shape 72% of the time, its area must be somewhere around 0·72 square metres.

This method comes with a lot of small print. First, it’s best when seeking a ballpark figure. The result is approximate, and we need to estimate the likely size of the error. Second, the darts need to be distributed uniformly over the rectangle. A good darts player, aiming at the shape, might hit it every time. What we want is a very poor darts player who sprays the darts all over the place, with no preferred direction. Third, bad estimates sometimes occur by chance. However, there are advantages too. The poor darts player can be arranged using tables of random numbers or, better still, computer calculations. The method works in higher dimensions – the volume of a complex three-dimensional space or a more conceptual ‘volume’ in many more dimensions. In mathematics, high-dimensional spaces abound, and they’re not mysterious: just a geometric language for talking about large numbers of variables. Finally, it’s usually far more efficient than a direct method.

Monte Carlo methods were invented (in the sense of being explicitly recognised as a general technique) by Stanislaw Ulam in 1946, while he was working on nuclear weapons at the Los Alamos National Laboratory in the USA. He was getting over an illness, passing the time by playing a form of patience called Canfield solitaire. Being a mathematician, he wondered whether he could work out the chance of winning by applying combinatorics and probability theory. Having tried and failed, he ‘wondered whether a more practical method than “abstract thinking” might be to lay it out, say, one hundred times and simply observe and count the number of successful plays’.

The computers of the time were good enough to do the sums. But since he was also a mathematical physicist, Ulam immediately started wondering about the big problems that were holding back progress in nuclear physics, such as how neutrons diffuse. He realised that the same idea would provide practical solutions whenever a complicated differential equation could be reformulated as a random process. He passed the idea on to von Neumann, and they tried it on a real problem. It needed a code name, and Nicholas Metropolis suggested ‘Monte Carlo’, a favourite haunt of Ulam’s gambling uncle.

Monte Carlo methods were vital to the development of the hydrogen bomb. From some points of view the world might well be a better place if Ulam had never had this insight, and I hesitate to advance nuclear weapons as a reason for doing mathematics. But it does illustrate the devastating power of mathematical ideas, and a powerful exploitation of randomness.

IRONICALLY, THE MAIN OBSTACLE TO the development of Monte Carlo methods was getting a computer to behave randomly.

Digital computers are deterministic. Give one a program, and it will carry out every instruction to the letter. This feature has led exasperated programmers to invent the spoof command DWIT (Do What I’m Thinking), and users to wonder about artificial stupidity. But this determinism also makes it hard for computers to behave randomly. There are three main solutions. You can engineer in some non-digital component that behaves unpredictably; you can provide inputs from some unpredictable real-world process such as radio noise; or you can set up instructions to generate pseudorandom numbers. These are sequences of numbers that appear to be random, despite being generated by a deterministic mathematical procedure. They’re simple to implement and have the advantage that you can run exactly the same sequence again when you’re debugging your program.

The general idea is to start by telling the computer a single number, the ‘seed’. An algorithm then transforms the seed mathematically to get the next number in the sequence, and the process is repeated. If you know the seed and the transformation rule, you can reproduce the sequence. If you don’t, it may be hard to find out what the procedure is. SatNav (the Global Positioning System) in cars makes essential use of pseudorandom numbers. GPS requires a series of satellites sending out timing signals, which are received by the gadget in your car, and analysed to work out where your car is. To avoid interference, the signals are sequences of pseudorandom numbers, and the gadget can recognise the right signals. By comparing how far along the sequence the message arriving from each satellite has got, it computes the relative time delays between all the signals. That gives the relative distances of the satellites, from which your position can be found using old-fashioned trigonometry.

In our post–chaos theory world, the existence of pseudorandom numbers is no longer paradoxical. Any chaotic algorithm can generate them. So can algorithms that technically aren’t chaotic. Many practical ones eventually start repeating exactly the same sequence of numbers over and over again, but if it takes a billion steps before that happens, who cares? An early algorithm involved starting with a large integer seed, say

554,378,906

Now square it, getting

307,335,971,417,756,836

There are regular mathematical patterns in squares of numbers near both ends. For instance, here the final digit 6 happens again because 6² = 36 also ends in 6. And you can predict that the first digit must be 3 because 55² = 3025 starts with 3. Patterns of this kind aren’t random, so to avoid them we chop off the ends (delete the rightmost six, then keep the next nine, say) to leave

335,971,417

Now square that to get

112,876,793,040,987,889

but keep only the middle chunk

876,793,040

Now repeat.

One theoretical issue with this recipe is that it’s very hard to analyse it mathematically, to find out whether it really does behave like a random sequence. So different rules are generally used instead, the commonest being linear congruential generators. Here the procedure is to multiply the number by some fixed number, add another fixed number, but then reduce everything to its remainder on dividing by some specific big number. For efficiency, do everything in binary arithmetic. A big advance was the invention of the Mersenne twister in 1997 by Makoto Matsumoto and Takuji Nishimura. It’s based on the number 2^19,937 – 1. This is a Mersenne prime – a prime number that is one less than a power of 2, a number-theoretic curiosity going back to the monk Marin Mersenne in 1644. In binary it consists of 19,937 consecutive 1s. The transformation rule is technical. The advantage is that the sequence of numbers it generates repeats only after 2^19,937 – 1 steps, which is a number with 6002 digits, and sub-sequences of up to 623 numbers are uniformly distributed.

Faster and better pseudorandom number generators have been developed since. The same techniques are useful for internet security, being used to encrypt messages. Each step in the algorithm can be viewed as ‘encoding’ the previous number, and the objective is to construct cryptographically secure pseudorandom number generators, which generate their numbers using a code that is provably difficult to break. The precise definition is more technical.

I’VE BEEN A BIT NAUGHTY.

Statisticians go to some lengths to explain that randomness is a feature of processes, not their outcomes. If you toss a fair dice ten times in a row, you might get 6666666666. In fact, on average that should happen once in every 60,466,176 trials.

However, there’s a slightly different sense in which ‘random’ does sensibly apply to the outcomes, so that a sequence like 2144253615 is more random than 6666666666. The difference is best formulated for very long sequences, and it characterises a typical sequence created by a random process. Namely, all of the expected statistical features of the sequence should be present. Every digit 1–6 should appear roughly 1/6 of the time; every sequence of two successive digits should occur roughly 1/36 of the time, and so on. More subtly, there shouldn’t be any long-range correlations; what happens in a given position and one two steps away, say, shouldn’t repeat any pair of numbers significantly more often than any other. So we rule out something like 3412365452 because odd and even numbers alternate.

The mathematical logician Gregory Chaitin introduced the most extreme form of such conditions in his theory of algorithmic information. In conventional information theory, the amount of information in a message is the number of binary digits (‘bits’) needed to represent it. So a message 1111111111 contains ten bits of information, and so does 1010110111. Chaitin focused not on the sequence, but on the rules that can generate it – the algorithms to produce it. Those can also be coded in binary, in some programming language, precisely which isn’t terribly important when the sequences get long enough. If the repeated 1s go on for, say, a million terms, then the sequence looks like 111...111 with a million 1s. The program ‘write down 1 one million times’, encoded into binary in any sensible way, is much shorter. The length of the shortest program that can generate a given output is the algorithmic information contained in that sequence. This description ignores some subtleties, but it will suffice here.

The sequence 1010110111 looks more random than 1111111111. Whether it is depends on how it continues. I chose it to be the first ten binary digits of p. Suppose it continues like that for a million digits. It will look extremely random. But the algorithm ‘compute the first million binary digits of p’ is much shorter than a million bits, so the algorithmic information in the million-digit sequence is a lot less than one million bits, even though p’s digits satisfy all standard statistical tests for randomness. What they don’t satisfy is ‘different from the digits of π’. No one in their right mind would use the digits of p as an encryption system; the enemy would soon work it out. On the other hand, if the sequence were to continue in a genuinely random manner, instead of coming from p, it would probably be hard to find a shorter algorithm that generates it.

Chaitin defined a sequence of bits to be random if it’s incompressible. That is, if you write down an algorithm to generate it up to a given position, then the algorithm is at least as long as the sequence, provided the number of digits gets very big. In conventional information theory, the amount of information in a binary string is the number of bits it contains – its length. The algorithmic information in a binary sequence is the length of the shortest algorithm that generates it. So the algorithmic information in a random sequence is also its length, but the algorithmic information in the digits of p is the length of the most compact program that generates it. That’s much smaller.

Using Chaitin’s definition, we can sensibly say that a specific sequence is random. He proved two interesting things about random sequences:

Random sequences of 0s and 1s exist. Indeed, almost every infinite sequence is random.

If a sequence is random, you can never prove it.

The proof of the first comes from counting how many sequences of given length there are, compared to shorter programs that might generate them. For instance, there are 1024 ten-bit sequences, but only 512 nine-bit programs, so at least half the sequences can’t be compressed using shorter programs. The proof of the second is basically that if you can prove a sequence is random, the proof compresses the data in it, so it’s not.

NOW, SUPPOSE YOU WANT TO generate a sequence, and you want to be convinced that it truly is random. Maybe you’re setting up the key to some encryption scheme, for example. Apparently, Chaitin’s work rules such a thing out. But in 2018 Peter Bierhorst and coworkers published a paper showing that you can get round this restriction using quantum mechanics.⁹⁸ Essentially, the idea is that quantum indeterminacy can be translated into specific sequences, with a physical guarantee that they’re random in Chaitin’s sense. That is, no potential enemy can deduce the mathematical algorithm that creates them – because there isn’t one.

It might seem that the security of a random number generator can be guaranteed only if it satisfies two conditions. The user must know how the numbers are generated, otherwise they can’t be sure that it generates truly random numbers. And the enemy must be unable to deduce the internal workings of the random number generator. However, there’s no way to satisfy the first condition in practice using a conventional random number generator, because whatever algorithm it implements, it might go wrong. Keeping an eye on its internal workings might do the trick, but that’s usually not practical. The second condition violates a basic principle of cryptography called Kerckhoff’s principle: you must assume that the enemy knows how the encoding system works. Just in case they do. Walls have ears. (What you hope they don’t know is the decoding system.)

Quantum mechanics leads to a remarkable idea. Assuming no deterministic hidden-variable theory exists, you can create a quantum- mechanical random number generator that is provably secure and random, such that the two conditions above both fail. Paradoxically, the user doesn’t know anything about how the random number generator works, but the enemy knows this in complete detail.

The device uses entangled photons, a transmitter, and two receiving stations. Generate pairs of entangled photons with highly correlated polarisations Send one photon from each pair to one station, and the other to a second station. At each station, measure the polarisation. The stations are far enough apart that no signal can travel between them while they make this measurement, but by entanglement the polarisations they observe must be highly correlated.

Now comes the nifty footwork. Relativity implies that the photons can’t be used as a faster-than-light communicator. This implies that the measurements, though highly correlated, must be unpredictable. The rare occasions on which they disagree must therefore be genuinely random. Violation of Bell inequalities, coming from entanglement, guarantees that the outcomes of these measurements are random. The enemy must agree with this assessment, whatever they know about the process used by the random number generator. The user can test for violations of Bell inequalities only by observing the statistics of the random number generator outputs; the internal workings are irrelevant for this purpose.

This general idea has been around for a while, but Bierhorst’s team carried it out experimentally, using a set-up that avoids known loopholes in Bell inequalities. The method is delicate and the violations of Bell inequalities are slight, so it takes a long time to produce a sequence that’s guaranteed to be random. Their experiment was like generating a random sequence of two equally likely outcomes by tossing a coin that generates heads 99.98% of the time. It can be done, by analysing the sequence after it has been generated. Here’s one way. Run along the sequence until you reach the first point at which consecutive tosses are different: HT or TH. These pairs have the same probability, so you can consider HT to be ‘heads’ and TH to be ‘tails’. If the probability of H is very big, or very small, you’ll reject most of the data in the sequence, but what’s left acts like a fair coin.

Running the experiment for ten minutes involved observing 55 million pairs of photons and led to a random sequence 1024 bits long. Conventional quantum random number generators, though not provably secure, generate millions of random bits per second. So right now, the extra security that this method guarantees isn’t worth the hassle. Another problem is the size of the set-up: the two stations are 187 metres apart. This isn’t a gadget you can carry in your briefcase, let alone put in a mobile phone. Miniaturising the set-up seems difficult, and putting it on a chip seems out of the question for the foreseeable future. Still, the experiment provides a proof of concept.

RANDOM NUMBERS (I’LL DROP THE ‘pseudo’ now) are used in a huge variety of applications. Innumerable problems in industry and related areas involve optimising some procedure to produce the best possible result. For example, an airline may wish to timetable its routes so that it uses the smallest number of aircraft, or to use a given number of aircraft to cover as many routes as possible. Or, more precisely, to maximise the profit that arises. A factory may wish to schedule maintenance of its machines to minimise the ‘down time’. Doctors might want to administer a vaccine so that it can be most effective.

Mathematically, this kind of optimisation problem can be represented as locating the maximum value of some function. Geometrically, this is like finding the highest peak in a landscape. The landscape is usually multidimensional, but we can understand what’s involved by thinking of a standard landscape, which is a two-dimensional surface in three-dimensional space. The optimum strategy corresponds to the position of the highest peak. How can we find it?

The simplest approach is hill climbing. Start somewhere, chosen however you wish. Find the steepest upward path and follow it. Eventually you’ll reach a point where you can’t go any higher. This is the peak. Well, maybe not. It’s a peak, but it need not be the highest one. If you’re in the Himalayas and climb the nearest mountain, it probably isn’t Everest.

Hill climbing works well if there’s only one peak, but if there are more, the climber can get trapped on the wrong one. It always finds a local maximum (nothing else nearby is higher), but maybe not a global one (nothing else is higher). One way to avoid getting trapped is to give the climber a kick every so often, teleporting them from one location to another one. If they’re stuck on the wrong peak, this will get them climbing a different one, and they’ll climb higher than before if the new peak is higher than the old one, and they don’t get kicked off it too soon. This method is called simulated annealing, because of a metaphorical resemblance to the way atoms in liquid metal behave as the metal cools down, and finally freezes to a solid. Heat makes atoms move around randomly, and the higher the temperature, the more they move. So the basic idea is to use big kicks early on, and then reduce their size, as if the temperature is cooling down. When you don’t initially know where the various peaks are, it works best if the kicks are random. So the right kind of randomness makes the method work better. Most of the cute mathematics goes into choosing an effective annealing schedule – the rule for how the size of kicks decreases.

ANOTHER RELATED TECHNIQUE, WHICH CAN solve many different kinds of problem, is to use genetic algorithms. These take inspiration from Darwinian evolution, implementing a simple caricature of the biological process. Alan Turing proposed the method in 1950 as a hypothetical learning machine. The evolutionary caricature goes like this. Organisms pass on their characteristics to their offspring, but with random variation (mutation). Those that are fitter to survive in their environment live to pass on their characteristics to the next generation, while the less fit ones don’t (survival of the fittest or natural selection). Continue selecting for enough generations, and the organism gets very fit indeed – close to optimal.

Evolution can be modelled rather coarsely as an optimisation problem: a population of organisms wanders randomly around a fitness landscape, climbing the local peaks, and the ones that are too low down die out. Eventually the survivors cluster around a peak. Different peaks correspond to different species. It’s far more complicated, but the caricature is sufficient motivation. Biologists make a big song and dance about evolution being inherently random. By this they mean (entirely sensibly) that evolution doesn’t start out with a goal and aim towards it. It didn’t decide millions of years ago to evolve human beings, and then keep choosing apes that got closer and closer to that ideal until it reached perfection in us. Evolution doesn’t know ahead of time what the fitness landscape looks like. In fact, the landscape itself may change over time as other species evolve, so the landscape metaphor is somewhat strained. Evolution finds out what works better by testing different possibilities, close to the current one but randomly displaced. Then it keeps the better ones and continues the same procedure. So the organisms keep improving, step by tiny step. That way, evolution simultaneously constructs the peaks of the fitness landscape, finds out where they are, and populates them with organisms. Evolution is a stochastic hill-climbing algorithm, implemented in wetware.

A genetic algorithm mimics evolution. It starts with a population of algorithms that try to solve a problem, randomly varies them, and selects those that perform better than the rest. Do this again for the next generation of algorithms, and repeat until you’re happy with the performance. It’s even possible to combine algorithms in a parody of sexual reproduction, so that two good features, one from each of two different algorithms, can be united in one. This can be seen as a kind of learning process, in which the population of algorithms discovers the best solution by trial and error. Evolution can be seen as a comparable learning process, applied to organisms instead of algorithms.

There are hundreds of applications of genetic algorithms, and I’ll mention just one to give a flavour of how they work. University timetables are highly complex. Hundreds of lectures must be scheduled so that thousands of students can follow innumerable different courses of study. It’s common for students to take optional courses in different subjects; the American ‘major’ and ‘minor’ subject areas are a simple example. The lectures have to be scheduled to avoid clashes, where a student is required to listen to two lectures at the same time, and to avoid cramming three lectures on the same topic into consecutive slots. A genetic algorithm starts with a timetable and finds out how many clashes or triple-headers there are. Then it modifies the timetable randomly seeking a better one, and repeats. It might even be possible to combine parts of two fairly successful timetables together, mimicking sexual recombination.

SINCE WEATHER FORECASTING HAS INHERENT limitations, how about weather control? Don’t ask whether rain is going to ruin the picnic or the invasion: make sure it doesn’t.

Folk tales in Northern Europe maintained that discharging cannons can prevent hailstorms. Anecdotal evidence for this effect appeared after several wars, including the Napoleonic war and the American Civil War: every time there’s a big battle, it rains afterwards. (If that convinces you, I’ve got a really nice bridge, going cheap.) Towards the end of the 19th century the US Department of War spent $9000 on explosives and set them off in Texas. Nothing with any scientific validity was observed. Seeding clouds with silver iodide particles, which are very fine and in theory provide nuclei around which water vapour can condense, is widely used today to cause rain to fall, but when this works it’s still not clear whether the rain would have fallen anyway. Several attempts have been made to weaken hurricanes by injecting silver iodide into their eyewalls, and again results have been inconclusive. The National Oceanographic and Atmospheric Administration of the USA has been exploring theoretical ideas to stop hurricanes, such as firing lasers at the storms that might give rise to them, triggering lightning discharges and dissipating some of the energy in the storms. And there are conspiracy theories that climate change isn’t caused by human production of CO₂, but it’s some sinister secret group controlling the weather to disadvantage America.

Randomness comes in many forms, and chaos theory tells us that a butterfly flap can radically change the weather. We’ve discussed the sense in which this is true: ‘change’ really means ‘redistribute and modify’. When von Neumann was told of this effect, he pointed out that as well as making weather unpredictable, it potentially makes it controllable. To redistribute a hurricane, find the right butterfly.

We can’t do this for a hurricane or a tornado. Not even a light drizzle. But we can do it for the electrical waves in a heart pacemaker, and it’s widely used to plan fuel-efficient space missions when time isn’t vital. In both cases, the main mathematical effort goes into selecting the right butterfly. That is, sorting out how, when, and where to interfere very slightly with the system to obtain the desired result. Edward Ott, Celso Grebogi, and James Yorke worked out the basic mathematics of chaotic control in 1990.⁹⁹ Chaotic attractors typically contain huge numbers of periodic trajectories, but these are all unstable: any slight deviation from one of them grows exponentially. Ott, Grebogi, and Yorke wondered whether controlling the dynamical system in the right way can stabilise such a trajectory. These embedded periodic trajectories are typically saddle points, so that some nearby states are initially attracted towards them, but others are repelled. Almost all of the nearby states eventually cease to be attracted and fall into the repelling regions, hence the instability. The Ott–Grebogi–Yorke method of chaotic control repeatedly changes the system by small amounts. These perturbations are chosen so that every time the trajectory starts to escape, it’s recaptured: not by giving the state a push, but by modifying the system and moving the attractor, so the state finds itself back on the in-set of the periodic trajectory.

A human heart beats fairly regularly, a periodic state, but sometimes fibrillation sets in and the heartbeat becomes seriously irregular – enough to cause death if not stopped quickly. Fibrillation occurs when the regular periodic state of the heart breaks up to give a special type of chaos: spiral chaos, in which the usual series of circular waves travelling across the heart falls apart into a lot of localised spirals.¹⁰⁰ A standard treatment for irregular heartbeats is to fit a pacemaker, which sends electrical signals to the heart to keep its beats in sync. The electrical stimuli supplied by the pacemaker are quite large. In 1992 Alan Garfinkel, Mark Spano, William Ditto, and James Weiss reported experiments on tissue from a rabbit heart.¹⁰¹ They used a chaotic control method to convert spiral chaos back into regular periodic behaviour, by altering the timing of the electrical pulses making the heart tissue beat. Their method restored regular beats using voltages far smaller than those in conventional pacemakers. In principle, a less disruptive pacemaker might be constructed along these lines, and some human tests were carried out in 1995.

Chaotic control is now common in space missions. The dynamical feature that makes it possible goes right back to Poincaré’s discovery of chaos in three-body gravitation. In the application to space missions, the three bodies might be the Sun, a planet, and one of its moons. The first successful application, proposed by Edward Belbruno in 1985, involved the Sun, Earth, and the Moon. As the Earth circles the Sun and the Moon circles the Earth, the combined gravitational fields and centrifugal forces create an energy landscape with five stationary points where all the forces cancel out: one peak, one valley, and three saddles. These are called Lagrange points. One of them, L₁, sits between the Moon and the Earth, where their gravitational fields and the centrifugal force of the Earth circling the Sun cancel out. Near this point, the dynamics is chaotic, so the paths of small particles are highly sensitive to small perturbations.

A space probe counts as a small particle here. In 1985 the International Sun–Earth Explorer ISEE-3 had almost completely run out of the fuel used to change its trajectory. If it could be transported to L₁ without using up much fuel, it would be possible to exploit the butterfly effect to redirect it to some distant objective, still using hardly any fuel. This method allowed the satellite to rendezvous with comet Giacobini–Zinner. In 1990 Belbruno urged the Japanese space agency to use a similar technique on their probe, Hiten, which had used up most of its fuel completing its main mission. So they parked it in a lunar orbit and then redirected it to two other Lagrange points to observe trapped dust particles. This kind of chaotic control has been used so often in unmanned space missions that it’s now a standard technique when fuel efficiency and lower costs are more important than speed.