8

Trust in Numbers

On 5 October 1960 the President of IBM, Thomas J. Watson Jr, was among three business leaders being shown around the war room of NORAD, the North American Air Defense headquarters in Colorado Springs. Above the maps on the wall was an alarm indicator connected to America’s ballistic-missile early-warning system in Thule, Greenland. The businessmen were told that a ‘1’ flashing on the alarm meant nothing much, but higher numbers were more serious. As they watched, the numbers started to rise. When the display reached ‘5’, an air of panic descended on the room, and the alarmed businessmen were hustled out of the war room into a nearby office, without further explanation. A ‘5’ was the highest alarm level possible. In this case it meant, with an estimated 99.9 per cent certainty, a massive incoming missile attack from Siberia. The chiefs of staff were urgently contacted and asked for immediate instructions regarding a response. With minutes left before the probable obliteration of the first US cities, the Canadian vice-commander of NORAD asked where Khrushchev was. The discovery that he was at the UN in New York was enough to justify a pause before authorizing full nuclear retaliation. And then it emerged that the all-out attack detected by the computer system was in fact the moon rising over Norway.1

What saved us from Armageddon was the wisdom to query what the computer model was saying, rather than simply giving in to the seemingly overwhelming force of its predictions. It could have been different: the persuasive power of that extreme number, 99.9 per cent, could have led us to set all doubts aside.

This, indeed, is exactly what happened, repeatedly, in the run-up to the financial crash of 2008. Legions of very smart people in all the world’s financial centres used computer models which were producing extreme numbers, numbers that contradicted the evidence emerging from the world around them. In August 2007 David Viniar, the CFO of Goldman Sachs, was quoted in the Financial Times, saying, ‘We were seeing things that were 25-standard deviation moves, several days in a row.’ Translated, Viniar was saying that in some markets recent price movements had been so extreme that they should never happen according to the predictions of Goldman’s models. ‘Never happen’ is a simplification: Goldman’s models implied that one of these market gyrations could occur, but it was highly unlikely, an extreme unlikeliness which is nearly impossible to describe. I will try. A 25-standard deviation event is one which happens much less often than once in the history of the universe since the Big Bang. It is as unlikely as winning the jackpot in the UK national lottery twenty-one times in a row.2

Yet such an event had occurred not just once but repeatedly, several days in a row. But rather than abandon the models, given their falsification by reality, Goldman Sachs, along with many other investment banks, continued to rely on them to quantify various kinds of uncertainty. Maybe they thought, as financial experts through the ages have repeatedly asserted, ‘This time is different.’

Why do very smart people put their trust in numerical measures of uncertainty, even when faced with conflicting empirical evidence? Most of the smart people in the world’s financial sectors are familiar with modern finance theory, an offshoot of economics. And the error of trying to measure the extent of unknowable uncertainty in numerical terms has been widely discussed in economics for decades, since a RAND analyst called Daniel Ellsberg published an experiment back in 1961 (just after that nuclear near-miss the previous October). Ellsberg’s experiment perfectly pinpointed the strangeness of attempting to measure pure uncertainty in numerical terms – that is, using odds or probabilities.

Suppose there are two boxes: box A holds fifty red and fifty black balls; box B also contains a total of one hundred red and black balls, but in a ratio which is unknown. A ball is to be drawn at random from one box, but you can choose the box. You win $100 if the ball is red. Which box do you choose?

There is no need to spend much time thinking about this: it is not a trick question. Nor is the follow-up question:

Now suppose you win $100 if the ball drawn is black, but the boxes are left unchanged; will you change your choice of box?

The purpose of Ellsberg’s experiment was not to ridicule our ordinary thinking. On the contrary, its target was the gap between our ordinary thinking and the intellectual orthodoxy perpetuated by economists, decision analysts, game-theorists and other peddlers of theories about how we ought to think. Faced with the choice between box A and box B, most of us choose box A initially and do not change our choice after the prize is switched to the draw of a black ball. That people choose box A, and stick with it, seems easy to explain: we like to know the odds we face, so we choose box A both times, rather than the pure uncertainty of box B.

But the dominant theory of decision-making under uncertainty holds that this is a confusing, inconsistent, irrational way to think and choose. According to this dominant view – an approach with its intellectual roots in economics but nowadays mainstream in a wide variety of applied fields from finance to epidemiology – the only rational way to choose in situations of uncertainty is to think in terms of numerical odds or probabilities.

This seems unarguable in circumstances where the probabilities are obvious. When I toss a coin, I should know it will come up heads about half the time. But the dominant view is that we should always think in terms of probabilities, even when we have absolutely no basis for knowing what they are. When we don’t know the probabilities, we should just invent them. And if people are rational, then we can infer the probabilities they have invented by observing their choices. So according to the orthodox view, in Ellsberg’s experiment, if you choose A in the initial choice, that must mean you believe it contains more red balls than box B – in other words, you believe the probability of a red ball being drawn is higher. It follows that you believe box A contains fewer black balls than box B, so when the prize is switched to black you should switch your choice to box B. The economists were so mystified by the realization that hardly anyone thinks like this that Ellsberg’s experiment was dubbed the Ellsberg Paradox. As we will soon see, it has revolutionary implications for thinking about uncertainty, and the world would be different if the guys at Goldman Sachs, and the finance theory on which they rely, had taken these implications seriously.

Daniel Ellsberg was an unlikely source for such a fundamental challenge to intellectual orthodoxy: we would expect a RAND analyst to defend the mathematical measurement of uncertainty, rather than question it. Ellsberg’s golden-boy education and early career made him a natural fit at the RAND Corporation. He graduated with highest honours in economics from Harvard, then, following graduate study and a period serving as a lieutenant in the Marines, he divided his time between RAND and Harvard. In 1964, he moved to work at the highest levels of the Pentagon. It is tempting to suppose that this reassuringly elite CV would encourage the usually cautious and conservative economic theorists at RAND and Harvard to embrace the radical implications of the Ellsberg Paradox.

But something else happened. As far as the US establishment was concerned, Ellsberg’s career went terribly wrong. He became the personal target of the notorious ‘White House Plumbers’, the Nixon aides who would later commit the Watergate burglaries. Secretary of State Henry Kissinger publicly branded him ‘the Most Dangerous Man in America’.fn1 Perhaps it is not so surprising that the Ellsberg Paradox was largely ignored for a generation.

PROBABILITY GETS PERSONAL

Probability is not a new idea. The idea that we humans need not be passive in the face of the future but can instead take control of our fate was seen as a stand against the gods, against unfathomable nature. In Western society it is no coincidence that modern ideas of probability became firmly established in the late eighteenth century, just as religion’s grip began to loosen with the triumph of the Enlightenment. Probabilistic thinking stood for confident progress beyond stoical submission to uncertainty. Like so much else, that confidence ended in 1914.

The impact of the assassination of Archduke Ferdinand on 28th June that year was not immediately clear. It took a month for the financial markets to realize the implications, and the resulting panic led to the closure of the London and New York stock exchanges on 31st July. By this time, several European countries had already declared war on each other. Two days later the Governor of the Bank of England was holidaying on a yacht off the coast of Scotland. ‘There’s talk of a war,’ he said to his friends, ‘but it won’t happen.’3 Two days after that Britain declared war on Germany.

Surprises happen. With the First World War, uncertainty was back, loud and kicking. In the war’s aftermath, four carnage-filled years later, when the prevailing mood oozed with unease and uncertainty about the future, two economists expressed reservations about orthodox probabilistic thinking. Chicago economist Frank Knight drew a crucial distinction between what he called ‘measurable’ and ‘unmeasurable’ uncertainty. Measurable uncertainty, he stated, includes games of chance and, more generally, situations in which probabilities can be calculated based on the relative frequency of events. Unmeasurable uncertainty is everything else: situations of pure uncertainty in which no relative-frequency information is available to us. In Britain, John Maynard Keynes drew a similar distinction to Knight, although it was embedded in an idiosyncratic and controversial theory of probability that arguably obscured Keynes’s discussion. Neither Knight nor Keynes can obviously be credited as the first to draw the distinction: both published their work in 1921, and in any case the distinction has been around far longer, for instance implicit in the difference between the French words hasard and fortuit.fn2 But both Knight and Keynes reminded economists and other thinkers interested in decision-making of the fundamental limit to probabilistic thinking: in some situations we don’t have information about probabilities, so uncertainty is unmeasurable.

Yet within just a few years, the distinction between measurable and unmeasurable uncertainty seemed to be made redundant by a paper titled ‘Truth and Probability’ from a wunderkind called Frank Ramsey.

The archetypal ‘Renaissance man’, Leonardo da Vinci, was not just a genius but a polymath with gifts in many different fields. But the accumulation of knowledge since then has made specialization unavoidable. Renaissance man may now be extinct but, arguably, Frank Ramsey was one of the last of them. He moved in circles of genius: his major intellectual interactions were with Keynes and Ludwig Wittgenstein. Both of them learned from Ramsey, who made profound and original contributions to philosophy, economics and mathematics (a branch of which is now called Ramsey Theory). Take philosophy. Thinking he had cracked the major problems of philosophy with his first book, Tractatus Logico-Philosophicus, Wittgenstein abandoned the discipline to become an elementary-school teacher in a small village outside Vienna. It was Ramsey who persuaded Wittgenstein that the Tractatus did not in fact solve everything. Wittgenstein returned to England and Cambridge so he could work with Ramsey (even in preference to working with the greatest philosopher in Cambridge at that time, Bertrand Russell). Ramsey had got to know Wittgenstein in the first place because he had produced the first English translation of Wittgenstein’s Tractatus, a book that several leading philosophers had declared impossible to translate, given Wittgenstein’s abstruse and compressed German. Ramsey simply dictated the translation directly to a shorthand writer in the Cambridge University typing office. Ramsey was eighteen at the time.

A few months later Ramsey turned to thinking about probability and uncertainty. He began by criticizing Keynes’s theory of probability. Keynes doggedly defended his contentious theory for ten years, until he reviewed Ramsey’s paper ‘Truth and Probability’ in 1931, after which Keynes changed his mind: ‘I yield to Ramsey – I think he is right.’ But in one sense it was too late. Ramsey’s intellectual triumph was lost in a far greater tragedy: in January 1930 he died unexpectedly of complications after routine surgery, aged twenty-six.

In ‘Truth and Probability’, Ramsey took theories of probability in a completely new direction. Instead of defining probability in a backwards-looking way, in terms of observed frequencies (a coin comes up heads about half the time), Ramsey saw probability as forwards-looking, a quantitative measure of the strength of someone’s belief in a future event. If I believe an event is 100 per cent certain to happen, this is equivalent to saying that, for me, the probability of the event is one; if I believe it is certain not to happen, the probability of the event for me is zero. And, analogously, for all beliefs between these two extremes: for example, if I believe an event is as likely to happen as not, the probability for me is one half. Ramsey explained how these beliefs-as-probabilities (‘personal probabilities’) could in principle be measured – essentially, by finding the least favourable odds a person would be willing to accept in a bet over whether the uncertain future event will happen. If Ramsey’s approach works, then the idea of probability suddenly becomes much bigger: its scope is dramatically widened. Probabilities based on observed frequencies can be known only if there is data on events which have been observed frequently. Personal probabilities are freed of these constraints and can in principle be used to quantify Knight’s ‘unmeasurable’ pure uncertainty, including one-off, non-repeatable events. But Ramsey’s approach assumes that a person will be perfectly consistent in their personal probabilities – and it is just this consistency which, Ellsberg showed, cannot be relied upon. Ellsberg’s experiment demonstrated that if we interpret people’s choices as reflecting their personal probabilities, then those personal probabilities may be inconsistent.

However, Ellsberg’s work lay decades in the future. In the meantime, Ramsey’s revolutionary ideas offered obvious, tantalizing possibilities for developing a science of society and had persuaded the most influential economist of the era in Maynard Keynes. Yet nothing happened. Ramsey’s ideas about probability were almost completely ignored for fifty years. Why?

To begin with, Ramsey’s ideas appeared as a posthumous contribution to the philosophy of belief. They were not read by economists or others interested in practical applications towards a science of society, and Ramsey was no longer there to promote them more widely. And Ramsey was truly ahead of his time: existing theories and thinkers did not catch up with, and connect with, most of his ideas in mathematics, economics and philosophy until around the 1960s. In the 1920s, no one realized the true originality and significance of his work: it was too different, too hard to relate to existing ideas. This was not helped by Ramsey’s presentational style: his mathematics used unusual notation derived from Bertrand Russell, his mathematical proofs were concise, verging on cryptic, while the modesty in his philosophical writing made his ideas seem almost light and flippant, not deep and profound. Wittgenstein’s reputation was bolstered – especially among people who had no idea what he was talking about – by lines such as the momentous final sentence of the Tractatus: ‘Whereof one cannot speak, thereof one must be silent.’ Now compare Ramsey’s version: ‘What we can’t say we can’t say, and we can’t whistle it either.’4

As for the handful of people in Cambridge who had come to grips with Ramsey’s ideas about probability, their attention was distracted by Keynes and Wittgenstein, both in Cambridge at the time. And Keynes’s views on probability had moved on. The virtue he had once seen in Ramsey’s ideas was by the mid-1930s overshadowed by his awareness of the ubiquity of pure, unmeasurable uncertainty in practice:

By ‘uncertain knowledge’, let me explain, I do not mean merely to distinguish what is known for certain from what is merely probable. The game of roulette is not subject, in this sense, to uncertainty … The sense in which I am using the term is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence, or the obsolescence of a new invention … About these matters there is no scientific basis on which to form any calculable probability whatsoever. We simply do not know.5

Keynes wrote these words in 1937, in response to criticisms of his General Theory, a book that represented unquestionably the biggest development in economics in the twentieth century – effectively inventing modern macroeconomics – and probably the most important contribution to economics since Adam Smith’s The Wealth of Nations over 150 years earlier. Keynes’s emphasis on the importance of economic and political uncertainty was, unsurprisingly, hugely influential.

But not influential enough: the Keynesian view of uncertainty is not our contemporary orthodoxy. Instead the Second World War and its immediate aftermath nurtured a renewed optimism about a science of society. (Ironically, Keynesian economics may have even bolstered this optimism, with its faith in the ability to measure and manage national economic performance.) However, as we saw in Chapter 2, John von Neumann and Oscar Morgenstern’s version of a science of society was very different to Keynesian economics: they thought that Keynes was a ‘charlatan’ and economics needed to be rebuilt from the ground up on the rigorous mathematical foundations of game theory. Uncertainty had to be captured in precise numbers. Von Neumann and Morgenstern had no time for Keynesian sentiments like ‘we simply do not know’.

Literally as an afterthought, von Neumann and Morgenstern added an appendix to the second edition of their Theory of Games and Economic Behavior in which they described a mathematical theory of decision-making. It was based entirely on ideas sketched out by von Neumann on the back of an envelope. Von Neumann and Morgenstern’s theory simply assumes that the decision-maker knows the probability of every relevant future event. They were unaware of Ramsey’s work over twenty years earlier. It was not until 1951 that Ramsey’s pioneering ideas were acknowledged in print in America – by Ken Arrow, who knew a genius when he saw one. However, the next major contribution was to come not from Arrow, but a mathematician named Leonard ‘Jimmie’ Savage.fn3

Milton Friedman described Savage as ‘one of the few people I have met whom I would unhesitatingly call a genius’.6 Yet Savage had become a mathematician only by accident: he was an undergraduate in chemical engineering at the University of Michigan when his exceptionally poor eyesight led him to cause a fire in the chemistry lab and get expelled. He was allowed to return to study maths, which did not involve lab work. Shortly after completing his PhD, Savage’s mathematical gifts caught the attention of von Neumann at Princeton, who encouraged Savage to study probability and statistics. In 1954 Savage extended von Neumann’s theory of decision-making to include personal probabilities. Savage emphasized that his core ideas were essentially the same as Ramsey’s and noted that Ramsey’s work had previously had little influence. But the time was now ripe: von Neumann and Morgenstern’s new decision theory was irresistible to economists and other wannabee scientists of society who rapidly realized that it had limitless applications, because the uncertain future could always be quantified with the aid of personal probabilities.

THE COMPUTER SAYS THAT CAN’T HAVE JUST HAPPENED

In the world beyond academia, a key impact of the new decision theory – the new orthodoxy, following Savage’s work – was that it encouraged people to believe that the uncertain future can be ‘managed’ by inventing probabilities about it. By implying that personal probabilities were just as valid and legitimate as objective probabilities based on observed frequencies, the Savage orthodoxy blurred the boundary between beliefs and facts. It replaced Keynes’s ‘we simply do not know’ with some comforting probability numbers, a pretence of scientific knowledge. This urge to actuarial alchemy, dissolving the incalculable into the calculable, is strongest when everything else at stake is objective and quantitative, hence uncertainty is the only remaining obstacle to a seemingly perfectly rational mathematical decision-making process. And the urge to actuarial alchemy is even stronger when people are willing to pay a lot of money for it. These urges climax in the stock market.

Nevertheless, there is still the problem of picking the probabilities. In the stock market the obvious starting point is statistics on past performance. Clearly, stock prices do not remain constant or follow a simple trend line. The next step, then, is to describe the pattern of price statistics in mathematical terms – how those statistics are distributed. The convenient final step is to assume that the future will be like the past. Assuming the statistical distribution of past prices holds in the future, we can deduce the probability of different future outcomes for stock prices and therefore the best portfolio of stocks to hold.

Somehow you can guess it doesn’t quite work out in practice.

As well as the obvious problems such as assuming the future is like the past, there are some more technical difficulties. These deserve discussion because their implications are much more than merely technical. Errors in conventional thinking about uncertainty lead directly to the conclusion that catastrophes are extremely unlikely to occur. Catastrophes recklessly ignored on the basis of these errors include those affecting financial markets and the global climate.

Conventional thinking about uncertainty assumes that uncertain phenomena display a familiar pattern found commonly in nature, the bell curve. (This distribution or pattern is so natural and familiar that statisticians call it the normal distribution.) For example, human heights form a bell-curve distribution. There is a typical, or average, height for humans, and the number of people with other heights diminishes the further we move away from this average. If we plot these observations on a graph, with heights on the horizontal and numbers of people on the vertical, the curve through our plots would be shaped like a bell. It is not just that unusually tall or unusually short people are uncommon. They are very uncommon, while extremely tall or extremely short people are extremely rare. The odds of observing someone with some particular height fall much faster as we move further away from the average. Depending on the assumptions we make about the present population of the world, humans are about 1.67 metres, or 5 feet 7 inches, tall. The odds of being 10 centimetres taller than average (1.77 metres, 5 feet 10) are about 1 in 6.3. The odds of being 20 centimetres taller (1.87 metres, 6 feet 2) are about 1 in 44. The odds have fallen significantly, but less than sevenfold for a 10-centimetres increase in height. But when we consider a different 10-centimetres height increase, comparing people who are 2.17 metres (7 feet 1) tall to those who are 2.27 metres (7 feet 5) tall, the odds fall from about 1 in 3.5 million to 1 in 1,000 million, a 286-fold reduction in likelihood.7 Towards the edge of the bell, likelihoods fall from ‘incredibly rare’ to ‘effectively never’. Which brings us back to the 25-standard deviation events mentioned by the CFO of Goldman Sachs in August 2007.

The bank’s analysts and their computer models assumed that market prices follow a bell-curve distribution. Standard deviation is a measure of distance from the average, the middle of the bell; 25-standard deviation events are very far away from this, and so should happen less than once in the history of the universe. Events with extreme impacts, which happen completely unexpectedly (although they may seem predictable with hindsight), have been named Black Swans by Nassim Taleb, a Lebanese mathematician and sometime hedge-fund manager. Bell-curve thinking essentially assumes that the possibility of black swans can be ignored, because they will never happen.

Goldman Sachs was not alone in using bell-curve thinking. This orthodoxy dominates the financial sector and has repeatedly been approved by regulators as a basis for judging risks. The global financial crisis beginning in 2007 did not cause banks and their regulators to abandon bell-curve thinking, although they could hardly dismiss the events of that period as unprecedented ‘one-offs’. It is not just that ‘once in the history of the universe’ events happened several times a week in late 2007. Two decades previously, on 19 October 1987 – a day quickly dubbed ‘Black Monday’ – the US stock market fell almost 30 per cent. Bell-curve thinking put the odds of that at about 1 in 100000000 … (46 zeros). The same goes for the crashes associated with the 1997 crisis in East Asian markets, and the dot.com bubble. Events which finance theory says will never ever happen keep happening. So why do we keep relying on this theory?

The idea that, while we don’t yet know some number, we can guess its average or typical value, is as seductive as it is reassuring. And surely, exceptionally low or exceptionally high numbers are very unlikely, because they would need a series of exceptional reasons or causes to bring them about. This ‘common sense’ supports bell-curve thinking. There are two contexts where it works well. First, in nature, where internal or system constraints such as gravity make great extremes virtually impossible. There are basic features of human physiology which explain why no person has yet reached 3 metres in height or has lived to be 150 years old. It is not just chance. Natural constraints of course apply throughout nature. And for most of our evolution the risks we have faced have come from the natural world, not human society, so we may have evolved to use simplified bell-curve thinking as a generally reliable coping strategy in a natural environment.

Another valid application of bell-curve thinking is to situations where truly random independent events are repeated. In the real world, these situations arise only in games of chance. If an unbiased coin is tossed many times, the most likely outcome is an equal number of heads and tails. Slightly less likely is observing one more head than the number of tails, or vice versa. Less likely still is two more heads than tails, and so on. The eighteenth-century French mathematician Abraham de Moivre was the first to realize that repeated random events such as these generate a bell-curve distribution.fn4

The problems begin when we carry over these ideas to the wrong places. Again, Jimmie Savage played a key role. Savage had discovered the work of Louis Bachelier, an obscure French mathematician who in 1900 had published a ‘theory of speculation’ suggesting prices in financial markets move completely randomly. Savage produced the first English translation of Bachelier’s theory, and Bachelier’s ideas gained further attention with the emergence of the ‘efficient market hypothesis’ in the 1960s. According to the hypothesis (it remains a hypothesis today because there is no convincing evidence for its truth), at any given moment all information relevant to the price of a stock is already reflected in the price. This is ensured by assuming perfectly free markets and hyper-rational, omniscient buyers and sellers. If all relevant information is already in the price, any short-term movements in stock prices are random, reflecting the many independent, individual acts of buying and selling, akin to repeated tosses of a coin. Given this fantasy about prices in financial markets, the bell curve is the obvious tool to describe the uncertainty – just as it can describe the outcome of repeated coin tosses. By 1999 deregulation of US banks (repealing the Glass–Steagall Act from the New Deal era) was explicitly justified by reference to the efficient-market hypothesis. The computer models were so trusted to ‘manage uncertainty’ that they controlled automated buying and selling of stocks. Human intervention was removed as far as possible in order to prevent error.

One simple reason why we persist with such ways of thinking is the prospect of the alternative. Bell-curve-based analysis is relatively straightforward maths, and the risk of getting extreme outcomes can effectively be measured with just one number – the standard deviation, which describes whether the bell is tall and thin, or flat and fat. The alternative involves fiendishly difficult maths, for modest reward: even using the hard maths, the risk of a black swan cannot be measured by a single number. Indeed, there are inherent limits to how far we can measure such risks at all.

‘IT’S LIKE A MASSIVE EARTHQUAKE’

So said Kirsty McCluskey, a trader at the massive investment bank Lehmann Brothers on the day it went bust.8 And so true, because the risk of both earthquakes and the financial crisis which engulfed Lehmann Brothers can be described by the same underlying maths. Not the ‘never happen’ events at the end of a bell curve but a ‘power law’ or ‘fractal’ distribution of outcomes. Don’t worry: although much of the underlying maths is PhD level and beyond, the core ideas are more accessible. In some parts of the world earthquake activity is almost constant but at a very low level, much of it imperceptible to humans. Then, occasionally, there is an earthquake event which is hugely bigger than that background activity. We don’t speak of a normal or average earthquake, because it would be completely uninformative – just silly – to add the size of the rare earthquakes to the multitude of background activity to get an average size of earthquake event. What most of us call ‘earthquakes’ are too rare for their arithmetic mean to be a useful number. The idea that there is no normal or natural size for some uncertain phenomena such as earthquakes is more profound than it first appears.

What is the natural size of a snowflake? Your instinct that this is a silly question is correct. If there was an answer, it would surely be revealed by the physical properties of snowflakes. But if we look at snowflakes under a magnifying lens we find a different kind of property. Snowflakes are called ‘scale-invariant’ by physicists because their crystal structure looks the same no matter how much we magnify them. Snowflakes are an example of what the mathematician Benoît Mandelbrot calls fractals – structures with no natural or normal size and which recur at different scales. (Another example is trees: the pattern of branches looks like the pattern of leaves on a branch, and also the pattern of veins in a leaf). Mandelbrot noticed that prices in financial markets have this property: a graph showing the price over time of some stock or market index will look much the same, whether the time period covered is several decades, a few seconds, or anything in between. The same is true of a graph of earthquake activity.

Scale-invariance arises where there are no inbuilt system limits which prevent extreme outcomes or changes. Unlike the height, weight or lifespan of humans and most other animals, there are no physical limits to the size of earthquakes. In fractal distributions there is just a fundamental relationship describing how large events or changes are less likely than small ones. As earthquakes double in size, they become about four times less likely.9 But scale-invariance means that this relationship always holds: it does not vary with the scale of the earthquake. Unlike bell-curve phenomena such as human height, the likelihood falls at a constant speed: it does not decline faster once we consider more extreme outcomes. This technical detail is crucial. It implies that in fractal distributions extreme events are realistic possibilities, albeit extremely unlikely. They are not the ‘never happen in the history of the universe’ events at the extremes of the bell curve.

Returning to the world of finance, research suggests that as the movement in a stock-market index doubles in size it becomes approximately eight times rarer. However, this is not especially useful knowledge. It does not enable us to predict the timing of stock-market crashes, any more than we can predict the timing of earthquakes. The relationship is based on past data; we should not assume it will hold in the future. And there is a deeper problem. Although we are no longer using bell-curve thinking, we are still trying to estimate the likelihood of extremely large, extremely rare events using past data. But to estimate the frequency of events which occur extremely rarely we need an extremely large number of observations. In the case of stock markets, we have had only a few crashes over many decades, far too few data points to estimate the probability of future crashes with any confidence. The lessons of applying Mandelbrotian mathematics to stock markets are all negative: you can’t calculate the probabilities of the market moves you most want to know about (the big ones); and don’t treat the absence of a market crash for several decades as evidence that bell-curve thinking works after all. These lessons are valuable, but not valuable enough to enable people to make a career out of explaining them to investors. The delusions of quantitative ‘risk management’ based on bell-curve thinking have proved much more lucrative.

According to the risk managers and their models in the run-up to the financial crisis, various arcane financial products linked to the housing market were safe investments: the models implied that a fall in house prices of more than 20 per cent or so had a probability equivalent to ‘less than once in the history of the universe’. Yet the people selling these products must have known that a 20 per cent fall in house prices was a realistic possibility – unlikely, perhaps, but not a ‘never happen’ probability.10 It is hard to deny the obvious explanation for their wilful blindness to this reality. Greed.

Greed works in subtler ways too. People will pay handsomely if you offer to measure and manage the financial risks they face. Hopefully, they won’t notice that you can achieve this feat only by redefining ‘risk’. This redefinition of risk began with Harry Markowitz, a Chicago student waiting to see his professor to discuss a topic for his PhD, chatting with a stockbroker sharing the waiting room.11 That chance conversation led to Markowitz using bell-curve thinking to publish a paper in 1952 which (twenty years or more later) became the basis of financial orthodoxy on risk. Our everyday understanding of financial risk is clear: the possibility of losing money. Modern financial orthodoxy defines risk as volatility. A non-volatile investment is therefore a non-risky or ‘safe’ investment – even though its modest fluctuations may be consistently bigger and more frequent on the downside, so you lose money.

Of course, we cannot blame decision theorists and thinkers for the greedy, wishful thinking of bankers, financial economists and others in assuming that uncertainty can always be represented by a bell curve. But Savage’s followers are responsible for the underlying idea that turning pure uncertainty into probability numbers is a kind of alchemy for generating the best decision. It was exactly this idea that Daniel Ellsberg rejected.

CALL IN THE PLUMBERS

As a child, Dan Ellsberg was obviously academically gifted. But he focused on playing the piano – and his mother was even more focused on a musical career for him. On 4th July 1946 fifteen-year-old Dan, his sister and his parents were driving through Iowa cornfields on their way to a party in Denver. They had been driving the previous day, too, but reached their overnight accommodation too late, arriving to find the reservations cancelled and the room taken. So the family slept in the car or outside, on the dunes at Lake Michigan. Dan’s father got almost no sleep and was exhausted at the start of another long day’s drive the following morning. Just after lunch he fell asleep at the wheel. The car crashed into a wall and Dan’s mother and sister were killed instantly.12 After his mother’s death Dan’s musical ambitions soon faded. But music left its mark: Dan Ellsberg was not the narrowly focused military strategist that his Harvard-Marines-Harvard again-RAND CV might have suggested.

Returning to Harvard as a young academic, he wondered about spending most of his fellowship learning all the Beethoven piano sonatas rather than doing economics. Beyond music, Dan was still a performer in everything he did. He was much less of a geek than his Harvard and RAND colleagues. By most accounts, he was an arrogant, egotistical, flirtatious party animal. He knew how to get noticed in his academic work too. Ellsberg worked with Tom Schelling on applying game theory to nuclear strategy. Ellsberg provocatively titled one of his lectures ‘The Political Uses of Madness’ and argued that Hitler had been a successful blackmailer because he was ‘convincingly mad’.13 Ellsberg explained his ideas to Henry Kissinger, then at Harvard; Kissinger later advised Nixon, who boasted of his ‘madman theory’ of waging war in Vietnam.

Even those who found Ellsberg difficult praised his intellect. The most talented people felt privileged to work with him: Schelling regarded Ellsberg as ‘one of the brightest people I ever knew’. But Ellsberg often struggled to finish academic work, to commit it to paper. He was too easily distracted by new interests, academic and otherwise. He moved on from game theory to the infant field of mathematical decision theory, in which Jimmie Savage’s work encapsulated the state of the art. The emerging orthodoxy in decision theory began with some abstract mathematical assumptions (‘axioms’) about rationality. The theory then made deductions about how a rational decision-maker ought to choose in any decision context, given their initial beliefs. The logic can be applied in reverse: given some facts about how someone does choose, their beliefs can be deduced from those facts, assuming that they are ‘rational’ in the sense defined by the theory. So, when choosing between box A and box B in the experiment described by Ellsberg, the fact that someone chooses box B allows us to deduce that the person believes it contains more balls of the colour which qualifies for a prize.

Or not, as Ellsberg argued in his ground-breaking 1961 paper ‘Risk, Ambiguity and the Savage Axioms’, which introduced his experiment to the world.14 Ellsberg argued that when people are invited to choose a ball from a box containing an unknown mixture of red and black balls they do not attribute numerical likelihoods or invent their own personal probabilities for red and black. Their first response is to try to avoid this kind of pure uncertainty altogether – choose the other box, for which the mixture of reds and blacks is known. More generally in life, we try to avoid pure uncertainty: we are reluctant to make decisions if we have no idea of the relative likelihood of the different possible outcomes (unless of course the decision is a trivial one – in which case we don’t care much about the outcome).

However, from the beginning, orthodox decision theory did double duty. It claimed both to describe how people ought to choose and how they do in fact choose. The obvious possibility that these may differ was obscured by a circular use of the idea that people are rational: rational people do in fact choose in line with the theory – because the theory defines what counts as rational behaviour. This ruse was helpful in keeping the critics at bay (and many twenty-first-century economics textbooks remain evasive on the issue). Of course, insiders like Ellsberg and Savage were well aware that a theory of how people choose may not match their behaviour in reality; indeed, by 1961 Savage had already accepted that people often make choices which are inconsistent with the theory. The focus of the debate had moved on to whether the theory still provided a compelling account of how rational people ought to choose. So Ellsberg knew that if his experiment merely showed that ordinary people make choices inconsistent with the theory, the response from supporters of the Savage orthodoxy would be ‘So what?’

Therefore, rather than asking ordinary people, Ellsberg asked supporters of the Savage orthodoxy – academics and graduate students working on decision theory – to participate in his experiment. And when, as most did, they chose in a way inconsistent with the use of probabilities, Ellsberg explained their ‘error’ to them and asked if they wanted to change their minds. Most did not: they stuck with their original choices, violating Savage’s theory, even after the opportunity to reflect and reconsider. (According to Ellsberg, Savage himself was one such ‘unrepentant violator’ of his own theory, but no independent evidence has emerged to corroborate Ellsberg’s story.)15 The lesson was clear. It is absurd to insist that a rational decision-maker must invent probabilities in the face of uncertainty, if even supporters of that view do not follow it in practice when given full opportunity to reflect and reconsider.

Ellsberg’s strategy – getting supporters of orthodox decision theory to make choices inconsistent with it – was clever. But perhaps too clever. It gave the defenders of orthodoxy no excuse, no place to hide, no way to save face. Which left them just one option: ignore the Ellsberg Paradox altogether. Ellsberg himself made it easier for them to do so, because his mercurial mind had already moved on to other things. By the time ‘Risk, Ambiguity and the Savage Axioms’ was published, Ellsberg was a consultant to the US Defense Department and the White House. In 1961 he drafted the guidance to the Joint Chiefs of Staff on the operational plans for general nuclear war; the following year, unsurprisingly, saw him preoccupied with the Cuban Missile Crisis. As for promoting his ideas in academic circles, Ellsberg had effectively gone AWOL. He had, in one way or another, become obsessed with the Vietnam War. After working on plans to escalate US involvement in Vietnam, Ellsberg spent two years in Saigon, and then returned to RAND to work on a top-secret review of US decision-making regarding Vietnam. Then it happened.

Ellsberg switched from hawk to whistle-blower. His work at RAND had convinced him that the US Administration had expanded military operations in Vietnam without approval from Congress and had misled the public about its real intentions. Ellsberg spent many hours secretly making photocopies of the 7,000-page review (he took his thirteen-year-old son Robert along to help). In 1971, after failing to persuade senators on the Foreign Relations Committee to make the review public, Ellsberg sent it to nineteen newspapers and, after the publication of what became known as the ‘Pentagon Papers’, turned himself in. He faced a potential 115 years in jail on conspiracy charges. But, unintentionally, the Plumbers saved him.

The Plumbers were a motley crew of ex-CIA operatives who were friends of friends of Nixon and used criminal methods to try to get evidence against enemies of Nixon (such as breaking into the office of Ellsberg’s former psychoanalyst). Once it was clear that the case against Ellsberg was based on government gross misconduct and illegal evidence-gathering, the judge dismissed all charges against him. Ellsberg avoided jail, but had attained international notoriety, a 1970s precursor to Julian Assange and Edward Snowden. His new reputation provided another excuse, if decision theorists needed one, not to think about the Ellsberg Paradox for a while longer.

FIVE BLACK SWANS AND NOBEL PRIZES

In the meantime, the emerging orthodoxy combined bell-curve thinking and the efficient-market hypothesis to suggest that uncertainty in financial markets could be tamed or neutered altogether. In academia, this culminated in the award of five Nobel prizes to financial economists: three in 1990 (Harry Markowitz and two others building on his work), and two more in 1997, for Robert Merton and Myron Scholes. In financial markets, the culmination of the idea that uncertainty can be neutered came with the emergence of hedge funds claiming to do just that. Merton and Scholes practised what they preached and were senior managers of the hedge fund Long-Term Capital Management. LTCM made huge profits, but its approach ignored the possibility of black swans. When one came along in the shape of a major default and devaluation by the Russian government, LTCM went bust. That was 1998 – just a year after Merton and Scholes had won the Nobel Prize.

Alas, regarding their blindness to the problems of orthodox thinking about uncertainty, most financial economists and bankers are serial offenders. In recent history the orthodoxy has been challenged by Black Monday in 1987, the collapse of LTCM, the dot.com bubble and the financial crisis beginning in 2007 (to name the most obvious challenges). In each case the main response has been silence, or the argument that, since such events are black swans, no one could be expected to see them coming. There has been little admission that we need new thinking about uncertainty which explicitly incorporates the possibility of black swans and – since there is inherently no hope of predicting them – the importance of taking precautions to help cope with their unexpected arrival.

The biggest obstacle to overthrowing the old orthodoxy is our deep-seated reluctance to admit the limits of our knowledge in the face of uncertainty and our dogged faith in the quantification of uncertainty following from that reluctance. We have already noted the preference for bell-curve thinking over the hard-won yet meagre rewards of Mandelbrotian maths. More generally, the idea that we can reduce uncertainty to a single number, a probability, appeals to our desire for simplicity, security and stability. Once captured in a single number, uncertainty can seemingly be controlled. We can choose the amount of risk we will tolerate.

In parallel with the desire to control our destiny comes a set of beliefs suggesting that we can do so. Many people have a deep-rooted, barely conscious belief that there are stable patterns in history which will continue into the future. This connects to the way we understand most things in terms of narratives and stories. Understanding the future as a narrative evolving from the present is not just a way of literally ‘making sense’ of uncertainty, replacing doubt with explanation. It is cognitively easier too. The novelist E. M. Forster famously contrasted a simple succession of facts – ‘The king died and then the queen died’ – with a plot: ‘The king died, and then the queen died of grief.’ There is more information in this plot, yet it is no harder to remember: it is cognitively more efficient.16

However, there is a catch. Daniel Kahneman and Amos Tversky provided the first clear evidence. The Linda Problem remains one of their most famous experiments:

Linda is thirty-one years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

1. Linda is a bank teller.

2. Linda is a bank teller and active in the feminist movement.17

Most people choose option 2. But option 2 must be less probable than option 1, because option 1 is true both when Linda is active in the feminist movement and when she is not. Our tendency to impose narratives as a way of coping with lack of knowledge plays havoc with using basic laws of probability. Put another way, the combination of our reliance on narrative understandings and our desire to use probabilities to describe uncertainty is potentially disastrous. Still, we must be careful not to overstate the problem. Yes, ordinary folk reach for narratives to help cope with uncertainty, but experts reach for their tools, theories and computer models. Experts may fool themselves with optimistic bell-curve thinking, but at least they don’t fall for an error as basic as the Linda Problem.

Except that they do, as Kahneman and Tversky discovered when they ran similar experiments with doctors and other trained experts. And there is clear evidence that expert decision-makers are reluctant to abandon another narrative – the optimistic orthodoxy about decision-making under uncertainty described in this chapter, the one beginning with von Neumann’s jottings on the back of an envelope, then proceeding via Savage through a series of improvements and applications leading to Nobel prizes and other glory.

Here is celebrated former Chairman of the US Federal Reserve Alan Greenspan in testimony to the US Congress in October 2008 after the financial crisis had hit:

In recent decades, a vast risk-management and pricing system has evolved … A Nobel Prize was awarded … This modern risk-management paradigm held sway for decades. The whole intellectual edifice, however, collapsed in the summer of last year …

So far so good.

… because the data inputted into the risk-management models generally covered only the past two decades, a period of euphoria.18

So, for Greenspan, the validity of bell-curve thinking and the quantification of uncertainty remains unquestioned, and perhaps unquestionable. All that went wrong was that we used only twenty years of data. This fundamentally misunderstands the problem. Black swans never come along often enough to estimate their probability reliably from past data. Whether in the financial markets or elsewhere, the probability of rare events cannot be estimated from past frequencies precisely because they are rare. And, logically, unforeseen events must be unforeseen in advance, so their probability cannot be estimated. Surprises must be surprising.

The rarity of black swans provides another reason why we act as though they don’t exist when attempting to measure uncertainty using probabilities. People who ignore black swans and use bell-curve thinking can seem to manage uncertainty successfully for a long period. And, in the meantime, they are likely to be well rewarded for this apparent success. In the long run, a black swan will appear, but then again, as Keynes put it, ‘in the long run we are all dead.’ Or retired with a healthy pension.

Yet this argument might seem too glib. Why do people – such as bankers who manage risks – get so well rewarded if their reasoning is flawed? We have already mentioned some reasons: the orthodoxy, from Nobel Prize-winners down, has become so dominant that alternative views have barely been heard, and these alternatives don’t offer the comforting illusion of taming uncertainty but instead a blunt description of the limits of our knowledge. There is a less obvious reason too. People are usually paid according to their performance rather than the reasoning behind it. And, in many sectors, such as finance, performance is wholly relative. If your rivals, against whom your performance is judged, are all using the same orthodox methods for ‘risk management’, then you will cover your back by doing so too. As the saying in the tech sector once went: nobody ever got fired for buying an IBM. Making the orthodox choice is safe. If things go wrong, your rivals will have fared no better, and you can all say that you relied on standard scientific theories and models. You save face. Fund managers call it ‘benchmarking against the market’; Keynes called it following the herd.

In contrast, abandoning the orthodoxy means making your own judgements – and being responsible if things go badly or your rivals do better. You are, as the UK’s chief financial regulator put it, ‘in a much more worrying space because you don’t have an intellectual system to refer each of your decisions’.19

Economists and other defenders of the orthodoxy have given decision-makers a licence to avoid making judgements, a licence to abrogate responsibility to the decision theory. Yet this tantalizing power of decision theory to do away with making judgements about the future rests on a fundamental error. Perhaps because it is a philosophical rather than mathematical error, the defenders of the orthodoxy do not seem to have noticed. A theory of decision-making using personal probabilities cannot describe how you ought to choose, because when you remove the mathematical wrapper a personal probability is just an expression of how likely you believe something will be. And your beliefs cannot determine what you ought to choose, because they can be mistaken. I might choose to continue as a heavy smoker because I believe that smoking does not increase my risk of cancer. But we can’t say that I ought to continue smoking, or that my decision to continue smoking is justified, based on my belief that smoking does not increase the risk of cancer – because that belief is false.20 A theory which prescribes what we should choose, justifying particular choices in the face of uncertainty – which is exactly what the modern orthodoxy claims to do – needs objective standards, facts external to the decision-maker. It cannot be based just on their subjective beliefs.

Frank Ramsey, the pioneer of personal probabilities, knew this. Unlike Savage, Ramsey did not argue that personal probabilities can tell us what someone ought to choose. Ramsey saw personal probabilities as a conceptual device to explain the choices that people do make. (And Ellsberg’s experiment subsequently cast doubt on even that.)

It was Savage and his followers who went further, giving beliefs and opinions, once represented as personal probabilities, the same status as objective facts. This brings us to the hidden arrogance of our modern orthodoxy in thinking about uncertainty. It leads us into an elaborate exercise in self-justification. The maths conceals the underlying arrogance: we justify our choices by reference to our own beliefs. But believing something does not make it true. We ignore the facts – and especially the facts about our lack of knowledge of the future. It is our contemporary Greek tragedy. We act with hubris, excessive self-confidence in defiance of the gods of chance. Hubris always leads to nemesis. And sometimes the stakes are much higher than they are in a global financial crisis.

THE MAKING OF A NUMBER

Although there is fierce debate over the details there is a strong scientific consensus that if we continue on our current path of carbon emissions into the atmosphere we are heading for a planet on average around 4°C warmer. (This average temperature rise conceals a complex picture of different dimensions of climate change – more extreme temperatures, floods, droughts, desertification, hurricanes, storm surges, and so on.) But how much does it matter?

On 30 October 2006, probably the most influential answer to this question to date was provided by a report for the UK government, the ‘Stern Review on the Economics of Climate Change’. The Stern Review captured headlines worldwide with its claim that doing nothing about climate change would cost between 5 per cent and 20 per cent of global GDP, every year. Economist Nick Stern’s conclusion was based on an elaborate economic model, and the review stimulated an explosion in similar modelling research by other economists, leading to the current consensus: the damage associated with 4°C warming will be around 5 per cent of global GDP per year. Five per cent of global GDP is still a big number, and a convenient figure which politicians, business leaders and others with the power to make a difference can retain in their minds. The difference between 5 per cent and 20 per cent arises from differences in the incredible array of assumptions, simplifications and omissions needed to get to these neat numbers.

It is hard to know where to begin. To get to a higher number like 20 per cent you need to consider risks posed by climate change that are less well understood by science, and so use more guesswork in putting a money value on the damage such risks might do – and a money value is of course what is needed to add up all damages and express them as a percentage of GDP. On the other hand, to get to a lower number like 5 per cent you need more naïve or optimistic assumptions about the extent of climate change, and you need to ignore some of its possible impacts. Here is a far from comprehensive list of potential damages that are ignored in reaching the 5 per cent figure: thawing of the Arctic permafrost, release of methane, air pollution from burning fossil fuels, variations in sea-level rise leading to inundation of small island states and coastal communities, and conflict resulting from large-scale migration to escape the worst-affected areas.

As for the impacts of climate change that are included in reaching a single magic number (whether 5 per cent or 20 per cent), few of them are absolutely certain. So to reach a total value for the damage caused by climate change, the money value of each uncertain impact must be multiplied by the probability of the impact occurring. Unfortunately, for many of the most important impacts there is pure uncertainty. We don’t know their probability; we don’t even know with any confidence the range of plausible probabilities. We simply do not know.

Reliable numbers are equally hard to find when we turn to determining the money value of every impact of climate change. Take perhaps the most important impact of all – the premature, probably grim deaths of millions of people. In 2002 the World Health Organization suggested that around 150,000 deaths annually might be caused by a temperature rise of less than 1°C (due to more heatwaves, malaria and dehydration, among other reasons). Since the harm to human health increases disproportionately as the temperature rises, a 4°C rise would probably cause more than half a million premature deaths per year.21

As we saw in Chapter 6, Tom Schelling introduced to economics a controversial method for putting a money value on (preventing) premature death. This value is derived from how much money people need to be paid to tolerate increased risk. The calculations are usually based on data comparing wages in more-or-less risky but otherwise identical jobs. But whatever the source of the data, we will find that how much money people accept to tolerate increased risk and how much they are willing to spend to avoid it depends on their income. Poor people tolerate risky jobs more readily than rich people, because they need the wage. They spend less on avoiding risk because they have less to spend. Thus Schelling’s method gives a lower value to life saving (preventing premature death) in a poor society than in a rich one. That is, the value of a ‘statistical life’ is less in a poor country than in a rich one. This isn’t just an intellectual game. It led directly to the influential Intergovernmental Panel on Climate Change, in its 1995 report, valuing lives lost in rich countries at $1,500,000 but those in poor countries at $100,000.22

Most economists recognize that valuing some lives more highly than others is widely seen as unacceptable. But another approach they adopt instead is no better: many economic models simply ignore potential loss of life from climate change altogether by assuming a fixed global population. Assuming away half a million premature deaths a year is a stunning move, yet there is an even more powerful assumption hidden in the economists’ calculations, the one wielding the biggest influence on the final numbers such as ‘5 per cent of GDP’.

This assumption is the ‘discount rate’ used to convert future costs and benefits of climate change into numbers comparable with current costs and benefits. Discounting means that the future impacts are discounted – given a reduced value in the calculations – compared to identical impacts now. And the further into the future some impact will occur, the more it is discounted, with a compounding effect of great power over long periods: in standard economic models with standard discount rates total global GDP in 200 years is discounted to be equivalent to around $4 billion dollars in current terms (the GDP of Togo, or just 2.5 per cent of the fortune of Jeff Bezos, the Amazon founder and world’s richest person in 2019). The upshot is that if we are trying to decide how much it is worth spending to prevent the destruction of the earth 200 years from now – on the basis of lost future GDP, which is just what economic models try to do – the answer would be no more than a minor dent in Jeff Bezos’s fortune. At least when applied over long time horizons, discounting seems absurd because it trivializes catastrophe in this way.

The history of discounting takes us back to the prodigious Frank Ramsey. He was the first person to pin down two key arguments for discounting in mathematical terms, by means of an elegantly concise formula, the Ramsey Rule, which remains the inspiration for the way in which future impacts of climate change are discounted in economic models today. Ramsey’s first argument is breathtakingly direct: so-called ‘pure discounting’, the basic assertion that lives in the future are less important than lives now. If you think this sounds like straightforward discrimination against future generations, most philosophers, religions and ethical codes agree with you. Ramsey may have described this justification for discounting mathematically, but he argued that it was ethically indefensible (and he would surely have also recognized the absurdity of discounting applied over 200-year periods). Alas, while most climate economists nowadays revere Ramsey’s mathematical models, they ignore his warnings about the immorality of misusing them. It is no exaggeration to say that most economic models of the impact of climate change discriminate indefensibly against future generations through the use of ‘pure discounting’ of future impacts.

Ramsey’s second argument for discounting is subtler. If, on average, people will be much wealthier in the future, then money will matter a bit less to them: an extra dollar is less valuable the richer you are. It follows that climate-change impacts in the future will matter less too, compared to current impacts of the same monetary value: the same monetary value will be worth less in the future. So we should reduce the weight attached to future impacts in our calculations. This is a better argument than justifying discounting through discrimination – but not much. It recklessly assumes that economic growth will continue much as it has done in the past, so that future generations will be meaningfully richer than us. It ignores the likelihood, taken seriously not just by environmentalists but also by the entire insurance industry, that climate change will disrupt economic activity enough to make business-as-usual economic growth rates no more than a distant memory. Ignoring the threat posed by climate change to economic growth in this way is an astonishing omission in an economic appraisal of the impact of climate change. At best it can be explained because it makes the maths more manageable and links the models more closely to Ramsey’s original analysis, which economists have grown up with from textbooks. But yet again the ultimate effect of this omission is a misleading reduction in the cost of climate change expressed as a proportion of global GDP. If an economist wanted to rig the economic appraisal in a way hidden from democratic scrutiny by non-economists, technical shenanigans with the discount rate would be a good way to do it.

But even if there might be a few villains among mainstream economists here, there are many more who are painfully aware of the limitations of a model that boils down all the scientific, economic, political and social uncertainties and dimensions to a single percentage of GDP. After the publication of the Stern Review, Nick Stern realized that almost all the uncertainties in the models bias them in the same direction: ‘Grafting Gross Underestimation of Risk onto Already Narrow Science Models’ is how he labels the situation in a leading economics journal. And Stern is exasperated by the attention given to the percentage-of-GDP numbers. He points out that the models generating these numbers comprised just 30 out of 692 pages in the Stern Review: the rest was devoted to other ways of thinking about climate change. Nevertheless, Stern and most climate economists remain loyal to these models, arguing that global decisions about climate policy should be based on improved models spewing out bigger numbers for the cost to global GDP caused by climate change.

For anyone hoping for significant action to try to reduce climate change, the economists’ strategy is a worrying one, given their continuing dominance over policy-making. The reaction to the Stern Review shows that once a percentage-of-GDP number is mentioned it will drown out all other ways of talking and thinking about the damage caused by climate change. If you don’t want politicians and the media to focus solely on one easy-to-remember number, don’t give them one. And using a new model to get a bigger number does not help.

Because in the end no one cares much.

That unchecked climate change might cost 5 per cent of global GDP is already a much bigger number than it might seem. Both the First World War and the Second World War made barely a dent in global GDP (overall, the Second World War probably increased global GDP compared to what would have happened otherwise).23 In comparison, a 5 per cent fall is significant. But the real message of this comparison is that the death of tens of millions of people – the Holocaust and human suffering on an unprecedented scale – went largely undetected in measures of GDP. Clearly, we cannot assume that the impact of global catastrophe will be reflected in measures of GDP, whether these catastrophes are wars or climate change.

If the orthodoxy on the cost of 4°C warming changed from 5 per cent to 20 per cent of global GDP, it is far from clear that this would have much effect on the global debate over climate change: perhaps most people have a gut understanding of the limitations of GDP numbers when contemplating huge global changes. Despite the seeming dominance of economic language in politics, ultimately, the numbers don’t matter much.

Still, if we abandon the orthodoxy, we face the same problem faced by the followers of Keynes insisting, when faced with pure uncertainty, that ‘we simply do not know’. How to decide instead? If not an overarching economic reckoning of the costs of climate change set against the costs of trying to do something to mitigate it, all measured in terms of money, then what?

SOME NUMBER IS BETTER THAN NO NUMBER?

Some risks have two characteristics that make them much harder to deal with. First, pure uncertainty. Second, the possibility of catastrophe – not just a bad outcome but different-in-kind bad, often featuring irreversible damage or loss, or complete collapse of the underlying system, organization or framework within which decisions are taken. Arguably, both climate change and global financial crises share these two characteristics. With such risks, traditional thinking in economics – which advocates precisely maximizing benefits-minus-costs and the aggressively efficient use of means in pursuit of ends – is reckless. Instead, the overriding priority must be avoiding catastrophe: we need resilience and security rather than maximization and efficiency. In nature, this priority is often pursued through the opposite of efficiency, namely redundancy – for example, humans having two kidneys rather than one. In politics and law this way of thinking is associated with the ‘precautionary principle’. In broad terms, the principle recommends approaches to practical decision-making which focus on resilience, security and taking precautions.24 The emphasis is on being aware of what we don’t know and the likelihood that the future will be surprising (who could have guessed that CFCs, chemicals discovered to be ideal for cooling machines like fridges, would turn out to cause global warming?). These alternative ways of thinking about risk reject the dominant orthodoxy of putting a price tag and a probability on every possible future impact of every choice. Instead they advocate focusing much more on a careful, detailed, multidimensional appraisal of the possible impacts, and much less on trying to guess their probabilities. We simply do not know these probabilities and, even if we did, that knowledge is of little use without a good understanding of what we have the probability of. Unlike the Governor of the Bank of England, many people in the summer of 1914 correctly forecast that a war was likely. But it hardly mattered, because no one realized that the war would be on a hitherto unimaginable scale.

Put another way, we cannot plan to reduce the risk of catastrophe without agreement on what counts as a catastrophe. We are forced to ask, ‘What matters most to us?’ And also in the case of climate change, ‘What will matter most to future generations?’ These questions are clearly beyond the scope of any science of society. We must move beyond money as a crude measure of goodness or badness. And technical economics is of no help in thinking through our obligations to future generations. Just the name of the theory of ‘discounting the future’ is enough to tell us what it does. Baroque discounting calculations are not needed; they serve only to conceal our disregard for the future.

Over the past fifty years or so economists and economic ideas have been central to the trend towards quantification of all risks and values. This trend formed part of the rebranding of economics as a neutral science akin to physics rather than a mode of analysis based on political and ethical assumptions. The Social Science Research Building at the University of Chicago has the following inscription (attributed to the physicist Lord Kelvin but not in fact an actual quotation) carved on its façade: ‘When you cannot measure, your knowledge is meager and unsatisfactory.’ Economists have taken this to mean: yes, the numbers have their flaws, but some number is better than no number. Frank Knight, the most un-Chicago of Chicago economists in his emphasis on pure, unmeasurable uncertainty, was less optimistic. He saw the inscription as licensing economists to conclude ‘Oh well, if you cannot measure, measure anyhow.’25

One simple problem with the ‘some number is better than no number’ mantra is our inability to ignore irrelevant numbers. Kahneman and Tversky’s ‘anchoring effect’ describes how people, including experienced decision-makers, are influenced by irrelevant starting points or anchors, including numbers they know to be irrelevant. For instance, German judges who rolled dice before sentencing issued longer sentences after rolling high numbers.26

And economic numbers – probabilities and money values – bring their own particular problems, so that, again, some number can be worse than no number. Some things which matter get ignored because they are difficult or impossible to quantify. As Einstein allegedly said: not everything that can be counted counts, and not everything that counts can be counted. We have seen examples through this chapter and others (such as RAND ignoring the loss of pilots’ lives because its analysts couldn’t agree how to put a dollar value on them). Worse still, once the number has been produced we often forget what was ignored in the process of producing it – and so we forget that the number may be systematically biased. One example is the GDP estimates of the cost of climate change. A less obvious kind of bias arises when we ignore the risk of bad things happening (avoiding such risks is worthwhile in advance, even if the risks don’t materialize). Bad things which didn’t happen get ignored because they are much harder to quantify than good things which did. This is one underlying reason why, in investment banks, risk-taking traders are paid much more than those in the bank responsible for risk control or compliance with regulations designed to limit risk-taking. The insistence on quantification supports a bias towards recklessness.

Another danger of obsessive quantification: the process of producing the number distorts or misrepresents the concept we are trying to measure. Ultimately, the original concept can disappear altogether, being redefined in terms of the number. Beginning with Harry Markowitz’s work, economists and financial types have quietly redefined risk as volatility with almost no one noticing – even though this redefinition affects everyday life in countless ways, from pensions to insurance. This is just one example of the transformative power of economic numbers. Introducing them into a decision or debate is not a neutral step, let alone always an improvement yielding greater precision. Instead, economic numbers are all too often anti-democratic, because they obscure the ethical and political issues behind a technical fog. And as we have seen, regarding the future, economic numbers bias decisions towards recklessness and ignoring the interests of future generations.

If there is to be a shift away from trusting in such numbers, back towards trusting in our own judgements, it will be made piecemeal, from the bottom up, by individual decision-makers. There are some small reasons for optimism.

First, the orthodoxy in decision theory has always claimed the scientific high ground, pointing to its sophisticated mathematical underpinnings. But recent developments at the cutting edge of decision theory have begun to provide mathematical respectability for alternative principles (such as the precautionary principle), so very smart people may start to take them seriously. Second, while making judgements involves taking responsibility, at least it gives us something to do. Nowadays, a computer doesn’t just perform calculations. With artificial intelligence it can tweak and refine the mathematical model it is using along the way. As artificial intelligence proliferates and humans increasingly contemplate their redundancy, confronting pure uncertainty and ethical dilemmas with cautious, tentative but uniquely human qualitative judgement begins to look appealing after all.