Six
Preventing financial meltdowns or: Decoupling

‘We have involved ourselves in a colossal muddle, having blundered in the control of a delicate machine, the working of which we do not understand.’

– John Maynard Keynes

‘Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius – and a lot of courage – to move in the opposite direction.’

– attributed to E.F. Schumacher

1 When failure is unthinkable

On the morning of 6 July 1988, maintenance workers on Piper Alpha, the largest and oldest oil and gas rig in the North Sea, dismantled a backup pump to check a safety valve. The work dragged on all day and the workers stopped work in the early evening, sealing the tube off and filling out a permit noting that the pump was unusable. An engineer left the permit in the control room but it was busy and there were constant interruptions. Later in the evening, the primary pump failed and – pressed for time, not knowing about the maintenance, and unable to find any reason why the backup pump should not be used – the rig’s operators started up the half-dismantled pump. Gas leaked out, caught fire and exploded.

The explosion, serious in itself, was compounded by several other failures. Normally, a gas rig such as Piper Alpha would have blast walls to contain explosions, but Piper Alpha had originally been designed to pump oil, which is flammable but rarely explosive. The retrofitted design also placed hazards too close to the rig’s control room, which the explosion immediately disabled. Fire-fighting pumps, which were designed to draw in huge volumes of sea water, did not automatically start, because of a safety measure designed to protect divers from being sucked into the pump inlet. The safety system could have been overridden from the control room, but the control room had been destroyed. This also meant no evacuation could be coordinated, so platform workers retreated to the rig’s accommodation block.

Two nearby rigs continued to pump oil and gas towards the blazing Piper Alpha, their operators watching the inferno but fretting that they lacked authority to make the expensive decision to shut down production. It might have made little difference anyway, given the presence of so much high-pressure gas in the supply lines. When this gas exploded, a fireball half the height of the Eiffel Tower engulfed the platform. The blast even killed two rescuers in a nearby boat, along with rig crewmen whom they had hauled from the water. Other pipelines ruptured in the heat, feeding the fire and driving away another fire-fighting rescue boat. It was impossible to approach the rig, and less than two hours after the initial explosion, the entire accommodation block slid off the melting platform into the sea. One hundred and sixty-seven men died. Many of the fifty-nine survivors had leapt ten storeys into deathly cold waves. The rig burned for three more weeks, wilting like old flowers in a betrayal of mass, steel and engineering.

Industrial safety experts pored over what had gone wrong with Piper Alpha and learned lessons for preventing future tragedies. But fewer lessons seem to have been learned from a related accident: a meltdown in the financial markets which was triggered by Piper Alpha’s destruction. This was the ‘LMX spiral’, and it nearly destroyed the venerable insurance market Lloyd’s.

Insurers often sign contracts in which one insurer agrees to cover another insurer’s extraordinary losses on a particular claim. These ‘reinsurance’ contracts have a sound business logic and a long history. Yet in the Lloyd’s market, where different insurance syndicates traded risk with each other, reinsurers had begun to insure the total losses of other insurers, rather than losses on a single claim. The subtle distinction proved important. The reinsurance contracts pulled losses from one syndicate to a second, then a third – and perhaps then from the third back to the first. Insurance syndicates could and did find that, through a circle of intermediaries, they were their own reinsurers.

The spiral was coiled and ready to unwind when Piper Alpha was destroyed. The insurance syndicates who traded on Lloyd’s were hit with an initial bill for about a billion dollars, one of the largest single claims in history. But then some reinsurance claims were triggered, and others, and then others in a chain reaction. The eventual total of claims resulting from the billion-dollar loss was $16 billion. Some hapless insurance syndicates discovered that they had insured Piper Alpha many times over. Parts of the spiral are still being unwound over two decades later.

If this sounds familiar, it should. Within the first few days of the credit crunch in 2007, long before most people were aware of the scale of the trouble, the economist John Kay was pointing out the similarities between the crunch and the LMX spiral. As in the credit crunch, financial institutions and regulators told themselves that sophisticated new financial tools were diluting risk by spreading it to those best able to cope. As in the credit crunch, historical data suggested that the packaged reinsurance contracts were very safe. And as in the credit crunch, the participants found the true shape of the risk they were taking almost impossible to discern until after things had gone horribly wrong. In both cases, innovative financial techniques proved to be expensive failures.

So far, this book has argued that failure is both necessary and useful. Progress comes from lots of experiments, many of which will fail, and we must be much more tolerant of failure if we are to learn from it. But the financial crisis showed that a tolerant attitude to failure is a dangerous tactic for the banking system. So what happens when we cannot allow ourselves the luxury of making mistakes, because mistakes have catastrophic consequences?

As I studied the LMX spiral, in the hope of discovering something that would prevent future financial crises, I realised that I was missing a hidden, yet vital, parallel. It was the horror of Piper Alpha’s destruction itself, rather than the financial meltdown that followed it, which could tell us more about financial accidents. If we want to learn about dealing with systems that have little room for trial and error, then gas rigs, chemical refineries, and nuclear plants are the place to start.

2 ‘Banking exceeds the complexity of any nuclear plant I ever studied’

The connection between banks and nuclear reactors is not clear to most bankers, or to banking regulators. But to the men and women who study industrial accidents such as Three Mile Island, Piper Alpha, Bhopal or the Challenger shuttle – engineers, psychologists and even sociologists – the connection is obvious. James Reason, a psychologist who has spent a lifetime studying human error in aviation, medicine, shipping and industry, uses the downfall of Barings Bank as a favourite case study. Barings was London’s oldest merchant bank when, in 1995, it collapsed after more than 300 years’ trading. One of its employees, Nick Leeson, had lost vast sums making unauthorised bets with the bank’s capital. He destroyed the bank single-handedly, assisted only by the gaps in Barings Bank’s supervision of him.

‘I used to speak to bankers about risk and accidents and they thought I was talking about people banging their shins,’ James Reason told me. ‘Then they discovered what a risk is. It came with the name of Nick Leeson.’

Another catastrophe expert who has no doubt about the parallel is Charles Perrow, emeritus professor of sociology at Yale. He is convinced that bankers and banking regulators could and should have been paying attention to ideas in safety engineering and safety psychology. Perrow made his name by publishing a book, Normal Accidents, after Three Mile Island and before Chernobyl. The book explored the dynamics of disasters and argued that in a certain kind of system, accidents were inevitable – or ‘normal’.

For Perrow, the dangerous combination is a system that is both complex and ‘tightly coupled’. The defining characteristic of a tightly coupled process is that once it starts, it’s difficult or impossible to stop: a domino-toppling display is not especially complex, but it is tightly coupled. So is a loaf of bread rising in the oven. Harvard University, on the other hand, is not especially tightly coupled, but is complex. A change in US student visa policy; or a new government scheme to fund research; or the appearance of a fashionable book in economics, or physics, or anthropology; or an internecine academic row – all could have unpredictable consequences for Harvard and trigger a range of unexpected responses, but none will spiral out of control quickly enough to destroy the university altogether.

So far, this book has looked at complex but loosely coupled systems, like Harvard. The sheer complexity of such systems means that failures are part of life, and the art of success is to fail productively.

But what if a system is both complex and tightly coupled? Complexity means there are many different ways for things to go wrong. Tight coupling means the unintended consequences proliferate so quickly that it is impossible to adapt to the failure or to try something different. On Piper Alpha, the initial explosion need not have destroyed the rig, but it took out the control room, making an evacuation difficult, and also making it impossible to override the diver-safety catch that was preventing the seawater pumps from starting automatically. Although the rig’s crew had, in principle, shut down the flow of oil and gas to the platform, so much pipework had been damaged that gas and oil continued to leak out and feed the inferno. Each interaction was unexpected. Many happened within minutes of the initial mistake. There was no time to react.

For men like James Reason and Charles Perrow, such disasters need to be studied not just for their own sakes, but because they offer us vital lessons about the unexpected traps that lie in wait in complex and tightly coupled systems – and the psychological and organisational factors that can help to prevent us from falling into them. Few human inventions are more complex and tightly coupled than the banking system; Charles Perrow says it ‘exceeds the complexity of any nuclear plant I ever studied’. So if the bankers and their regulators did start paying attention to the unglamorous insights of industrial safety experts, what might they learn?

3 Why safety systems bite back

Among the bitter recriminations over the financial crisis of 2008, if there’s consensus about anything it’s that the financial system needs to be made safer. Rules must be introduced, one way or another, to prevent banks from collapsing in future.

It might seem obvious that the way to make a complex system safer is to install some safety measures. James Reason is celebrated in safety-engineering circles for the ‘Swiss cheese model’ of accidents. Imagine a series of safety systems as a stack of Emmental slices. Just as each piece of cheese has holes, each safety device has flaws. But add enough pieces of cheese and you can be fairly sure that the holes will never line up with each other. The natural temptation is thus to layer more and more Emmental onto the financial system – but unfortunately, it’s not quite so straightforward. As safety experts like Reason are only too well aware, every additional safety measure also has the potential to introduce an unexpected new way for something to go wrong.

Galileo described an early example of this principle in 1638. Masons at the time would store stone columns horizontally, raised above the soil by two piles of stone. The columns often cracked in the middle under their own weight. The ‘solution’ was to reinforce the support with a third pile of stone in the centre. But that didn’t help. The two end supports would often settle a little, and the column, balanced like a see-saw on the central pile, would then snap as the ends sagged.

The Piper Alpha disaster is another example: it began because a maintenance operation crashed into rules designed to prevent engineers working long tiring shifts, and it was aggravated by the safety device designed to prevent divers being sucked into the seawater pumps. At the Fermi nuclear reactor near Detroit in 1966, a partial meltdown put the lives of 65,000 people at risk. Several weeks after the plant was shut down, the reactor vessel had cooled enough to identify the culprit: a zirconium filter the size of a crushed beer can, which had been dislodged by a surge of coolant in the reactor core and then blocked the circulation of the coolant. The filter had been installed at the last moment for safety reasons, at the express request of the Nuclear Regulatory Commission.

The problem in all of these cases is that the safety system introduced what an engineer would call a new ‘failure mode’ – a new way for things to go wrong. And that was precisely the problem in the financial crisis: not that it had no safety systems, but that the safety systems it did have made the problems worse.

Consider the credit default swap, or CDS – a three-letter acronym with a starring role in the crisis. Credit default swaps are a kind of insurance against a loan not being repaid. The first CDS was agreed between JP Morgan and a government-sponsored development bank, the European Bank for Reconstruction and Development, in 1994. JP Morgan paid fees to the EBRD, and in exchange the EBRD agreed to make good any losses in the almost unimaginable event that the oil giant Exxon defaulted on a possible $4.8 billion loan. In a narrow sense, it was a sensible deal: the EBRD had idle cash and was seeking some low-risk income, while JP Morgan had plenty of useful things it could do with its own funds, but banking regulations dictated that it must set aside nearly half a billion dollars just in case there was a problem with the Exxon loan. The CDS deal offloaded the risk to the EBRD, liberating JP Morgan’s cash. It did so with the explicit permission of the regulators, who felt that this was a safe way of managing risk.

There were two ways in which these credit default swaps led to trouble. The first is simply that having insured some of their gambles, the banks felt confident in raising the stakes. Regulators approved; so did the credit-rating agencies responsible for evaluating these risks; so did most bank shareholders. John Lanchester, a chronicler of the crisis, quips, ‘It’s as if people used the invention of seatbelts as an opportunity to take up drunk-driving.’ Quite so – and in fact there is evidence that seatbelts and airbags do indeed encourage drivers to behave more dangerously. Psychologists call this ‘risk compensation’. The entire point of the CDS was to create a margin of safety that would let banks take more risks. As with safety belts and dangerous drivers, innocent bystanders were among the casualties.

The subtler way in which credit default swaps helped cause the crisis was by introducing new and unexpected ways for things to go wrong – just as with Galileo’s columns or the zirconium filter at the Fermi reactor. The CDS contracts increased both the complexity and the tight coupling of the financial system. Institutions that hadn’t previously been connected turned out to be bound together, and new chains of cause and effect emerged that nobody had anticipated.

The bond insurance business is a case in point.^* As the banks cranked out complex new mortgage-related bonds, they turned to insurance companies called ‘monolines’, and huge general insurers such as AIG, to provide insurance using credit default swaps. This seemed to make sense for both sides: for the insurers, it was profitable and seemed extremely safe, while investors enjoyed the security of being backed by rock-solid insurance companies.

But as we saw with the LMX spiral, even insurance, the quintessential safety system, can create unexpected risks. The hidden danger came through ‘credit ratings’, which are a measure of a bond’s risk devised by companies called rating agencies. If a bond was insured, it simply inherited the credit rating of the insurer. Insurance companies such as AIG, of course, had very high credit ratings, so even a risky bond could acquire an excellent credit rating if it was insured by AIG.

Unfortunately, this process also works in reverse. If an insurance company has mistakenly insured too many risky bonds, it will find itself flirting with bankruptcy, and so it will lose its high credit rating – precisely what happened to AIG and the monoline insurers. And as its rating is downgraded, so is the rating of all the bonds it has insured. As large numbers of bonds were downgraded in unison, banks were legally forced to sell them in unison by sensible-seeming regulations forbidding banks to hold too many risky bonds. It doesn’t take a financial wizard to see that the combination of safety system and safety regulation produced a recipe for a price collapse.

The consequence of all that is that a bank could avoid all the major sources of financial trouble – such as the subprime mortgage market – and still be pushed into bankruptcy. The bank would be quietly holding a sensible portfolio of medium-risk bonds, insured by an insurance company. The insurance company itself would get into trouble because it had insured subprime mortgage products, and the bank’s portfolio would have its credit rating downgraded not because the quality of the portfolio changed, but because its insurer was in trouble. The bank would be legally obliged to sell its assets at the same time as other banks were doing the same. It was like a mountaineer, cautiously scaling a cliff while roped to a reckless team, and suddenly finding himself pulled into the abyss by his own safety harness. The insurance companies and their web of credit default swaps acted as the rope.

Rather than reducing risk, credit default swaps instead contrived to magnify it and make it pop up in an unexpected place. The same thing was true of other financial safety systems – for instance the infamous collateralised debt obligations, or CDOs, which repackaged financial flows from risky ‘subprime’ mortgages. The aim was to parcel out the risk into well-understood slices, some extremely risky and some extremely safe. The result, instead, was to magnify certain risks almost beyond imagination – twice the losses on the underlying mortgages would be squared by the repackaging process once, twice, or more, to turn into losses that were 4, 16, 256 or even 65,000 times greater than expected. (These numbers are illustrative rather than precise, but the illustration is a fair portrait of the CDOs.) In both cases, the safety systems made investors and banks careless – and more fundamentally, they transformed small problems into catastrophes. Industrial safety experts – if anyone had asked – could have warned that such unexpected consequences are common.

Better designed safety measures might work differently, of course, but experience from industrial disasters suggests that it’s harder than it looks to develop safety measures that don’t bite back. So if a Rube Goldbergesque accretion of one safety system after another is not the solution either to industrial or financial catastrophes, what is?

4 ‘The people who were operating the plant were absolutely, completely, lost’

The 1979 crisis at Three Mile Island remains the closest the American nuclear industry has come to a major disaster. It started when engineers trying to clear a blocked filter accidentally let a cupful of water leak into the wrong system. The leak – harmless in its own right – triggered an automatic safety device that shut down the main pumps which circulated water through the heat exchanger, steam turbines and cooling towers. The reactor now needed to be cooled in some other way. What followed was a classic example of one of Charles Perrow’s system accidents, with individually recoverable errors snowballing.

Two backup pumps should have started to inject cold water into the reactor vessel, but valves in both pipes had been mistakenly left closed after maintenance. Warning lights should have alerted operators to the closed valves, but they were obscured by a paper repair tag hanging from a switch. As the reactor began to overheat, a relief valve – like on a pressure cooker – automatically popped open. When the pressure fell back to optimal level, it should have popped shut again. But it jammed open, causing the reactor to depressurise to dangerous levels.

If operators had realised the valve was jammed open, they could have shut another valve further down the pipe. But the control panel seemed to show that the valve had closed as normal. In fact, the panel merely showed that a signal had been sent to close the valve as normal, not that the valve had responded. As they struggled to make sense of what was going on, the supervisor figured out that there was a chance that the relief valve might be open. So he asked one of the engineers to check the temperature reading. The engineer reported all was normal – because he had looked at the wrong gauge.

This was a serious error, but understandable in its context. A cacophony of over a hundred alarms provided the backdrop to these confused discussions. The control panels were baffling: they displayed almost 750 lights, each with a letter code, some near the relevant flip switch and some far. Some were above and some below. Red lights indicated open valves or active equipment; green indicated closed valves or inactive equipment. But since some of the lights were typically green and others were normally red, it was impossible even for highly trained operators to scan the winking mass of lights and quickly spot trouble.

At 6.20 in the morning, the arrival of a new shift finally brought fresh eyes and the realisation that superheated coolant had been gushing out of the depressurised reactor for over two hours. The new shift successfully brought the situation under control – not before 32,000 gallons of highly contaminated coolant had escaped, but in time to avert complete meltdown. With better indicators of what was happening, the accident could have been much more swiftly contained.

I asked the head of nuclear installation safety at the International Atomic Energy Agency, Philippe Jamet, what we had learned from Three Mile Island. ‘When you look at the way the accident happened, the people who were operating the plant were absolutely, completely, lost,’ he replied.

Jamet says that since Three Mile Island, much attention has been lavished on the problem of telling the operators what they need to know in a format they can understand. The aim is to ensure that never again will operators have to try to control a misfiring reactor core against the sound of a hundred alarms and in the face of a thousand tiny winking indicator lights.

The lesson is apparent at Hinkley Point B, an ageing power plant overlooking the Bristol Channel in southwest England. The site was once designed to welcome visiting school children, but is now defended against terrorists by a maze of checkpoints and perimeter fencing. At the heart of the site, which I visited on a mizzling unseasonable day in late July, looms a vast grey slab of a building containing a pair of nuclear reactors. A short distance away is a low-rise office that would have looked at home in any suburban business park. At the heart of that office is the simulator: a near perfect replica of Hinkley Point B’s control room. The simulator has a 1970s feel, with large sturdy metal consoles and chunky bakelite switches. Modern flat-screen monitors have been added, just as in the real control room, to provide additional computer-moderated information about the reactor. Behind the scenes, a powerful computer simulates the nuclear reactor itself and can be programmed to behave in any number of inconvenient ways.

‘There have been vast improvements over the years,’ explained Steve Mitchelhill, the simulator instructor who showed me around. ‘Some of it looks cosmetic, but it isn’t. It’s about reducing human factors.’ ‘Human factors’, of course, means mistakes by the nuclear plant’s operators. And Mitchelhill goes out of his way to indicate a deceptively simple innovation introduced in the mid-1990s: coloured overlays designed to help operators understand, in a moment of panic or of inattention, which switches and indicators are related to each other. That humble idea alone would probably have allowed operators to stop the Three Mile Island accident within minutes.

The lesson for financial regulators might seem obscure. Yet the same baffled, exhausted mistakes that characterised Three Mile Island also bedevilled decision making during the financial crisis. There was a Three Mile Island moment in the second week of September 2008. All eyes were focused on Lehman Brothers, which by then was sliding into deep trouble. Among the eyes focusing on Lehman were those of Tim Geithner, then President of the Federal Reserve Bank of New York, which supervised the banks. Geithner had just completed a transatlantic flight when the Chief Executive of the American International Group, AIG, Robert Willumstad, requested a meeting. According to the journalist Andrew Ross Sorkin, Geithner kept Willumstad waiting for half an hour because he was on the phone to Lehman Brothers. And when the two men did meet, Willumstad asked if AIG could have access to the same borrowing facilities at the Federal Reserve that were available to the investment banks.

Willumstad handed Geithner a briefing note confessing that AIG was exposed to $2,700 billion ($2,700,000,000,000) worth of perilous-looking financial contracts – more than a third of which were credit default swaps and similar deals agreed with twelve top financial institutions. The implication was that if AIG collapsed, it would bring the global financial system to its knees. AIG was both a bigger threat than Lehman Brothers, and a far more surprising one. Yet alarm bells cannot have sounded in Geithner’s head as loudly as perhaps they should have. AIG was, after all, an insurance company, regulated by the Treasury rather than Geithner’s New York Fed. For some reason – possibly fatigue, perhaps because he had no time to study Willumstad’s note, or maybe the note had been too indirect – Tim Geithner set the AIG question to one side and turned back to concentrate on the Lehman Brothers problem.

Frantic negotiations to save Lehman went on between government officials and top investment bankers throughout the weekend. It was only on Sunday evening that the penny dropped, when one of those investment bankers received a call from a Treasury official to ask if she could put together a team and start working on similar rescue discussions for AIG instead. The surprising news was greeted with an unsurprising response: ‘Hold on, hold on … You’re calling me on a Sunday night saying that we just spent the entire weekend on Lehman and now we have this? How the fuck did we spend the past forty-eight hours on the wrong thing?’ Just as in Three Mile Island, those in charge of a complex system had apparently been unable to pick out the essential information from a blizzard of financial noise.

‘We always blame the operator – “pilot error”,’ says Charles Perrow, the Yale sociologist. But like a power-plant operator staring at the wrong winking light, Tim Geithner had the wrong focus not because he was a fool, but because he was being supplied with information that was confusing and inadequate. It may be satisfying to castigate the likes of Geithner and the heads of Lehman Brothers and AIG, but safety experts like Perrow know it is far more productive to design better systems than to hope for better people.

Air-traffic control is one celebrated example of how a very reliable system was created despite the inherent difficulty of the task. So could we design the equivalent of an air-traffic control system for financial regulators, showing them when institutions are on a collision course? Regulators currently have little idea about whether there is another AIG out there, and no systematic method for finding out. They need more information – and more important, they need information in a format that’s as easy to understand as moving dots on a radar screen.

Andrew Haldane, director for financial stability at the Bank of England, looks forward to the day when regulators will have a ‘heat map’ of stresses in the financial system, harnessing the technologies now used to check the health of an electricity grid. With the right data and the right software to interpret it, regulators could look at a financial network map, highlighting critical connections, overstressed nodes, and unexpected interactions. Rather than poring over disconnected spreadsheets or puzzling PowerPoint slides, they would be looking at clear, intuitive presentations of risks emerging in the system. Ideally the map would be updatable daily, hourly – perhaps even in real time.

‘We’re a million miles away from that at the moment,’ Haldane readily admits. The Dodd–Frank reform act, signed by President Obama in July 2010, establishes a new Office for Financial Research which seems likely to try to draw up a map. The technology should, in principle, reveal which companies are systemically important – ‘too big to fail’ – and how systemic importance is changing over time. (The new ‘Basel III’ regulations discuss what rules should apply to systemically important institutions, but at present the definition of systemic importance is no clearer than the definition of art, literature or pornography.) A future Tim Geithner should never again be surprised to discover the unexpected importance of an institution such as AIG.

For all the attractions of a systemic heat map, it is unlikely to solve the problem by itself, any more than Donald Rumsfeld’s ‘information dominance’ solved the problem of waging war. Keeping the financial system safe will require proper systemic information for regulators, but it will also require much more. As on a battlefield, what goes on at the front line of finance can be impossible for any computer to summarise.

5 ‘We had no time’

One Saturday evening in September 2008, while Tim Geithner and a slew of top investment bankers in New York were busily spending forty-eight hours on the wrong thing, Tony Lomas was enjoying a meal at a Chinese restaurant with his family when his phone rang. At the other end of the line was the senior lawyer for the British operations of Lehman Brothers. The lawyer asked Lomas to come along the next day to the firm’s offices at Canary Wharf in London with a small team of insolvency experts. Lomas already knew that Lehman Brothers was in trouble. The shares had lost more than three quarters of their value in the past week. Some kind of rescue deal was being brokered in New York, but Lehman’s European directors wanted a Plan B – wisely, as Lehman Brothers fell apart shortly after the New York deal evaporated, leaving each national subsidiary to fend for itself. Plan B meant sending for the boss of the biggest insolvency practice in the UK. And that man was Tony Lomas.

The speed of Lehman’s collapse took even Lomas and his seasoned colleagues at PwC by surprise. Insolvency is typically a less sudden process – potential administrators tend to be lined up, just in case, weeks before a company declares that it is bankrupt. Yet suddenness is in the nature of a financial-services bankruptcy. Nobody wants to do business with a bank that seems like a credit risk, so there is no such thing as an investment bank that slowly slides towards bankruptcy. It happens fast, or it does not happen at all. The effect of such a sudden end to Lehman’s was chaos, most immediately for the personal lives of the accountants. One PwC partner said goodbye to his family at Sunday lunchtime and didn’t leave Canary Wharf for a week. His car ticked up an enormous bill in the short-stay car park – just one modest contribution to the cost of the administration process. PwC earned £120 million in the first year of working on the European arm of the Lehman bankruptcy, while the first year’s fees paid to administrators in the US and Europe totalled about half a billion dollars.

Lomas quickly took over the 31st floor of the Lehman offices in Canary Wharf, previously the executive dining suite; ostentatiously expensive works of art found themselves sharing wall space with hand-scrawled signs of guidance for the mushrooming team of PwC number-crunchers. The situation was an instant crisis. On Sunday afternoon, the administrators learned that the New York office had swept up all of the cash in Lehman’s European accounts on Friday evening – standard practice every day, but on this occasion there was little chance that the money would come back. That would make it impossible, and illegal, to trade on Monday morning. And Lehman had countless unresolved transactions open with many thousands of companies. On Monday morning – after a 5 a.m. board meeting – a judge signed over control of Lehman Europe to the PwC team, making the bankruptcy official. This happened at 7.56 a.m.; the ink wasn’t even dry by the time the London markets opened four minutes later.

The PwC team scrambled to figure out how Lehman’s operations worked. They were shown a baffling diagram of the bank’s Byzantine but tax-avoiding legal structure, with hundreds of subsidiary legal entities, only to be told that what looked like the Gordian knot was in fact just the simplified summary. It wasn’t that the team lacked experience: they’d overseen the restructuring of the European arm of Enron, the disgraced energy trading company famous for its financial wizardry. But Enron’s contracts were nowhere near as complex. Lomas was forced to assign staff to ‘mark’ senior Lehman officials, following them around all day in a desperate attempt to figure out what they actually did.

The scale of the chaos was mind-boggling. As a broker, Lehman Europe held over $40 billion in cash, shares and other assets on behalf of its clients. That money was frozen, so some clients found they were, as a result, at risk of bankruptcy themselves. Lehman was responsible for fully one in eight trades on the London Stock Exchange, but the last three days’ worth of trades had not been fully settled. Remarkably, this was typical. These unsettled trades were swinging in the winds of an unprecedentedly volatile market. Lehman had also hedged many of the risks it faced, using derivatives deals to protect it from volatility. But as the cancellation emails started to arrive on Monday, it became apparent that the bankruptcy made some of these deals void. When Lehman Brothers failed, it had one million derivatives contracts open.

It was only Lehman’s traders who understood how to untangle these deals, so only if some of them could be persuaded to stay on temporarily could the open positions be closed without the loss of still vaster sums of money. Infuriatingly for Lehman’s creditors – the cleaners, the cooks, the providers of telephone service and electricity – Lomas had to conjure up a $100 million loan to hand the traders generous bonuses. Even then, they couldn’t do it alone: any trader from another firm who became aware that Lehman was on the other end of the phone trying to offload an asset would be able to exploit the knowledge that the sale was forced. So Tony Lomas recruited teams at other banks, operating under hush-hush conditions, to do the job instead. To make matters worse, as it was itself a rather large bank Lehman didn’t have its own bank account. It couldn’t open one with another bank because they were all Lehman creditors and so would be legally able to grab any money Lehman deposited. Lomas had to enlist the help of the Bank of England, opening dozens of accounts in different currencies directly with the Old Lady of Threadneedle Street.

And that was just the immediate firefighting. Tidying up the charred remains would take a long, long time. It was over a year after Lehman Brothers collapsed before a British court started to hear testimony from Lehman’s clients, the financial regulator and PwC about what might be the correct way to treat a particular multi-billion dollar pool of money that Lehman held on behalf of clients. Who should get paid, how much and when? As PwC’s lawyer explained to the court, there were no fewer than four schools of thought as to the correct legal approach. The court case took weeks. Another series of court rulings governed whether Tony Lomas was able to execute a plan to speed up the bankruptcy process by dividing Lehman creditors into three broad classes and treating them accordingly, rather than as individuals. The courts refused.

It slowly emerged that the bank had systematically hidden the extent of its financial distress using a legal accounting trick called Repo 105, which made both Lehman’s tower of debt and its pile of risky assets look smaller and thus safer than they really were. Whether Repo 105 was legitimate in this context is the subject of legal action: in December 2010, New York State prosecutors sued Lehman’s auditors, Ernst and Young, accusing them of helping Lehman in a ‘massive accounting fraud’. But if that case remains unproven, it is quite possible that Lehman’s financial indicators were technically accurate despite being highly misleading, like the indicator light at Three Mile Island which showed only that the valve had been told to close, and not that it actually had.

Interviewed by the Financial Times on the first anniversary of Lehman Brothers’ collapse, Tony Lomas was hopeful of having resolved the big issues some time in 2011, about three years after the bankruptcy process began.

Lomas explained what would have made a difference: ‘If we had walked in here on that Sunday, and there had been a manual there that said, “Contingency plan: If this company ever needs to seek protection in court, this is what will happen” – wouldn’t that have been easier? At Enron, we had two weeks to write that plan. That wasn’t long enough, but it did give us an opportunity to hit the ground running. Here, we had no time to do that.’

Tony Lomas found an operation of bewildering complexity, and he was dealing only with the European office of Lehman Brothers – just a subsidiary of the entire bank, itself just a component of the global financial machine. But as we have seen, complexity is a problem only in tightly coupled systems. The reason we should care about how long it took to untangle Lehman Brothers is not because bankers and bank shareholders deserve any special protection – it is that tens of billions of dollars of other companies’ money were entombed with the dead bank for all that time. If that problem could be solved, the next Lehman Brothers could be allowed to fail – safely. That means turning a tightly coupled system into one where the interconnections are looser and more flexible.

6 Dominoes and zombie banks

The rather quirky sport of domino toppling is perhaps the ultimate example of a tightly coupled system. You’ve seen domino stunts as the last item on the evening news: record attempts in which someone has painstakingly stacked up thousands upon thousands of dominoes, ready to topple them all with a single gentle push. Dominoes, unlike banks, are supposed to fall over – but not too soon. One of the first domino toppling record attempts – 8000 dominoes – was ruined when a pen dropped out of the pocket of the television cameraman who had come to film the happy occasion. Other record attempts have been disrupted by moths and grasshoppers.

It might be possible to topple dominoes in a strictly controlled environment, free of insects and television crews. This would reduce the complexity of the domino system, meaning that being tightly coupled wouldn’t be so much of a problem. But it is clearly far more practical to loosen the coupling of the system instead. Professional domino topplers now use safety gates, removed at the last moment, to ensure that when accidents happen they are contained. In 2005, a hundred volunteers had spent two months setting up 4,155,476 stones in a Dutch exhibition hall when a sparrow flew in and knocked one over. Because of safety gates, only 23,000 dominoes fell. It could have been much worse. (Though not for the hapless sparrow, which a domino enthusiast shot with an air rifle – incurring the wrath of animal rights protesters, who tried to break into the exhibition centre and finish the job the poor bird had started.)

The financial system will never eliminate its sparrows (perhaps black swans would be a more appropriate bird), so it needs the equivalent of those safety gates. If the system’s coupling could be loosened – so that one bank could run into distress without dragging down others – then the financial system could be made safer even if errors were as common as ever.

Banks can act like dominoes – toppling many other firms when they fall over – in two ways. Most obviously, they can go infectiously bankrupt, meaning that they can collapse while holding their customers’ money. The nightmare scenario is that depositors from ordinary consumers to large companies find their cheques bouncing, not because they have run out of money but because the bank has.

Then there are zombie banks. They avoid going bankrupt, but only by stumbling around in a corporate half-life, terrorising other businesses. Here’s what happens. All banks have assets (a mortgage is an asset because the homeowner owes money to the bank) and liabilities (a savings account is a liability because the bank has to give the saver her money back if she asks for it). If the assets are smaller than the liabilities, the bank is legally bankrupt. Banks have a buffer against bankruptcy, called ‘capital’. This is money that the bank holds on behalf of its shareholders, who are at the back of any queue for repayment if the bank gets into trouble.

If the assets are barely larger than the liabilities, the bank is on the brink of bankruptcy – and to avoid that fate, it is likely to resort to the undeath of zombiehood. We’d ideally want the bank to avoid bankruptcy by seeking fresh capital from shareholders, inflating the capital cushion and letting the bank continue doing business with confidence. Yet most shareholders would be unwilling to inject capital, because much of the benefit would be enjoyed by the bank’s creditors instead. Remember: the creditors get paid first, then the shareholders. If the bank is near bankruptcy, the capital injection’s biggest effect is to ensure that creditors are paid in full; shareholders benefit only if there’s money left over.

So zombie banks do something else. Instead of inflating their capital cushion, they try to shrink in size so that a smaller cushion is big enough. They call in loans and use the proceeds to pay off their own creditors, and become reluctant to lend cash to any new businesses or homebuyers. This process sucks cash out of the economy.

Both zombie banks and infectiously bankrupt banks can topple many dominoes. No wonder governments responded to the financial crisis by guaranteeing bank debts and forcibly injecting big chunks of capital into banks. This prevented the crash from having more serious effects on the economy, but it had a cost – not only the vast expenditure (and even bigger risks) that taxpayers were forced to take, but also the dangerously reassuring message to bank creditors: ‘Lend as much as you like to whomever you like, because the taxpayer will always make sure you get paid.’ Instead of a capital cushion, it was the taxpayer who was pushed into the middle of the crash to soften the impact on the financial system. Decoupling the financial system means setting up the financial equivalent of those safety gates, so that when a bank such as Lehman Brothers gets into distress in future, it can be allowed to topple.

7 Decoupling

The first and most obvious way we can insert a safety gate between banks and the dominoes they could topple is making sure banks hold more capital. This not only reduces the chance that an individual bank will fail, but also reduces the chance that the failure will spread. Banks will not voluntarily carry thick cushions of capital, so regulators have to force them, and there is a cost to this. Capital is expensive, so higher capital requirements are likely to make loans and insurance more costly. It is possible to have too much of a good thing, even capital. But the credit crunch made it clear that the banks were carrying too little.

The second possible safety gate involves the curiously named ‘CoCo’ bonds – short for contingent convertible bonds. CoCos are debt, so under normal circumstances CoCo holders are paid interest and take priority over shareholders just as ordinary bank creditors do. But a CoCo is a bit like an airbag: if the bank crashes, it suddenly turns into a cushion, converting from bond to capital. Effectively, given certain triggers, the creditors who held CoCos find that, instead, they are now holding newly minted bank shares. This means they take the same risks as other shareholders.

Nobody is going to rejoice about this. Existing shareholders find they own a smaller slice of the firm, along with any profits. CoCo holders find they’re taking more risk than they wanted to. But the point about CoCo bonds is that they’re a pre-agreed piece of contingency planning: if the bank is on the verge of turning into a zombie, then the CoCo clause is triggered. Ordinary bondholders are safer because they get priority over CoCo bondholders; ordinary shareholders enjoy a higher return than they would have if the bank had simply to have ordinary capital rather than contingent capital. And in normal times the CoCo holders, because they are acting as insurers, will be paid a higher return than other bondholders.

It all sounds great. But remember that airbags can cause injuries as well as prevent them. CoCo bonds – like other insurance-style schemes – can move risk around the financial system, and we’ve seen what that can lead to. In Japan in the 1990s, CoCos acquired the charming name of ‘death spiral bonds’, which many people will find less than reassuring. One bank’s distress would trigger the CoCo clause, and other banks holding bonds that had suddenly been converted into equity were forced to sell them at a loss, possibly facing distress themselves. The answer is to ban banks from holding each other’s CoCo bonds: instead, those bonds should be held by private individuals or by pension funds, which are more robust in the face of short-term problems.

The third way to loosen the system is to have a much better way of handling bankruptcy if a bank does fail. Recall Tony Lomas’s lament that Lehman Brothers had no contingency plan for bankruptcy. Regulators could and should insist that major financial companies prepare such contingency plans and file them every quarter for inspection. The plans should include estimates of the time it would take to dismantle the company – information that the regulator would take into account when setting minimum capital requirements. If an investment bank’s operations are hellishly complex – often to avoid tax – and bankruptcy would take years, fine: let the capital cushion be luxuriously plump. A simpler operation with clearly defined contingency plans would cause less disruption if it went bankrupt and can be allowed a slimmer cushion. Since capital is expensive, this would encourage banks to simplify their operations and perhaps even to spin off subsidiaries. Currently, the playing field is tilted the other way, in favour of sprawling megabanks – complexity often brings tax advantages, while larger banks seem to be better credit risks.

It’s also absurd that a year after Lehman Brothers went bankrupt, the courts were exploring four different possible legal treatments of money in Lehman accounts. Regulators should have the authority to rule on ambiguities quickly. Of course, fairness is important when billions of dollars are at stake – but when a bank goes bankrupt, the worst possible decision is indecision. The physical economy can be paralysed by the tangle of claims against the banks – like some modern-day equivalent of Jarndyce and Jarndyce, the inheritance dispute in Charles Dickens’s Bleak House, which dragged on for so long that legal fees consumed the entire estate and none of the relatives got a penny.

Regulators also need the authority to take over banks or other financial institutions and quickly restructure them. As Tony Lomas discovered, international banks splinter into national banks as they die, so this kind of authority would need international agreement. But technically, it is simpler than it might seem.

One simple way to restructure even a complex bank has been invented by two game theorists, Jeremy Bulow and Paul Klemperer,^* and endorsed by Willem Buiter, who subsequently became chief economist of perhaps the world’s most complex bank, Citigroup. It’s such an elegant approach that at first it seems like a logical sleight of hand: Bulow and Klemperer propose that regulators could forcibly split a struggling bank into a good ‘bridge’ bank and a bad ‘rump’ bank. The bridge bank gets all the assets, and only the most sacred liabilities – such as the deposits ordinary people have left in savings accounts, or in the case of an investment bank, the cash deposited by other businesses. The rump bank gets no assets, just the rest of the debts. At a stroke, the bridge bank is fully-functioning, has a good capital cushion and can keep lending, borrowing and trading. The rump bank is, of course, a basket case.

Haven’t the rump bank’s creditors been robbed? Not so fast. Here comes the conjuring trick: the rump bank owns the bridge bank. So when the rump bank goes bust, and its creditors see what they can salvage, part of what they can salvage will include shares in the still-functioning bridge bank. That ought to leave them better off than trying to salvage only from the wreckage of the original bank. And meanwhile the bridge bank continues to support the smooth running of the economy, too.

If you are blinking at the idea that one can produce a healthy bridge bank like a rabbit from a troubled-bank top hat, without injecting new funds and without resorting to expropriation, you should be. But it seems to be true.

An even more radical – and probably safer – idea comes from the economist John Kay and is known as ‘narrow banking’. Kay suggests splitting the ‘casino’ and the ‘utility’ functions of modern banking. Utility banking is what ensures that ATMs give out cash, credit cards work, and ordinary people can deposit money in bank accounts without fearing for their savings. Casino banking incorporates the more speculative side of banking – financing corporate buyouts, investing in mortgage-backed bonds, or using credit derivatives in the hope of making money. A narrow bank is one that supplies all the utility functions of the banking system without dabbling in the casino side, and the idea of narrow banking is to make sure that banks that provide utilities cannot also play in the casino.

The truth is, naturally, messier. It is not quite fair to liken all risky banking to playing in a casino. As we saw in chapter 3, new ideas need rather speculative sources of funding, and many good ideas fail. There is always something of a gamble about the process of moving money to where it may achieve astonishing things, so without the presence of ‘casino’ activities such as venture capital, the world would be a poorer and less innovative place than it is. Nor is it quite so easy to differentiate between utilities and casinos: some casino-style activities are in fact simply sensible, even conservative pieces of risk hedging. If I bet that my neighbour’s house will burn down, that should raise some eyebrows, but if I bet that my own house will burn down, that’s insurance – it’s not only sensible but compulsory in many countries. Similarly, whether a bank’s particular financial transaction counts as a gamble or a piece of sensible risk management very much depends on what else the bank may be doing.

Nevertheless, the idea of narrow banking may be workable. Kay suggests that narrow banks would require a licence, and to get that licence they would have to satisfy regulators that their deposits were backed solidly with plenty of capital, and their ‘casino’ activities strictly limited to supporting the utility side, rather than designed to make money in their own right. Narrow banks would be the only institutions legally allowed to call themselves ‘banks’, the only ones allowed to take deposits from small businesses and consumers, the only ones allowed to use the basic inter-bank payments systems which transfer money from one bank account to another and which underpin the ATM network, and the only ones qualifying for deposit protection provided by the taxpayer.

This might sound like excessive regulatory meddling, but John Kay points out that in some ways it is less meddlesome. Rather than supervising the entire financial system in a vague and – we now know – inadequate way, dedicated regulators would focus on the simpler task of working out whether a particular bank deserved a narrow banking licence or not. Other financial firms could take the usual risks with their shareholders’ money. They could even own narrow banks: if the parent casino bank got into trouble, the narrow bank could be lifted wholesale out of difficulty and placed somewhere safer, without disruption to depositors or cost to the taxpayers – in the same way that if an electricity company went bankrupt, its power stations would keep running under new ownership.

All this harks back to Peter Palchinsky’s second principle: make failures survivable. Normally, carrying out lots of small experiments – variation and selection – means that survivability is all part of the package. But in tightly coupled systems, failure in one experiment can endanger all. That is the importance of successfully decoupling.

‘We cannot contemplate keeping aircraft circling over London while the liquidator of Heathrow Airport Ltd finds the way to his office,’ says John Kay. That is pretty much what happened to the dealings of Lehman Brothers while Tony Lomas’s team tried to resolve the mess, and Kay is right to seek a more sensible resolution system in future. His approach is in sharp contrast to the prevailing regulatory philosophy, which unwittingly encouraged banks to become larger and more complicated, and actively encouraged off-balance-sheet financial juggling. I do not know for sure whether Kay has the right answer, but normal accident theory suggests he is certainly asking the right question.

8 Slips, mistakes and violations

James Reason, the scholar of catastrophe who uses Nick Leeson and Barings Bank as a case study to help engineers prevent accidents, is careful to distinguish between three different types of error. The most straightforward are slips, when through clumsiness or lack of attention you do something you simply didn’t mean to do. In 2005, a young Japanese trader tried to sell one share at a price of ¥600,000 and instead sold 600,000 shares at the bargain price of ¥1. Traders call these slips ‘fat finger errors’ and this one cost £200 million.

Then there are violations, which involve someone deliberately doing the wrong thing. Bewildering accounting tricks like those employed at Enron, or the cruder fraud of Bernard Madoff, are violations, and the incentives for them are much greater in finance than in industry.

Most insidious are mistakes. Mistakes are things you do on purpose, but with unintended consequences, because your mental model of the world is wrong. When the supervisors at Piper Alpha switched on a dismantled pump, they made a mistake in this sense. Switching on the pump was what they intended, and they followed all the correct procedures. The problem was that their assumption about the pump, which was that it was all in one piece, was mistaken. The mathematical assumptions behind CDOs were also a mistake – the whiz-kids who designed them were wrong about the underlying distribution of risks, and the CDO structure dramatically magnified that mistake.

In the aftermath of disaster, we typically devote lots of attention to distinguishing violations from mistakes. Violations mean people should be fined, or sacked, or sent to jail. Mistakes are far less of an outrage. But what mistakes and violations have in common is at least as important as what separates them: they are generally much harder to spot than slips are, and so they lead to more of what Professor Reason calls ‘latent errors’.

Latent errors lurk unnoticed until the worst possible moment – like maintenance workers accidentally leaving valves closed on backup cooler pumps, and paper repair tags obscuring the view of warning lights. By their nature, such safety devices are used only in emergencies – and the more safety systems there are, the less likely latent errors are to be noticed until the very instant we can least afford them. Very often latent errors are tiny, almost impossible to pick up without being right at the business coal face. In James Reason’s Swiss cheese metaphor, the holes in one slice after another begin to line up, and stay lined up, without anyone noticing that the risk of disaster is rising.

The financial system is particularly vulnerable to latent error, partly because of its inherent complexity, and also because the incentive for violations is so much stronger in finance. Airline pilots, surgeons and nuclear plant operators are human – they will make mistakes, and they may sometimes cut corners. But we can usually hope that they will try in good faith to avoid accidents. We can have no such hope in finance, where the systemic consequences of bending the rules can pop up far away from the perpetrators and long after the profits have been banked.

Yet even in finance, latent errors can be spotted and fixed before any damage is done. The question is how. The assumption underpinning financial regulation is that if a bank is creating latent errors – whether through deliberate violations or innocent mistakes – then the people who will spot the risks are auditors and financial regulators. It is, after all, their job to do so. But do they? That is the question three economists tried to answer with an exhaustive study of corporate fraud. Not all potential problems involve fraud, of course, but the ability to uncover fraud is a good indicator of the ability to spot other latent errors. Alexander Dyck, Adair Morse and Luigi Zingales examined 216 serious allegations of fraud in US companies between 1996 and 2004. The sample excludes frivolous cases and includes all the famous scandals such as WorldCom and Enron.

What Dyck, Morse and Zingales found completely undermines the conventional wisdom. Out of the frauds that were discovered, auditors and financial regulators discovered only one in six. So who did spot corporate fraud? In some larger cases it was journalists. But non-financial regulators such as the Federal Aviation Administration spotted twice as many frauds as did the Securities and Exchange Commission. Evidently the contacts a non-financial regulator has with the everyday operations of a business are more likely to reveal wrongdoing than the auditors’ reviews of accounts.

That suggests that the best-placed people of all to spot fraud – or indeed any kind of hidden danger in an organisation – are employees, who are at the front line of the organisation and know most about its problems. Sure enough, Dyck, Morse and Zingales found that employees did indeed lift the lid on more frauds than anyone else.

Yet it is a brave employee who does this. Frauds and other latent errors are often uncovered only when the situation is desperate, because the whistleblowers who speak out often suffer for their actions.

9 ‘There was nothing in it for me to tell them the truth’

When Paul Moore interviewed 140 front-line staff at Britain’s largest mortgage lender, HBOS, he says ‘it was like taking the lid off a pressure cooker – fpow! – it was amazing.’ Moore was the head of group regulatory risk at HBOS between 2002 and 2005, and his job was to make sure that the banking group didn’t take too many gambles. He found that staff at HBOS’s major subsidiary, Halifax, were worried that they faced pressure to sell mortgages and hit targets, no matter what the risks were. One person complained to Moore that a manager had introduced a ‘cash and cabbages’ scheme, where staff would be given cash bonuses for hitting weekly sales targets, but publicly handed a cabbage if they failed. Another said, ‘We’ll never hit our sales targets and sell ethically.’ The risk, of course, was the same that brought down the subprime mortgage market: that given the pressure to hit their targets, HBOS staff would lend money to people who couldn’t afford to repay it. Moore mustered his evidence and presented a hard-hitting summary to the board of HBOS.

He says that he was thanked by the Chairman of HBOS and by the head of the HBOS audit committee for bringing to light such serious problems. Soon afterwards he was called in to meet Sir James Crosby, then the Chief Executive of HBOS. As Moore describes it, Moore’s concerns about the risks HBOS was running were dismissed ‘like swatting a fly’ and he was sacked. Moore walked out on to the street in front of the HBOS offices and burst into tears. Crosby’s account is different: he says that Paul Moore’s concerns were fully investigated and were without merit If Paul Moore’s fate seems extreme, it pales beside that of the stock market analyst Ray Dirks. Dirks was an unconventional man, at least by the standards of New York financiers in 1973. A tubby, bespectacled and dishevelled figure, he eschewed the well-trimmed Wall Street conformity of the day, in favour of a duplex flat in Greenwich Village that was adorned by little more than a spiral staircase, two telephones and the occasional girlfriend. Dirks was a nonconformist in another way: in an era when many analysts were simply cheerleaders, he had a reputation as a ruthlessly candid analyst who wasn’t afraid to dig up bad news about the companies he was analysing. But the bad news he received about the Equity Funding Corporation beggared belief.

A senior employee of Equity Funding had just quit, and decided that Dirks was the man to whom to tell his incredible story: Equity Funding had for years been running a massive fraud with its own dedicated computer system, specifically designed to create non-existent life insurance policies and sell them to other insurance companies. Over the course of a decade, over half of Equity Funding’s life insurance policies were fictitious. The company was selling the future income stream from these fake policies – cash today in exchange for promises of cash tomorrow. When the bills came due, it would simply manufacture more fakes and sell them to raise the money.

Dirks was astonished, and as he began to make enquiries, he became alarmed: he began to hear rumours that Equity Funding had mafia connections; at one stage, when visiting the company in Los Angeles, he received a call from his boss telling him that by discussing the possibility of fraud he was laying himself open to being sued for libel; two days later, a former auditor of Equity told Dirks he’d better go into hiding for his own safety. As his suspicions grew, Dirks had told the Wall Street Journal, Equity’s auditors, and the Securities and Exchange Commission (SEC) – but not before warning his clients of his fears.

Shortly after the Equity Funding Corporation collapsed, Ray Dirks was rewarded for his efforts: the SEC prosecuted him for insider trading, a charge that would at the very least have ended his career. Dirks fought his case for ten years before eventually being cleared by the US Supreme Court.

The SEC seems to have learned few lessons: when a former fund manager, Harry Markopolos, handed them a dossier of evidence that Bernard Madoff was running a gigantic fraud, he was ignored. (At least he was not prosecuted.) It is true that some whistleblowers have an axe to grind. Some are disgruntled former employees looking to make trouble. Mr Markopolos was Mr Madoff’s rival; Paul Moore had plenty of reasons to complain about HBOS, whether or not his complaints had merit. It is hard to know who to take seriously. But when billions are at stake, it is unwise to dismiss whistleblowers too casually.

Many whistleblowers later say they regret speaking out – more than four fifths of those who uncovered fraud in the Dyck–Morse–Zingales study say they had to quit, or were fired or demoted. If we rely on the pure public-spiritedness of employees to blow the whistle on fraud, reckless selling, incompetent mathematical modelling, poor maintenance, or any other risky latent condition, then we are relying on individuals to take a big personal risk for the benefit of society as a whole. Most, it seems, prefer to live and let live, and it is easy to understand why.

Only the exceptionally motivated follow through, and the very qualities that make them determined to persist may also make them hard to take seriously. Ray Dirks was a stubborn contrarian by nature, which helped him speak up but also isolated him. Paul Moore seems to have been driven by religious conviction: he speaks of having ‘sinned’, ‘examined my conscience very very closely’ and doing ‘a lot of praying’. But this religiosity, unusual for a British risk manager, may have chipped away at his credibility at the same time as it toughened his resolve against intimidation. And there was intimidation: Moore recounts how one colleague leaned across the table towards him and warned, ‘Don’t you make a fucking enemy of me.’ Moore persisted, despite the fact that – his voice wavers as he says this – ‘There was nothing in it for me to tell them the truth.’

But it is not impossible to encourage whistleblowers to speak out when they see evidence of a financial accident in the making – or an industrial accident. One piece of evidence for this comes from the Dyck–Morse–Zingales research. They looked at the healthcare sector, which relies on the taxpayer for much of its revenue. Because of this, whistleblowers can receive bonuses for saving tax dollars. The sums of money are breathtaking: such whistleblowers collected an average of almost $50 million in the study’s sample of alleged frauds. Not surprisingly, the prospect of a lottery-win reward coaxes more employees to blow the whistle. This happens three times more often in the healthcare business than elsewhere.

Another example: the Inland Revenue Service recently increased the rewards people could earn by reporting suspected tax evaders, and the number of tip-offs increased sixfold. The sums of money at stake are much larger now, too, often involving tens or hundreds of millions of dollars.

It would be harder to reward whistleblowers who spot more subtle latent errors. But the problem is worth thinking about, because it’s quite clear that during the financial crisis many people saw signs of trouble inside individual banks and financial institutions, but did not see the percentage in speaking out.

Less than four years after Moore stood sobbing on the street outside HBOS, the company – including the proud, three-centuries-old Bank of Scotland – tottered on the verge of bankruptcy. It had to be bailed out twice in quick succession – first it was forced to sell itself to its rival, Lloyds TSB, and then the merging group accepted a total of £17 billion from the British government. It was all very unexpected, not least to the UK’s financial regulator, the FSA. The deputy chairman of the FSA at the time? None other than the man who’d sacked Paul Moore, Sir James Crosby.

10 Making experiments survivable

The financial crisis was so traumatic that it is tempting simply to conclude that all banking risks should be legislated out of existence, with fancy financial instruments outlawed, and banks compelled to hold gigantic capital cushions. But that would take for granted – and threaten – the benefits we now enjoy from banking. The end of error in finance would also be the end of new ideas, and indeed of most banking as we know it.

We’d miss it. In the 1960s, my father-in-law tried to get a mortgage. He couldn’t. He was a dentist, so self-employed – too risky. Property was concentrated in the hands of a narrow class of wealthy landlords, who were able to buy it cheap, without much competition, and rent it out to the masses. Immigrants or those with the wrong colour of skin were often the last to be able to get hold of a loan to buy their own home. Let’s not forget that, although we ended up taking several steps too far in making mortgages easy to come by, those steps started off as being in the right direction. As in any other sector, some innovations in finance will inevitably fail. And as in any other sector, those inevitable failures are a price well worth paying for innovations that succeed – but only if the failures are survivable. John Kay’s ‘narrow banking’ proposal aims to structure banks in such a way that the financial system can continue to take risks and develop valuable new products, but without endangering the system as a whole.

That is the key lesson that emerges from industrial safety. We can make a priority of getting more reliable indicators of what is going on, in a format that might enable a regulator both to anticipate systemic problems and to understand crises as they are occurring. We can get better at spotting latent errors more quickly by finding ways to reward – or at least to protect – those who speak up. We can be more systematic about publicising latent errors, too: the nuclear industry now has a system for recording near-misses and disseminating the information to other power plants that might be on the verge of making the same mistake. But above all, we should look at decoupling connections in the financial system to make sure that failures remain isolated.

After those fateful few days in 2008 when the US government let Lehman Brothers fail and then propped up AIG, many people drew one of two contradictory conclusions: either AIG should have been treated like Lehman, or Lehman should have been treated like AIG. But the real lesson is that it should have been possible to let both Lehman and AIG collapse without systemic damage. Preventing banks from being ‘too big to fail’ is the right kind of sentiment but the wrong way of phrasing it, as the domino analogy shows: it would be absurd to describe a single domino as being too big to fail. What we need are safety gates in the system that ensure any falling domino cannot topple too many others.

Above all, when we look at how future financial crises could be prevented, we need to bear in mind the two ingredients of a system that make inevitable failures more likely to be cataclysmic: complexity and tight coupling. Industrial safety experts regard the decoupling of different processes and the reduction of complexity as valuable ends in themselves. Financial regulators should, too.

11 Deepwater Horizon

After nightfall on 20 April 2010, Mike Williams was in his workshop on a floating drilling rig in the Gulf of Mexico. The rig was a colossal engineering achievement, with a deck 400 feet by 250 feet, and the world record for deep-water drilling to its credit: over 35,000 feet – deeper than Mount Everest is high. The rig’s team had just completed the drilling and sealing of the Macondo oil well, and that very day had hosted executives from the rig’s operator, Transocean, and the well’s owner, BP, to celebrate seven years without a notable accident. But the accident that was about to occur would be far more than merely notable: it was to be the worst environmental disaster in American history. The name of the rig was Deepwater Horizon.

Williams first realised something was amiss when the rig’s engines began revving wildly. He did not realise that explosive methane gas had bubbled up from the seabed, a mile below the surface of the water. It was being sucked into the rig’s engines, forcing them to excessive speeds. Alarms sounded; lights glowed so brightly that they shattered; Williams pushed back from his desk just as his own computer monitor exploded. He was then hurled across the room by a far larger explosion – pinned under a three-inch steel fire door that had been ripped off its hinges by the force of the blast. He crawled towards the exit and was again flung across the room by a second flying blast door. Bleeding profusely from a head wound, he finally reached the deck of the rig to see that the crew were already evacuating, not realising that he and a few other crew had survived and remained behind on the rig. With a last thought of his wife and young daughter, and a prayer, Williams leapt from the deck of Deepwater Horizon. Like the few survivors of the Piper Alpha disaster, he faced a ten-storey drop. Mike Williams survived; eleven others died.

The exact distribution of blame for the Deepwater Horizon explosion and the gigantic oil spill that followed will be left to the courts – along with a bill of many billions of dollars. Almost five million barrels of oil surged into the Gulf of Mexico just 40 miles from the coast of Louisiana. How did it happen?

Blame could possibly be attached to the rig’s operator, Transocean; to the contractor responsible for sealing the well with cement, Halliburton; to the regulator responsible for signing off on the drilling plans; and of course to BP, which owned the Macondo well and was in overall charge of the project. Each party has a strong financial incentive to blame the others. Still, amidst the confusion, the details that have emerged at the time of writing suggest a pattern that will now be familiar.

The first lesson is that safety systems often fail. When the boat that picked Mike Williams up circled back to tow a life raft away from the burning rig, it found the life raft tied to the rig by a safety line. Transocean, the rig’s operator, banned crew from carrying knives – so the boat, and the life raft, found themselves attached to a blazing oil rig by an interacting pair of safety precautions. (The safety line was eventually severed and the crew rescued.) Or consider a safety device called the mud-gas separator: when the well started to leak, blowing mud and gas onto the deck of the rig, the crew directed the flow into the separator, which was quickly overwhelmed, enveloping much of the rig in explosive gas. Without this device, the crew would simply have directed the flow over the side of the rig, and the worst of the accident might have been prevented.

The second lesson is that latent errors can be deadly. BP’s own review of the accident concluded that eight separate lines of defence had been breached – in James Reason’s language, eight holes in the Swiss cheese had managed to align. But that is no great surprise; in such disasters, multiple lines of defence are almost always breached. The most noticeable failure was that of the blowout preventer, a massive seabed array of valves and hydraulic rams designed to seal the well in the event of disaster. A congressional hearing has heard that the preventer appeared to be in a shocking state: one of the automatic triggers had no battery power, while another had a faulty component. The preventer was leaking hydraulic fluid, meaning that when it was eventually triggered by a robot submersible, it lacked the power to seal the well. All this sounds shocking, but failsafe systems such as the blowout preventer are often in a poor state of repair because in an ideal world they would never be used: Deepwater Horizon’s blowout preventer, which operated in extreme conditions a mile under the sea, had last been inspected five years before the accident.

The third lesson is that had whistleblowers felt able to speak up, the accident might have been prevented. The well had been unstable for weeks, and for months BP engineers had been expressing concern that the specific design of the well might not be up to the job. The Macondo well’s manager reported problems with the blowout preventer three months before the accident. Meanwhile, Transocean’s safety record had been deteriorating for the few years prior to the accident: the company was showing signs of stress after a merger. On paper, BP has a clear policy of protecting people who blow the whistle with safety concerns. But in practice, the tight-knit community of an offshore drilling rig can encourage the kind of conformist thinking we encountered in chapter 2, regardless of the official policy. Oil companies, like banks, need to find ways to encourage whistleblowers.

Fourth, the rig system was too tightly coupled. One failure tended to compound another. The rig was designed as the key defence against minor and major spills: the rig contained the mud-gas separator to prevent small spills, and also controlled the blowout preventer. But at the very moment when the rig’s capabilities were most needed to plug the leak, the rig itself was being torn apart by a series of explosions. In an awful echo of Piper Alpha, the blowout preventer could not be triggered from the rig’s deck because power lines had been severed in the initial explosion. A safer design would have decoupled the blowout preventer from the rig’s control room.

Fifth, as Tony Lomas could have attested, contingency plans would have helped. BP – along with other oil majors – was humiliated when it was discovered that their contingency plans for a major spill included measures to protect the local walrus population. This was not actually necessary: walruses typically look after themselves when oil is spilled in the Gulf of Mexico by staying exactly where they are, in the Arctic Circle. The implication was clear: BP and others seem to have grabbed a contingency plan off the shelf, one that was originally designed for drilling in Alaska or the North Sea.

The final lesson is that of ‘normal accident’ theory: accidents will happen, and we must be prepared for the consequences. The US government signed off on the Macondo drilling project because the risk of trouble was thought to be small. Perhaps it was small – but the chance of accidents is never zero.

As the economy we have created becomes ever more complex, both the engineering that underpins it and the finance that connects it all together will tend to become more complex, too. Deepwater Horizon was pushing the limits of deep sea engineering; Three Mile Island came at a time of constant innovation in nuclear technology; the burgeoning market in credit derivatives also tested the boundaries of what was possible in finance. The usual response to complexity, that of trial and error, is not enough when faced with systems which are not only complex, but also tightly coupled. The costs of error are simply too high.

The instinctive answer is to eliminate the errors. This is an impossible dream. The alternative is to try to simplify and to decouple these high-risk systems as much as is feasible, to encourage whistleblowers to identify latent errors waiting to strike, and – sadly – to stand prepared for the worst. These are lessons that some engineers – both petroleum engineers and financial engineers – seem to have to learn again and again.

*A bond is a kind of tradable loan: if you buy the bond, you’re getting the right to receive the loan repayments, perhaps from a company, perhaps from a government, or perhaps from some more complex financial process.

*Readers of The Undercover Economist may recall Klemperer as one of the designers of the 3G spectrum auctions.

SixPreventing financial meltdowns or: Decoupling