Death by ice cream
Shark attacks are more common in months when people buy more ice cream, noted writer Michael Blastland in a 2008 article for BBC online. Then he invited readers to speculate on why this might be so, and got hundreds of responses, including: ‘Eating ice cream will cause people to urinate a large quantity of lipids (fats) similarly to seals and other fatty sea mammals, which happen to be the sharks’ favourite prey’, ‘It is easier to swim away from a shark if you are not trying to hold on to an ice cream’, and ‘Sharks will stop at nothing for a 99 with a Flake’.
Some, missing the joke, pointed out the real reason: both increase in hot weather.
Finding a correlation between death by shark and buying ice cream does not mean that there’s a causal link between the two. Banning ice cream vans would not save lives by reducing the number of shark attacks. Conversely, reducing shark-related deaths probably wouldn’t reduce ice cream sales. If anything, feeling safer might attract more people to the beach and boost ice cream consumption.
A few years ago, in the first excitement around big data, some people claimed that, in the future, correlation would be enough. Just as the striking link between smoking and developing lung cancer was enough to change people’s habits even before scientists understood how smoking caused an increased risk of cancer,1 spotting patterns from massive datasets would enable effective policies without understanding underlying causes.
So local councils, which now have responsibility for public health in the UK, might decide to ban ice cream vans from half the beaches, to see if it did lower the incidence of shark attacks. And if they did that, I think they’d find that the number of shark attacks fell on those beaches and rose on the other beaches, which would strongly suggest that their policy was working. The responsible thing to do, then, would be to ban all ice cream vans from all beaches. And shark attacks might fall right across the district.
Why? Because if my favourite beach suddenly doesn’t sell ice cream, I may move to a different beach that’s more fun. And the more people are at a beach, the more chance there is that one of us will be eaten by a shark.
I should point out that very few people are attacked by sharks, even in Australia where the warm waters are more hospitable for both sharks and bathers.2 So no UK council would get statistically useful information from this kind of study. Beaches with or without ice cream vans would both report zero shark deaths. You’re vastly more likely to die of a heart attack than a shark attack.
Worldwide, the average number of deaths by shark is just under six per year. So fewer than one in a billion people will be killed by a shark in the average year. Whereas, of people who die in a given year, around a third will die of cardiovascular disease (CVD): some problem with their heart or circulation, including heart attacks.
Don’t panic, you don’t have a one in three chance of dying of a cardiovascular illness this year. That’s your approximate chance that CVD will be the cause of your eventual death. Your chance of dying at all this year is small, and depends on factors such as your age, sex, medical history, involvement in gun crime, and how often you swim with sharks. In the UK, the average chance of dying this year varies between one in 10,417 for a woman aged 5–14, to one in six for a man aged 85 or over. By far the greatest risk factor for death is age.
I hope that reassures you that you’re statistically unlikely to die in the near future, and very unlikely indeed to die from a shark attack. Go ahead, have that ice cream.
Public health officials are not idiots. They know sharks are not a major threat, especially in chilly British waters, and that any link with ice cream sales is about the time of year, not sweet-toothed sharks.
And most people who got overexcited a few years ago and said things like ‘correlation supersedes causation’, are now calming down and saying they didn’t mean we’d only need correlation, and hastily looking through the bin for causation. But because correlation is one of the things big data does very well, it’s still central to the data-driven view of the world.
Computers, even sophisticated ones like IBM’s Watson or Google’s Deep Mind, are much better at spotting patterns than having leaps of insight. They’re often very complex patterns, about how many points are connected in a network, and how strongly, or relationships between thousands of dimensions of millions of variables, but they are still patterns, not theories about how the world works.
Guitars and football
Spotting patterns, and making decisions based on what we’ve previously observed, is something we all do. When you get on a busy train, where do you sit? Not just window or aisle, but who do you sit next to? Chances are, you get on the train, do a quick scan of the people who would share your seat and make a snap decision about which to choose.
You may not even be conscious you’re doing it. Most of us have other things on our minds, which is why we have a subconscious algorithm, a mental shortcut, of the sort that psychologists call a heuristic. While you’re busy planning what to have for dinner, or looking at Tinder, you may not be aware that you’re going through a near-automatic process of weighing up your seating options.
Some psychologists compare this subconscious process to the way data scientists apply Bayes’ theorem, though without the mathematics. You start with a best guess, based on what you do know, and then revise that guess as new information comes along. You can’t be completely certain about how the future will turn out, but you need to make a decision, so you choose the best bet. ‘Given the information I have, which choice is most likely to make me happy?’ Or, more often, ‘Which choice am I least likely to regret?’
What information are you using to predict the outcomes of your seating decision? Unless you happen to see somebody you know, you have to jump to conclusions about everybody else on the train, based on a few pieces of instant information, mainly appearance.
I live near Millwall FC’s ground. For American readers, Millwall is a London football team, soccer to you, which used to have such a bad reputation that one of their chants was, ‘no one likes us, we don’t care’. A football fan wearing the colours of an opposing team might even expect personal violence from Millwall fans.
My train into central London stops at the Millwall ground, so on match days I find London Bridge Station full of dark-blue-and-white scarves, jackets and hats. In full voice, the supporters fill the station’s echoing vaults with a wordless, baritone chant as harmonic as any Gregorian monks. There’s a lot of herding around by police, so it can be quite disruptive of a smooth journey home.
Knowing this, if I get on a train carriage and have to choose between a seat next to three Millwall-blue-wearing men, and another seat next to a sensitive-looking young man with a guitar case, the choice is obvious.
However, that isn’t all I know. Having lived here for six years, I’ve had lots of contact with Millwall fans and found them polite and friendly. When they’re winning, or hoping to win, they radiate joyful energy. When they’ve just lost, I feel a pang for their disappointment. So I’m happy to sit next to them.
I don’t have anything against sensitive young men with guitars, though there’s a small danger they might try to show me their poetry. But before you infer that he’s musical, creative and in touch with his emotions, get chatting, give him your phone number and go on a date, let me give you a new piece of information.
If you were once asked for your telephone number by an attractive Frenchman, who never rang you as he promised, don’t be disappointed. If you suspected that he asked literally hundreds of women for their phone number that day, you could be right. You may have unwittingly been part of a psychological experiment by Nicolas Guégen and his colleagues at the University of South Brittany.
They sent out a young man ‘previously evaluated as having a high level of physical attractiveness’ to ask 300 young women for their telephone number on the street. In every case, he was to say he found them pretty, and suggest they could go for a drink later. He never did call them back, and they never did go for that drink. I don’t know whether the researchers ever phoned all those young women to apologise for falsely raising their hopes.
The script was the same for all 300 of the encounters, but the researchers varied one thing to see if it had any effect on his rate of success. And it did. Empty-handed, the attractive Frenchman got a phone number from 14 per cent of the women. When he carried a guitar case, that success rate went up to 31 per cent. His chance of getting a phone number more than doubled, just by carrying a guitar case.
This study was published in 2013, and received international press coverage, including a piece in Popular Science by the admirably named Colin Lecher. So my young man on the train with a guitar case may play the guitar, or he may have read an article saying that carrying a guitar case makes him more likely to get a date.
I don’t know what he’ll do if he gets the date but can’t play the guitar. It would sound a bit shallow to confess he was only carrying a guitar case to chat up women. Then again, it would sound a bit shallow to say you only gave him your number because he plays the guitar. And you might not even be aware that the guitar case was a factor, it might all be part of the subconscious process of predicting who you’d like to date, or sit next to on a train.3
This is a silly example, but we all do it all the time. If we didn’t, we’d never have the time or the energy for the important stuff, such as reading or looking out of the window. We’re using correlation, remembering that in the past the guy whose shoes are held on with string turned out to be smelly, or that you had a fascinating conversation with the woman reading the book on mathematics.4
Nominal data
What if we can’t see somebody? Names provide another source of clues. Because fashions for names change, your first name says something about your age. Nate Silver’s data website, fivethirty-eight, offers a guide to guessing a person’s age from their name, noting that over half of all Lisas are in their forties, but an Anna, with an enduringly popular name, could be any age.
Remember the BBC-commissioned study of names and voting intentions? Your name may not influence your voting intention, or your career choice – let’s hope not, for Colin Lecher’s sake – but the name your parents chose for you gives some clues about your parents5 and thus about your own background.
Suppose I write to you, asking if you can give work experience to one of my students. I have three potential interns for you: Lakisha Washington, Eleanor Cadbury and Alex Clark. They all have the same qualifications, aptitude and experience. Which one would you choose? Can you picture the three students in your mind?
This is a thought experiment, by the way. They’re all imaginary, I don’t have any students. Clark is a name I picked in tribute to economist Gregory Clark, who has researched links between names and life histories. He studied University of Oxford students, compared their names with the UK population, and found that Eleanors were three times as numerous in the Oxford student population as in the population as a whole. Another study found that the surname Cadbury is associated with high social status. These effects may not apply if you’re outside Britain.
Before you all change your child’s name to Eleanor Cadbury, let’s remember that banning ice cream doesn’t prevent shark attacks. My own father, whose parents were working class, and who went to a state school in Liverpool, got a scholarship to Oxford. He’s definitely not called Eleanor.
All UK universities are very keen to find students from a wider range of social backgrounds, so a state school kid with good grades will beat a stereotypical Eleanor Cadbury with mediocre grades every time. Students with good grades from poor schools have already shown they have something exceptional about them.
But parents who call their daughters Eleanor are also more likely to give their children the benefit of good schools, lots of encouragement and high expectations. The name is a predictor of academic success, not because it causes success, but because it’s associated with factors that make success more likely.
However, many researchers have found that your parents’ choice of name may have an impact in the wider world. In a study done in the other Cambridge, in Massachusetts, researchers Marianna Bertrand and Senghil Mullainathan sent out résumés (CVs) in response to real job adverts in 2001 and 2002. They varied the CVs by quality, by address and by name, choosing names that would clearly signal the race of the applicant.
You may not be surprised to learn that ‘White-sounding’ names got 50 per cent more offers of interviews. Our hypothetical White applicant would need to send out 10 résumés to get an interview, but our imaginary African-American applicant would need to send out 15. Similar studies have found similar effects in Britain.
So much for Eleanor and Lakisha. What about Alex, how did you imagine her? Because yes, my imaginary third student is also female. But she’s read some research showing that changing the name John to Jennifer at the top of a job application makes recruiters less likely to offer a job, or training. That name change also knocked $4,000 per year off the suggested starting salary.
Social psychologist Corinne Moss-Racusin found that professors of biology, chemistry and physics saw Jennifer as less competent than John for a laboratory manager job, even though all other details of the résumé were the same. In case you tend to picture all science professors as male, by the way, this included female professors. They also favoured John.
So Alex has stopped calling herself Alexandra and removed anything from her CV that might give away her gender. Which might confuse a few employers when they meet her and have to revise their mental picture of this ambitious young man, but at least she will have got that far.
I’m not telling you all this to make you feel bad if you instinctively went for Eleanor as your marketing intern, or Alex to help out in your engineering firm. They’re imaginary, nobody’s feelings got hurt. I’m just trying to show that our subconscious shortcuts draw on a lot of social shortcuts that can be limiting.
They can also be overcome.
Things sometimes feel as though they’re changing very slowly, but a century ago I wouldn’t have been allowed to get a degree from Oxford University,6 or to vote. Now you’ve read this chapter, if you ever have a real job or internship on offer, you’ll probably think twice before you make lots of assumptions about the candidates. You know it’s a more important decision than where to sit on the train.
But if Oxford used big data methods to choose their candidates for interview, and the machine learning algorithm used previous success as a basis for selection, then Eleanor Cadbury would do very nicely. There would be strong correlations between name, what school you went to, and graduating from a good university.
The algorithm might go a bit further and use other publicly available data such as social networks. Knowing existing or previous students, or even being related to them, is a good predictor of getting into an elite university. Photos of the candidate playing lacrosse or polo must be linked quite closely to Oxford and Cambridge admissions, as they’re uncommon sports outside private schools. And there’s previous educational attainment, of course, which is perhaps the only relevant correlation.
You might think this would be a ludicrous system, but employers are already using algorithms when they decide who to hire, that are not so different from what I’ve just described.
A few blocks from Steven Skiena’s desk at Yahoo! Labs in New York, I drop in on the Data and Society Research Institute, a light and open 11th-floor space, with meeting rooms called cheeky things like Bias and Panopticon. The Institute isn’t about creating new technology, more about thinking through the technology’s implications for our shared future. Things such as how the use of algorithms can disproportionately impact the same groups that have experienced discrimination in the past.
Over lunch, I chat to one of the Fellows, Gideon Litchfield, about a problem he calls algorithmic accountability. ‘Technology can perpetuate an existing power dynamic even if it was intended not to,’ he says. ‘One of the ways we see that is increased profiling, whether it’s by law enforcement, by banks, whether it’s shops that calculate what price to offer you online based on where you live …’
Or employment.
Data-driven hiring decisions don’t only rely on scoring what the candidates have chosen to submit, such as a résumé or the results of psychometric tests. They may also use other publicly available data, such as social media posts.7
Some of the things that earn you a red flag, according to a 2014 Data & Society Research Institute report, are the expression of opinions that are derogatory towards a ‘protected group of people’, photos with drug references, sexually explicit material, ‘animal activism’ or ‘At Risk Populations’. No, I don’t know how you get allocated to an ‘At Risk Population’ either.
Other factors that can affect your chances of getting a job are how far you live from the workplace, your likelihood of needing time off sick, and how strongly you agree with statements such as, ‘When I’m working for a company I take pride in making it as profitable as possible.’
So if you live in a poor neighbourhood with little local employment, or have poor health, or tend to answer questions too honestly, your chances of getting work are now even smaller.
‘You end up with very subtle discrimination happening at many levels,’ says Gideon, ‘which will naturally enhance the existing segregations in society. Because people are slightly less apt at the moment to get credit, or to get jobs, that will all feed into the data about them, which will feed into their abilities to do other things. There’s a risk of it all becoming a self-perpetuating cycle if you like.’
Our human mental shortcuts can have the same failings, as I tried to trick you into proving earlier in this chapter. But with algorithms there’s an extra problem, as Gideon reminds me:
‘If an algorithm is making decisions, and it’s a machine learning algorithm that has been trained on large datasets and is a black box, nobody can look inside it and understand the logic of any decision that it makes.
And this is the problem he calls Algorithmic Accountability: ‘If this algorithm makes a decision and people feel like they’ve been harmed by that decision, or it’s prejudiced against them unfairly, what or who do they appeal to? Is it the people who designed the framework in which the algorithm was trained? Is it the people who provided the data on which the algorithm was based? Is it the people who are using the algorithm to back up their decisions, the policymakers? How do you show whether an algorithm is biased or not? How do you show it was the algorithm that was responsible for that?’
Some claim that algorithms, if carefully designed, can overcome the unconscious bias of human recruiters, and lead to more fairness, not less. After all, if you don’t feed information about race or gender into the model, how can it be prejudiced?
Gideon pulls a skeptical face. ‘People think of algorithms as neutral, it’s just a calculation, but of course they have biases within them that are the fruit of which decisions they’re used on, or what data goes into them, or the way that they’re written, the way they’re programmed. But those biases are very hard to detect.
People naturally don’t think of algorithms as biased, because they don’t have biases in the easy relatable way that human beings have biases, so policymakers tend to think of algorithms as infallible. So all those things come together into the question of algorithmic accountability.’
Innocent until probably pre-guilty
Not knowing why you were turned down for a job is bad enough. But some young people in Chicago had a surprise visit from the police without committing a serious offence, and can’t ask the algorithm why they were singled out for such pre-crime attention.
‘If you hang around people who are getting shot, even if you’re not actively doing anything, then you become exposed,’ Andrew Papachristos of Yale University told the Chicago Tribune. ‘It puts you at risk because of the behaviors of your friends and your associates.’
This is the thinking behind Chicago Police Department’s Two Degrees of Association system, for which they got a federal grant to work with Professor Miles Wernick of Illinois Institute of Technology. Going beyond 500m (just over 500-yard) squares that are predicted to be future crime scenes, they wanted to know who would be committing those crimes. So they fed a list of names into a computer that used various risk factors to rank their likelihood of future involvement in homicide or other serious crime.
These risk factors included past criminal and arrest records, the police records of friends and acquaintances, and whether any of those associates have been shot. So previous contact with the police is a good predictor of the police paying closer attention to you in the future. I’m guessing that law students at the University of Chicago and unemployed kids on Chicago’s West Side are both likely to smoke weed, but that the law students are less likely to get a rap sheet for it.
‘The novelty of our approach is that we are attempting to evaluate the risk of violence in an unbiased, quantitative way,’ Professor Wernick told The Verge in a 2014 interview. ‘This is accomplished in a similar manner to how the medical field has identified statistically that smoking is a risk factor for lung cancer. Of course, everybody who smokes doesn’t get lung cancer, but it demonstrably increases the risk dramatically. The same is true of violent crime.’
One difference is that committing a crime, unlike getting lung cancer, is an act of will, a choice. It may be a choice in circumstances not chosen by the criminal, or a last resort, but it was still an action, not a disease caused by a rogue genetic mutation.
Another difference is that I can choose whether to smoke. I can’t choose whether to come from the wrong part of the city, or to have cousins who have committed murder.
Miles Wernick is not a criminologist. He is a Professor of electrical and computer engineering, and his previous research is in medical imaging, training machines to spot worrying patterns in brain scans.
I don’t know exactly how the algorithm chose its list of 420 people who are not guilty, but not entirely innocent, and neither do they. The police have so far refused Freedom of Information requests.
If it uses machine learning, the basic AI that underpins a lot of data analysis, then nobody knows, exactly. It is a black box. Having given the computer a task, to find the patterns in past data and make predictions about the future, all we get is the results.
Defendants in Pennsylvania may soon be asking what’s inside another black box, as their state has decided to use a statistical risk assessment when sentencing.
This kind of tool is already used to make parole decisions by assigning a level of risk of recidivism. If the computer says you’re at a high risk of reoffending, you don’t get parole. Now the machine learning algorithm developed by Richard Berk at the University of Pennsylvania will have a say in whether people go to prison, and for how long.
A similar system, used in Virginia when sentencing sex offenders, was challenged by the ACLU when a 19-year-old man was sentenced to 18 months in prison for consensual sex with a 14-year-old girl. The ACLU challenge pointed out that, had the perpetrator been 36, he would have been given a lower risk score and so escaped imprisonment.
Using factors unrelated to the crime, over which the defendant may have no control, such as age, employment and education, undermines fundamental fairness in the criminal justice system, says the ACLU:
‘These are sentences divined from nothing more than statistical correlations. Using the same logic, if the Crime Commission discovered that people who like hotdogs and drive Buicks are more likely to recidivate, judges will soon be giving them longer sentences, too.’
America incarcerates too many of its population, that is something on which many politicians from all sides agree. Some of them claim that using data to decide who is unlikely to reoffend is one way to keep people out of jail.
Nor are human judges free from prejudice. But they are, at least, accountable for their decisions, and can be asked to justify and explain them. An algorithm is accountable to nobody.
Astrology to four decimal places
Predicting the future is another thing we all do, all the time. Will that guitarist ever phone me? Will Liverpool win tomorrow, or should I call my dad now, while he’s still in a good mood? Will it rain on Saturday?
Weather forecasting is pretty good in the short term, thanks partly to good mathematical models. Very powerful computers work with twenty-first-century versions of Galton’s maps of pressure and wind direction, using many variables at many points in space and time, based on real observations and the known laws of physics.
But after a couple of days, the accuracy of weather forecasts falls away badly. Because weather systems tend to be chaotic, in the mathematical sense of the word, they’re hard to forecast, even with a complex computer program. Weather is a deterministic physical process, but the tiniest change in the start settings escalates quickly into massive changes in the results.
Predicting processes that involve human beings adds a whole new level of unpredictability. The difficulty with using big data to predict the future is that mathematical models assume the future will look like the past, or at least follow the same rules.
Malthus assumed that human population would grow exponentially, by repeated doubling, and food in a linear fashion, adding the same quantity every year, until starvation or disease intervened. Or until the poor could be persuaded to have fewer children. Mathematically, population growth must overtake food production and inevitably lead to hungry mouths outnumbering plates of food.
But Malthus didn’t factor in any of the improvements in food production that have allowed it to outstrip population growth while using less of the Earth’s surface. In the last 50 years alone, the land area needed to grow a fixed amount of food has fallen by two-thirds.
Predictions using computer models often include caveats about assuming that present trends continue. These warnings don’t always make it into the news coverage, so careful studies spelling out their assumptions become alarming headlines about the world in 2050: half of us will be obese, we’ll need 60 per cent more food, and sea levels will be 3m (10ft) higher. Probably because of all the obese bathers, displacing tons of seawater every time they dive in.
Perhaps we should ban those ice cream vans after all.
It’s helpful to remember a distinction between projection and prediction. You can make a mathematical model, and test how well it fits the past by letting it calculate past values from even earlier data. If it gives you values for 1980, 1990 and 2000 that are close to the observed values for those years, then it fits well. You have a model that’s shown it could calculate the present from the past.
Now, you can continue this model forwards to calculate projected figures for the future. And if things don’t change too much, and if your assumptions were sound, you stand a good chance of your projections being close to how things turn out in reality. So you can use your projection as a basis for prediction. If real life continues to resemble your model, here’s how the future will look.
The difference is that a projection is a continuation of your mathematical model. A prediction is you, making a judgement that your model is a good guide to the future.
Suppose you want to know how tall your child will be. You could measure them every month and use that to predict how tall they will be at the age of 25. If you just noted that they grow about 5cm (2in) a month in the first year, you’d project a height of more than 15m (50ft) at age 25.
If you observed that this rate slows down after the first year, you might assume that an inch a month is a better predictor, or even that you need to include changes in the speed of growth, so the height eventually stops changing. If you use your experience of real life, you might include things like the growth spurt at puberty, the fact that adults don’t keep growing indefinitely, or that family members tend to end up being similar in height.
You could use an online height calculator. I tried one that ‘uses genetics’ by extrapolating from my parents’ height to predict that I will be 1.63m (5ft 4in) tall. Which is about 10cm (4in) too short. So either it’s not easy to accurately predict from so little data, or my dad isn’t really my dad, and my mum managed to have an affair with an even taller man with similar sticky-out ears and terrible puns.
To be fair, the website does say it’s only an estimate.
It’s also a reminder that when Galton first used regression to model the relationship between parents’ height and the heights of their adult children, he was making predictions about the population generally, or the probable range of an individual’s height.
The best predictor of your child’s adult stature is their current stature. Look up their height and age on the UK-WHO charts to get a predicted height. If your four-year-old son is 112cm (3ft 8in) tall, he’ll grow to be 188cm (6ft 2in). That’s not a guaranteed adult height, but four out of five boys the same height as your son aged four will be within a couple of inches (6cm) shorter or taller than this height as adults. So that range is an 80 per cent confidence interval.
Or you could use my hairdresser’s rule, that the height of a two-year-old is half their adult height. And he’s just started wearing his son’s hand-me-down trainers, so he should know.
I’m not sure why you would want to predict your infant’s adult height, apart from the fact that parents love thinking about their child’s future. But to plan for the future, we need some idea of what will be happening in a year’s time, or 10 or 50. Not just individually, but as a society. How many people will need trains or schools or hospitals? All these things take time to plan, build and staff. And making predictions about them means making guesses about the future behaviour of millions of people.
Luckily, as Quetelet noted, it’s easier to predict what a population will do than an individual. The variations cancel each other out a bit, and the underlying pattern is more coherent. Your local bar can’t know if you, personally, will turn up tonight, or how much you’ll drink, but they can have some idea of roughly how many people will turn up, and overall beer consumption on an average Saturday night at this time of year.
But if there’s one thing human history should teach us, it’s that things will change in ways that history doesn’t predict.
Remember Turing and his little robot child? He imagined a thinking robot, an artificial intelligence with feelings that might be hurt by the school bullies. But he still pictured little robot Turing Junior having to bring in the coal to heat the house, and possibly to cook dinner and heat the water for the laundry. The infamous prediction that the market for computers was ‘six’ simply didn’t foresee that they would become so fast and cheap to make, so compact and so widely useful.
And we may want things to change. A century ago, when women in Britain were sabotaging the census in their campaign to vote equally with men, there were some who said we weren’t naturally suited to politics. And the data would have backed them up. Women in the early twentieth century were less educated, and less involved in public life, than men.
Nothing about this would inevitably lead to the conclusion that women, therefore, should not get the vote. You could take the opposite view and say that women need not only the vote, but also equal access to education and greater involvement in public life. Many people did, and those things too have changed since 1916.
Big data gives us a tremendous opportunity to use observations of the past and present, processed with powerful software, to make very detailed projections and help us predict the future. It can also help us prepare for the future in ways we’re only just beginning to explore.
But it has some pitfalls that we need to beware of, like a tendency to dazzle us with precision. Even when the data analysts include confidence intervals and other reminders that forecasting the future is a grey art, we look for certainty. It can be comforting to have a picture of what to expect.
It may be a grim picture, of a world in which obese people bob about in ever-rising sea water while the entire surface of the world is one intensive factory farm, but at least it’s predictable. Which we often prefer to a future in which everything is uncertain.
We have to make decisions, based on too little information, with unforeseeable consequences. No wonder people are tempted to let a machine make the predictions, take the decisions, and bear the blame.
Average man
I once got a feedback card in the post after training a bunch of engineers. On it was written one word: average.
Now I’ve let you wince in empathy, I should tell you it was a joke, and I laughed, because I am a person, not a machine, and I knew the context. But nobody wants to be described as average, do they?
And rightly so. It’s useful to know the average when you’re studying a population. Average life expectancy keeps going up. That’s great news. But it’s no guarantee that I, personally, will live a certain number of years. If I look up the official life expectancy figures for my area, I can expect to live to 84. If I moved across London to Camden I could postpone my predicted death to the age of 87.
How? By joining a population with a better average. I don’t know why, but I’m guessing Camden includes more well-off areas, because wealth is a good predictor of being healthier and living longer. I don’t think it’s the fact that I live in a low area near the Thames, and Camden is uphill, further from the miasma.
It could be that those north-Londoners are thinner, less likely to smoke, more likely to jog, and so on. It’s hard to separate those things out from having more cash and a less precarious existence, because wealthier people tend to be thinner, take more exercise and smoke less. On average.
When health researchers look for factors that increase the risk of diseases, they have to handle the data carefully to try and reduce the confounding effect of things that go together but aren’t necessarily implicated. Statistical innocent bystanders, if you like, found at the scene of the crime and now on somebody’s list.
I’m unlikely to change the date and time of my own death by moving to Camden, unless I step out from behind the removal van with a box I can’t see over and am struck by a bus. Knowing the average for my population is of limited use for me, personally. None of us is the average person imagined by Quetelet.
I found a website that takes a few more details to give me a more precise predicted departure date. By entering my BMI, my outlook (optimistic) and how much I drink, I’ve been given an estimated lifespan of 84 years, one month and eight days. So you’d better all come to my 84th birthday party.
Some of these websites are designed to scare you into putting money aside for retirement. Look how long you’ll live! Quick, start saving!8 Others are designed to scare you into living a healthier life. Look how soon you’ll die! Quick, stop smoking!
Some let you change some of your habits onscreen and see how much extra time you’d get, on average. I discover that I’d probably only lose two years if I started smoking now. And I could probably add two years by taking more exercise, so I could start jogging to the ciggie shop and be quits. Though, as I’d be spending all the money I really should be putting into a pension fund on cigarettes, I’d be looking forward to an impoverished old age.
But again, this is deceptively specific. Smoking, like most lifestyle choices, does not guarantee a particular outcome. All you do is shift your odds. You could have no vices, be unlucky and die young, or be that lucky 104-year-old, lighting a fag from all the candles on your highly fattening birthday cake.
It’s useful for the National Health Service in my area to know roughly how many of us will be wanting hip replacements, hearing aids and hover chairs, or whatever we’ll be getting around in when I’m old. But it’s of limited use to me.
If 84 is my expected age of death, I have a 50 per cent chance of living that long. Doesn’t sound so good, does it? But I also have a 50 per cent chance of living longer, which sounds better. My chance of dying in any given year goes up every year from now until … Well, until it stops going up because it’s reached one in one.
To be pessimistic, I could die within the next year. To be wildly optimistic, I could live to 114. Between those dates, it’s anybody’s guess. So let’s stop worrying about how long our lives will be, and start thinking about how we live them.
Notes
1 Though, as we saw earlier, Doll and Hill spotted the relationship between smoking and lung cancer because they looked for it, not because it popped out of a correlation engine.
2 I know this because shark attack casualties were one statistic we used in our show, Your Days Are Numbered: The Maths of Death, and stand-up mathematician Matt Parker checked the figures. When we took the show to Australia we scarcely had to adjust the UK probability of death by shark attack. Though while we were in Australia somebody was killed by a shark, which made our show slightly less accurate, but much more tasteless.
3 If you’re collecting dating tips, they also tried sending out the same Frenchman with a sports bag, which knocked his success rate down from 14 per cent to 9 per cent. So sporty guys, try carrying your gym kit home in a guitar case.
4 I know, that goes without saying.
5 My name tells you that they were expecting a boy and, stumped for ideas, looked through the character lists of Shakespeare’s plays till they found one that sounded nice.
6 I still don’t have a degree from Oxford, but that’s not the point. The point is that if I had applied, and been clever enough, and worked hard, I could have got one. Theoretically I still could.
7 Most human recruiters also go online to get a bigger picture of potential candidates. If you need a minute to go and delete a lot of stuff off your social media profile, go ahead, I’ll wait.
8 No, I don’t have a pension fund. Please buy as many copies of this book as you can afford.