The power to not collect data is one of the most important and little-understood sources of power that governments have . . . By refusing to amass knowledge in the first place, decision-makers exert power over the rest of us.
—ANNA POWELL-SMITH, MissingNumbers.org
Nearly seven decades ago, the noted psychologist Solomon Asch gave a simple task to 123 experimental subjects.
They were shown two illustrations, one with three quite different lines and the other of a ‘reference line’, and Asch asked them to pick which of the three lines was the same length as the reference line. Asch had a trick up his sleeve: he surrounded each subject with stooges who would unanimously pick the wrong line. Confused, the experimental subjects were often (though not always) swayed by the errors of those around them.
The Asch experiments are endlessly fascinating, and I often find myself discussing them in my writing and talks: they are a great starting point for a conversation about the pressure we all feel to conform, and they provide a memorable window into human nature.
Or do they? The experiments are elegant and powerful, but like many psychologists Asch was working with material that came readily to hand: 1950s American college students. We shouldn’t criticise him too much for that; Asch was simply harvesting the low-hanging fruit. It would have been troublesome for him to have collected a representative sample of all Americans, even harder to study an international sample, and impossible for him to have known what the study would have shown if it had been conducted not in 1952 but in 1972. (Others were to run the follow-up experiments, which found somewhat lower levels of conformity – perhaps a sign of student rebelliousness in the Vietnam era.)
It’s all too tempting, however, to act as though Solomon Asch discovered an immutable and universal truth – to discuss the results of psychological experiments on a very specific type of person, in this case 1950s American students, as though they were experiments on the human race as a whole. I am guilty of this myself at times, especially when under the time pressure of a talk. But we should draw conclusions about human nature only after studying a broad range of people. Psychologists are increasingly acknowledging the problem of experiments that study only ‘WEIRD’ subjects – that is, Western, Educated and from Industrialised Rich Democracies.
By 1996, a Cochrane-style review of the literature found that Asch’s experiment had inspired 133 follow-ups. The overall finding stood up, which is encouraging in the light of the previous chapter: conformity is a powerful and widespread effect, though it seemed to have weakened over time. But the obvious next question to ask is this: does conformity vary in its power depending on who is under pressure to conform to whom?
Disappointingly, the follow-up studies were not very diverse – most had been conducted in the United States and almost all with students – but the few exceptions were illuminating. For example, a 1967 experiment conducted with the Inuit of Baffin Island in Canada found lower levels of conformity than one conducted with the Temne people of Sierra Leone. I am no anthropologist, but reportedly the Inuit had a relaxed and individualistic culture, while the Temne society had strict social norms, at least at the time that these experiments were conducted. In general – and with some notable exceptions such as Japan – conformity in the Asch-inspired experiments has been lower in societies which sociologists viewed as individualist, and higher in societies viewed as collectivist, where social cohesion is more important.1
That implies Asch probably understated the power of conformity by studying subjects from America, an individualistic society. But then, accounts in both psychology textbooks and pop-science books often exaggerate how much conformity Asch found. (Asch’s experimental subjects often rebelled against group pressure. Hardly any of them buckled every single time; much more commonly they tried to equivocate by varying their actions across repeated rounds of the experiment, sometimes agreeing with the group and sometimes staking out a lonely position.) By pure luck, these two biases in the popular understanding of Asch’s findings may have effectively cancelled each other out.2
How much of the conformity pressure came because the group being studied was a monoculture? Would a more heterogeneous group leave more space for dissent? There are some tantalising hints of that possibility – for example, follow-up studies found that people conformed to groups of friends much more than they conformed to groups of strangers. And when Asch instructed his stooges to disagree with each other, conformity pressure evaporated: his subjects were happy to pick the correct choice, even if they were the only one doing so, as long as others were disagreeing among themselves. All this suggests that one cure for conformity is to make decisions with a diverse group of people, people who are likely to bring different ideas and assumptions to the table. But this practical tactic is hard to test as the original experiments and many of the follow-ups were on homogeneous groups. One can’t help feeling that an opportunity has been missed.
It should, I think, make us feel uneasy that most accounts of Asch’s results completely ignore the omission of people who might have acted differently, and whom he could easily have included. Solomon Asch taught in a proudly coeducational institution, Swarthmore College in Pennsylvania. Was it really necessary that not a single one of his experimental participants, neither the stooges nor the subjects, was female?
As it happens, follow-up studies suggest that all-male groups are less conformist than all-female groups. So, again, you could say it’s a case of no harm, no foul: Asch might have seen even stronger evidence of conformity had he looked beyond young American males.3 Still, gender does matter, and Asch could have studied its effects, or at least used mixed-gender groups. But it evidently didn’t occur to him, and it’s discomfiting how few subsequent reports on his experiment seem to care.
If Solomon Asch was the only researcher to have done this, we could wave it away as a historical curiosity. But Asch isn’t alone; of course he isn’t. His student Stanley Milgram conducted a notorious set of electric shock experiments at Yale University in the 1960s. Here’s how I once described his experiments in the Financial Times:4
[Milgram] recruited unsuspecting members of the public to participate in a ‘study of memory’. On showing up at the laboratory, they drew lots with another participant to see who would be ‘teacher’ and who ‘learner’. Once the learner was strapped into an electric chair, the teacher retreated into another room to take control of a shock machine. As the learner failed to answer questions correctly, the teacher was asked to administer steadily increasing electric shocks. Many proved willing to deliver possibly fatal shocks – despite having received a painful shock themselves as a demonstration, despite the learner having already complained of a heart condition, despite the screams of pain and the pleadings to be released from the other side of the wall, and despite the fact that the switches on the shock machine read ‘Danger: Severe Shock, XXX’. Of course, there were no shocks – the man screaming from the nearby room was pretending. Yet the research exerts a horrifying fascination.
My article should have mentioned, if only in passing, that all forty of Milgram’s experimental subjects were men. But I wasn’t thinking about that particular issue at the time, and so – like many others before me – it didn’t occur to me to check.
I hope it would now, because since writing that article I have interviewed Caroline Criado Perez about her book Invisible Women. Meeting her was fun – she strolled into the BBC with an adorable little dog who curled up in the corner of the studio and left us to talk about the gender data gap. Reading her book was less fun, because the incompetence and injustice she described was so depressing – from the makers of protective vests for police officers who forgot that some officers have breasts, to the coders of a ‘comprehensive’ Apple health app who overlooked that some iPhone users menstruate.5 Her book argues that all too often, the people responsible for the products and policies that shape our lives implicitly view the default customer – or citizen – as male. Women are an afterthought. Criado Perez argues that the statistics we gather are no exception to this rule: she makes abundantly clear how easy it is to assume that data reflect an impartial ‘view from nowhere’, when in fact they can contain deep and subtle biases.
Consider the historical under-representation of women in clinical trials. One grim landmark was thalidomide, which was widely taken by pregnant women to ease morning sickness only for it to emerge that the drug could cause severe disability and death to unborn children. Following this disaster, women of childbearing age were routinely excluded from trials, as a precaution. But the precaution only makes sense if one assumes that we can learn most of what we need to know by testing drugs only in men – a big assumption.6
The situation has improved, but many studies still do not disaggregate data to allow an exploration of whether there might be a different effect in men and in women. Sildenafil, for example, was originally intended as a treatment for angina. The clinical trial – conducted on men – revealed an unexpected side effect: magnificent erections. Now better known as Viagra, the drug hit the market as a treatment for erectile dysfunction. But sildenafil might have yet another unexpected benefit: it could be an effective treatment for period pain. We’re not sure, as only one small and suggestive trial has yet been funded.7 If the trial for angina had equally represented men and women, the potential to treat period pain might have been as obvious as the impact on erections.
This kind of sex-dependent effects is surprisingly common. One review of drug studies conducted in male and female rodents found that the drug being tested had a sex-dependent effect more than half of the time. For a long time, researchers into muscle-derived stem cells were baffled by why they sometimes regenerated and sometimes didn’t. It seemed entirely arbitrary, until it occurred to someone to check whether the cells came from males or females. Mystery solved: it turned out that the cells from females regenerated, while those from males did not.
The gender blind-spot has yet to be banished. A few weeks into the coronavirus epidemic, researchers started to realise that men might be more susceptible than women, both to infection and to death. Was that because of a difference in behaviour, in diligent hand-washing, in the prevalence of smoking, or perhaps a deep difference in the biology of the male and female immune systems? It wasn’t easy to say, particularly since of the twenty-five countries with the largest number of infections, more than half – including the UK and the US – did not disaggregate the cases by gender.8
A different problem arises when women are included in data-gathering exercises, but the questions they are asked don’t fit the man-shaped box in the survey-designer’s head. About twenty-five years ago in Uganda, the active labour force suddenly surged by over 10 per cent, from 6.5 to 7.2 million people. What had happened? The labour force survey started asking better questions.9
Previously, people had been invited to list their primary activity or job, and many women who held down part-time jobs, ran their own market stalls or put in hours on the family farm simply wrote down ‘housewife’. The new survey asked about secondary activities as well, and suddenly women mentioned the long hours of paid work they had been doing on the side. Uganda’s labour force increased by 700,000 people, most of them women. The problem was not that the women were ignored by the earlier survey, but that it asked questions that assumed an old-fashioned division of household labour in which the husband did full-time paid work and the wife worked unpaid in the home.
An even subtler gap in the data emerges from the fact that governments often measure the income not of individuals but of households. This is not an unreasonable decision: in a world where many families pool their resources in order to cover rent, food and sometimes all expenses, the ‘household’ is a logical unit of analysis. I know several people, men and women, who spend much or most of their time doing unpaid work at home, looking after children, while their partners are earning large salaries. It would be strange to claim that, on the basis that the unpaid partner earns little or no income, they live in poverty.
And yet while many households pool their resources, we cannot simply assume that they all do: money can be used as a weapon within a household, and unequal earnings can empower abusive relationships. Collecting data on household income alone makes such abuses statistically invisible, irrelevant by definition. It is all too tempting to assume that what we do not measure simply does not exist.
As with the Asch experiments, it turns out that we don’t have to speculate that it might matter who within a household controls the purse strings. We have good evidence that it sometimes does. Economist Shelly Lundberg and colleagues studied what happened in the UK when in 1977, child benefit, a regular subsidy to families, was switched from being a tax credit (usually to the father) to a cash payment to the mother. That shift measurably increased spending on women’s and children’s clothes relative to men’s clothes.10
When I wrote about Lundberg’s research in the Financial Times, an outraged reader wrote to ask me how I knew that it was better to spend money on women’s and children’s clothes rather than men’s clothes. Uncharacteristically for readers of the FT, this person had missed the point: it is not that any spending pattern was better, but that the spending pattern was different. Household income did not change, but when that income was paid to a different person in the household, it was spent on different things. That tells us that measuring income only at the level of the household omits important information. The UK’s new benefit system, Universal Credit, is payable to a single ‘head of household’. That curiously old-school decision may well favour men – but, given the data we have, it’s going to be hard to tell.
It’d be nice to fondly imagine that high-quality statistics simply appear in a spreadsheet somewhere, divine providence from the numerical heavens. Yet any dataset begins with somebody deciding to collect the numbers. What numbers are and aren’t collected, what is and isn’t measured, and who is included or excluded, are the result of all-too-human assumptions, preconceptions and oversights.
The United Nations, for example, has embraced a series of ambitious ‘Sustainable Development Goals’ for 2030. But development experts are starting to call attention to a problem: we often don’t have the data we would need to figure out whether those goals have been met. Are we succeeding in reducing the amount of domestic violence suffered by women? If few countries have chosen to collect good enough data on the problem to allow for historical comparisons, it’s very hard to tell.11
Sometimes the choices about what data to gather are just bizarre. Will Moy, the director of the fact-checking organisation Full Fact, points out that in England, the authorities know more about golfers than they do about people who are assaulted, robbed or raped.12 That’s not because somebody in government sat down with a budget to commission surveys and decided that it was more important to understand golf than crime. Instead, surveys tend to be bundled with other projects. Amid the excitement of London being awarded the 2012 Olympic Games, the government launched the Active Lives Survey, which reaches 200,000 people, with enough geographical spread to allow us to understand which sports are most popular in each local area. That’s why we know so much about golfers.
That’s no bad thing – it’s great to have such a fine-grained picture of how people keep fit. But doesn’t it suggest there’s a case for beefing up the Crime Survey of England and Wales, which reaches just 35,000 households? That’s a large enough survey to understand the national trend in common crimes, but if it were as large as the Active Lives Survey, we might be able to understand trends for rare crimes, smaller demographic groups or particular towns. Other things being equal, a larger survey can give more precise estimates, especially when you’re trying to count something unusual.
But bigger isn’t always better. It’s perfectly possible to reach vast numbers of people while still missing out enough other people to get a disastrously skewed impression of what’s really going on.
In 1936, the Kansas Governor Alfred Landon was the Republican nominee for President against the incumbent, President Franklin Delano Roosevelt, a Democrat. A respected magazine, the Literary Digest, shouldered the responsibility of forecasting the result. It conducted an astonishingly ambitious postal opinion poll, which reached 10 million people, a quarter of the electorate. The deluge of mailed-in replies can hardly be imagined, but the Digest seemed to be relishing the scale of the task. In late August it reported, ‘Next week, the first answers from these ten million will begin the incoming tide of marked ballots, to be triple-checked, verified, five-times cross-classified and totalled.’13
After tabulating a remarkable 2.4 million returns as they flowed in over two months, the Literary Digest announced its conclusions: Landon would win by a convincing 55 per cent to 41 per cent, with a few voters favouring a third candidate.
The election delivered a very different result. Roosevelt crushed Landon by 61 per cent to 37 per cent. To add to the Literary Digest’s agony, a far smaller survey conducted by the opinion poll pioneer George Gallup came much closer to the final vote, forecasting a comfortable victory for Roosevelt.
Mr Gallup understood something that the Literary Digest did not: when it comes to data, size isn’t everything. Opinion polls such as Gallup’s are based on samples of the voting population. This means opinion pollsters need to deal with two issues: sample error and sample bias.
Sample error reflects the risk that, purely by chance, a randomly chosen sample of opinions does not reflect the true views of the population. The ‘margin of error’ reported in opinion polls reflects this risk, and the larger the sample, the smaller the margin of error. A thousand interviews is a large enough sample for many purposes, and during the 1936 election campaign Mr Gallup is reported to have conducted three thousand interviews.
But if three thousand interviews were good, why weren’t 2.4 million far better? The answer is that sampling error has a far more dangerous friend: sampling bias. Sampling error is when a randomly chosen sample doesn’t reflect the underlying population purely by chance; sampling bias is when the sample isn’t randomly chosen at all. George Gallup took pains to find an unbiased sample because he knew that was far more important than finding a big one.
Literary Digest, in its quest for a bigger dataset, fumbled the question of a biased sample. It mailed out forms to people on a list it had compiled from automobile registrations and telephone directories – a sample that, at least in 1936, was disproportionately prosperous. Those who had telephones or cars were generally wealthier than those who did not. To compound the problem, Landon supporters turned out to be more likely to mail back their answers than those who backed Roosevelt. The combination of those two biases was enough to doom the Literary Digest’s poll. For each person George Gallup’s pollsters interviewed, Literary Digest received eight hundred responses. All that gave the magazine for its pains was a very precise estimate of the wrong answer. By failing to pay enough attention both to the missing people (the ones who were never surveyed) and to the missing responses, the Literary Digest perpetrated one of the most famous polling disasters in statistical history.
All pollsters know that their polls are vulnerable to the Literary Digest effect, and the serious ones try – as George Gallup tried – to reach a representative sample of the population. This has never been easy, and it seems to be getting harder: fewer people bother to answer the pollsters’ enquiries, raising the obvious question about whether the ones who do are really representative of everyone else. This is partly because people are less willing to pick up a landline telephone to speak to cold-callers, but that’s not the only explanation. For example, the first British Election Study, a face-to-face survey in which the survey team would knock on people’s doors, had a response rate of nearly 80 per cent back in 1963. The 2015 version, also face-to-face, had a response rate of just over 55 per cent; in almost half of the homes approached, either nobody opened the door, or somebody opened the door but refused to answer the surveyor’s questions.14
Pollsters try to correct for this, but there is no foolproof method of doing so. The missing responses are examples of what the statistician David Hand calls ‘dark data’: we know the people are out there and we know that they have opinions, but we can only guess at what those opinions are. We can ignore dark data, as Asch and Milgram ignored the question of how women would respond in their experiments, or we can try desperately to shine a light on what’s missing. But we can never entirely solve the problem.
In the UK General Election of 2015, opinion polls suggested that David Cameron, the incumbent Prime Minister, was unlikely to win enough votes to stay in power. The polls were wrong: Cameron’s Conservative party actually gained seats in the House of Commons and secured a narrow victory. It was unclear what had gone wrong, but many polling companies presumed that there had been a last-minute swing in favour of the Conservatives. If only they had conducted a few snap polls at the last possible moment, they might have detected that swing.
But that diagnosis of what had gone wrong was incorrect. Later research showed that the real problem was dark data. Shortly after the election, researchers chose a random sample of houses and knocked on the door to ask people if and how they voted. They got the same answer as the pollsters had: not enough Conservative voters to return Mr Cameron to office. But the pollsters then went back again to the houses where nobody had answered, or where people had turned the surveyors away. On the second attempt, more Conservative voters were in evidence. The pollsters came back to try to fill in the gaps again, and again, and again – sometimes as many as six times – and eventually got an answer from almost everyone they’d originally hoped to talk to. The conclusion: this retrospective poll finally matched the result of the election – a Conservative government.
If the problem had been a late swing, the solution would have been a bunch of quick-and-dirty, last-minute surveys. But because the real problem was that Conservative voters were harder to reach, the real solution may have to be a slower, more exhaustive method of conducting opinion polls.15
Both problems hit US pollsters in the notorious 2016 election, when the polls seemed to put Hillary Clinton ahead of Donald Trump in the swing states that would decide the contest. There was a late swing towards Trump, and also the same kind of non-response bias that had doomed the 2015 UK polls: it turned out to have been easier for pollsters to find Clinton supporters than Trump supporters. The polling error was not, objectively speaking, very large. It just loomed large in people’s imagination, perhaps because Trump was such an unusual candidate. But the fact remains that the polls were wrong in part because when the pollsters tried to find a representative group of voters to talk to, too many Trump supporters were missing.16
One ambitious solution to the problem of sample bias is to stop trying to sample a representative slice of the population, and instead speak to everybody. That is what the census attempts to do. However, even census-takers can’t assume they have counted everyone. In the US 2010 census, they received responses from just 74 per cent of households. That’s a lot of people missing out or opting out.
In the UK 2011 census, the response rate was 95 per cent, representing about 25 million households. That’s much better – indeed, it seems almost perfect at first glance. With 25 million households responding, random sample error is not an issue; it will be tiny. But even with just 5 per cent of people missing, sample bias is still a concern. The census-takers know that certain kinds of people are less likely to respond when the official-looking census form lands with a thud on the doormat: people who live in multiple-occupancy houses such as a shared student house; men in their twenties; people who don’t speak good English. As a result, the 5 per cent who don’t respond may look very different from the 95 per cent who do. That fact alone is enough to skew the census data.17
Census-taking is among the oldest ways of collecting statistics. Much newer, but with similar aspirations to reach everyone, is ‘big data’. Professor Viktor Mayer-Schönberger of Oxford’s Internet Institute, and co-author of the book Big Data, told me that his favoured definition of a big dataset is one where ‘N = All’ – where we no longer have to sample, because we have the entire background population.18
One source of big data is so mundane as to be easy to overlook. Think about the data you create when you watch a movie. In 1980 your only option would have been to go to a cinema, probably paying with cash. The only data created would have been box office receipts. In 1990 you could instead have gone to your local video rental store; they might have had a computer to track your rental, or it might have all been done with pen and paper. If it was done on a computer it would probably not have been connected to any broader database. But in the twenty-first century, when you sign up for an account with Netflix or Amazon, your data enter a vast and interconnected world – easily analysed, cross-referenced or shared with a data wholesaler, if the terms and conditions allow.
The same story is true when you apply for a library card, pay income tax, sign up for a mobile phone contract or apply for a passport. Once upon a time, such data would have existed as little slips of paper in a giant alphabetical catalogue. They weren’t designed for statistical analysis, as a census or a survey would have been. They were administrative building blocks – data gathered in order to get things done. Over time, as administrative data have been digitised and algorithms that can interrogate the data have been improved, it has become ever easier to use them as an input to statistical analysis, a complement to or even a substitute for survey data.
But ‘N = All’ is often more of a comforting assumption than a fact. As we’ve seen, administrative data will often include information about whoever in the household fills in the forms and pays the bills; the admin-shy will be harder to pin down. And it is all too easy to forget that ‘N = All’ is not the same as ‘N = Everyone who has signed up for a particular service’. Netflix, for example, has copious data about every single Netflix customer, but far less data about people who are not Netflix customers – and it would be perilous for Netflix to generalise from one group to the other.
Even more than administrative data, the lifeblood of big data is ‘found data’ – the kind of data we leave in our wake without even noticing, as we carry our smartphones around, search Google, pay online, tweet our thoughts, post photos to Facebook, or crank up the heating on our smart thermostat. It’s not just the name and credit card details that you gave to Netflix: it’s everything you ever watched on the streaming service, when you watched it – or stopped watching it – and much else besides.
When data like these are opportunistically scraped from cyberspace, they may be skewed in all sorts of awkward ways. If we want to put our finger on the pulse of public opinion, for example, we might run a sentiment analysis algorithm on Twitter rather than going to the expense of commissioning an opinion poll. Twitter can supply every message for analysis, although in practice most researchers use a subset of that vast firehose of data. But even if we analysed every Twitter message – N = All – we would still learn only what Twitter users think, not what the wider world thinks. And Twitter users are not particularly representative of the wider world. In the United States, for example, they are more likely than the population as a whole to be young, urban, college-educated and black. Women, meanwhile, are more likely than men to use Facebook and Instagram, but less likely to use LinkedIn. Hispanics are more likely than whites to use Facebook, while blacks are more likely than whites to use LinkedIn, Twitter and Instagram. None of these facts is obvious.19
Kate Crawford, a researcher at Microsoft, has assembled many examples of when N = All assumptions have led people astray. When Hurricane Sandy hit the New York area in 2012, researchers published an analysis of data from Twitter and a location-based search engine, FourSquare, showing that they could track a spike in grocery shopping the day before and a boom for bars and nightclubs the day after. That’s fine, as far as it goes – but those tweets about the hurricane were disproportionately from Manhattan, whereas areas such as Coney Island had been hit much harder. In fact, Coney Island had been hit so hard the electricity was out – that was why nobody there was tweeting – while densely populated and prosperous Manhattan was unusually saturated with smartphones, at least by 2012 standards, when they were less ubiquitous than today. To make this sort of big data analysis useful, it takes a considerable effort to disentangle the tweets from the reality.20
Another example: in 2012 Boston launched a smartphone app, StreetBump, which used an iPhone’s accelerometer to detect potholes. The idea was that citizens of Boston would download the app and, as they drove around the city, their phones would automatically notify City Hall when the road surface needed repair – city workers would no longer have to patrol the streets looking for potholes. It’s a pleasingly elegant idea, and it did successfully find some holes in the road. Yet what StreetBump really produced, left to its own devices, was a map of potholes that systematically favoured young, affluent areas where more people owned iPhones and had heard about the app. StreetBump offers us ‘N = All’ in the sense that every bump from every enabled phone can be recorded. That is not the same thing as recording every pothole. The project has since been shelved.
The algorithms that analyse big data are trained using found data that can be subtly biased. Algorithms trained largely on pale faces and male voices, for example, may be confused when they later try to interpret the speech of women or the appearance of darker complexions. This is believed to help explain why Google photo software confused photographs of people with dark skin with photographs of gorillas; Hewlett Packard webcams struggled to activate when pointing at people with dark skin tones; and Nikon cameras, programmed to retake photographs if they thought someone had blinked during the shot, kept retaking shots of people from China, Japan or Korea, mistaking the distinctively east Asian eyelid fold for a blink. New apps, launched in the spring of 2020, promise to listen to you cough and detect whether you have Covid-19 or some other illness. I wonder whether they will do better?21
One thing is certain. If algorithms are shown a skewed sample of the world, they will reach a skewed conclusion.22
*
There are some overtly racist and sexist people out there – look around – but in general what we count and what we fail to count is often the result of an unexamined choice, of subtle biases and hidden assumptions that we haven’t realised are leading us astray.
Unless we’re collecting data ourselves, there’s a limit to how much we can do to combat the problem of missing data. But we can and should remember to ask who or what might be missing from the data we’re being told about. Some missing numbers are obvious – for example, it’s clearly hard to collect good data about crimes such as sex-trafficking or the use of hard drugs. Other omissions show up only when we take a close look at the claim in question. Researchers may not be explicit that an experiment only studied men – such information is sometimes buried in a statistical appendix, and sometimes not reported at all. But often a quick investigation will reveal that the study has a blind spot. If an experiment studies only men, we can’t assume it would have pointed to the same conclusions if it had also included women. If a government statistic measures the income of a household, we must recognise that we’re learning little about the sharing of that income within a household.
Big found datasets can seem comprehensive, and may be enormously useful, but ‘N = All’ is often a seductive illusion: it’s easy to make unwarranted assumptions that we have everything that matters. We must always ask who and what is missing. And this is only one reason to approach big data with caution. Big data represents a huge and under-scrutinised change in the way statistics are being collected, and that is where our journey to make the world add up will take us next.