Data

BACK IN 2004, soon after college student Mark Zuckerberg created Facebook, he had an instant messenger exchange with a friend:

ZUCK: Yeah so if you ever need info about anyone at Harvard

ZUCK: Just ask.

ZUCK: I have over 4,000 emails, pictures, addresses …

[REDACTED FRIEND’S NAME]: What? How’d you manage that one?

ZUCK: People just submitted it.

ZUCK: I don’t know why.

ZUCK: They ‘trust me’

ZUCK: Dumb fucks1

In the wake of the 2018 Facebook scandal, these words were repeatedly reprinted by journalists wanting to hint at a Machiavellian attitude to privacy within the company. Personally, I think we can be a little more generous when interpreting the boastful comments of a 19-year-old. But I also think that Zuckerberg is wrong. People weren’t just giving him their details. They were submitting them as part of an exchange. In return, they were given access to an algorithm that would let them freely connect with friends and family, a space to share their lives with others. Their own private network in the vastness of the World Wide Web. I don’t know about you, but at the time I certainly thought that was a fair swap.

There’s just one issue with that logic: we’re not always aware of the longer-term implications of that trade. It’s rarely obvious what our data can do, or, when fed into a clever algorithm, just how valuable it can be. Nor, in turn, how cheaply we were bought.

Every little helps

Supermarkets were among the first to recognize the value of an individual’s data. In a sector where companies are continually fighting for the customer’s attention – for tiny margins of preference that will nudge people’s buying behaviour into loyalty to their brand – every slim improvement can add up to an enormous advantage. This was the motivation behind a ground-breaking trial run in 1993 by the British supermarket Tesco.

Under the guidance of husband-and-wife team Edwina Dunn and Clive Humby, and beginning in certain selected stores, Tesco released its brand-new Clubcard – a plastic card, the size and shape of a credit card, that customers could present at a checkout when paying for their shopping. The exchange was simple. For each transaction using a Clubcard, the customer would collect points that they could use against future purchases in store, while Tesco would take a record of the sale and associate it with the customer’s name.2

The data gathered in that first Clubcard trial was extremely limited. Along with the customer’s name and address, the scheme only recorded what they spent and when, not which items were in their basket. None the less, from this modest harvest of data Dunn and Humby reaped some phenomenally valuable insights.

They discovered that a small handful of loyal customers accounted for a massive amount of their sales. They saw, postcode by postcode, how far people were willing to travel to their stores. They uncovered the neighbourhoods where the competition was winning and neighbourhoods where Tesco had the upper hand. The data revealed which customers came back day after day, and which saved their shopping for weekends. Armed with that knowledge, they could get to work nudging their customers’ buying behaviour, by sending out a series of coupons to the Clubcard users in the post. High spenders were given vouchers ranging from £3 to £30. Low spenders were sent a smaller incentive of £1 to £10. And the results were staggering. Nearly 70 per cent of the coupons were redeemed, and while in the stores, customers filled up their baskets: people who had Clubcards spent 4 per cent more overall than those who didn’t.

On 22 November 1994, Clive Humby presented the findings from the trial to the Tesco board. He showed them the data, the response rates, the evidence of customer satisfaction, the sales boosts. The board listened in silence. At the end of the presentation, the chair was the first person to speak. ‘What scares me about this,’ he said, ‘is that you know more about my customers in three months than I know in 30 years.’3

Clubcard was rolled out to all customers of Tesco and is widely credited with putting the company ahead of its main rival Sainsbury’s, to become the biggest supermarket in the UK. As time wore on, the data collected became more detailed, making customers’ buying habits easier to target.

Early in the days of online shopping, the team introduced a feature known as ‘My Favourites’, in which any items that were bought while using the loyalty card would appear prominently when the customer logged on to the Tesco website. Like the Clubcard itself, the feature was a roaring success. People could quickly find the products they wanted without having to navigate through the various pages. Sales went up, customers were happy.

But not all of them. Shortly after the launch of the feature, one woman contacted Tesco to complain that her data was wrong. She’d been shopping online and seen condoms among her list of ‘My Favourites’. They couldn’t be her husband’s, she explained, because he didn’t use them. At her request, the Tesco analysts looked into the data and discovered that her list was accurate. However, rather than be the cause of a marital rift, they took the diplomatic decision to apologize for ‘corrupted data’ and remove the offending items from her favourites.

According to Clive Humby’s book on Tesco, this has now become an informal policy within the company. Whenever something comes up that is just a bit too revealing, they apologize and delete the data. It’s a stance that’s echoed by Eric Schmidt, who, while serving as the executive chairman of Google, said he tries to think of things in terms of an imaginary creepy line. ‘The Google policy is to get right up to the creepy line but not cross it.’4

But collect enough data and it’s hard to know what you’ll uncover. Groceries aren’t just what you consume. They’re personal. Look carefully enough at someone’s shopping habits and they’ll often reveal all kinds of detail about who they are as a person. Sometimes – as in the case of the condoms – it’ll be things you’d rather not know. But more often than not, lurking deep within the data, those slivers of hidden insight can be used to a company’s advantage.

Target market

Back in 2002, the American discount superstore Target started looking for unusual patterns in its data.5 Target sells everything from milk and bananas to cuddly toys and garden furniture, and – like pretty much every other retailer since the turn of the millennium – has ways of using credit card numbers and survey responses to tie customers to everything they’ve ever bought in the store, enabling them to analyse what people are buying.

In a story that – as US readers won’t need telling – became infamous across the country, Target realized that a spike in a female customer’s purchases of unscented body lotion would often precede her signing up to the in-store baby-shower registry. It had found a signal in the data. As women entered their second trimester and started to worry about stretch marks, their buying of moisturizer to keep their skin supple left a hint of what was to come. Scroll backwards further in time, and these same women would be popping into Target to stock up on various vitamins and supplements, like calcium and zinc. Scroll forwards in time and the data would even suggest when the baby was due – marked by the woman buying extra-big bags of cotton wool from the store.6

Expectant mothers are a retailer’s dream. Lock in her loyalty while she’s pregnant and there’s a good chance she’ll continue to use your products long after the birth of her child. After all, shopping habits are quick to form when a hungry screaming baby is demanding your attention during your weekly shop. Insights like this could be hugely valuable in giving Target a head start over other brands in attracting her business.

From there it was simple. Target ran an algorithm that would score its female customers on the likelihood they were pregnant. If that probability tipped past a certain threshold, the retailer would automatically send out a series of coupons to the woman in question, full of things she might find useful: nappies, lotions, baby wipes and so on.

So far, so uncontroversial. But then, around a year after the tool was first introduced, a father of a teenage girl stormed into a Target store in Minneapolis demanding to see the manager. His daughter had been sent some pregnancy coupons in the post and he was outraged that the retailer seemed to be normalizing teenage pregnancy. The manager of the store apologized profusely and called the man’s home a few days later to reiterate the company’s regret about the whole affair. But by then, according to a story in the New York Times, the father had an apology of his own to make.

‘I had a talk with my daughter,’ he told the manager. ‘It turns out there’s been some activities in my house I haven’t been completely aware of. She’s due in August.’

I don’t know about you, but for me, an algorithm that will inform a parent that their daughter is pregnant before they’ve had a chance to learn about it in person takes a big step across the creepy line. But this embarrassment wasn’t enough to persuade Target to scrap the tool altogether.

A Target executive explained: ‘We found out that as long as a pregnant woman thinks she hasn’t been spied on, she’ll use the coupons. She just assumes that everyone else on her block got the same mailer for diapers and cribs. As long as we don’t spook her, it works.’

So, Target still has a pregnancy predictor running behind the scenes – as most retailers do now. The only difference is that it will mix in the pregnancy-related coupons with other more generic items so that the customers don’t notice they’ve been targeted. An advertisement for a crib might appear opposite some wine glasses. Or a coupon for baby clothes will run alongside an ad for some cologne.

Target is not alone in using these methods. Stories of what can be inferred from your data rarely hit the press, but the algorithms are out there, quietly hiding behind the corporate front lines. About a year ago, I got chatting to a chief data officer of a company that sells insurance. They had access to the full detail of people’s shopping habits via a supermarket loyalty scheme. In their analysis, they’d discovered that home cooks were less likely to claim on their home insurance, and were therefore more profitable. It’s a finding that makes good intuitive sense. There probably isn’t much crossover between the group of people who are willing to invest time, effort and money in creating an elaborate dish from scratch and the group who would let their children play football in the house. But how did they know which shoppers were home cooks? Well, there were a few items in someone’s basket that were linked to low claim rates. The most significant, he told me, the one that gives you away as a responsible, houseproud person more than any other, was fresh fennel.

If that’s what you can infer from people’s shopping habits in the physical world, just imagine what you might be able to infer if you had access to more data. Imagine how much you could learn about someone if you had a record of everything they did online.

The Wild West

Palantir Technologies is one of the most successful Silicon Valley start-ups of all time. It was founded in 2003 by Peter Thiel (of PayPal fame), and at the last count was estimated to be worth a staggering $20 billion.7 That’s about the same market value as Twitter, although chances are you’ve never heard of it. And yet – trust me when I tell you – Palantir has most certainly heard of you.

Palantir is just one example of a new breed of companies known as data brokers, who buy and collect people’s personal information and then resell it or share it for profit. There are plenty of others: Acxiom, Corelogic, Datalogix, eBureau – a swathe of huge companies you’ve probably never directly interacted with, that are none the less continually monitoring and analysing your behaviour.8

Every time you shop online, every time you sign up for a newsletter, or register on a website, or enquire about a new car, or fill out a warranty card, or buy a new home, or register to vote – every time you hand over any data at all – your information is being collected and sold to a data broker. Remember when you told an estate agent what kind of property you were looking for? Sold to a data broker. Or those details you once typed into an insurance comparison website? Sold to a data broker. In some cases, even your entire browser history can be bundled up and sold on.9

It’s the broker’s job to combine all of that data, cross-referencing the different pieces of information they’ve bought and acquired, and then create a single detailed file on you: a data profile of your digital shadow. In the most literal sense, within some of these brokers’ databases, you could open up a digital file with your ID number on it (an ID you’ll never be told) that contains traces of everything you’ve ever done. Your name, your date of birth, your religious affiliation, your vacation habits, your credit-card usage, your net worth, your weight, your height, your political affiliation, your gambling habits, your disabilities, the medication you use, whether you’ve had an abortion, whether your parents are divorced, whether you’re easily addictable, whether you are a rape victim, your opinions on gun control, your projected sexual orientation, your real sexual orientation, and your gullibility. There are thousands and thousands of details within thousands and thousands of categories and files stored on hidden servers somewhere, for virtually every single one of us.10

Like Target’s pregnancy predictions, much of this data is inferred. A subscription to Wired magazine might imply that you’re interested in technology; a firearms licence might imply that you’re interested in hunting. All along the way, the brokers are using clever, but simple, algorithms to enrich their data. It’s exactly what the supermarkets were doing, but on a massive scale.

And there are plenty of benefits to be had. Data brokers use their understanding of who we are to prevent fraudsters from impersonating unsuspecting consumers. Likewise, knowing our likes and dislikes means that the adverts we’re served as we wander around the internet are as relevant to our interests and needs as possible. That almost certainly makes for a more pleasant experience than being hit with mass market adverts for injury lawyers or PPI claims day after day. Plus, because the messages can be directly targeted on the right consumers, it means advertising is cheaper overall, so small businesses with great products can reach new audiences, something that’s good for everyone.

But, as I’m sure you’re already thinking, there’s also an array of problems that arise once you start distilling who we are as people down into a series of categories. I’ll get on to that in a moment, but first I think it’s worth briefly explaining the invisible process behind how an online advert reaches you when you’re clicking around on the internet, and the role that a data broker plays in the process.

So, let’s imagine I own a luxury travel company, imaginatively called Fry’s. Over the years, I have been getting people to register their interest on my website and now have a list of their email addresses. If I wanted to find out more about my users – like what kind of holidays they were interested in – I could send off my list of users’ emails to a data broker, who would look up the names in their system, and return my list with the relevant data attached. Sort of like adding an extra column on to a spreadsheet. Now when you visit my Fry’s website, I can see that you have a particular penchant for tropical islands and so serve you up an advert for a Hawaii getaway.

That’s option one. In option two, let’s imagine that Fry’s has a little extra space on its website that we’re willing to sell to other advertisers. Again, I contact a data broker and give them the information I have on my users. The broker looks for other companies who want to place adverts. And, for the sake of the story, let’s imagine that a company selling sun cream is keen. To persuade them that Fry’s has the audience the sun-cream seller would want to target, the broker could show them some inferred characteristics of Fry’s users: perhaps the percentage of people with red hair, that kind of thing. Or the sun-cream seller could hand over a list of its own users’ email addresses and the broker could work out exactly how much crossover there was between the audiences. If the sun-cream seller agrees, the advert appears on Fry’s website – and the broker and I both get paid.

So far, these methods don’t go much beyond the techniques that marketers have always used to target customers. But it’s option three where, for me, things start to get a little bit creepy. This time, Fry’s is looking for some new customers. I want to target men and women over 65 who like tropical islands and have large disposable incomes, in the hope that they’ll want to go on one of our luxurious new Caribbean cruises. Off I go to a data broker who will look through their database and find me a list of people who match my description.

So, let’s imagine you are on that list. The broker will never share your name with Fry’s. But they will work out which other websites you regularly use. Chances are, the broker will also have a relationship with one of your favourites. Maybe a social media site, or a news website, something along those lines. As soon as you unsuspectingly log into your favourite website, the broker will get a ping to alert them to the fact that you’re there. Virtually instantaneously, the broker will respond by placing a little tiny flag – known as a cookie – on your computer. This cookiefn1 acts like a signal to all kinds of other websites around the internet, saying that you are someone who should be served up an advert for Fry’s Caribbean cruises. Whether you want them or not, wherever you go on the internet, those adverts will follow you.

And here we stumble on the first problem. What if you don’t want to see the advert? Sure, being bombarded with images of Caribbean cruises might be little more than a minor inconvenience, but there are other adverts which can have a much more profound impact on a person.

When Heidi Waterhouse lost a much-wanted pregnancy,11 she unsubscribed from all the weekly emails updating her on her baby’s growth, telling her which fruit the foetus now matched in size. She unsubscribed from all the mailing lists and wish lists she had signed up to in eager anticipation of the birth. But, as she told an audience of developers at a conference in 2018, there was no power on earth that could unsubscribe her from the pregnancy adverts that followed her around the internet. This digital shadow of a pregnancy continued to circulate alone, without the mother or the baby. ‘Nobody who built that system thought of that consequence,’ she explained.

It’s a system which, thanks to either thoughtless omission or deliberate design, has the potential to be exploitative. Payday lenders can use it to directly target people with bad credit ratings; betting adverts can be directed to people who frequent gambling websites. And there are concerns about this kind of data profiling being used against people, too: motorbike enthusiasts being deemed to have a risky hobby, or people who eat sugar-free sweets being flagged as diabetic and turned down for insurance as a result. A study from 2015 demonstrated that Google was serving far fewer ads for high-paying executive jobs to women who were surfing the web than to men.12 And, after one African American Harvard professor learned that Googling her own name returned adverts targeted on people with a criminal record (and as a result was forced to prove to a potential employer that she’d never been in trouble with the police), she began researching the adverts delivered to different ethnic groups. She discovered that searches for ‘black-sounding names’ were disproportionately likely to be linked to adverts containing the word ‘arrest’ (e.g. ‘Have you been arrested?’) than those with ‘white-sounding names’.13

These methods aren’t confined to data brokers. There’s very little difference between how they work and how Google, Facebook, Instagram and Twitter operate. These internet giants don’t make money by having users, so their business models are based on the idea of micro-targeting. They are gigantic engines for delivering adverts, making money by having their millions of users actively engaged on their websites, clicking around, reading sponsored posts, watching sponsored videos, looking at sponsored photos. In whatever corner of the internet you use, hiding in the background, these algorithms are trading on information you didn’t know they had and never willingly offered. They have made your most personal, private secrets into a commodity.

Unfortunately, in many countries, the law doesn’t do much to protect you. Data brokers are largely unregulated and – particularly in America – opportunities to curb their power have repeatedly been passed over by government. In March 2017, for instance, the US Senate voted to eliminate rules that would have prevented data brokers from selling your internet browser history without your explicit consent. Those rules had previously been approved in October 2016 by the Federal Communications Commission; but, after the change in government at the end of that year, they were opposed by the FCC’s new Republican majority and Republicans in Congress.14

So what does all this mean for your privacy? Well, let me tell you about an investigation led by German journalist Svea Eckert and data scientist Andreas Dewes that should give you a clear idea.15

Eckert and her team set up a fake data broker and used it to buy the anonymous browsing data of 3 million German citizens. (Getting hold of people’s internet histories was easy. Plenty of companies had an abundance of that kind of data for sale on British or US customers – the only challenge was finding data focused on Germany.) The data itself had been gathered by a Google Chrome plugin that users had willingly downloaded, completely unaware that it was spying on them in the process.fn2

In total, it amounted to a gigantic list of URLs. A record of everything those people had looked at online over the course of a month. Every search, every page, every click. All legally put up for sale.

For Eckert and her colleagues, the only problem was that the browser data was anonymous. Good news for all the people whose histories had been sold. Right? Should save their blushes. Wrong. As the team explained in a presentation at DEFCON in 2017, de-anonymizing huge databases of browser history was spectacularly easy.

Here’s how it worked. Sometimes there were direct clues to the person’s identity in the URLs themselves. Like anyone who visited Xing.com, the German equivalent of LinkedIn. If you click on your profile picture on the Xing website, you are sent through to a page with an address that will be something like the following:

www.xing.com/profile/Hannah_Fry?sc_omxb_p

Instantly, the name gives you away, while the text after the username signifies that the user is logged in and viewing their own profile, so the team could be certain that the individual was looking at their own page. It was a similar story with Twitter. Anyone checking their own Twitter analytics page was revealing themselves to the team in the process. For those without an instant identifier in their data, the team had another trick up their sleeve. Anyone who posted a link online – perhaps by tweeting about a website, or sharing their public playlist on YouTube – essentially, anyone who left a public trace of their data shadow attached to their real name, was inadvertently unmasking themselves in the process. The team used a simple algorithm to cross-reference the public and anonymized personas,16 filtering their list of URLs to find someone in the dataset who had visited the same websites at the same times and dates that the links were posted online. Eventually, they had the full names of virtually everyone in the dataset, and full access to a month’s worth of complete browsing history for millions of Germans as a result.

Among those 3 million people were several high-profile individuals. They included a politician who had been searching for medication online. A police officer who had copied and pasted a sensitive case document into Google Translate, all the details of which then appeared in the URL and were visible to the researchers. And a judge, whose browsing history showed a daily visit to one rather specific area of the internet. Here is a small selection of the websites he visited during one eight-minute period in August 2016:

18.22: http://www.tubegalore.com/video/amature-pov–ex-wife-in-leather-pants-gets-creampie42945.html

18.23: http://www.xxkingtube.com/video/pov_wifey_on_sex_stool_with_beaded_thong_gets_creampie_4814.html

18.24: http://de.xhamster.com/movies/924590/office_lady_in_pants_rubbing_riding_best_of_anlife.html

18.27: http://www.tubegalore.com/young_tube/5762–1/page0

18.30: http://www.keezmovies.com/video/sexy-dominatrix-milks-him-dry-1007114?utm_sources

In among these daily browsing sessions, the judge was also regularly searching for baby names, strollers and maternity hospitals online. The team concluded that his partner was expecting a baby at the time.

Now, let’s be clear here: this judge wasn’t doing anything illegal. Many – myself included – would argue that he wasn’t doing anything wrong at all. But this material would none the less be useful in the hands of someone who wanted to blackmail him or embarrass his family.

And that is where we start to stray very far over the creepy line. When private, sensitive information about you, gathered without your knowledge, is then used to manipulate you. Which, of course, is precisely what happened with the British political consulting firm Cambridge Analytica.

Cambridge Analytica

You probably know most of the story by now.

Since the 1980s, psychologists have been using a system of five characteristics to quantify an individual’s personality. You get a score on each of the following traits: openness to experience, conscientiousness, extraversion, agreeableness and neuroticism. Collectively, they offer a standard and useful way to describe what kind of a person you are.

Back in 2012, a year before Cambridge Analytica came on the scene, a group of scientists from the University of Cambridge and Stanford University began looking for a link between the five personality traits and the pages people ‘liked’ on Facebook.17 They built a Facebook quiz with this purpose in mind, allowing users to take real psychometric tests, while hoping to find a connection between a person’s true character and their online persona. People who downloaded their quiz knowingly handed over data on both: the history of their Likes on Facebook and, through a series of questions, their true personality scores.

It’s easy to imagine how Likes and personality might be related. As the team pointed out in the paper they published the following year,18 people who like Salvador Dalí, meditation or TED talks are almost certainly going to score highly on openness to experience. Meanwhile, people who like partying, dancing and Snooki from the TV series Jersey Shore tend to be a bit more extraverted. The research was a success. With a connection established, the team built an algorithm that could infer someone’s personality from their Facebook Likes alone.

By the time their second study appeared in 2014,19 the research team were claiming that if you could collect 300 Likes from someone’s Facebook profile, the algorithm would be able to judge their character more accurately than their spouse could.

Fast-forward to today, and the academic research group – the Psychometrics Centre at Cambridge University – have extended their algorithm to make personality predictions from your Twitter feed too. They have a website, open to anyone, where you can try it for yourself. Since my Twitter profile is open to the public anyway, I thought I’d try out the researchers’ predictions myself, so uploaded my Twitter history and filled out a traditional questionnaire-based personality study to compare. The algorithm managed to assess me accurately on three of the five traits. Although, as it turns out, according to the traditional personality study I am much more extraverted and neurotic than my Twitter profile makes it seem.fn3

All this work was motivated by how it could be used in advertising. So, by 2017,20 the same team of academics had moved on to experimenting with sending out adverts tailored to an individual’s personality traits. Using the Facebook platform, the team served up adverts for a beauty product to extraverts using the slogan ‘Dance like no one’s watching (but they totally are)’, while introverts saw an image of a girl smiling and standing in front of the mirror with the phrase ‘Beauty doesn’t have to shout.’

In a parallel experiment, targets high in openness-to-experience were shown adverts for crossword puzzles using an image with the text: ‘Aristoteles? The Seychelles? Unleash your creativity and challenge your imagination with an unlimited number of crossword puzzles!’ The same puzzles were advertised to people low in openness, but using instead the wording: ‘Settle in with an all-time favorite! The crossword puzzle that has challenged players for generations.’ Overall, the team claimed that matching adverts to a person’s character led to 40 per cent more clicks and up to 50 per cent more purchases than using generic, unpersonalized ads. For an advertiser, that’s pretty impressive.

All the while, as the academics were publishing their work, others were implementing their methods. Among them, so it is alleged, was Cambridge Analytica during their work for Trump’s election campaign.

Now, let’s backtrack slightly. There is little doubt that Cambridge Analytica were using the same techniques as my imaginary luxury travel agency Fry’s. Their approach was to identify small groups of people who they believed to be persuadable and target them directly, rather than send out blanket advertising. As an example they discovered that there was a large degree of overlap between people who bought good, American-made Ford motor cars and people who were registered as Republican party supporters. So they then set about finding people who had a preference for Ford, but weren’t known Republican voters, to see if they could sway their opinions using all-American adverts that tapped into that patriotic emotion. In some sense, this is no different from a candidate identifying a particular neighbourhood of swing voters and going door-to-door to persuade them one by one. And online, it’s no different from what Obama and Clinton were doing during their campaigns. Every major political party in the Western world uses extensive analysis and micro-targeting of voters.

But, if the undercover footage recorded by Channel Four News is to be believed, Cambridge Analytica were also using personality profiles of the electorate to deliver emotionally charged political messages – for example, finding single mothers who score highly on neuroticism and preying on their fear of being attacked in their own home to persuade them into supporting a pro-gun-lobby message. Commercial advertisers have certainly used these techniques extensively, and other political campaigns probably have, too.

But on top of all that, Cambridge Analytica are accused of creating adverts and dressing them up as journalism. According to one whistleblower’s testimony to the Guardian, one of the most effective ads during the campaign was an interactive graphic titled ‘10 inconvenient truths about the Clinton Foundation’.21 Another whistleblower went further and claimed that the ‘articles’ planted by Cambridge Analytica were often based on demonstrable falsehoods.22

Let’s assume, for the sake of argument, that all the above is true: Cambridge Analytica served up manipulative fake news stories on Facebook to people based on their psychological profiles. The question is, did it work?

Micro-manipulation

There is an asymmetry in how we view the power of targeted political adverts. We like to think of ourselves as independently minded and immune to manipulation, and yet imagine others – particularly those of a different political persuasion – as being fantastically gullible. The reality is probably something in between.

We do know that the posts we see on Facebook have the power to alter our emotions. A controversial experiment run by Facebook employees in 2013 manipulated the news feeds of 689,003 users without their knowledge (or consent) in an attempt to control their emotions and influence their moods.23 The experimenters suppressed any friends’ posts that contained positive words, and then did the same with those containing negative words, and watched to see how the unsuspecting subjects would react in each case. Users who saw less negative content in their feeds went on to post more positive stuff themselves. Meanwhile, those who had positive posts hidden from their timeline went on to use more negative words themselves. Conclusion: we may think we’re immune to emotional manipulation, but we’re probably not.

We also know from the Epstein experiment described in the ‘Power’ chapter that just the ordering of pages on a search engine can be enough to tip undecided voters into favouring one candidate over another. We know, too, from the work done by the very academics whose algorithms Cambridge Analytica repurposed, that adverts are more effective if they target personality traits.

Put together, all this does build a strong argument to suggest that these methods can have an impact on how people vote, just as they do on how people spend their money. But – and it’s quite a big but – there’s something else you need to know before you make your mind up.

All of the above is true, but the actual effects are tiny. In the Facebook experiment, users were indeed more likely to post positive messages if they were shielded from negative news. But the difference amounted to less than one-tenth of one percentage point.

Likewise, in the targeted adverts example, the makeup sold to introverts was more successful if it took into account the person’s character, but the difference it made was minuscule. A generic advert got 31 people in 1,000 to click on it. The targeted ad managed 35 in 1,000. Even that figure of 50 per cent improvement that I cited here, which is boldly emblazoned across the top of the academic paper, is actually referring to an increase from 11 clicks in 1,000 to 16.

The methods can work, yes. But the advertisers aren’t injecting their messages straight into the minds of a passive audience. We’re not sitting ducks. We’re much better at ignoring advertising or putting our own spin on interpreting propaganda than the people sending those messages would like us to be. In the end, even with the best, most deviously micro-profiled campaigns, only a small amount of influence will leak through to the target.

And yet, potentially, in an election those tiny slivers of influence might be all you need to swing the balance. In a population of tens or hundreds of millions, those one-in-a-thousand switches can quickly add up. And when you remember that, as Jamie Bartlett pointed out in a piece for the Spectator, Trump won Pennsylvania by 44,000 votes out of six million cast, Wisconsin by 22,000, and Michigan by 11,000, perhaps margins of less than 1 per cent might be all you need.24

The fact is, it’s impossible to tell just how much of an effect all this had in the US presidential election. Even if we had access to all of the facts, we can’t look back through time and untangle the sticky web of cause and effect to pinpoint a single reason for anyone’s voting decisions. What has gone has gone. What matters now is where we go in the future.

Rate me

It’s important to remember that we’ve all benefited from this model of the internet. All around the world, people have free and easy access to instant global communication networks, the wealth of human knowledge at their fingertips, up-to-the-minute information from across the earth, and unlimited usage of the most remarkable software and technology, built by private companies, paid for by adverts. That was the deal that we made. Free technology in return for your data and the ability to use it to influence and profit from you. The best and worst of capitalism in one simple swap.

We might decide we’re happy with that deal. And that’s perfectly fine. But if we do, it’s important to be aware of the dangers of collecting this data in the first place. We need to consider where these datasets could lead – even beyond the issues of privacy and the potential to undermine democracy (as if they weren’t bad enough). There is another twist in this dystopian tale. An application for these rich, interconnected datasets that belongs in the popular Netflix show Black Mirror, but exists in reality. It’s known as Sesame Credit, a citizen scoring system used by the Chinese government.

Imagine every piece of information that a data broker might have on you collapsed down into a single score. Everything goes into it. Your credit history, your mobile phone number, your address – the usual stuff. But all your day-to-day behaviour, too. Your social media posts, the data from your ride-hailing app, even records from your online matchmaking service. The result is a single number between 350 and 950 points.

Sesame Credit doesn’t disclose the details of its ‘complex’ scoring algorithm. But Li Yingyun, the company’s technology director, did share some examples of what might be inferred from its results in an interview with the Beijing-based Caixin Media. ‘Someone who plays video games for ten hours a day, for example, would be considered an idle person. Someone who frequently buys diapers would be considered as probably a parent, who on balance is more likely to have a sense of responsibility.’25

If you’re Chinese, these scores matter. If your rating is over 600 points, you can take out a special credit card. Above 666 and you’ll be rewarded with a higher credit limit. Those with scores above 650 can hire a car without a deposit and use a VIP lane at Beijing airport. Anyone over 750 can apply for a fast-tracked visa to Europe.26

It’s all fun and games now while the scheme is voluntary. But when the citizen scoring system becomes mandatory in 2020, people with low scores stand to feel the repercussions in every aspect of their lives. The government’s own document on the system outlines examples of punishments that could be meted out to anyone deemed disobedient: ‘Restrictions on leaving the borders, restrictions on the purchase of … property, travelling on aircraft, on tourism and holidays or staying in star-ranked hotels.’ It also warns that in the case of ‘gravely trust breaking subjects’ it will ‘guide commercial banks … to limit their provision of loans, sales insurance and other such services’.27 Loyalty is praised. Breaking trust is punished. As Rogier Creemers, an academic specializing in Chinese law and governance at the Van Vollenhoven Institute at Leiden University, puts it: ‘The best way to understand it is as a sort of bastard love child of a loyalty scheme.’28

I don’t have much comfort to offer in the case of Sesame Credit, but I don’t want to fill you completely with doom and gloom, either. There are glimmers of hope elsewhere. However grim the journey ahead appears, there are signs that the tide is slowly turning. Many in the data science community have known about and objected to the exploitation of people’s information for profit for quite some time. But until the furore over Cambridge Analytica these issues hadn’t drawn sustained, international front-page attention. When that scandal broke in early 2018 the general public saw for the first time how algorithms are silently harvesting their data, and acknowledged that, without oversight or regulation, it could have dramatic repercussions.

And regulation is coming. If you live in the EU, there has recently been a new piece of legislation called GDPR – General Data Protection Regulation – that should make much of what data brokers are doing illegal. In theory, they will no longer be allowed to store your data without an explicit purpose. They won’t be able to infer information about you without your consent. And they won’t be able to get your permission to collect your data for one reason, and then secretly use it for another. That doesn’t necessarily mean the end of these kinds of practices, however. For one thing, we often don’t pay attention to the T&Cs when we’re clicking around online, so we may find ourselves consenting without realizing. For another, the identification of illegal practices and enforcement of regulations remains tricky in a world where most data analysis and transfer happens in the shadows. We’ll have to wait and see how this unfolds.

Europeans are the lucky ones, but there are those pushing for regulation in America, too. The Federal Trade Commission published a report condemning the murky practices of data brokers back in 2014, and since then has been actively pushing for more consumer rights. Apple has now built ‘intelligent tracking prevention’ into the Safari browser. Firefox has done the same. Facebook is severing ties with its data brokers. Argentina and Brazil, South Korea and many more countries have all pushed through GDPR-like legislation. Europe might be ahead of the curve, but there is a global trend that is heading in the right direction.

If data is the new gold, then we’ve been living in the Wild West. But I’m optimistic that – for many of us – the worst will soon be behind us.

Still, we do well to remember that there’s no such thing as a free lunch. As the law catches up and the battle between corporate profits and social good plays out, we need to be careful not to be lulled into a false sense of privacy. Whenever we use an algorithm – especially a free one – we need to ask ourselves about the hidden incentives. Why is this app giving me all this stuff for free? What is this algorithm really doing? Is this a trade I’m comfortable with? Would I be better off without it?

That is a lesson that applies well beyond the virtual realm, because the reach of these kinds of calculations now extends into virtually every aspect of society. Data and algorithms don’t just have the power to predict our shopping habits. They also have the power to rob someone of their freedom.