As far as Alex and his manager knew, a man in Milwaukee owned the Middleton Theater. Given that renovations and even casual upkeep were nonexistent, he apparently used it as a tax write-off. Although they never saw the owner, the regional supervisor, Jerry, often stopped by unannounced. At these surprise visits, the manager would drop whatever he was doing, pull Alex off his stool, and give him a broom, yelling “Jerry’s here!!” The manager, a former army private, used to lecture Alex on being prepared, but it was obvious that the surprise visits left the manager with a tangle of nerves. Being prepared was not high on the list around the Middleton Theater.
Speaking of being unprepared, 2016 was a rough year for prediction. You might have noticed. Consider that on June 24, 2016, when 52 percent of the people in the United Kingdom voted to leave the European Union, few, if any, in the British government were ready for it. Rather, they were stunned because almost every poll had showed the United Kingdom remaining safely in the European Union. One pro-“Leave” member of Parliament reportedly said, “The Leave campaign doesn’t have a post-Brexit plan. Number 10 should have had a plan.” The member was correct: 10 Downing Street did not have a plan, aside from the prime minister immediately tendering his resignation. Not much of a “plan,” but when every preelection poll indicated the Brexit strategy would fail, why do much planning?
Meanwhile, in the United States, the major media were busy leveraging every presidential election poll into highly precise yet ultimately inaccurate predictions for the November presidential election. Most prognostications had Donald Trump with less than a 15 percent chance of winning, and several, including the HuffPost Polster, had him with less than a 5 percent chance. FiveThirtyEight’s Nate Silver, who in 2012 scored a perfect fifty out of fifty states in terms of how they would vote, did much better than most, but even on the eve of the 2016 election he had Hillary Clinton with a commanding sixty-five to thirty-five lead. “If you want to put your faith in the numbers,” proclaimed the (pro-Clinton) Huffington Post on November 5, three days before the election, “You can relax. She’s got this.” Uh, got what? After the election, everyone asked how the “science” could be so wrong. How could major polls, including Silver’s, underestimate Trump’s performance by a percentage point or more in thirty-plus states?
Amid all the data-driven probabilities, filmmaker Michael Moore published an essay in July 2016 predicting that “Donald J. Trump is going to win in November.” His essay discussed his reasoning, including a neglected working-class population in former industrial parts of the country, elite politicians, and other reasons, which were similar to those that led to Brexit. Moore was basing his prediction on his qualitative theory of causation, an anachronism for many big-data analysts. “Causation is for other people,” the former chief analytics officer for New York City, Mike Flowers, famously said in 2011. “We have real problems to solve. I can’t dick around, frankly, thinking about other things like causation right now.” Guess we know where he stands.
How did we get to this point? How will it affect cultural transmission if people base their decisions on aggregate data rather than on personal experience or a theory of causation? Let’s go back to 2009, when researchers at Google were already using Google Trends to help predict breakouts of influenza, travel planning, and house prices. In the case of automobile sales, they had a causal model, and it was straightforward: a 1 percent increase in Google searches for “Ford” predicted a 0.5 percent increase in sales of Fords. Later, researchers at Yahoo! showed that searches about certain NASDAQ-100 stocks preceded correlated changes in their trading volumes typically by a day and never by more than three days.
Investors have always talked about the “mood” of the market, so by 2010, when there were already tens of millions of Tweets per day, it was not too surprising to see the paper “Twitter Mood Predicts the Stock Market” appear. Johan Bollen and his colleagues at Indiana University looked at Twitter content from six word bags—“calm,” “alert,” “sure,” “vital,” “kind,” and “happy”—and found a stretch of time, between December 1 and December 19, 2008, when “calm” words on Twitter had an accuracy of around 87 percent in predicting the daily ups and downs of the Dow Jones Industrial Average (DJIA). Bollen’s group used a statistical method called Granger causality analysis, which tells you whether changes in one time series—say, Twitter or the DJIA—consistently precedes a corresponding change in the other time series. If so, we say the first series “Granger” causes the other, meaning that although A precedes B, it doesn’t necessarily cause B. It might—and in many instances, probably does—but that’s not the same as demonstrating cause and effect.
Let’s take a look at another study, which found that every three-week change in Google search volume for “debt”—which you can look up on Google Trends—would have predicted changes in the DJIA following those three weeks. If you had used this as your investment strategy between 2004 and 2011, you would have made a 300 percent profit. A plausible causal model was offered for the phenomenon—namely that periods of lower prices are preceded by periods of concern, captured by searches for “debt.” Here the strategy is to find the predictive pattern first and then see if the causation seems plausible. Following on from the “debt” study, researchers used Wikipedia to categorize a huge number of words into word bags to do with politics, business, sports, religion, and the like. They found that for the period 2004–2012, the Google search volume of political or business topics—but not the other topics—could have predicted large stock moves five to ten weeks later and made them some money.
Big-data analysts obviously delight in replacing causation with data-driven correlation, but critics jump on such a substitution. A particularly dogged critic of the Bollen word study, using the Internet pseudonym “Lawly Wurm,” pointed out that the test period of fifteen trading days might have been cherry-picked to maximize the prediction. Also, since the measure of success was correctly picking a coin flip (DJIA up or down) thirteen days out of fifteen—86.7 percent correct—Lawly Wurm pointed out that the chance of this happening randomly, if tried fifty times, would be about one in six. But how could you try fifty times if you’re using historical data, which are real as opposed to simulated? One way would be to test multiple hypotheses: try different word categories and also slide the time window around to capture the most accurate fifteen days in 2008.
If after-the-fact explanation is a problem, why not just predict during an event? On election night in the United States, there were many prediction machines that you could follow live. As the election results came in, one prominent “prediction meter” shifted in steady, linear fashion from a 96 percent chance that Clinton would win before any results were in slowly down to 4 percent late that night, and finally to 0 percent by the next morning. Back in the day, we called this watching the game, not prediction. That said, sports media now cover games with their own dynamic prediction charts. In 2016 college football, Kansas beat Texas (Mike, being a University of Texas graduate, was pretty bummed) for the first time since 1938—this despite having won only four Big 12 games in the previous seven years and following a perfect zero and twelve season the year before. In the first quarter, a prominent sports website predicted Texas’s chances to win at 96 percent, and later, with two minutes left in the game and a three-point lead, Texas was still given a 96 percent chance. By overtime, however, when Kansas was getting ready to kick the game-winning field goal, Texas had suddenly plunged to only a 6 percent chance of winning—still kind of a high percentage, given that Kansas was kicking from Texas’s eight-yard line. After Kansas kicked the winning field goal, ending the game, Texas was given a 0 percent chance of winning. Seems like a pretty safe “prediction.”
By the time a football game is down to a field goal try from the eight-yard line, almost everybody will converge on the same prediction. They are all watching the same game, not playing in it. If they are all playing in the game they are betting on, though, like stockbrokers do, prediction is much harder or even impossible. Think of what Yogi Berra supposedly said: “No one goes there anymore; it’s too crowded.” In the classic “El Farol problem,” modeled by economist Brian Arthur in the 1990s, each person tries to predict how full the El Farol Bar in Santa Fe, New Mexico, will be before deciding whether or not to go. If you think it will be less than 60 percent full, you’ll go, but if you think it’ll be over 60 percent full, you’ll avoid the crowd and stay home. If everyone predicts less than 60 percent, they all go, and if they think everybody is going, then no one shows up. Arthur showed that while the mean attendance converges to the threshold value—in this case, 60 percent—it never settles because everyone is comparing recent results to their predictions of other people’s predictions.
Of course, there are lots of ways for human beings to find a good bar, since we are not the mindless agents in the El Farol problem. We have plenty of good algorithms working to solve our coordination problems. They sort our luggage at the airport, protect our credit cards, and help (some of) us with online dating. The algorithms that work on Wall Street are known as high-frequency traders (HFTs). Sometimes HFTs get a bit hyperactive, and lacking creativity or patience, they can get us into trouble. For example, at 2:32 p.m. (EST) on May 6, 2010, according to the Securities Exchange Commission report, automated HFTs were programmed to sell seventy-five thousand e-mini futures contracts, valued at about $4 billion, in twenty minutes. Between 2:32 and 2:45 p.m., other HFTs bought many of the contracts and then traded over twenty-seven thousand of them in fourteen seconds, between 2:45:13 and 2:45:27. With prices dropping by 5 percent in four minutes, the Chicago Mercantile Exchange automatically triggered a five-second halt at 2:45:28, but it was too late to prevent a spread to New York, where the DJIA fell by nearly 1,000 points in under twenty minutes. Procter and Gamble reportedly fell from $62 per share to $39 before recovering, while Accenture fell from $40 per share to 1¢, and then recovered back to $40. By 3 p.m., the “flash crash” was over, with the Dow recovering to finish down 348 points for the day—still the second-largest one-day drop it had ever experienced.
So much for leaving those pesky HFTs unsupervised! Like with sports and elections, everyone understood what happened after the fact, even if few could immediately agree on exactly how it happened. What they could agree on, though, was the frightening speed at which events happened, with key events measured in milliseconds: three successive HFT sell-offs reportedly took place at exactly 44:075, 48:250, and 50:475 seconds past 2:42 p.m. that afternoon. Herding occurs in financial trading, but HFTs accelerate it, as if we are fast-forwarding through a zombie movie.
A bit like algorithms, financial traders who send instant messages to each other tend to synchronize their trading activity, as a Northwestern University study showed, and the more traders are in sync with other traders, the more money they tend to make. What we see from all this is that responding to the previous moment, whether to predict the next moment or copy the most recent success, is often the best short-term strategy for the individual. It was for London trader Navinder Singh Sarao, who used spoofing algorithms to make $40 million in the flash crash (he later pleaded guilty to fraud)—but it is not a good long-term strategy for the community or society. If prediction is competitive, then most of us lose, and even the prize for the winners gets smaller and smaller, as predictions become more and more shortsighted.
Instead of using big data to predict collective behavior, perhaps we should use it to understand that behavior. Mass Twitter data show the routine rhythms of Internet users: sleeping, waking, swearing, commuting, picking up children and going out at night. Nello Cristianini and his team at Bristol University trained a neural network to recognize different categories of clothing in hundreds of thousands of archived images so that it could distinguish a coat from a jacket, a T-shirt, and so on. How did they do this? By testing a rule on a known pattern, then calculating how far off the rule is, adjusting the rule to get closer, and repeating the cycle about half a million times. When they then applied the neural network to a couple hundred thousand publicly available images of pedestrians walking past a street webcam in Brooklyn, the neural net learned that most people wear T-shirts and short dresses in the summer, sweaters in the fall, and coats and jackets in the winter.
Again, as we pointed out in chapter 6 when we discussed some of Cristianini’s finding, one might be tempted to say, “That’s it? People wear jackets in the winter? We already knew that.” But this is just the beginning. “It is easy to imagine a software infrastructure observing hundreds of webcams,” Cristianini and his colleagues concluded in their report, “trying to detect changes, trends, and events.” This sounds weird and more than a little ominous—sort of like the television show Person of Interest—but before we get too bent out of shape, let’s recognize that plenty of people are signing up voluntarily for observation. An example is the “intelligent home.” Jason Slosberg, a former medical doctor, is now CEO of LinkBee, which aims to equip homes with light bulbs that include sensors to detect climate, light levels, and pollutants such as pollen in order to improve air quality, energy efficiency, and home security. But it goes much further. By comparing several streams of data from the bulbs, including people’s movements, LinkBee wants to infer a resident’s physical and mental state. At a central computational center, a neural network tries to work out your mood, or urgent physical condition, such as hypothermia, potential for a stroke, incapacitation, or unconsciousness. When the bulbs detect an anomaly, Slosberg said, “the intelligent home can contact the caregiver to alert them of a potential medical condition.”
Even if you don’t volunteer to be remotely observed at home through your light bulbs, your health is already monitored in other ways. People tend to Google their symptoms weeks or months before they finally go to the doctor. If people are tweeting a lot of aggressive and stress-related words in a certain zip code, the area tends to have more heart attacks. Using lists of specific words, researchers at the University of Pennsylvania assembled a bag of words related to hostility—mostly nasty, four-letter words—and other word bags for skilled occupations (“conference,” “staff,” and “council”), interpersonal tension (“hate” and “jealous”), positive experiences (“fabulous,” “hope,” and “wonderful”), and optimism (“overcome,” “strength,” and “faith”). Counting up these words by US county, they found that counties tweeting about skilled occupations, positive experiences, or optimism have low rates of atherosclerotic heart disease. Twitter words actually predicted heart disease as well—or even better—than standard risk factors such as smoking, hypertension, or obesity.
So many of these predictions are short term, like our horizons in chapter 6. What about longer-term traditions? The Twitter data tell a traditional story, too, in the interpersonal tension and atherosclerosis across the Rust Belt and into Appalachia. This suggests we might look for a connection between long-term economics and word usage. Maybe the emotion words of a generation, aggregated on a national scale, are correspondingly biased. After all, we have historical eras called the Gay Nineties, Depression, and Fabulous Fifties. Thanks to Google’s book-scanning project that started in the 1990s, we have the annual counts of every English word in millions of books over a three-hundred-year period. Counting words in these data from several word bags—“anger,” “disgust,” “fear,” “joy,” “sadness,” and “surprise,” from a resource called WordNet-Affect—Alberto Acerbi and Vasileios Lampos found that emotion words in books declined in relative frequency throughout the twentieth century. This was true for both nonfiction and fiction books. Interestingly, this overall decline was due to a decline in positive emotion words, which had started in the early nineteenth century, with little change in negative emotion words over the two centuries.
This kind of instant feedback among economics, politics, and verbiage is worth discussing. How does it change things? We’ll consider that in the next chapter, but we can’t help but briefly mention one of the coolest pieces yet of twenty-first-century technology: the Versatile Extra-Sensory Transducer (VEST) technology engineered at Baylor College of Medicine in Houston that translates words via sound sensors into specific patterns of vibration on the body. Its inventor, David Eagleman, predicts that political leaders may someday be outfitted with an Internet-connected VEST when giving a live speech: “Twitter is giving you that feedback immediately. So you’re plugged into the consciousness of thousands of people, maybe hundreds of thousands of people all at once hearing your speech, and you can say, ‘Ooh, that didn't go over so well.’” Given how much both candidates depended on social media because of the immediate feedback it provides, we would’ve paid big money to see Clinton and Trump debate each other in the 2016 presidential race while wearing VESTs.