Preface

Errors using inadequate data are much less than those using no data at all.

—Charles Babbage

It’s about 7:30 a.m. on October 26, 2011, and I’m driving on The Strip in Las Vegas, Nevada. No, I’m not about to play craps or see Celine Dion. (While very talented, she’s just not my particular brand of vodka.) I’m going for a more professional reason. Starting sometime in mid-2011, I started hearing more and more about something called Big Data. On that October morning, I was invited to IBM’s Information on Demand (IOD) conference. It was high time that I learned more about this new phenomenon, and there’s only so much you can do in front of a computer.

Beyond my insatiable quest for knowledge on all matters technology, truth be told, I went to IOD for a bunch of other reasons. First, it was convenient: The Strip is a mere fifteen minutes from my home. Second, the price was right: I was able to snake my way in for free. It turns out that, since I write for a few high-profile sites, some people think of me as a member of the media. (Funny how I never would have expected that ten years ago, but far be it from me to look a gift horse in the mouth.) Third, it was a good networking opportunity and my fourth book, The Age of the Platform, had just been published. I am familiar enough with the book business to know that authors have to get out there if they want to generate a buzz and move copies. These were all valid reasons to hop in my car, but for me there was an extra treat. I had the opportunity to meet and listen firsthand to the conference’s two keynote speakers: Michael Lewis (one of my favorite writers) and a man by the name of Billy Beane.

For his part, Lewis wasn’t at IOD to promote his latest opus like I was. On the contrary, he was there to speak about his 2003 book Moneyball: The Art of Winning an Unfair Game. The book had been enjoying a huge commercial resurgence as of late, thanks in no small part to the recent film of the same name starring some guy named Brad Pitt. I hadn’t read Moneyball in some years, but I remember breezing through it. Lewis’s writing style is nothing if not engaging. (He even made subprime mortgages and synthetic collateralized debt obligations [CDOs] interesting in The Big Short.)

I’ve always been a bit of a stats geek, and Moneyball instantly hit a nerve with me. It told the story of Beane, the general manager (GM) of the budget-challenged Oakland A’s. Despite his team’s financial limitations, he consistently won more games than most other mid-market teams—and even franchises like the New York Yankees that effectively printed their own money. The obvious question was how? Beane bucked convention and routinely ignored the advice of long-time baseball scouts, often earning their derision in the process. Instead, Beane predicated his management style on a rather obscure, statistics-laden field called sabermetrics. He signed free agents who he believed were undervalued by other teams. That is, he sought to exploit market inefficiencies.

One of Beane’s favorite bargains: a relatively cheap player with a high on-base percentage (OBP).5 In a nutshell, Beane’s simple and irrefutable logic could be summarized as follows: players more likely to get on base are more likely to score runs. By extension, higher-scoring teams tend to win more games than their lower-scoring counterparts. But Beane didn’t stop there. He was also partial to players (again, only at the right price) who didn’t swing at the first pitch. Beane liked hitters who consistently made opposing pitchers work deep into the count. These patient batters were more likely to make opposing pitches tired—and then give everyone on the A’s better pitches to hit. (Again, more runs would result, as would more wins.)

Figure P.1 Michael Lewis and Billy Beane with Katty Kay at IBM Information on Demand 20111

Source: Todd Watson

figP01.eps

Back then, evaluating players based on unorthodox stats like these was considered heresy in traditional baseball circles. And that resistance was not just among baseball outsiders. In the late 1990s and early 2000s, a conflict within the A’s organization was growing between Beane and his most visible employee: manager Art Howe. A former infielder with three teams over twelve years, Howe for one wasn’t on board with Beane’s unconventional program, to put it mildly. As Lewis tells it in Moneyball, Howe was nothing if not old school. He certainly didn’t need some newfangled, stat-obsessed GM telling him the X’s and O’s of baseball.

Oakland’s internal conflict couldn’t persist; a GM and manager have to be on the same page in all sports, and baseball is no exception. Rather than fire Howe outright (with the A’s eating his $1.5 million salary), Beane got creative, as he is wont to do. He cajoled the New York Mets into taking him off their hands, not that the Mets needed much convincing. The team soon signed its new leader to a then-bawdy four-year, $9.4 million contract. After all, Howe had won a more-than-respectable 53 percent of his games with the small-market A’s and he just looks managerial. The man has a great jaw. Imagine what Howe could do for a team with a big bankroll like the Mets?

Howe’s tenure with the Mets was ignominious. The team won only 42 percent of its games on Howe’s watch. After two seasons, the Mets realized what Beane knew long ago: Howe and his managerial jaw were much better in theory than in practice. In September 2004, the Mets parted ways with their manager.

While Beane may have been the first GM to embrace sabermetrics, he soon had company. His success bred many disciples in the baseball world and beyond. Count among them Theo Epstein, currently the President of Baseball Operations for the Chicago Cubs. In his previous role as GM of the Boston Red Sox, Epstein even hired Bill James, the godfather of sabermetrics. And it worked. Epstein won two World Series for the Sox, breaking the franchise’s 86-year drought. Houston Rockets’s GM Daryl Morey is bringing Moneyball concepts to the NBA. As a November 2012 Sports Illustrated article points out, the MBA grad takes a radically different approach to player acquisition and development compared to his peers.2

And then there’s the curious case of Kevin Kelley, the head football coach at the Pulaski Academy, a high school in Little Rock, Arkansas. Kelley isn’t your average coach. The man “stopped punting in 2005 after reading an academic study on the statistical consequences of going for the first down versus handing possession to the other team.”3 Coach Kelley simply refuses to punt. Ever. Even if it’s fourth and 20 from his own ten-yard line. But it gets even better. Ever the contrarian, after Pulaski scores, Kelley has his kicker routinely try on-side kicks to try to get the ball right back. In one game, Kelley’s team scored twenty-nine points before the opponent even touched the football!4 The results? The Bruins have won multiple state championships using their coach’s unconventional style.

So why were Lewis and Beane the keynote speakers at IOD, a corporate information technology (IT) conference? Because, as Moneyball demonstrates so compellingly, today new sources of data are being used across many different fields in very unconventional and innovative ways to produce astounding results—and a swath of people, industries, and established organizations are finally starting to realize it.

This book explains why Big Data is a big deal. For example, residents in Boston, Massachusetts, are automatically reporting potholes and road hazards via their smartphones. Progressive Insurance tracks real-time customer driving patterns and uses that information to offer rates truly commensurate with individual safety. HR departments are using new sources of information to make better hiring decisions. Google accurately predicts local flu outbreaks based on thousands of user search queries. Amazon provides remarkably insightful, relevant, and timely product recommendations to its hundreds of millions of customers. Quantcast lets companies target precise audiences and key demographics throughout the Web. NASA runs contests via gamification site TopCoder, awarding prizes to those with the most innovative and cost-effective solutions to its problems. Explorys offers penetrating and previously unknown insights into health care behavior.

How do these organizations and municipalities do it? Technology is certainly a big part, but in each case the answer lies deeper than that. Individuals at these organizations have realized that they don’t have to be statistician Nate Silver to reap massive benefits from today’s new and emerging types of data. And each of these organizations has embraced Big Data, allowing them to make astute and otherwise impossible observations, actions, and predictions.

It’s time to start thinking big.

This book is about an unassailably important trend: Big Data, the massive amounts, new types, and multifaceted sources of information streaming at us faster than ever. Never before have we seen data with the volume, velocity, and variety of today. Big Data is no temporary blip of a fad. In fact, it is only going to intensify in the coming years, and its ramifications for the future of business are impossible to overstate.

Put differently, Big Data is becoming too big to ignore. And that sentence, in a nutshell, summarizes this book.

Phil Simon
Henderson, NV
March 2013

NOTES

1. Watson, Todd, “Information on Demand 2011: A Data-Driven Conversation with Michael Lewis & Billy Beane,” October 26, 2011, http://turbotodd.wordpress.com/2011/10/26/information-on-demand-2011-a-data-driven-conversation-with-michael-lewis-billy-beane/, retrieved December 11, 2012.

2. Ballard, Chris, “Lin’s Jumper, GM Morey’s Hidden Talents, More Notes from Houston,” November 30, 2012, http://sportsillustrated.cnn.com/2012/writers/chris_ballard/11/30/houston-rockets-jeremy-lin-james-harden-daryl-morey/index.html, retrieved December 11, 2012.

3. Easterbrook, Gregg, “New Annual Feature! State of High School Nation,” November 15, 2007, http://sports.espn.go.com/espn/page2/story?page=easterbrook/071113, retrieved December 11, 2012.

4. Wertheim, Jon, “Down 29-0 Before Touching the Ball,” September 15, 2012, http://sportsillustrated.cnn.com/2011/writers/scorecasting/09/15/kelley.pulaski/index.html, retrieved December 11, 2012.

5 For those of you not familiar with the term, OBP represents the true measure of how often a batter reaches base. It includes hits, walks, and times hit by a pitch. Beane also sought out those with high on-base plus slugging percentages. OPS equals the sum of a player’s OBP and slugging percentage (total bases divided by at bats).