The pagination of this electronic edition does not match the edition from which it was created. To locate a specific entry, please use your e-book reader’s search tools.
A/B testing
ABCs of, 209–21
and addictions, 219–20
and Boston Globe headlines, 214–17
in digital world, 210–19
downside to, 219–21
and education/learning, 276
and Facebook, 211
future uses of, 276, 277, 278
and gaming industry, 220–21
and Google advertising, 217–19
importance of, 214, 217
and Jawbone, 277
and politics, 211–14
and television, 222
Abdulkadiroglu, Atila, 235–36
abortion, truth about, 147–50
Adamic, Lada, 144
Adams, John, 78
addictions
and A/B testing, 219–20
See also specific addiction
advertising
and A/B testing, 217–19
causal effects of, 221–25, 273
and examples of Big Data searches, 22
Google, 217–19
and Levitt-electronics company, 222, 225, 226
and movies, 224–25
and science, 273
and Super Bowl games, 221–26
TV, 221–26
African Americans
and Harvard Crimson editorial about Zuckerberg, 155
income and, 175
and origins of notable Americans, 182–83
and truth about hate and prejudice, 129, 134
See also “nigger”; race/racism
age
and baseball fans, 165–69, 165–66n
and lying, 108n
and origins of political preferences, 169–71
and predicting future of baseball players, 198–99
of Stormfront members, 137–38
and words as data, 85–86
See also children; teenagers
Aiden, Erez, 76–77, 78–79
alcohol
as addiction, 219
and health, 207–8
AltaVista (search engine), 60
Alter, Adam, 219–20
Amatriain, Xavier, 157
Amazon, 20, 203, 283
American Pharoah (Horse No. 85), 22, 64, 65, 70–71, 256
Angrist, Joshua, 235–36
anti-Semitism. See Jews
anxiety
data about, 18
and truth about sex, 123
AOL, and truth about sex, 117–18
AOL News, 143
art, real life as imitating, 190–97
Ashenfelter, Orley, 72–74
Asher, Sam, 202
Asians, and truth about hate and prejudice, 129
asking the right questions, 21–22
assassinations, 227–28
Atlantic magazine, 150–51, 152, 202
Australia, pregnancy in, 189
auto-complete, 110–11, 116
Avatar (movie), 221–22
Bakshy, Eytan, 144
Baltimore Ravens-New England Patriots games, 221, 222–24
baseball
and influence of childhood experiences, 165–69, 165–66n, 171, 206
and overemphasis on measurability, 254–55
predicting a player’s future in, 197–200, 200n, 203
and science, 273
scouting for, 254–55
zooming in on, 165–69, 165–66n, 171, 197–200, 200n, 203
basketball
pedigrees and, 67
predicting success in, 33–41, 67
and socioeconomic background, 34–41
Beane, Billy, 255
Beethoven, Ludwig von, zooming in on, 190–91
behavioral science, and digital revolution, 276, 279
Belushi, John, 185
Benson, Clark, 217
Berger, Jonah, 91–92
Bezos, Jeff, 203
bias
implicit, 134
language as key to understanding, 74–76
omitted-variable, 208
subconscious, 132
See also hate; prejudice; race/racism
Big Data
and amount of information, 15, 21, 59, 171
and asking the right questions, 21–22
and causality experiments, 54, 240
definition of, 14, 15
and dimensionality, 246–52
and examples of searches, 15–16
and expansion of research methodology, 275–76
and finishing books, 283–84
future of, 279
Google searches as dominant source of, 60
honesty of, 53–54
importance/value of, 17–18, 29–33, 59, 240, 265, 283
limitations of, 20, 245, 254–55, 256
powers of, 15, 17, 22, 53–54, 59, 109, 171, 211, 257
and predicting what people will do in future, 198–200
as revolutionary, 17, 18–22, 30, 62, 76, 256, 274
as right data, 62
skeptics of, 17
and small data, 255–56
subsets in, 54
understanding of, 27–28
See also specific topic
Bill & Melinda Gates Foundation, 255
Billings (Montana) Gazette, and words as data, 95
Bing (search engine), and Columbia University-Microsoft pancreatic cancer study, 28, 30
Black, Don, 137
Black Lives Matter, 12
Blink (Gladwell), 29–30
Bloodstock, Incardo, 64
bodies, as data, 62–74
Boehner, John, 160
Booking.com, 265
books
conclusions to, 271–72, 279, 280–84
digitalizing, 77, 79
number of people who finish, 283–84
borrowing money, 257–61
Bosh, Chris, 37
Boston Globe, and A/B testing, 214–17
Boston Marathon (2013), 19
Boston Red Sox, 197–200
brain, Minsky study of, 273
Brazil, pregnancy in, 190
breasts, and truth about sex, 125, 126
Brin, Sergey, 60, 61, 62, 103
Britain, pregnancy in, 189
Bronx Science High School (New York City), 232, 237
Buffett, Warren, 239
Bullock, Sandra, 185
Bundy, Ted, 181
Bush, George W., 67
business
and comparison shopping, 265
reviews of, 265
See also corporations
butt, and truth about sex, 125–26
Calhoun, Jim, 39
Cambridge University, and Microsoft study about IQ of Facebook users, 261
cancer, predicting pancreatic, 28–29, 30
Capital in the 21st Century (Piketty), 283
casinos, and price discrimination, 263–65
causality
A/B testing and, 209–21
and advertising, 221–25
and Big Data experiments, 54, 240
college and, 237–39
correlation distinguished from, 221–25
and ethics, 226
and monetary windfalls, 229
natural experiments and, 226–28
and power of Big Data, 54, 211
and randomized controlled experiments, 208–9
reverse, 208
and Stuyvesant High School study, 231–37, 240
Centers for Disease Control and Prevention, 57
Chabris, Christopher, 250
Chance, Zoë, 252–53
Chaplin, Charlie, 19
charitable giving, 106, 109
Chen, M. Keith, 235
Chetty, Raj, 172–73, 174–75, 176, 177, 178–80, 185, 273
children
abuse of, 145–47, 149–50, 161
and benefits of digital truth serum, 161
and child pornography, 121
decisions about having, 111–12
height and weight data about, 204–5
of immigrants, 184–85
and income distribution, 176
and influence of childhood experiences, 165–71, 165–66n, 206
intelligence of, 135
and origins of notable Americans, 184–85
parent prejudices against, 134–36, 135n
physical appearance of, 135–36
See also parents/parenting; teenagers
cholera, Snow study about, 275
Christians, and truth about hate and prejudice, 129
Churchill, Winston, 169
cigarette economy, Philippines, 102
cities
and danger of empowered government, 267, 268–69
predicting behavior of, 268–69
zooming in on, 172–90, 239–40
Civil War, 79
Clemens, Jeffrey, 230
Clinton, Bill, searches for, 60–62
Clinton, Hillary. See elections, 2016
A Clockwork Orange (movie), 190–91
cnn.com, 143, 145
Cohen, Leonard, 82n
college
and causality, 237–39
and examples of Big Data searches, 22
college towns, and origins of notable Americans, 182–83, 184, 186
Colors (movie), 191
Columbia University, Microsoft pancreatic cancer study and, 28–29, 30
comparison shopping, 265
conclusions
benefits of great, 281–84
to books, 271–72, 279, 280–84
characteristics of best, 272, 274–79
importance of, 283
as pointing way to more things to come, 274–79
purpose of, 279–80
Stephens-Davidowitz’s writing of, 271–72, 281–84
condoms, 5, 122
Congressional Record, and Gentzkow-Shapiro research, 93
conservatives
and origins of political preferences, 169–71
and parents prejudice against children, 136
and truth about the internet, 140, 141–44, 145
and words as data, 75–76, 93, 95–96
consumers. See customers/consumers
contagious behavior, 178
conversation, and dating, 80–82
corporations
consumers blows against, 265
danger of empowered, 257–65
reviews of, 265
correlations
causation distinguished from, 221–25
and predicting the stock market, 245–48, 251–52
counties, zooming in on, 172–90, 239–40
Country Music Radio, 202
Craigslist, 117
creativity, and understanding the world, 280, 281
crime
alcohol as contributor to, 196
and danger of empowered government, 266–70
and prison conditions, 235
violent movies and, 193, 194–95, 273
Cundiff, Billy, 223
curiosity
and benefits of digital truth serum, 162, 163
Levitt views about, 280
about number of people who finish books, 283–84
and understanding the world, 280, 281
cursing, and words as data, 83–85
customers/consumers
blows against businesses by, 265
and price discrimination, 265
truth about, 153–57
Cutler, David, 178
Dahl, Gordon, 191–93, 194–96, 196–97n, 197
Dale, Stacy, 238
Dallas, Texas, “Large and Complex Datasets” conference (1977) in, 20–21
data
amount/size of, 15, 20–21, 30–31, 53, 171
benefits of expansion of, 16
bodies as, 62–74
collecting the right, 62
government, 149–50, 266–70
importance of, 26
individual-level, 266–70
as intimidating, 26
Levitt views about, 280
as money-maker, 103
nontraditional sources of, 74
pictures as, 97–102, 103
reimagining of what qualifies as, 55–103
sources of, 14, 15
speed for transmitting, 55–59
and understanding the world, 280
what counts as, 74
words as, 74–97
See also Big Data; data science; small data; specific data
data science
as changing view of world, 34
and counterintuitive results, 37–38
economists role in development of, 228
future of, 281
goal of, 37–38
as intuitive, 26–33
and who is a data scientist, 27
dating
and examples of Big Data searches, 22
physical appearance and, 82, 120n
and rejection, 120n
and Stormfront members, 138–39
and truth about hate and prejudice, 138–39
and truth about sex, 120n
and words as data, 80–86, 103
Dawn of the Dead (movie), 192
death, and memorable stories, 33
DellaVigna, Stefano, 191–93, 194–96, 196–97n, 197
Democrats
core principles of, 94
and origins of political preferences, 170–71
and words as data, 93–97
See also specific person or election
depression
Google searchs for, 31, 110
and handling the truth, 158
and lying, 109, 110
and parents prejudice against children, 136
developing countries
economies of, 101–2, 103
investing in, 251
digital truth serum
abortion and, 147–50
and child abuse, 145–47, 149–50
and customers, 153–57
and Facebook friends, 150–53
and handling the truth, 158–63
and hate and prejudice, 128–40
and ignoring what people tell you, 153–57
incentives and, 109
and internet, 140–45
sex and, 112–28
sites as, 54
See also lying; truth
digital world, randomized experiments in, 210–19
dimensionality, curse of, 246–52
discrimination
and origins of notable Americans, 182–83
price, 262–65
See also bias; prejudice; race/racism
DNA, 248–50
Dna88 (Stormfront member), 138
doctors, financial incentives for, 230, 240
Donato, Adriana, 266, 269
doppelgangers
benefits of, 263
and health, 203–5
and hunting on social media, 201–3
and predicting future of baseball players, 197–200, 200n, 203
and price discrimination, 262–63, 264
zooming in on, 197–205
dreams, phallic symbols in, 46–48
drugs, as addiction, 219
Duflo, Esther, 208–9, 210, 273
Earned Income Tax Credit, 178, 179
economists
and number of people finishing books, 283
role in data science development of, 228
as soft scientists, 273
See also specific person
economy/economics
complexity of, 273
of developing countries, 101–2, 103
of Philippines cigarette economy, 102
and pictures as data, 99–102
and speed of data, 56–57
and truth about hate and prejudice, 139
See also economists; specific topic
Edmonton, water consumption in, 206
EDU STAR, 276
education
and A/B testing, 276
and digital revolution, 279
and overemphasis on measurability, 253–54, 255–56
in rural India, 209, 210
small data in, 255–56
state spending on, 185
and using online behavior as supplement to testing, 278
See also high school students; tests/testing
Eisenhower, Dwight D., 170–71
elections
and order of searches, 10–11
predictions about, 9–14
voter turn out in, 9–10
elections, 2008
and A/B testing, 211–12
racism in, 2, 6–7, 12, 133, 134
and Stormfront membership, 139
elections, 2012
and A/B testing, 211–12
predictions about, 10
racism in, 2–3, 8, 133, 134
Trump and, 7
elections, 2016
and lying, 107
mapping of, 12–13
polls about, 1
predicting outcome of, 10–14
and racism, 8, 11, 12, 14, 133
Republican primaries for, 1, 13–14, 133
and Stormfront membership, 139
voter turn out in, 11
electronics company, and advertising, 222, 225, 226
“Elite Illusion” (Abdulkadiroglu, Angrist, and Pathak), 236
Ellenberg, Jordan, 283
Ellerbee, William, 34
Eng, Jessica, 236–37
environment, and life expectancy, 177
EPCOR utility company, 193, 194
EQB, 63–64
equality of opportunity, zooming in on, 173–75
Error Bot, 48–49
ethics
and Big Data, 257–65
and danger of empowered government, 267
doppelganger searches and, 262–63
empowered corporations and, 257–65
and experiments, 226
hiring practices and, 261–62
and IQDNA study results, 249
and paying back loans, 257–61
and price discrimination, 262–65
and study of IQ of Facebook users, 261
Ewing, Patrick, 33
experiments
and ethics, 226
and real science, 272–73
See also type of experiment or specific experiment
and A/B testing, 211
and addictions, 219, 220
and hiring practices, 261
and ignoring what people tell you, 153–55, 157
and influence of childhood experiences data, 166–68, 171
IQ of users of, 261
Microsoft-Cambridge University study of users of, 261
“News Feed” of, 153–55, 255
and overemphasis on measurability, 254, 255
and pictures as data, 99
and “secrets about people,” 155–56
and size of Big Data, 20
and small data, 255
as source of information, 14, 32
and truth about customers, 153–55
truth about friends on, 150–53
and truth about sex, 113–14, 116
and truth about the internet, 144, 145
and words as data, 83, 85, 87–88
The Facebook Effect: The Inside Story of the Company That Is Connecting the World (Kirkpatrick), 154
Facemash, 156
faces
black, 133
and pictures as data, 98–99
and truth about hate and prejudice, 133
Farook, Rizwan, 129–30
Father’s Day advertising, 222, 225
50 Shades of Gray, 157
financial incentives, for doctors, 230, 240
First Law of Viticulture, 73–74
food
and phallic symbols in dreams, 46–48
predictions about, 71–72
and pregnancy, 189–90
football
and advertising, 221–25
zooming in on, 196–97n
Freakonomics (Levitt), 265, 280, 281
Freud, Sigmund, 22, 45–52, 272, 281
Friedman, Jerry, 20, 21
Fryer, Roland, 36
Gabriel, Stuart, 9–10, 11
Gallup polls, 2, 88, 113
gambling/gaming industry, 220–21, 263–65
“Gangnam Style” video, Psy, 152
Garland, Judy, 114, 114n
Gates, Bill, 209, 238–39
gays
in closet, 114–15, 116, 117, 118–19, 161
and dimensions of sexuality, 279
and examples of Big Data searches, 22
and handling the truth, 159, 161
in Iran, 119
and marriage, 74–76, 93, 115–16, 117
mobility of, 113–14, 115
population of, 115, 116, 240
and pornography, 114–15, 114n, 116, 117, 119
in Russia, 119
stereotype of, 114n
surveys about, 113
teenagers as, 114, 116
and truth about hate and prejudice, 129
and truth about sex, 112–19
and wives suspicions of husbands, 116–17
women as, 116
and words as data, 74–76, 93
Gelles, Richard, 145
Gelman, Andrew, 169–70
gender
and life expectancy, 176
and parents prejudice against children, 134–36, 135n
of Stormfront members, 137
See also gays
General Social Survey, 5, 142
genetics, and IQ, 249–50
genitals
and truth about sex, 126–27
See also penis; vagina
Gentzkow, Matt, 74–76, 93–97, 141–44
geography
zooming in by, 172–90
See also cities; counties
Germany, pregnancy in, 190
Ghana, pregnancy in, 188
Ghitza, Yair, 169–70
Ginsberg, Jeremy, 57
girlfriends, killing, 266, 269
girls, parents prejudice against young, 134–36
Gladwell, Malcolm, 29–30
Gnau, Scott, 264
gold, price of, 252
The Goldfinch (Tartt), 283
Goldman Sachs, 55–56, 59
advertisements about, 217–19
and amount of data, 21
and digitalizing books, 77
Mountain View campus of, 59–60, 207
See also specific topic
Google AdWords, 3n, 115, 125
Google Correlate, 57–58
Google Flu, 57, 57n, 71
Google Ngrams, 76–77, 78, 79
Google searches
advantages of using, 60–62
auto-complete in, 110–11
differentiation from other search engines of, 60–62
as digital truth serum, 109, 110–11
as dominant source of Big Data, 60
and the forbidden, 51
founding of, 60–62
and hidden thoughts, 110–12
and honesty/plausibility of data, 9, 53–54
importance/value of, 14, 21
polls compared with, 9
popularity of, 62
power of, 4–5, 53–54
and speed of data, 57–58
and words as data, 76, 88
See also Big Data; specific search
Google STD, 71
Google Trends, 3–4, 3n, 6, 246
Gottlieb, Joshua, 202, 230
government
danger of empowered, 266–70
and predicting actions of individuals, 266–70
and privacy issues, 267–70
spending by, 93, 94
and trust of data, 149–50
and words as data, 93, 94
“Great Body, Great Sex, Great Blowjob” (video), 152, 153
Great Recession, and child abuse, 145–47
The Green Monkey (Horse No. 153), 68
gross domestic product (GDP), and pictures as data, 100–101
Gross National Happiness, 87, 88
Guttmacher Institute, 148, 149
Hannibal (movie), 192, 195
happiness
and pictures as data, 99
See also sentiment analysis
Harrah’s Casino, 264
Harris, Tristan, 219–20
Harry Potter and the Deathly Hallows (Rowling), 88–89, 91
Hartmann, Wesley R., 225
Harvard Crimson, editorial about Zuckerberg in, 155
Harvard University, income of graduates of, 237–39
hate
and danger of empowered governments, 266–67, 268–69
truth about, 128–40, 162–63
See also prejudice; race/racism
health
and alcohol, 207–8
and comparison of search engines, 71
and digital revolution, 275–76, 279
and DNA, 248–49
and doppelgangers, 203–5
methodology for studies of, 275–76
and speed of data transmission, 57
zooming in on, 203–5, 275
See also life expectancy
health insurance, 177
Henderson, J. Vernon, 99–101
The Herd with Colin Cowherd, McCaffrey interview on, 196n
Herzenstein, Michal, 257–61
Heywood, James, 205
high school students
testing of, 231–37, 253–54
and truth about sex, 114, 116
high school yearbooks, 98–99
hiring practices, 261–62
Hispanics, and Harvard Crimson editorial about Zuckerberg, 155
Hitler, Adolf, 227
hockey match, Olympic (2010), 193, 194
Horse No. 85. See American Pharoah
Horse No. 153 (The Green Monkey), 68
horses
and Bartleby syndrome, 66
and examples of Big Data searches, 22
internal organs of, 69–71
pedigrees of, 66–67, 69, 71
predicting success of, 62–74, 256
searches about, 62–74
hours, zooming in on, 190–97
housing, price of, 58
Human Genome Project, 248–49
Human Rights Campaign, 161
humankind, data as means for understanding, 16
humor/jokes, searches for, 18–19
Hurricane Frances, 71–72
Hurricane Katrina, 132
husbands
wives descriptions of, 160–61, 160–61n
and wives suspicions about gayness, 116–17
Hussein, Saddam, 93, 94
ignoring what people tell you, 153–57
immigrants, and origins of notable Americans, 184, 186
implicit association test, 132–34
incentives, 108, 109
incest, 50–52, 54, 121
income distribution, 174–78, 185
India
education in rural, 209, 210
pregnancy in, 187, 188–89
and sex/porn searches, 19
Indiana University, and dimensionality study, 247–48
individuals, predicting the actions of, 266–70
influenza, data about, 57, 71
information. See Big Data; data; small data; specific source or search
Instagram, 99, 151–52, 261
Internal Revenue Service (IRS), 172, 178–80. See also taxes
internet
as addiction, 219–20
browsing behavior on, 141–44
as dominated by smut, 151
segregation on, 141–44
truth about the, 140–45
See also A/B testing; social media; specific site
intuition
and A/B testing, 214
and counterintuitive results, 37–38
data science as, 26–33
and the dramatic, 33
as wrong, 31, 32–33
IQ/intelligence
and DNA, 249–50
of Facebook users, 261
and parents prejudice against children, 135
Iran, gays in, 119
Iraq War, 94
Irresistible (Alter), 219–20
Islamophobia
and danger of empowered governments, 266–67, 268–69
See also Muslims
Ivy League schools
income of graduates from, 237–39
See also specific school
Jacob, Brian, 254
James, Bill, 198–99
James, LeBron, 34, 37, 41, 67
Jawbone, 277
Jews, 129, 138
Ji Hyun Baek, 266
Jobs, Steve, 185
Johnson, Earvin III, 67
Johnson, Lyndon B., 170, 171
Johnson, “Magic,” 67
jokes
and dating, 80–81
and lying, 109
nigger, 6, 15, 132, 133, 134
and truth about hate and prejudice, 132, 133, 134
Jones, Benjamin F., 227, 228, 276
Jordan, Jeffrey, 67
Jordan, Marcus, 67
Jordan, Michael, 40–41, 67
Jurafsky, Dan, 80
Kadyrov, Akhmad, 227
Kahneman, Daniel, 283
Kane, Thomas, 255
Katz, Lawrence, 243
Kaufmann, Sarah, 236–37
Kawachi, Ichiro, 266
Kayak (website), 265
Kennedy, John F., 170, 171, 227
Kerry, John, 8, 244
King John (Shakespeare), 89–90
King, Martin Luther Jr., 132
King, William Lyon Mackenzie (alias), 138–39
Kinsey, Alfred, 113
Kirkpatrick, David, 154
Klapper, Daniel, 225
Knight, Phil, 157
Kodak, and pictures as data, 99
Kohane, Isaac, 203–5
Krueger, Alan B., 56, 238
Ku Klux Klan, 12, 137
Kubrick, Stanley, 190–91
Kundera, Milan, 233
language
and digital revolution, 274, 279
emphasis in, 94
as key to understanding bias, 74–76
and paying back loans, 259–60
and traditional research methods, 274
and U.S. as united or divided, 78–79
See also words
learning. See education
Lemaire, Alain, 257–61
Levitt, Steven, 36, 222, 254, 280, 281. See also Freakonomics
liberals
and origins of political preferences, 169–71
and parents prejudice against children, 136
and truth about the internet, 140, 141–45
and words as data, 75–76, 93, 95–96
library cards, and lying, 106
life, as imitating art, 190–97
life expectancy, 176–78
Linden, Greg, 203
listening, and dating, 82n
loans, paying back, 257–61
Los Angeles Times, and Obama speech about terrorism, 130
lotteries, 229, 229n
Luca, Michael, 265
Lycos (search engine), 60
lying
and age, 108n
and incentives, 108
and jokes, 109
to ourselves, 107–8, 109
and polls, 107
and pornography, 110
prevalence of, 21, 105–12, 239
and racism, 109
reasons for, 106, 107, 108, 108n
and reimaging data, 103
and search information, 5–6, 12
and sex, 112–28
by Stephens-Davidowitz, 282n
and surveys, 105–7, 108, 108n
and taxes, 180
and voting behavior, 106, 107, 109–10
“white,” 107
See also digital truth serum; truth; specific topic
Ma-Kellams, Christine, 266
Macon County, Alabama, successful/notable Americans from, 183, 186–87
Malik, Tashfeen, 129–30
Manchester University, and dimensionality study, 247–48
Massachusetts Institute of Technology, Pantheon project of, 184–85
Matthews, Dylan, 202–3
McCaffrey, Ed, 196–97n
McFarland, Daniel, 80
McPherson, James, 79
measurability, overemphasis on, 252–56
“Measuring Economic Growth from Outer Space” (Henderson, Storygard, and Weil), 99–101
media
bias of, 22, 74–77, 93–97, 102–3
and examples of Big Data searches, 22
owners of, 96
and truth about hate and prejudice, 130, 131
and truth about the internet, 143
and words as data, 74–77, 93–97
See also specific organization
Medicare, and doctors reimbursements, 230, 240
medicine. See doctors; health
Messing, Solomon, 144
MetaCrawler (search engine), 60
Mexicans, and truth about hate and prejudice, 129
Michel, Jean-Baptiste, 76–77, 78–79
Microsoft
and Cambridge University study about IQ of Facebook users, 261
Columbia University pancreatic cancer study and, 28–29, 30
and typing errors by searchers, 48–50
Milkman, Katherine L., 91–92
Minority Report (movie), 266
Minsky, Marvin, 273
minutes, zooming in on, 190–97
Moneyball, Oakland A’s profile in, 254, 255
Moore, Julianne, 185
Moskovitz, Dustin, 238–39
movies
and advertising, 224–25
and crime, 193, 194–95, 273
violent, 190–97, 273
zooming in on, 190–97
See also specific movie
msnbc.com, 143
murder
and danger of empowered government, 266–67, 268–69
See also violence
Murdoch, Rupert, 96
Murray, Patty, 256
Muslims
and danger of empowered governments, 266–67, 268–69
and truth about hate and prejudice, 129–31, 162–63
Nantz, Jim, 223
National Center for Health Statistics, 181
National Enquirer magazine, 150–51, 152
national identity, 78–79
natural experiments, 226–28, 229–30, 234–37, 239–40
NBA. See basketball
neighbors, and monetary windfalls, 229
Netflix, 156–57, 203, 212
Netzer, Oded, 257–61
New England Patriots-Baltimore Ravens games, 221, 222–24
New Jack City (movie), 191
New York City, Rolling Stones song about, 278
New York magazine, and A/B testing, 212
New York Mets, 165–66, 167, 169, 171
New York Post, and words as data, 96
New York Times
Clinton (Bill) search in, 61
and IQDNA study results, 249
and Obama speech about terrorism, 130
Stephens-Davidowitz’s first column about sex in, 282
Stormfront users and, 137, 140, 145
and truth about internet, 145
types of stories in, 92
vaginal odors story in, 161
and words as data, 95–96
New York Times Company, and words as data, 95–96
New Yorker magazine
Duflo study in, 209
and Stephens-Davidowitz’s doppelganger search, 202
News Corporation, 96
newslibrary.com, 95
Nielsen surveys, 5
Nietzsche, Friedrich, 268
Nigeria, pregnancy in, 188, 189, 190
“nigger”
and hate and prejudice, 6, 7, 131–34, 244
jokes, 6, 15, 132, 133, 134
motivation for searches about, 6
and Obama’s election, 7, 244
and power of Big Data, 15
prevalence of searches about, 6
and Trump’s election, 14
night light, and pictures as data, 100–101
Nike, 157
Nixon, Richard M., 170, 171
numbers, obsessive infatuation with, 252–56
Obama, Barack
and A/B testing, 211–14
campaign home page for, 212–14
elections of 2008 and, 2, 6–7, 133, 134, 211–12
elections of 2012 and, 8–9, 10, 133, 134, 211–12
and racism in America, 2, 6–7, 8–9, 12, 134, 240, 243–44
State of the Union (2014) speech of, 159–60
and truth about hate and prejudice, 130–31, 133, 134, 162–63
Ocala horse auction, 65–66, 67, 69
Oedipal complex, Freud theory of, 50–51
OkCupid (dating site), 139
Olken, Benjamin A., 227, 228
127 Hours (movie), 90, 91
Optimal Decisions Group, 262
Or, Flora, 266
Ortiz, David “Big Papi,” 197–200, 200n, 203
“out-of-sample” tests, 250–51
Page, Larry, 60, 61, 62, 103
pancreatic cancer, Columbia University-Microsoft study of, 28–29
Pandora, 203
Pantheon project (Massachusetts Institute of Technology), 184–85
parents/parenting
and child abuse, 145–47, 149–50, 161
and examples of Big Data searches, 22
and prejudice against children, 134–36, 135n
Parks, Rosa, 93, 94
Parr, Ben, 153–54
Pathak, Parag, 235–36
PatientsLikeMe.com, 205
patterns, and data science as intuitive, 27, 33
Paul, Chris, 37
paying back loans, 257–61
PECOTA model, 199–200, 200n
pedigrees
of basketball players, 67
of horses, 66–67, 69, 71
pedometer, Chance emphasis on, 252–53
penis
and Freud’s theories, 46
and phallic symbols in dreams, 46–47
size of, 17, 19, 123–24, 124n, 127
“penistrian,” 45, 46, 48, 50
Pennsylvania State University, income of graduates of, 237–39
Peysakhovich, Alex, 254
phallic symbols, in dreams, 46–48
Philadelphia Daily News, and words as data, 95
Philippines, cigarette economy in, 102
physical appearance
and dating, 82, 120n
and parents prejudice against children, 135–36
and truth about sex, 120, 120n, 125–26, 127
physics, as science, 272–73
pictures, as data, 97–102, 103
Pierson, Emma, 160n
Piketty, Thomas, 283
Pinky Pizwaanski (horse), 70
pizza, information about, 77
PlentyOfFish (dating site), 139
Plomin, Robert, 249–50
political science, and digital revolution, 244, 274
politics
and A/B testing, 211–14
complexity of, 273
and ignoring what people tell you, 157
and origin of political preferences, 169–71
and truth about the internet, 140–44
and words as data, 95–97
See also conservatives; Democrats; liberals; Republicans
polls
Google searches compared with, 9
and lying, 107
reliability of, 12
See also specific poll or topic
Pop-Tarts, 72
Popp, Noah, 202
Popper, Karl, 45, 272, 273
PornHub (website), 14, 50–52, 54, 116, 120–22, 274
pornography
as addiction, 219
and bias of social media, 151
and breastfeeding, 19
cartoon, 52
child, 121
and digital revolution, 279
and gays, 114–15, 114n, 116, 117, 119
honesty of data about, 53–54
and incest, 50–52
in India, 19
and lying, 110
popular videos on, 152
popularity of, 53, 151
and power of Big Data, 53
search engines for, 61n
and truth about sex, 114–15, 117
unemployed and, 58, 59
Posada, Jorge, 200
poverty
and life expectancy, 176–78
and words as data, 93, 94
See also income distribution
predictions
and data science as intuitive, 27
and getting the numbers right, 74
and what counts as data, 74
and what vs. why it works, 71
See also specific topic
pregnancy, 20, 187–90
prejudice
implicit, 132–34
of parents against children, 134–36, 135n
subconscious, 134, 163
truth about, 128–40, 162–63
See also bias; hate; race/racism; Stormfront
Premise, 101–2, 103
price discrimination, 262–65
prison conditions, and crime, 235
privacy issues, and danger of empowered government, 267–70
property rights, and words as data, 93, 94
proquest.com, 95
Prosper (lending site), 257
Psy, “Gangnam Style” video of, 152
psychics, 266
psychology
and digital revolution, 274, 277–78, 279
as science, 273
as soft science, 273
and traditional research methods, 274
Quantcast, 137
questions
asking the right, 21–22
and dating, 82–83
race/racism
causes of, 18–19
elections of 2008 and, 2, 6–7, 12, 133
elections of 2012 and, 2–3, 8, 133
elections of 2016 and, 8, 11, 12, 14, 133
explicit, 133, 134
and Harvard Crimson editorial about Zuckerberg, 155
and lying, 109
map of, 7–9
and Obama, 2, 6–7, 8–9, 12, 133, 240, 243–44
and predicting success in basketball, 35, 36–37
and Republicans, 3, 7, 8
Stephens-Davidowitz’s study of, 2–3, 6–7, 12, 14, 243–44
and Trump, 8, 9, 11, 12, 14, 133
and truth about hate and prejudice, 129–34, 162–63
See also Muslims; “nigger”
randomized controlled experiments
and A/B testing, 209–21
and causality, 208–9
rape, 121–22, 190–91
Rawlings, Craig, 80
“rawtube” (porn site), 59
Reagan, Andy, 88, 90, 91
Reagan, Ronald, 227
regression discontinuity, 234–36
Reisinger, Joseph, 101–2, 103
relationships, lasting, 31–33
religion, and life expectancy, 177
Renaissance (hedge fund), 246
Republicans
core principles of, 94
and origins of political preferences, 170–71
and racism, 3, 7, 8
and words as data, 93–97
See also specific person or election
research
and expansion of research methodology, 275–76
See also specific researcher or research
reviews, of businesses, 265
“Rocket Tube” (gay porn site), 115
Rolling Stones, 278
Romney, Mitt, 10, 212
Roseau County, Minnesota, successful/notable Americans from, 186, 187
Runaway Bride (movie), 192, 195
sabermetricians, 198–99
San Bernardino, California, shooting in, 129–30
Sands, Emily, 202
science
and Big Data, 273
and experiments, 272–73
real, 272–73
at scale, 276
soft, 273
search engines
differentiation of Google from other, 60–62
for pornography, 61n
reliability of, 60
word-count, 71
See also specific engine
searchers, typing errors by, 48–50
searches
negative words used in, 128–29
See also specific search
“secrets about people,” 155–56
Seder, Jeff, 63–66, 68–70, 71, 74, 155, 256
segregation, 141–44. See also bias; discrimination; race/racism
self-employed people, and taxes, 178–80
sentiment analysis, 87–92, 247–48
sex
as addiction, 219
and benefits of digital truth serum, 158–59, 161
and childhood experiences, 50–52
condoms and, 5, 122
and digital revolution, 274, 279
and dimensions of sexuality, 279
during marriage, 5–6
and fetishes, 120
and Freud, 45–52
Google searches about, 5–6, 51–52, 114, 115, 117, 118, 122–24, 126, 127–28
and handling the truth, 158–59, 161
and Harvard Crimson editorial about Zuckerberg, 155
how much, 122–23, 124–25, 127
in India, 19
new information about, 19
oral, 128
and physical appearance, 120, 120n, 125–26, 127
and power of Big Data, 53
pregancy and having, 189
Rolling Stones song about, 278
and sex organs, 123–24
Stephens-Davidowitz’s first New York Times column about, 282
and traditional research methods, 274
truth about, 5–6, 112–28, 114n, 117
and typing errors, 48–50
and women’s genitals, 126–27
See also incest; penis; pornography; rape; vagina
Shadow (app), 47
Shakespeare, William, 89–90
Shapiro, Jesse, 74–76, 93–97, 141–44, 235, 273
“Shattered” (Rolling Stones song), 278
shopping habits, predictions about, 71–74
The Signal and the Noise (Silver), 254
Silver, Nate, 10, 12–13, 133, 199, 200, 254, 255
Simmons, Bill, 197–98
Singapore, pregnancy in, 190
Siroker, Dan, 211–12
sleep
and digital revolution, 279
Jawbone and, 276–77
and pregnancy, 189
“Slutload,” 58
small data, 255–56
smiles, and pictures as data, 99
Smith, Michael D., 224
Snow, John, 275
Sochi, Russia, gays in, 119
social media
bias of data from, 150–53
doppelganger hunting on, 201–3
and wives descriptions of husbands, 160–61, 160–61n
See also specific site or topic
social science, 272–74, 276, 279
social security, and words as data, 93
socioeconomic background
and predicting success in basketball, 34–41
See also pedigrees
sociology, 273, 274
Soltas, Evan, 130, 162, 266–67
South Africa, pregnancy in, 189
Southern Poverty Law Center, 137
Spain, pregnancy in, 190
Spartanburg Herald-Journal (South Carolina), and words as data, 96
specialization, extreme, 186
speed, for transmitting data, 56–59
“Spider Solitaire,” 58
Stephens-Davidowitz, Noah, 165–66, 165–66n, 169, 206, 263
Stephens-Davidowitz, Seth
ambitions of, 33
lying by, 282n
mate choice for, 25–26, 271
motivations of, 2
obsessiveness of, 282, 282n
professional background of, 14
and writing conclusions, 271–72, 279, 280–84
Stern, Howard, 157
stock market
data for, 55–56
and examples of Big Data searches, 22
Summers-Stephens-Davidowitz attempt to predict the, 245–48, 251–52
Stone, Oliver, 185
Stoneham, James, 266, 269
Storegard, Adam, 99–101
stories
categories/types of, 91–92
viral, 22, 92
and zooming in, 205–6
See also specific story
Stormfront (website), 7, 14, 18, 137–40
stretch marks, and pregnancy, 188–89
Stuyvesant High School (New York City), 231–37, 238, 240
suburban areas, and origins of notable Americans, 183–84
successful/notable Americans
factors that drive, 185–86
zooming in on, 180–86
suffering, and benefits of digital truth serum, 161
suicide, and danger of empowered government, 266, 267–68
Summers, Lawrence
and Obama-racism study, 243–44
and predicting the stock market, 245, 246, 251–52
Stephens-Davidowitz’s meeting with, 243–45
Sunstein, Cass, 140
Super Bowl games, advertising during, 221–25, 239
Super Crunchers (Gnau), 264
Supreme Court, and abortion, 147
Surowiecki, James, 203
surveys
in-person, 108
internet, 108
and lying, 105–7, 108, 108n
and pictures as data, 97
skepticism about, 171
telephone, 108
and truth about sex, 113, 116
and zooming in on hours and minutes, 193
See also specific survey or topic
Syrian refugees, 131
Taleb, Nassim, 17
Tartt, Donna, 283
TaskRabbit, 212
taxes
cheating on, 22, 178–80, 206
and examples of Big Data searches, 22
and lying, 180
and self-employed people, 178–80
and words as data, 93–95
zooming in on, 172–73, 178–80, 206
teachers, using tests to judge, 253–54
teenagers
adopted, 108n
as gay, 114, 116
lying by, 108n
and origins of political preferences, 169
and truth about sex, 114, 116
See also children
television
and A/B testing, 222
advertising on, 221–26
Terabyte, 264
terrorism, 18, 129–31
tests/testing
of high school students, 231–37, 253–54
and judging teacher, 253–54
and obsessive infatuations with numbers, 253–54
online behavior as supplement to, 278
and small data, 255–56
See also specific test or study
Thiel, Peter, 155
Think Progress (website), 130
Thinking, Fast and Slow (Kahneman), 283
Thome, Jim, 200
Tourangeau, Roger, 107, 108
towns, zooming in on, 172–90
Toy Story (movie), 192
Trump, Donald
elections of 2012 and, 7
and ignoring what people tell you, 157
and immigration, 184
issues propagated by, 7
and origins of notable Americans, 184
polls about, 1
predictions about, 11–14
and racism, 8, 9, 11, 12, 14, 133, 139
See also elections, 2016
truth
benefits of knowing, 158–63
handling the, 158–63
See also digital truth serum; lying; specific topic
Tuskegee University, 183
Twentieth Century Fox, 221–22
Twitter, 151–52, 160–61n, 201–3
typing errors by searchers, 48–50
The Unbearable Lightness of Being (Kundera), 233
Uncharted (Aiden and Michel), 78–79
unemployment
and child abuse, 145–47
data about, 56–57, 58–59
unintended consequences, 197
United States
and Civil War, 79
as united or divided, 78–79
University of California, Berkeley, racism in 2008 election study at, 2
University of Maryland, survey of graduates of, 106–7
urban areas
and life expectancy, 177
and origins of notable Americans, 183–84, 186
vagina, smells of, 19, 126–27, 161
Varian, Hal, 57–58, 224
Vikingmaiden88, 136–37, 140–41, 145
violence
and real science, 273
zooming in on, 190–97
See also murder
voter registration, 106
voter turnout, 9–10, 109–10
voting behavior, and lying, 106, 107, 109–10
Vox, 202
Walmart, 71–72
Washington Post, and words as data, 75, 94
Washington Times, and words as data, 75, 94–95
wealth
and life expectancy, 176–77
See also income distribution
weather, and predictions about wine, 73–74
Weil, David N., 99–101
Weiner, Anthony, 234n
white nationalism, 137–40, 145. See also Stormfront
Whitepride26, 139
Wikipedia, 14, 180–86
wine, predictions about, 72–74
wives
and descriptions of husbands, 160–61, 160–61n
and suspicions about gayness of husbands, 116–17
women
breasts of, 125, 126
butt of, 125–26
genitals of, 126–27
violence against, 121–22
See also girls; wives; specific topic
words
and bias, 74–76, 93–97
and categories/types of stories, 91–92
as data, 74–97
and dating, 80–86
and digital revolution, 278
and digitalization of books, 77, 79
and gay marriage, 74–76
and sentiment analysis, 87–92
and U.S. as united or divided, 78–79
workers’ rights, 93, 94
World Bank, 102
World of Warcraft (game), 220
Wrenn, Doug, 39–40, 41
Yahoo News, 140, 143
yearbooks, high school, 98–99
Yelp, 265
Yilmaz, Ahmed (alias), 231–33, 234, 234n
YouTube, 152
Zayat, Ahmed, 63–64, 65
Zero to One (Thiel), 155
zooming in
on baseball, 165–69, 165–66n, 171, 197–200, 200n, 203, 206, 239
benefits of, 205–6
on counties, cities, and towns, 172–90, 239–40
and data size, 171, 172–73
on doppelgangers, 197–205
on equality of opportunity, 173–75
on gambling, 263–65
on health, 203–5, 275
on income distribution, 174–76, 185
and influence of childhood experiences, 165–71, 165–66n, 206
on life expectancy, 176–78
on minutes and hours, 190–97
and natural experiments, 239–40
and origin of political preferences, 169–71
on pregnancy, 187–90
stories from, 205–6
on successful/notable Americans, 180–86
on taxes, 172–73, 178–80, 206
Zuckerberg, Mark, 154–56, 157, 158, 238–39