THE NEW HAVEN EXPERIMENTS

Don Green was still new to the Yale political science department when he began to suspect that his chosen discipline was intellectually bankrupt. In the 1950s, political scientists had started talking like economists, describing politicians and citizens as rational beings who acted to maximize their self-interest. Voters were believed to peruse a ballot the same way they examined a store shelf, calculating the benefits each product had presented and checking the box next to the one offering the best value. “Voters and consumers are essentially the same people,” the economist Gordon Tullock wrote in his 1976 book The Vote Motive. “Mr. Smith buys and votes; he is the same man in the supermarket and in the voting booth.” By the time Green began teaching in 1989, such thinking was pervasive among his peers. They saw politics as a marketplace where people and institutions compete for scarce power and resources with the clear, consistent judgment of accountants.

This detached view of human behavior was particularly galling to Green, who was trained as a political theorist but found his greatest joy amid sophisticated board games. Growing up in Southern California, Green had played Civil War and World War II games with his brothers, a diversion he partly credits for his later interest in politics and history. When he first arrived at Yale, Green bonded with students and colleagues through games, which filled the interstices between classes and office hours, with a single competitive session often stretched over weeks. In the late 1990s, Green was playing at his colonial home in New Haven with his seven-year-old son and five-year-old daughter, using the plastic construction toy K’nex to build a lattice-like structure. The kids imagined spiderlike monsters moving from one square to the next. Green started to visualize from this a new board game, in which Erector-set-like limbs could be grafted onto basic checkers-style coins and every piece would become dynamic. Tinkering in his spare time, Green created a deceptively simple two-player game on a two-dimensional grid. At each turn, a participant could move one of his or her starting pieces or add a limb that would increase its power by allowing it to move in a new direction. “When you’re playing chess, you play the hand you’re dealt, where here you build your own pieces,” says Green. “Imagine a game of chess where all the pieces start out as pawns.” To bring his game to market, Green needed a prototype, so he taught himself woodworking and built a studio in his basement—the first time in his life, he realized, that he had done anything truly physical. Within a year, a Pennsylvania company had agreed to produce Octi—in which each turn required a player to make a choice between moving and building, all while trying to anticipate the opponent’s response. Green described it as “an abstract idea of a game about mobilization.”

Watching people play Octi only illustrated what Green already believed about their behavior. Even in a board game, human beings were incapable of logically assessing all of their options and making the optimal decision each time. Yet rational-choice scholars thought this is what people did every time they participated in politics—and what frustrated Green most was that these claims were purely speculative. The rational-choicers had built entire theoretical models to explain how institutions from Congress to the military were supposed to function. The more closely the rational-choice model was applied to the way politics actually worked, the less it seemed able to explain. In 1994, along with his colleague Ian Shapiro, Green coauthored a book titled Pathologies of Rational Choice Theory, in which he argued that the ascendant movement in political science rested on a series of assumptions that had not been adequately demonstrated through any real-world research. “There was reason to think the whole thing might be a house of cards,” says Green.

When political scientists did try to explain real-world events, Green didn’t think the results were much better. The principal tool of so-called observational research was correlation, a statistical method for seeking out connections between sets of data. Academics relied on a declaration of “statistical significance” to explain just about everything, yet demonstrating a correlation rarely illuminated much. For instance, one element that defined twentieth-century politics was the fact that people who lived in urban areas voted overwhelmingly Democratic. Were cities pulling their inhabitants to the left, or were liberal people drawn to cities? Or was there some other explanation altogether for the pattern? Perhaps most frustrating of all to Green and a junior colleague, Alan Gerber, was the inability of their discipline to even justify the individual decision to vote at all. Casting a ballot is the basic act of political behavior in a democracy, and yet political science offered little reason to explain why people would bother when there was no legal requirement. After all, considering the economic logic favored by rational-choicers, voting carried a known set of costs (the time and inconvenience of registering, learning about the candidates and going to the polling station) and little in the way of benefits (a tiny probability that an individual’s vote would affect government policies). “There was good reason to think no one should vote,” Green says.

Political scientists had toyed with this question for a generation, and by 1998 the most sophisticated thinking relied on the proposition that, as election day approached, voters calculated the likelihood they might be the pivotal vote deciding the race. In other words, before changing her plans to stop at a local firehouse on a rainy Tuesday in November, a harried working mother paused to assess the likelihood that she would cast the tie-breaking vote in a race with thousands, or even millions, of other citizens each making his or her own simultaneous calculations. “Is that how a typical voter thinks when he’s casting his ballot?” Gerber asked.

If we can’t explain what makes people vote, he and Green thought, let’s see if we can affect their calculus behind doing so. To bolster their claim that basic political theories were unproven in the real world, Green and Gerber decided to do something political scientists were not supposed to. They would conduct an experiment.

IN THE LATE SUMMER of 1998, Green and Gerber sat in adjacent, wood-paneled offices at Yale’s Institution for Social and Policy Studies, sheltered in a Richardsonian Romanesque building that was once a clubhouse for the secret society Wolf’s Head, and scoured all they could find of the experimental tradition in political science. As an undergraduate at Yale, Gerber had learned how field experiments had been taken up by policymakers, notably those developing Lyndon B. Johnson’s Great Society, to test the effects of new social programs. Perhaps the most famous were a series of experiments coordinated in 1968 by the White House to test the viability of a so-called negative income tax. The experiment, designed by a graduate student at the Massachusetts Institute of Technology, would randomize households below the poverty line to receive bonus payments and then measure their levels of employment afterward. The target was a major behavioral riddle that vexed the welfare state—how could the government give aid without undercutting the motivation to work?—and an empirical approach to solving it proved popular across the ideological divide. Running the federal Office of Economic Opportunity for the two years in which it oversaw the experiments were its director, Donald Rumsfeld, and his assistant, Dick Cheney. “In the deep recesses of my mind was the notion that some kind of large-scale experimentation was a thing that social scientists at one point or another did,” says Gerber.

But that interest had never really pervaded the study of elections. Gerber and Green were surprised to find that the use of field experiments had begun, and effectively ended, with the publication of Harold Gosnell’s Getting Out the Vote in 1927, and they were eager about the possibilities that opened up for them. “There are very few things in academia that are more exciting,” says Green, “than doing things that either haven’t been done before or haven’t been done in a very long time.” So he and Gerber began to read more generally about the origins of field experiments in other areas. The term hinted at the history: the earliest randomized trials grew out of searches for fertilizer compounds conducted by nineteenth-century researchers for the nascent chemical industry.

Each season, scientists at the Rothamsted Agricultural Experimentation Station in England would take a blend of compounds such as phosphate and nitrogen salts, alter the ratio of the chemicals, and sprinkle it over plots of rye, wheat, and potato planted in the clay soil of the estate north of London. One year’s plant growth would be compared with the next, and the difference was recorded as an index of fertility for each chemical mixture. When the pipe-smoking mathematician R. A. Fisher arrived in 1919 and examined ninety years of experiments, he realized that the weather probably had had more to do with the variations in growth than the chemical blend. Even though Rothamsted researchers tried to discount for the volume of rain in a given season, there were many other things that varied unpredictably and even imperceptibly from year to year, like soil quality or sun or insect activity. Fisher redrew the experiment so that different chemical ratios could be compared with one another simultaneously. He split existing plots into many small slivers and then randomly assigned them different types and doses of fertilizer that could be dispensed at the same time. The size and proximity of the plots ensured that, beyond the varying fertilizer treatments, they would all experience the same external factors.

Not far from Fisher, a young economist named Austin Bradford Hill was growing similarly impatient with the limits of statistics to account for cause and effect in health care. In 1923, for example, Hill received a grant from Britain’s Medical Research Council that sent him to the rural parts of Essex, east of London, to investigate why the area suffered uncommonly high mortality rates among young adults. Hill returned from Essex with an explanation that had little to do with the quality of medical care: the healthiest members of that generation quickly left the country to live in towns and cities. The whole British medical system was built on similarly misleading statistics, and Hill worried that the faulty inferences drawn from them put people’s health at risk. Hill joined the Medical Research Council’s scientific staff and began writing articles in the Lancet explaining to doctors in straightforward language what concepts like mean, median, and mode meant.

But even as he worked to educate the medical community about how to use the statistics it had—most from the rolls of life and death maintained by national registrars—Hill knew the quality of the numbers themselves was a potentially bigger problem. In medicine, “chance was regarded as an enemy of knowledge rather than an ally,” writes historian Harry M. Marks. When clinicians ran controlled experiments, they looked to find two patients as similar as possible in every measurable respect, treat them differently, and attribute the outcome to the care they received. But Hill thought that this matching process—or alternating treatments on patients in the order they were admitted to a hospital—would always let uncontrolled variables leak in. “It is obvious that no statistician can be aware of all the factors that are, or may be, relevant,” he wrote.

In 1943, a New Jersey chemist isolated streptomycin, an antibiotic that put up a promising fight against tuberculosis, and the pharmaceutical manufacturer Merck began producing it in large volumes. After the war ended, several companies in the United Kingdom, where the disease killed twenty-five thousand residents annually, made plans to introduce their own streptomycin. Meanwhile, a Mayo Clinic tuberculosis researcher traveled to London and Oxford to trumpet findings from his successful laboratory experiments on guinea pigs. The Medical Research Council received fifty kilograms of streptomycin and was quickly overwhelmed by requests from tuberculosis patients for some of the miracle cure. For Hill, who had become honorary director of the council’s statistical research unit, the medicine shortage offered a promising opportunity to try a new type of experiment.

There was only a distant precedent for the idea of randomly splitting patients into separate groups and measuring the varying effects of treatments on each. The seventeenth-century Flemish physician and chemist Jan Baptista van Helmont had defended his technique by daring academic rivals to “take out of the hospitals, out of the camps, or from elsewhere, 200 or 500 poor People that have Fevers, Pleurisies, etc. Let us divide them into halfes, let us cast lots, that one half of them may fall to my share, and the other to yours … we shall see how many funerals both of us shall have.” Into the twentieth century, however, such controlled testing came to be seen as ethically dodgy, since it meant consciously denying the best known care to those who wanted it. But because there wasn’t enough streptomycin for everyone who requested it, the council had no choice but to leave people untreated. A decision to pass over some people randomly, Hill realized, offered an opportunity to improve the statistical quality of an important clinical experiment—and would also be a fairer method of distributing potentially lifesaving medicine.

Hill set out to translate Fisher’s technique from the farm to the hospital. Each of the 107 tuberculosis patients in Hill’s study was randomly assigned a number that put him in one of two treatment groups. The “S” cases were to receive two grams of streptomycin daily, spread over four doses, along with bed rest. The “C” cases were assigned only bed rest. Even once admitted to the hospital, a patient never learned which treatment he or she had been assigned.

After one year, Hill’s investigators reviewed the health of the whole sample: a majority of the patients assigned streptomycin, 56 percent, had improved their condition over the course of their hospitalization, compared with 31 percent of the control sample. Over the same period, 22 percent of the S cases had died, compared with 46 percent of the C cases. Since Hill had randomized the treatment, there was only one way to explain the result: the new medicine worked. British companies began manufacturing streptomycin, which became an essential tool in the doctor’s bags of those fighting tuberculosis, since most of the others were scalpels to cut a hole in the patient’s chest and an air pump to collapse the infected lungs.

In an era in which wonder drugs emerged from labs worldwide, such blind randomized-control experiments quickly became the dominant tool for demonstrating that a new treatment worked and delivered no mitigating side effects. When, a few years later, Jonas Salk isolated a vaccine for polio, a successful large-scale randomized experiment—involving 1.8 million children—was a natural way to test it. In 1962, the Food and Drug Administration changed its standards to require “adequate and well-controlled investigations,” and not merely clinical judgment, before approving a drug for wide use.

Gerber and Green believed they could bring this approach into politics. They wanted to explain electoral behavior with the same degree of authority that doctors now had in describing therapeutic care. Instead of patients, they would randomize individual households to separate the factors that affected voter participation. Most of the money in major campaigns was spent on television and radio, and it was impossible to treat one voter differently from a neighbor when broadcast waves covered a whole region. But individualized forms of contact—a canvasser’s knock on the door, a pamphlet arriving in the mail, live or recorded phone calls—could be easily isolated.

Gerber was three years younger than Green, a political scientist by curiosity more than training. Gerber had arrived at Yale from MIT, where he earned a graduate degree in economics just as the school was becoming known as a center for cutting-edge research employing novel tools to examine subjects outside the typical bounds of economic study. Gerber and a classmate, Steve Levitt, kept being drawn to political questions, like whether the winner’s fund-raising advantage could explain the outcome of congressional elections. (Levitt later won a John Clark Bates Medal and cowrote the bestselling book Freakonomics, exploring subjects such as the hierarchy of drug gangs and the ethics of sumo wrestlers.)

In his dissertation, Gerber used economic techniques to answer the type of question usually left to political scientists or historians: what happened when the United States adopted the secret ballot in the 1880s? That moment, when Americans went from picking their candidates aloud in crowded pubs to making their selections in curtained solitude, was key in forming the country’s modern political culture, but it had never been analyzed in that way. “Until you read about the adoption of the secret ballot it would never occur to you that the secret ballot would need to be adopted,” says Gerber. He wanted to know whether the shift had had an impact on how many Americans turned out on election day, how incentives might have changed when voting was converted from a public act to a private one. “At the most general level it seems pretty obvious that the payoffs for voting are social and psychological, not instrumental,” says Gerber. “It seems very hard to imagine people figuring out the idea of the payoff for voting being literally your odds of being the pivotal vote.”

The world of elections was not an academic abstraction to Gerber, who had spent just enough time around political campaigns to be interested when Green suggested putting their methods to the test. The stories that stuck with him from his own campaign experiences were ones that revealed a deep crisis of knowledge among those who practiced politics for a living. In 1987, not long after graduating from Yale, a twenty-three-year-old Gerber went to work as the New Hampshire scheduling director on Paul Simon’s presidential campaign, responsible for managing the Illinois senator’s itinerary in the first primary state. One day, he fielded a call from Simon’s top Illinois-based consultant, a former journalist named David Axelrod, who was working on a batch of radio ads attacking one of Simon’s midwestern rivals, Missouri congressman Dick Gephardt. “He even supported the neutron bomb,” one of Axelrod’s scripts read.

“For reasons not entirely clear to me, he asked the scheduler,” recalls Gerber, referring to himself, “ ‘How do you think that’ll play in New Hampshire?’ ”

“I’m not sure people in New Hampshire will know what the neutron bomb is,” Gerber told Axelrod.

After he started teaching at Yale in 1993, Gerber interned for a summer in the Washington office of Democratic pollster Mark Mellman to get a different perspective on the way campaigns worked. One of the firm’s polls itemized a list of qualities and asked voters if they would be more likely to support a candidate who shared that trait—including, to Gerber’s amusement, “doesn’t listen too much to polls.”

Just as Green was coming to question his discipline, some political scientists had begun to conclude that the whole political-consulting profession was a farce. With nearly a half century of rich electoral data and ever-better measurements of national conditions, researchers had thought that they could explain presidential outcomes with a basic set of facts—primarily which party held power and how the economy fared while they did. Ads, debates, candidate speeches, and election organizing were mere spectacle at the margins of a predetermined outcome. The debate was summarized by an unusually succinct question: Do campaigns matter?

The more time Gerber spent within Mellman’s polling operation, the more he appreciated that political scientists themselves lacked the tools to ever arrive at a convincing answer. Much of what academics thought they knew came from exit polls and post-election surveys like the University of Michigan’s national election studies. Pollsters would ask people whether or not they voted, if they were contacted by a campaign before the election, then look for a correlation between the two. Gerber saw flaws in this method. He assumed that the people who answered polls were more likely to be those reachable by campaigns. Logic suggested also that respondents highly attuned to politics were more likely than others to remember when they were contacted by campaigns. Campaigns decide which voters to contact in the first place based on their own calculations of who is more likely to vote. “If you put this all together, you get a causal explanation of who knows what?” says Gerber. “These are technical issues, but until they are resolved you have no good answer to the question you are trying to understand.”

SHORTLY BEFORE ELECTION DAY in 1998, Don Green and Alan Gerber walked through the streets of New Haven trying to monitor the dozens of students they had dispatched across the city, keeping an eye on approaching rain clouds and fretting that when they arrived there wouldn’t be enough umbrellas to keep everyone dry. They had recruited off Yale bulletin boards, promising the generous pay of twenty dollars per hour, and assigned students into pairs according to buddy-system precepts. Where possible, Gerber and Green tried to hire local residents to serve as Sherpas in unfamiliar city neighborhoods. They checked in with their employees often and called everybody back in from the field at dusk. “We tried not to take chances,” says Green. The students were knocking on doors to encourage people to vote in an upcoming election; that Green referred to this as “dangerous work” was evidence of just how detached, and sheltered, political science had grown from the world it supposedly studied.

Gerber and Green had designed a field experiment to measure what effects, if any, the most fundamental campaign methods could have on an election’s outcome. They had selected three basic modes of voter contact—an oversized postcard arriving by mail, a scripted ring from a far-off call center employee, and a doorstep visit from a canvasser—and within each a series of different appeals to participate on November 3. One message pointed to an idea of civic duty, with an image of Iwo Jima, under the slogan “They fought … so we could have something to vote for.” Another raised themes of community solidarity: “When people from our neighborhood don’t vote we give politicians the right to ignore us.” The last emphasized the prospect of a close election, illustrated with a “Dewey Defeats Truman” headline. “Will yours be the deciding vote?” the postcard version asked. Various combinations of mode, message, and number of contacts were randomly deployed across thirty thousand New Haven voters scattered among twenty-nine of the city’s thirty wards. (To “get away from students,” as they later put it, the Yale professors removed the ward including the university from their study.) A control group would go without any contact. Afterward Gerber and Green would check the electoral rolls maintained by the town clerk to measure the influence of each type of contact on voter turnout.

Despite the national furor over the looming impeachment of Bill Clinton, there was little suspense about the outcomes of the top statewide races in Connecticut that fall. The state’s popular Republican governor, John Rowland, and its longtime Democratic senator, Chris Dodd, were both going to be comfortably reelected. But the experiment made no reference to the particulars of that year’s ballot, largely because Gerber and Green had chosen to partner with nonprofit groups prohibited by the tax code from taking a side in elections. That summer the two professors had presented their plan—which amounted, in essence, to creating their own political action committee for the sake of the experiment—to the local League of Women Voters chapter, which agreed to attach its name to the project. Then Gerber and Green found a Connecticut foundation willing to put up nearly fifty thousand dollars for the operation in the hopes that it would yield new strategies for increasing civic engagement after a generation of falling participation nationwide.

In that regard, the timing of Gerber and Green’s gambit was fortuitous; everybody wanted to understand why Americans seemed to be retreating from public life. Three years earlier, Harvard professor Robert Putnam had emerged as the most visible political scientist in the country on the basis of a journal article titled “Bowling Alone: America’s Declining Social Capital.” (It would later become the basis for a bestselling book.) Putnam looked at declining membership figures in nonpolitical community organizations—from bowling leagues to Elks Lodges and the League of Women Voters, whose ranks had shrunk nearly in half nationwide since the late 1960s—to argue that a distinctively American civil society had dissolved into a fizz of solitary entertainments and self-interest. Putnam mostly kept clear of electoral politics in his article, but the same pattern was apparent there, too: between 1960 and 1988, voter turnout rates in presidential campaign years fell by 12 percentage points. A director of the University of Michigan’s National Election Studies at the time, Steven Rosenstone, dove into decades’ worth of survey data in search of explanation. In a 1993 book resulting from that effort, Rosenstone and John Mark Hansen had spread responsibility widely. The electorate had expanded (with the constitutional change of the voting age to eighteen) while individual voters grew disengaged (they were less attached to candidates and parties, and had lost confidence in their electoral power). But much of the blame, according to Rosenstone and Hansen, belonged to politicians themselves for losing touch with voters as they embraced new media. Party organizations that had once mobilized votes by speaking directly with their constituents had receded into the background, replaced by candidate campaigns that chose to blare their messages over the airwaves.

In early 1999, Gerber and Green waited restlessly for local election authorities to update New Haven’s individual voter histories to reflect who had cast a ballot in November. They were in an unusual situation for a pair of political scientists: they did not know what they would be arguing, if anything at all, when it came time to publish the results. But they already had a hunch. In the days before the election, Gerber and Green were able to patch into the North Dakota call center they had hired to dial voters, and they were amazed by how perfunctory the exchanges were. The caller often sounded like he or she was rushing through the script to get to the end before the recipient hung up. (Most political call centers are paid based on the number of “completes” they fulfill.) Even when the caller successfully reached the end of her script, the academics listening in heard little that made them think the voter was being engaged by the appeal, or even listening. “There’s no way this can work,” Gerber told Green as they eavesdropped on one call.

He was right. When the results of the experiment came in, the phone calls showed no influence in getting people to vote. The direct-mail program increased turnout a modest but appreciable 0.6 percentage points for each postcard sent. (The experiment sent up to three pieces per household.) But the real revelation was in the group of voters successfully visited by one of the student teams: they turned out at a rate 8.7 percentage points higher than the control sample, an impact larger than the margin in most competitive elections. When Gerber and Green reread Rosenstone and Hansen, they began to question whether the authors had fully accounted for the historic drop in turnout during the late twentieth century. Maybe the issue wasn’t just that Americans were being mobilized by campaigns any less, but that even the new forms of individual contact lacked a personal touch. A message that may have once been spoken at the doorstep would now come facelessly by phone or mail. The professionalization of such consulting services, and the growth in campaign budgets to employ them, meant that it was often easier to find paid workers to deliver a message than it would be to recruit and manage volunteers.

Other academic research had shown that in-person appeals were particularly effective in encouraging other “prosocial” behaviors, like recycling newspapers or donating blood. How to get people to undertake activities that offered no individual-level benefits but helped the community as a whole was a conundrum that theorists called the problem of “collective action,” and it sat at the center of most rational-choice explorations of why people vote (or don’t). In their experiments, Gerber and Green concluded, the costs and benefits of voting had not been appreciably altered by the volunteers’ visit, but it had certainly changed whatever internal calculus people used when deciding whether to go to the polls. “There’s nothing about that which should make you more likely to vote,” says Green. “It was a collective-action problem before I showed up, and it’s a collective-action problem after I showed up.” But now his experiment was pointing to a potential solution: maybe one way you could get people to vote was simply to have other people ask them to.

The Yale researchers began turning their findings into an article with an understated title, “The Effects of Canvassing, Telephone Calls, and Direct Mail on Voter Turnout: A Field Experiment,” that reflected the modest pragmatism of their accomplishment. What had started as a pilot study to settle internecine disputes within their discipline turned out not to yield any bold theoretical insights. Instead, Gerber and Green had stumbled into almost embarrassingly practical but valuable lessons in street-level politics. They sent their paper to the American Political Science Review, the discipline’s most prestigious venue and the same one where Gosnell had published his work seventy-five years earlier. The paper was rejected. “In short, its findings are entirely confirmatory of previous work. The paper does not offer any new theory about voter behavior,” an anonymous peer reviewer wrote. “That said, I think the study is useful, and I wish the authors luck in getting it published elsewhere.”

Gerber and Green successfully appealed the decision and in September 2000 their study appeared in the journal, but then only under the secondary designation of “research note.” But the fusty peer review standards did not much concern the campaign professionals whose methods were under examination, and who welcomed the Gerber-Green study with the same mixture of dread and flattery that washing machine dealers likely felt when the inaugural Consumer Reports came out. Political operatives had been trained to view political scientists with skepticism, even hostility. They saw academics as intellectual snobs with no practical experience, conjuring abstract models on college campuses far removed from the chaos and urgency of real campaigns. “Those smart guys speak that smart language. They collect smart theories to properly arrange their smart facts. Then they publish smart papers to make sure people know they are real smart,” says Tom Lindenfeld, a former Democratic National Committee campaign director and one of the party’s leading field tacticians. “The rest of us just know what works.”

The Gerber-Green experiments, though, were hard to overlook. The findings assailed many of the consulting class’s business models and provoked a minor civil war within it. Direct-mail vendors happily used the Gerber-Green findings to suggest that candidates would be wasting their money on phone calls. “It created a furor because what it effectively was saying was that a lot of the expenditures weren’t getting you bang for the buck,” says Ken Smukler, a Pennsylvania-based consultant.

As word of the New Haven findings circulated, battered photocopies of the article passed from one hand to the next like social science samizdat. “A lot of what gets done on campaigns gets done on the basis of anecdotal evidence, which often comes down to who is a better storyteller. Who tells a better story about what works and what doesn’t work?” says Christopher Mann, a former executive director of the New Mexico Democratic Party. “It might be that their phone script made a difference—or it might be that one was Alabama and one was Arkansas and they were fundamentally different races.”

Gerber and Green had identified the first tool that was able to satisfyingly disentangle cause and effect and demonstrate what actually won votes. “It became really obvious to me very quickly that my quibbles about what had been going on in campaigns were being addressed with field experiments,” says Mann. The next year he applied to graduate school at Yale so that he could study with Gerber and Green, who were eager to have someone with Mann’s political experience as they plotted new field trials. “They had a very clear sense that what they had done to that point was just scratching the surface,” says Mann. “They had an idea that there was an audience for this stuff among campaign folk. But they needed to understand how to ask the questions that mattered to campaigns and not just academia.”

In the fall of 2000, Gerber and Green were invited to speak to the Carnegie Corporation, one of many civic-minded institutions that had added dwindling voter turnout to their list of concerns over the course of the 1990s. Because the tax code allowed nonprofit organizations to run registration and turnout drives as long as they did not push a particular candidate, organizing “historically disenfranchised” communities (as Carnegie described them) became a backdoor approach to ginning up Democratic votes outside the campaign finance laws that applied to candidates, parties, and political action committees. Major liberal donors got into the GOTV game: Project Vote organized urban areas, Rock the Vote targeted the young, the NAACP National Voter Fund focused on African-Americans. “You were seeing much more energy devoted to turnout,” says Thomas Mann, a Brookings Institution scholar who hosted an event with Gerber and Green in a Capitol Hill committee room at the time. “They were putting resources into it, and didn’t have a very good way of measuring the effectiveness of it.”

When Gerber and Green stepped into a conference room at Carnegie, they unwittingly stumbled into an epic battle for resources within lefty interest groups. Ground-level field organizers had been losing their share of budgets to broadcast ads and commercial-style marketing campaigns, for what the organizers believed was no reason other than that mass-media platforms looked sexier. The Gerber-Green study demonstrating that door-knocking delivered results was the redemptive evidence for which they had long waited, salvation with footnotes. “Someone off to my right whispered, ‘This is like the Beatles,’ ” Green recalls of the Carnegie visit, the air particularly electric for a think-tank session. “It was only beginning to dawn on me why we were heroic figures.”