six

Development as Escape from Poverty

What vast amount of misery, ruin, loss, privations [people's banks] have either averted or removed, penetrating, wherever they have once gained a footing, into the smallest hovel, and bringing to its beggared occupant employment and the weapons wherewith to start afresh in the battle of life, it would tax the powers of even experienced economists to tell.

—HENRY WOLFF, 18961

Control groups in theory correct for the attribution problem by comparing people exposed to the same set of conditions and possible choices. However, control-group design is tricky, and skeptics hover like vultures to pounce on any weakness.

—ELISABETH RHYNE, 20012

In January 2008 I visited Cairo, Egypt, for a few days. My official purpose was to speak at a United Nations conference whose premise I did not understand and did not try very hard to understand. It took place just off Tahrir Square, in a high-rise hotel with burly security guards and a clientele of Saudis who came to pursue pastimes more effectively proscribed in their own capital.

Outside the conference hall, I devoted most of my waking hours to two contradictory activities. On a laptop computer in the hotel room, I worked to reconstruct and scrutinize what was then the most rigorous and influential study of the impacts of microcredit on borrowers, one that Grameen Bank founder Muhammad Yunus often cited as showing that 5 percent of his borrowers climb out of poverty each year.3 I found that as I progressed on the reconstruction, my doubts about the study intensified: it is one thing to show a correlation, another to prove causation. By extension, I increasingly doubted most other studies of the impact of microcredit, which were less rigorous. I began to see that after thirty years, there was little solid evidence that the microfinance movement lived up to its claims of reducing poverty.

But I also took time to visit a fast-growing microlender called the Lead Foundation. Happily, it was disbursement day at a branch office of the foundation in the poor district of Shoubra. Hundreds of women, clad modestly in hijabs, many towing their children, thronged the office lobby. The crowd overflowed into the hallways, down the stairwell, and onto the street. The women were waiting hours to get new loans. Some were there for the first time. The rest had just repaid smaller loans and were stepping up to the next. My Lead Foundation guide, who told me to call him George, ushered two groups of five women each into the branch director's office to talk with me. Through George's translation, I learned that every woman would use the credit to finance informal retail. In one group, Rasha and her sister Hala sold clothes and makeup, respectively; their cousin Doaa traded in women's accessories and scarves, while their aunt Samoh peddled clothing, too; and their neighbor in the same building, Anayat, sold bed sheets. Since the women spoke in the presence of bank employees, I only half-believed their stories. Possibly they were required to assert business activities in applying for the loans. But whatever they did with the credit, they sure seemed to want it.

I reflected on the absurdity of my situation. Should I tell these women, who seemed to be seizing loans to thread the gauntlets of their lives, that on a computer back in my hotel room, I had performed conditional recursive, mixed-process, maximum-likelihood regressions on a cross-section of household data from Bangladesh in the early 1990s, and I wasn't so sure this microlending was a good idea? Of course not. Unless I had compelling evidence that microcredit harms, which I did not, who was I to tell them how to live their lives?

At the same time, I believed that aid-funded projects should be rigorously evaluated. Billions of dollars are given away every year to fight poverty—including from the U.S. Agency for International Development to the Lead Foundation—yet so little of this giving is guided by rigorous analysis of what works. Surely that is a recipe for massive waste.

So, in that branch director's office in the Shoubra district of Cairo, I hit a paradox. How could I square my concern about the lack of scientific evidence with the evidence before my eyes that something good was happening, something hard to gainsay?

My resolution of that paradox is the basis for the next three chapters, which confront head-on the question of whether microfinance “works.” I realized that in the global conversation about microfinance are embedded three distinct notions of success in microlending. Each revolves around a different concept of development: as escape from poverty, as freedom, or as industry building. Each leads to different questions that tend to be tested with different kinds of data. If microfinance's success at eliminating poverty had been incontrovertibly proven, perhaps it would trump consideration of the other two concepts. But as this chapter explains, I believe the proof is lacking.

That makes the other two concepts essential to a full assessment of microfinance. One of these—development as freedom—focuses on the extent to which microfinance gives people more or less control over their circumstances: Does it empower women? Does it entrap some in debt? Development as freedom resonates with the client perspectives in chapter 2. The other conception—development as industry building—focuses on when microfinance institutions enrich the economic fabric of nations. It resonates with the business perspective of chapter 5.

About a year after returning from Cairo, I toured the websites of American microfinance groups, all of which evinced confidence that microfinance helps people out of poverty. Kiva invited me to “lend to a specific entrepreneur in the developing world—empowering them to lift themselves out of poverty.” FINCA had launched a “historic campaign to create 100,000 Village Banks and lift millions out of poverty by 2010.” The Microcredit Summit Campaign had set a goal to “help 100 million families rise above the…$1 per day threshold by 2015.” Opportunity International stated simply, “Microfinance: A Solution to Global Poverty.” Not to be outdone, Acción International invited me to visit lendtoendpoverty.org and “Ask the World's Economic Leaders to Make Microfinance a Focus.”4

What to make of such claims? Common sense says that the effects of microcredit vary. If you lend three friends $1,000 each, they will do different things with the money and achieve different outcomes by luck or skill. One might pay heating bills. The other two might start catering businesses, one to succeed, one to fail. Credit is leverage. Just as bank loans let hedge funds bet bigger than they could with their capital alone, microcredit lets borrowers gain more and lose more than they could alone. Among the millions of borrowers, microcredit no doubt lifts some out of poverty even as it leaves others worse off. Less obvious are the average effects on such as things as household income and enrollment of children in school.

Academics respond to such claims and complexities by calling for studies. Their impulse, a good one, is to go beyond a few haphazardly collected stories of microfinance users. They go to “the field,” gather data from a larger, more representative set of “subjects,” and analyze it systematically from thousands of miles away. Some practitioners who work daily with appreciative customers roll their eyes at the researchers who seek enlightenment in such a stilted mode. But pursuit of empirical truth is what researchers are trained for, and just as they should hesitate to tell practitioners how to do their jobs, practitioners should acknowledge researchers' competence in measuring microfinance's social return.

Dozens of studies have attempted to measure average effects of microfinance.5 Yet in the face of all that data collection, number crunching, and reporting, a recent World Bank review concluded that “the evidence…of favorable impacts from direct access of the poor to credit is not especially strong.”6 To help others understand that conclusion, in this chapter I introduce some statistical concepts that equip non-statisticians to view statistical studies in a properly critical light, and I then review the best studies.

Statisticians sometimes speak of research producing a “negative result.” By that, they do not mean that they have found a clear negative impact, such as of smoking on health, but that they have failed to find evidence of impact positive or negative. In that sense, the conclusion of this chapter is doubly negative: most studies of the effects of microcredit on poverty are not capable of producing credible evidence one way or the other; and the few that are capable do not. On the limited high-quality evidence so far available, the average impact of microcredit on poverty is about zero. In contrast, the one high-quality study of microsavings does find economic gains. Overall, the verdict on microfinance from research that directly assesses the impact on poverty is muted.

The Challenges of Studying Impacts

The year 2009 was a milestone in the study of whether microcredit reduces poverty—and it was a tough year for those who said it did. In the summer of 2009, results from the first randomized studies of the subject reached the public. (At the same time, economist Jonathan Morduch and I released a paper questioning the leading non-randomized study, using the analysis on which I had labored in Cairo.) The new randomized studies, more rigorous than older ones, threw microfinance promoters back on their heels: at least over the first fifteen to eighteen months of availability, microcredit appeared to make no dent in poverty rates. Predictably, newspapers accentuated the negative. “Perhaps Microfinance Isn't Such a Big Deal After All,” ran a headline in the Financial Times.7 Said the Boston Globe: “Billions of dollars and a Nobel Prize later, it looks like ‘microlending’ doesn't actually do much to fight poverty.”8

The next spring, recognizing an existential threat, six American microfinance groups banded together to release a joint statement on the power of microfinance. Going by word count, their largest countervolley was a sequence of vignettes of successful clients, one from each of the six signatories.9 In Peru, Delia Fontela used loans to build a rental house and educate her daughters. In Afghanistan, Roqia started a tailoring business. To me, the document betrayed a misunderstanding of the purpose of research, which is the pursuit of responsible generalizations. I blogged:

The question stands: what is the overall pattern of impacts?

Think about how you might answer that question with stories. How would you go about collecting the stories by which to judge? Probably you'd start to think about how to build a representative sample; how big you could practically make the sample; and how to distill the core elements of each story into a few common terms in order to allow summary statements, such as the percentage that are success stories. This kind of thinking—not fancy statistical arguments—is what these new microfinance impact studies are really about. It's hard to see how to avoid it if you want to understand the impacts of microfinance.10

There is little reason to doubt the six stories in that statement. But there must be many other kinds of microfinance stories, which is what makes finding the right generalizations so hard. Take the case of a loan to a woman. She might invest in a calf, and the calf might die. Or she might invest in vegetable trading, depressing market prices for vegetables and reducing other women's earnings. She might substitute the loan for credit from a moneylender, cutting her interest payments, but not immediately augmenting her capital. She might not invest, as we saw in chapter 2, but instead buy rice to feed her family. She might be empowered by the loan, gaining a measure of freedom from oppressive, gender-based obligations within her extended family or patron–client relationships outside the family. On the other hand, the woman might “pipeline” the loan, handing the cash to her husband and gaining little new power for herself. The mutual dependence of joint liability might unify her and her neighbors into a local political force or oppress them with peer pressure. Some of these outcomes would be unfortunate; others would be fortunate but would not fit the archetypal story. Among fifty borrowers within a single slum or village, all these things could happen.

A researcher wanting to make sense of this complexity by measuring average effects of microfinance must take three difficult analytical steps. First, the details of people's lives must be observed. That is not as easy as it might seem. One can live in a village for a year or dispatch surveyors door to door, but the information gleaned will never be complete nor completely accurate. Second, researchers must estimate the counterfactual, what the lives of microfinance users would have been like without the microfinance. Measuring the world as it isn't is even harder than measuring it as it is. Third, since the complexity of the effects—different for each person—would exceed the grasp of the human mind, they must be distilled to a more manageable, yet ideally representative, set of stories and statistics.

Methods for Studying Impacts

In practice, then, social science research makes simplistic generalizations about variegated experiences using incomplete data about the world as it is and untestable assumptions about the world as it would have been. That does not make social science hopeless, but it does force choices in performing it. As in social science generally, researchers have studied the impacts of microfinance using several methods, which address the various impediments with varying degrees of success.

Researchers use two main methods to understand the impacts of microfinance. Qualitative research involves observing, speaking with, even living with, a few dozen or hundred people to grasp the complexities of a phenomenon of interest. Usually the core of qualitative research is a sequence of in-depth interviews with a defined set of subjects, such as women in a particular slum in Buenos Aires or vendors in a Ugandan market. The interviews can range from rigidly planned sequences of queries to natural conversations motivated by broad questions. Regardless, because of the intense exposure to one milieu, the researcher inevitably picks up extra information in unplanned ways.

All people do qualitative research to understand and navigate the world around them. Qualitative research exploits this innate human capacity to build up a rich picture of a particular setting. Done well, it penetrates the sheen of half-truths interviewees may serve up to transient surveyors. Its weaknesses are narrowness and subjectivity. “As the inclusion of the observer within the observed scene becomes more intense,” anthropologist Margaret Mead wrote, “the observations become unique.”11 Two researchers living in two different villages—or even the same village—might perceive different realities; how then to generalize from one or two specks on the globe?

The other major approach to research is quantitative. Numerical data are collected on a set of “observational units” (people, families, villages, countries), and then avowedly dispassionate, mathematical methods are used to extract signals from the noise of local happenstance. Because the question that drives this chapter is about the measured, average impact of microfinance, I focus on quantitative research. The next chapter, on development as freedom, draws more on qualitative work.

Researchers collecting quantitative data (and this includes qualitative data) face a trade-off between depth and breadth. Expanding data sets to cover more people reduces vulnerability to distortion by a few quirky instances. That is why pollsters interview thousands rather than dozens. By using larger, more representative samples, large-scale quantitative studies can do better at the third research step, generalizing across instances. The cost usually is a shallower understanding of individuals studied. In contrast, more qualitative researchers can collect quantitative data on small sets of subjects, and because these quantitative data can be carefully observed, they can be of high quality. In exchange for depth (precision), they sacrifice breadth (representativeness).

Most heavily quantitative impact analysis is done on data from surveys of households or tiny businesses, collected through relatively superficial contacts between researcher and subject. At each doorstep, a surveyor pulls out a sheaf of blank questionnaires and begins to rattle off questions. The surveys that generated the data I studied in my Cairo hotel room asked some 400 questions, many of them detailed and intrusive. Are you married? How much do you earn? Which of your relatives own at least half an acre of land? Are you in debt to moneylenders?12 In asking many questions of many people, the surveyor might seem to dodge the trade-off I just asserted, between depth and breadth. But I can only guess how a poor Bangladeshi woman, perhaps illiterate, perhaps interrupted in her long gauntlet of daily chores, would greet such a peculiar visitor. Out of pride or fear, she might hide embarrassments or say what she thinks the surveyor wants to hear.

Debt in particular carries shame. Economists Dean Karlan and Jonathan Zinman wrote a paper called “Lying about Borrowing,” in which they reported that half of South Africans who had recently taken a high-interest, short-term loan kept this information from surveyors when asked about it.13 Such distortions, often hidden from the econometricians who analyze the data, make data collection an underappreciated art. A researcher for BRAC, a major microcreditor in Bangladesh, learned a surveying technique from her colleagues for determining whether a respondent borrows from more than one microcredit group. Since many adults answer “no” regardless of the truth—sensing that belonging to more than one group is frowned upon—a surveyor should ask the children: “Does your mommy go to Grameen meetings, too?” The little ones give extra meaning to the technical term for a survey respondent: “informant.”14

But not all survey data are so artfully gathered. Journalist Helen Todd and her husband David Gibbons provide an interesting example by collecting the answers to the same question in two ways. Recall from chapter 2 that they followed the lives of sixty-two women in Bangladesh in 1992, forty of whom were Grameen Bank members. At the end of the year, they used their detailed knowledge to classify the women according to how much clout they wielded over their husbands in matters of money. “Managing directors” dominated financial decisionmaking. “Bankers” and “partners” made decisions jointly with their husbands; bankers had a slight upper hand, and partners had a more equal relationship. “Cashbox” women were responsible for holding money in the home but handed it to their husbands or other relatives on request. “Cashbox plus” women had some small say in domestic spending decisions. According to their study, twenty-seven of the forty Grameen Bank members were at least partners, while only five of the twenty-two non-members had achieved this status (see table 6-1).15

Table 6-1. Influence of Sixty-Two Bangladeshi Women in Household Financial Decisions, 1992

Type Grameen members Non-Grameen members
No say 2 4
Cashbox 4 8
Cashbox plus 7 5
Partner 11 2
Banker 8 2
Managing director 8 1
Total 40 22

Source: Todd (1996), 86–89

Back at the start of the year, Todd, Gibbons, and the rest of the research team had surveyed the same sample in a more conventional fashion. They asked each woman such questions as whether she had an equal say in decisions about purchase of land and schooling of children, then recorded the answers on a form with tick marks. “Looking back at them a year later, the ticks bore only limited relation to the reality we had come to know. If they had any usefulness it was to demonstrate a cultural idea—the way our respondent felt families ought to behave, rather than the way they actually did.16

Correlation versus Causation

Whatever balance researchers strike between breadth and depth of data collection, the second obstacle in analyzing impact remains: establishing the counterfactual. Indeed, this is probably the largest challenge to social science because it is at the heart of any attempt to understand causation. Somehow, the research must divine what a client's life would be like without microfinance.

Suppose a researcher finds, as Todd did, that borrowers earn more than non-borrowers. Absent certainty about how the borrowers would have fared without credit, many stories can explain this correspondence. Perhaps when women form their microcredit groups, they exclude the poorest as too risky—or the poorest exclude themselves—making borrowers richer on average to begin with. In fact, Todd saw this happen.17 From the perspective of impact measurement, the influence of affluence on borrowing constitutes “reverse causality.” Then there is the risk of “selection bias”: people who succeed financially are more likely to make scheduled repayments and continue borrowing, while those who struggle and drop out may fall off the researcher's radar. Then, too, microfinance organizations may operate only in places of relative affluence. This “endogenous program placement” could create a misleadingly positive correlation between poverty and microfinance. More generally, there is the problem of “omitted variables,” factors left out of an analysis that are simultaneously influencing both borrowing and income. For example, suppose that a woman's hidden entrepreneurial talent simultaneously makes her more apt to borrow and more likely to climb out of poverty, yet the borrowing has no effect on her welfare. It could appear that borrowing is helping her even though it is not—and even if she says it does. Unmeasured, her entrepreneurial talent would be omitted from the statistical analysis. As explanations for the correlation between borrowing and income, all these scenarios compete with the hypotheses that credit is making borrowers better off.

In sum, it is one thing to observe that borrowing and income go together but quite another to conclude that the first raises the second. Quantitative researchers have developed techniques to rule out such alternative causal chains, to attack what they call “endogeneity.” Most of the techniques work worse in practice than in theory—and often worse than researchers seem to think, going by the write-ups in their papers.18 In the late 1990s, for instance, the Assessing the Impact of Microfinance Services project of the U.S. Agency for International Development recommended comparing old borrowers (ones in a program for a few years) to new ones, on the idea that if the old borrowers had never borrowed, they would look the same as the new ones in such terms as income, spending, and family size. The new borrowers would stand in for the old borrowers' counterfactuals. Karlan pointed out several dangers in this approach. Old borrowers might differ systematically from new ones: perhaps they are the pioneering risk takers, while the new ones, being more cautious, held back from credit at first. And the passage of time may weed out the unsuccessful clients, as was the case in Todd's sample. This makes it hard to attribute to microfinance any differences found between and non-borrowers and “surviving” borrowers, such as those in table 6-1.19

Fancy Math: Why to Doubt Most Microfinance Impact Studies

Quantitative microfinance impact research provokes two opposite reactions from those untrained in statistics: sweeping credulity and sweeping cynicism. Some confidently invoke the reports, intoning that standard phrase, “Studies show…” Others dismiss economists as divorced from reality. The problem with both views is that they are not properly discriminating. Indeed there are good reasons to be skeptical of statistical research. But some studies are better than others. And one needs to understand why in order to judge what the research so far tells about the effects of microfinance on poverty. Here, I will describe three foibles of econometrics: the black box problem, data mining, and the hidden assumptions needed to interpret statistical correlation as causation. The natural habitat for all these ideas is the world of numbers, but all can be explained with a minimum of technical jargon.

The black box problem. In the 1980s and 1990s especially, the advent of powerful microcomputers and the fear of endogeneity led researchers to devise increasingly complicated mathematical techniques meant to purge endogeneity in its many forms. For example, in the study I scrutinized while in Cairo, Mark Pitt of Brown University and Shahidur Khandker of the World Bank deployed “Weighted Exogenous Sampling Maximum Likelihood–Limited Information Maximum Likelihood–Fixed Effects,” which is as complicated and ingenious as it sounds.20 Such analytical tools, typically packaged in bits of software, amplify some long-standing dangers in the application of statistics to the social sciences, known as econometrics.

Among the simplest statistical techniques is ordinary least squares, which can be visualized as finding the line that best fits a set of data points on a scatter plot of, for example, household income versus microcredit borrowing. The slope of this “regression” line—uphill, downhill, or flat—indicates whether the overall relationship between the two variables is positive, negative, or nonexistent. In fact, even this staple of undergraduate textbooks is complicated enough that it is hard to perform by hand. Computers make it easy, but in the process of hiding the complexity of the analysis, they also hide the complexity of that which is analyzed.

Figure 6-1. Four Realities, One Best-Fit Line

images

Source: Anscombe (1973).

Figure 6-1, based on a classic paper by Francis John Anscombe, illustrates.21 Each of the four scatter plots shows a hypothetical set of eleven observations on two variables, one on the horizontal axis, one the vertical. Imagine them as total microcredit borrowing versus household income for eleven families. Each plot implies a different kind of relationship between the two variables. In the upper left, the data exhibit a linear relationship, if with some statistical noise. In the upper right, the linear relationship is perfect except for what looks like a data entry error. In the lower left, the relationship between the income and borrowing is also mechanical but curved rather than straight. In the bottom right, every household borrows the same amount except for one, whose income entirely controls the placement of the best-fit line. Here's the point of Anscombe's quartet: all four yield exactly the same numerical results if plugged into a computer program that does ordinary least squares line fitting. A researcher who merely looked at the numerical output could easily miss the real story. To look inside the black box, it's essential to make graphs like these. Unfortunately, more complicated techniques are harder to check with graphs. As a result, they tend to obscure the underlying statistical challenges as much as they resolve them.

Data mining. Another danger amplified by technical sophistication is “data mining,” which is the process of sifting for certain conclusions, consciously or unconsciously. A statistical analysis can be run in a vast number of ways, varying the technique, the data points that are included, the variables that are controlled for, and so on. The laws of chance dictate that even if variables of interest are statistically unrelated, some ways of running the analysis will discover an improbable degree of correlation. Even a fair coin sometimes comes up heads five times in a row. And every step in the research process tends to favor the selection of regressions that show significant results, meaning those that are superficially difficult to ascribe to chance, even when they are random mutations.

A researcher who has just assembled a data set on microcredit use among 2,000 households in Mexico or a complicated mathematical model of how microcredit boosts profits will feel a strong temptation to zero in on the preliminary regressions that show microcredit to be important. Research assistants may mine data unbeknownst to their supervisors. Then, tight for time, a researcher may be more likely to write up the projects with apparently strong correlations, deferring others with the best of intentions. And if two researchers with the highest standards study the same topic in somewhat different ways, the one finding the more significant result is more likely to win publication in a prestigious journal.22 Meanwhile, there is less career reward for spending time, as I have done, replicating and checking the work of others.23 Economist Edward Leamer characterized the situation this way in his call to “take the con out of econometrics”:

The econometric art as it is practiced at the computer terminal involves fitting many, perhaps thousands, of statistical models. One or several that the researcher finds pleasing are selected for reporting purposes. This search for a model is often well intentioned, but there can be no doubt that such a specification search invalidates the traditional theories of inference…. [A]ll the concepts of traditional theory…utterly lose their meaning by the time an applied researcher pulls from the bramble of computer output the one thorn of a model he likes best, the one he chooses to portray as a rose…. This is a sad and decidedly unscientific state of affairs we find ourselves in. Hardly anyone takes data analyses seriously. Or perhaps more accurately, hardly anyone takes anyone else's data analyses seriously. Like elaborately plumed birds who have long since lost the ability to procreate but not the desire, we preen and strut and display our t-values.24

Readers of research filter, too. Displaying a commitment to scientific evaluation unusual for a microfinance group, ten years ago Freedom from Hunger carefully studied its “credit with education” approach (see chapter 4) in Bolivia and Ghana. In Bolivia, the combination package apparently increased what the researchers termed intermediate indicators, such as breastfeeding. But with regard to the program's goals, “the evaluation research provide[d] little direct evidence of improved household food security and better nutritional status for children of mothers participating in the program.” For example, young children of mothers who took microcredit did not appear to gain more weight than children of mothers who did not.25 Yet a nine-page review of the literature on the impacts of microfinance published by CGAP cites only positive results from the Bolivia report, not mentioning the lack of evidence that Freedom from Hunger had reduced hunger.26 “We do have some use for these studies,” wrote Susy Cheston of Women's Opportunity Fund and Larry Reed of Opportunity International in a frank 1999 piece. “We quote liberally from them (as long as they are in our favor) when we apply for funding.”27

Instruments and Assumptions. Alongside the black box problem and data mining stands one more side effect of technical sophistication: fancy math obscures crucial assumptions, giving econometric work a false appearance of objectivity.

One popular technique for assessing what is causing a correlation is “instrumental variables.” For example, and simplifying, Pitt and Khandker's study that I pored over in Cairo posits the following causal chain in Bangladesh in the 1990s:

Owning less than half an acre → Microcredit borrowing → Household welfare

The left arrow says that owning less than half an acre of land makes a family poor enough to be eligible for microcredit, a criterion enshrined in the law that created the Grameen Bank.28 The right arrow says that how much a family borrows influences how well it does on such measures of poverty as total household spending per week. Roughly speaking, Pitt and Khandker's key assumption is that no causal arrow leapfrogs directly from landholdings on the left to welfare on the right: the characteristic of owning less than half an acre is related to welfare only through borrowing.29 Thus if the two things on the ends of the diagram are statistically related, moving up and down in concert or counterpoint, both the intervening arrows must be at work—in particular, the right one: borrowing must be affecting welfare. Here, the characteristic of owning less than half an acre “instruments” borrowing; and having that arrow on the left lets us study the arrow of greatest interest, on the right.

Notice the reasoning. The authors assume that being below half an acre affects household well-being only through microcredit. That assumption plus data—an observed correlation between landholdings and welfare—leads to the conclusion that microcredit affects welfare. All reasoning proceeds this way: you have to assume something to conclude something. Euclid's classic tract on geometry begins with postulates, such as that any pair of points can be connected by a straight line. From that foundation, which is asserted, not proven, conclusions such as the Pythagorean theorem are deduced. Most social scientists, like Pitt and Khandker, make clear what they assume in order to reach their conclusions. Yet they often deemphasize the assumptions in abstracts and introductions, enough that the assumptions escape the notice of laypeople. This is unfortunate because some who lack the expertise to detect the assumptions are well qualified to check them.

There is an irony here. Implicitly or otherwise, social scientists evaluating microcredit chide the “believers”—those with fervent confidence that microcredit is a sure path to economic salvation. “You can't just make the leap of faith,” the researchers say. “You have to look at the evidence.” It turns out that social scientists take their own leaps of faith. And it is not always obvious that the assumptions they make are more credible than those of the “believers.” A researcher assumes that landholdings affect welfare only through microcredit. A microfinance believer just holds faith that microcredit improves welfare. How different are social scientists from the practitioners, who also must forge ahead on assumptions and who also allow their own biases to creep into how they interpret the evidence they have? The scientists do better only to the extent that their assumptions are more credible than the true believers'.

Randomization to the Rescue

So there are good reasons to distrust seemingly objective, quantitative studies of microfinance's effects. Is running the numbers therefore a hopeless pursuit? No. In the last ten years, a new movement has swept into the evaluation of aid-funded social programs. The core idea is to not merely observe reality but manipulate it—in a word, to experiment. Experiments can be designed by the researcher, as when microfinance is randomly offered to some people and not others, or they can arise “naturally,” as when a charter school admits students by lottery.30 Either way, what differentiates these setups is randomness in who gets access to the “treatment” (microfinance or enrollment at a charter school). And that reduces or vitiates several econometric bugaboos.

Virtues of Randomization. In the physical sciences, one can perform a controlled experiment with as few as two observational units if those units can be made identical except in the characteristic of interest. One can observe, for example, if black paper heats up faster in the sun than white. But human beings are not so uniform and malleable, so experiments in the social sciences, like tests of new drugs, work best with large samples and a dose of randomness. Suppose an experimenter chooses 1,000 people in a slum, and picks a random 500 for an offer of microcredit. No two individuals in this “randomized controlled trial” (RCT) are identical when it begins, but those offered credit would probably be statistically indistinguishable from those not, having the same average income, number of kids, entrepreneurial talent, and so on. Statistically, those not offered microcredit are the counterfactual for those offered. If systematic differences between the two groups then emerge, the only credible explanation this side of the supernatural is that the loans caused the difference.

Put another way, the computerized random number generator that decides who gets the microcredit is an excellent instrument for credit:

Random number generator → Microcredit borrowing → Household welfare

Unlike with land ownership, one can safely assume that the random number generator on the left only affects household welfare through microcredit. So if the two things on the ends prove statistically related, the experiment makes a strong case that both causal arrows are at work. In particular, microcredit is changing welfare. Notice that the assumption here is much more obvious than the conclusion. In that difference lies the contribution to knowledge.

RCTs also have the virtue of mathematical simplicity, which shrinks the scope for black box confusion and data mining. Analysis can consist of little more than comparing average outcomes between the treatment and control groups, either across the whole sample or within subgroups, such as households headed by women. Elementary school children learn that kind of math. When a valid analysis can be done so simply, that reduces the impulse to explore baroque econometric variations, which can obscure the underlying reality and, through data mining, generate spuriously strong results.

For all these reasons, RCTs are the gold standard in medical research. A story shows why. By the late 1990s, half the postmenopausal women in America, on the advice of doctors, were taking artificial hormones to replace the natural ones dwindling in their bloodstreams.31 Doctors drew confidence from non-experimental “cohort studies” that tracked thousands of women over time; the women in these studies who took artificial hormones had less heart disease. But beginning in 1991, the U.S. government funded a major RCT-based research program called the Women's Health Initiative (WHI), which studied, among other things, the estrogen–progestin combination pill in women who still had their uterus. So harmful did the hormones prove in that trial that the WHI halted it early. Women taking replacement hormones suffered more heart disease, as well as more strokes and invasive breast cancers. (Death rates within the study period, however, were not higher.)32

The contradiction with the older cohort study results mystified experts.33 Perhaps women who responsibly reduced their risk of heart disease by eating well and not smoking were also more apt to follow the latest medical thinking by starting hormone replacement therapy, thus making it appear that the drugs reduced heart disease. After the RCT results were published, in 2002, drug companies slashed promotions of the drugs, and doctors slashed prescriptions.34 Breast cancer incidence fell at unprecedented speed, from 376 new cases per 100,000 among women 50 or older to 342 a year later (see figure 6-2).35

Figure 6-2. New Breast Cancer Cases per 100,000 among U.S. Women 50 and Older, 1975–2007

images

Source: Altekruse and others (2010), Tables 4.7, 4.8

Note: Includes invasive and in situ tumors. Excludes women who have had hysterectomies.

It is not certain that the release of the revolutionary findings caused the drop in cancer. However, a newer study tracked how fast heart disease, strokes, and cancer declined among WHI subjects who were taken off hormones after the original study was halted. Working backward, it estimated that hormone replacement therapy caused 200,000 extra breast cancer cases between 1992 and 2002 in the United States.36 Non-randomized studies cause cancer.

Seemingly, RCTs also ought to be the gold standard when it comes to testing which social programs are safe to provide to poor people. In fact, over the last decade, RCTs became a hot trend in development economics, the ticket to tenure at elite universities for young stars studying everything from deworming of children to bed net distribution.37 Given the weaknesses of non-experimental evaluations, the rapid rise of the “randomistas” seems fundamentally welcome. Today, hundreds of experiments are under way under the auspices of Innovations for Poverty Action in New Haven, the Abdul Latif Jameel Poverty Action Lab at Massachusetts Institute of Technology, and similar research centers. It was only a matter of time before they put microfinance to the randomized test.

Limits of Randomization. Yet the randomization movement also bears the marks of a fad and deserves a critical look. Indeed one mark is a brewing backlash. In the last few years, respected economists Angus Deaton, Martin Ravallion, James Heckman, and others have asked tough questions.38 One challenge they pose is perhaps academic: are RCT researchers doing science if they do not model what makes people in their experiments respond in the ways the researchers observe them to do? Medical researchers confronted this question decades ago and reached a consensus that useful science runs along a spectrum. At one end are “pragmatic” studies that tell us the real-world harm and good done by a treatment. At the other are “explanatory” studies that help us understand the mechanisms that link cause and effect.39 To apply that perspective to microfinance, it is worthwhile and practical to study the bottom-line question of whether financial services make people better off as well as intermediate questions, such as whether the cause is higher business profits or less-volatile spending.

A problem more inherent in experimentation is that randomized trials are by construction special cases. Running a microfinance experiment requires the cooperation of microfinance providers, many of which view manipulation of their offerings as administratively burdensome or immoral. Arbitrarily offering a service to some people goes against their professional grain. “A bank for the poor such as Grameen would find it hard to justify giving credit randomly to a group of poor people and declining credit to others for methodological convenience,” explain Asif Dowla and Dipal Barua, two men associated with Grameen since its earliest days.40 By contrast, the randomistas can argue that it is immoral not to rigorously test a potentially harmful social program on a small number of people before unleashing it on the general population, just as for drugs. But the randomistas cannot experiment without the practitioners, so they search for clever ways to randomize without withholding services. Karlan and Zinman, the ones who later wrote “Lying about Borrowing,” persuaded lenders in South Africa and the Philippines to randomly “unreject” some people who had applied for loans.41 No one was randomly rejected. Researchers at the Massachusetts Institute of Technology worked with the Indian microfinance group Spandana to randomize which neighborhoods got microcredit first as it expanded across the city of Hyderabad in Andhra Pradesh. (More follows on these studies.)

Such ingenious opportunism comes at some cost in representativeness. Since most microfinance groups do not allow researchers to interfere with their decisionmaking, those that do may be atypical. Then, too, any evaluation applies at once to an intervention and an intervener. Two microcreditors can implement the same program differently. Accepting that microcredit helps some more than others, the overall effect depends as much on the creditor's ability to select those it will help as it does on the potential effects on each borrower. Perhaps Spandana only got the kinks out of its Hyderabad operation after the researchers finished their eighteen-month data collection. Also, as described in chapter 5, microcredit tends to start out with small loans to test borrowers' reliability. The ramp-up may occur after researchers leave, so that microfinance is only evaluated at low doses. Finally, some contexts resist experimental evaluation altogether. In Bangladesh, most people already have access to microfinance, making it almost impossible to construct a control group. In the very Mecca of microcredit, the benefits cannot be rigorously demonstrated.

All these caveats show that even the gold standard has its blemishes. Every approach to knowledge in the social sciences—qualitative or quantitative, experimental or otherwise—is compromised. Each has particular strengths, so each generally deserves a place in the evaluator's portfolio. That said, my own examination of the research makes me distrustful of quantitative, non-experimental studies that once garnered the most attention. I rank good qualitative research higher in credibility and randomized quantitative studies highest of all, and I look to both for insight.

Evaluating the Evaluations

With tongue in cheek, the late sociologist Peter Rossi once enunciated “The Iron Law of Evaluation and Other Metallic Rules.” His Iron Law says that on average the measured impact of a large-scale social program is zero. (One reason: the most obviously successful programs don't need evaluation studies.) His Stainless Steel Law warns that “the better designed the impact assessment of a social program, the more likely is the resulting estimate of net impact to be zero.” Rossi conceded, “The ‘iron’ in the Iron Law has shown itself to be somewhat spongy” because some programs have been clearly shown to work, but:

The Stainless Steel Law appears to be more likely to hold up…. This is because the fiercest competition as an explanation for the seeming success of any program—especially human services programs—ordinarily is either self or administrator-selection of clients. In other words, if one finds that a program appears to be effective, the most likely alternative explanation to judging the program as the cause of that success is that the persons attracted to that program were likely to get better on their own or that the administrators of that program chose those who were already on the road to recovery as clients. As the better research designs—particularly randomized experiments—eliminate that competition, the less likely is a program to show any positive net effect.42

Summarizing a career's worth of social program evaluation, Rossi's laws are food for thought as we turn to reviewing the available quantitative studies of the effects of microfinance. If the Stainless Steel Law holds up, the randomized studies will show smaller effects than the older non-randomized studies.

The Non-Randomized Studies: Less than Meets the Eye

Researchers and evaluators have performed dozens of microfinance impact studies over the last twenty-odd years.43 There is no shortage of statistical “evidence.” However, nearly all of it, in my view, is seriously flawed. I am indeed the sort of skeptic Elisabeth Rhyne wrote about, ready to pounce like a vulture on any weakness. But what can I say? I think the vultures are onto something. In my experience, the methodological weaknesses matter when it comes to deriving real-world meaning from statistics.44 The problem is not simply that researchers are careless, though they may feel pressure to cut corners and publicize only favorable conclusions. The main problem is one already described, that it is hard to infer causation from correlation in social systems. In families, villages, and nations, everything, roughly speaking, affects everything else. Causal arrows run every which way, competing to explain any superficial patterns discovered by number crunchers.

Here I review three non-randomized studies, the first by economist Brett Coleman, the second by Mark Pitt and Shahidur Khandker, the third solely by Khandker. One literature review published in 2007 speaks only of these three.45 And the chapter on impacts in the authoritative academic text Economics of Microfinance casts the net only slightly wider, incorporating as well three studies from the Assessing the Impact of Microfinance Services project (the one that recommended comparing new and old borrowers)—but those mainly to illustrate the difficulties of measuring impact.46 These precedents should head off the appearance that I am biasing my own account by dwelling on studies with problems.

Coleman's Studies of Village Banks in Northeast Thailand in 1995–96. Microfinance first caught the attention of Brett Coleman in the early 1990s while he worked in Burkina Faso for Catholic Relief Services (CRS). In 1992, he designed CRS's first microfinance project in the country. He then returned to academia for a Ph.D. in economics at Berkeley and decided to build his thesis around microfinance. He noticed that microcredit repayment rates are typically close to perfect but assumed that many members have trouble making every payment. As result, he wanted to study how villagers respond to borrowing peers in difficulty. Did borrowers informally insure each other, paying each other's loans from time to time? Did they lend each other money on the side? Or intensify the peer pressure? He also began to think about the question of impact on poverty.

While sitting in a Berkeley coffee shop, he hit upon a clever new way to attack the question. He then persuaded CRS to apply his method in a project run by two CRS-backed nongovernmental organizations (NGOs) doing village banking in Thailand. Normally, the NGOs visited villages slated for new banks to explain the program and help residents organize themselves into the groups. Then the NGOs would make the first loans. But in a handful of villages, at Coleman's request, the NGOs inserted a one-year delay: members were gathered into village banks in early 1995 but did not begin banking until 1996. That created the first experimental assessment of the impact of microfinance. During the one-year delay, Coleman could track the lives of the newly self-selected members, as well as their non-joining neighbors, both in villages with active banks and those with delayed banks. If members gained more relative to non-members in villages where the credit was flowing than in villages where it wasn't, that would suggest that credit helps.

Matters worked out rather differently. Coleman found that CRS's village banking was not operating as intended.47 Households in the villages with banks were richer than non-members even before receiving support from the village banks, with twice the average landholdings. “None of the villagers interviewed identified the village bank as a program that targeted the poor,” Coleman explains. “Frequently, the village chief's wife was the village bank president or held another influential committee position, and other wealthy leading women in the village also usually became committee members.”48 These influential members took out the largest loans, often under several different names. Coleman could detect no impact of borrowing on the rank-and-file members, perhaps because their loans were so small. However, those with longer tenure on the governing committee, who apparently got the most credit, saved more, spent more on schooling for their boys, and bought non-land assets, such as cows. The women in these households spent more on and earned more from small business activities—and engaged more in moneylending.49

The good news in Coleman's study is that credit appears to have helped people. The bad news is that they were hardly the “poorest of the poor” CRS had targeted.

Pitt and Khandker's Studies of Solidarity Group Microcredit in Bangladesh in the 1990s. As I discussed earlier in the chapter, Pitt and Khandker ran what long stood as the most influential effort to measure the impacts of microcredit on borrowers. In 1991 and 1992, the World Bank funded the Bangladesh Institute for Development Studies to field a team of surveyors in eighty-seven villages and city neighborhoods in Bangladesh. The Grameen Bank operated in some of the villages, BRAC in others, and a government program, the Bangladesh Rural Development Board, in still others. Some of the villages studied had no microcredit program. Some of the solidarity groups were male and some were female. Some of the households that got knocks on the door from surveyors had participated in the microcredit programs; some could have, being considered poor enough, but did not; some formally could not, but did anyway; and some could not and did not 50 The surveyors visited these households after each of the three main rice seasons, asking up to 400 questions at each stop.

The first published study of the data was a tour de force of economic reasoning and econometric analysis, and it won publication in the prestigious Journal of Political Economy. Pitt and Khandker found that “annual household consumption expenditure increases 18 taka for every 100 additional taka borrowed by women…compared with 11 taka for men.” This conclusion reinforced two beliefs about microfinance: it reliably reduced poverty and was especially effective when given to women.

Pitt and Khandker also found a complex and somewhat perplexing set of other effects. Lending to women made them less likely to use contraception and more likely to send their boys to school; and it reduced how much men in the household worked. Microcredit had no discernible effect on the work hours and non-land assets of women, or on the enrollment of girls in school.51

A pair of suppositions in the study set it apart from most that came before. These were the key assumptions that let the authors make the leap from correlation to causation. The first I already described: the rule barring those with more than half an acre of land created an artificial cut-off in the availability of microcredit. Intuitively speaking, households just under the line and households just over probably had similar economic prospects, yet officially one kind could get credit while the other could not. The comparison between them, as between Coleman's two groups, could reveal the effects of microcredit. Second, to differentiate the impacts of lending by gender, Pitt and Khandker used the fact that some villages had only male credit groups, some had only female, and some had both. They assumed that whatever determined the availability of credit to a particular gender also only affected welfare via microcredit borrowings.52

Not long after Pitt and Khandker circulated their findings, Jonathan Morduch obtained and reanalyzed their data. He had already published a detailed analysis of the costs of subsidies for the Grameen Bank from such donors as the Ford Foundation and the government of Norway.53 Now he wanted to compare the costs to the benefits for poor people. In pursuit of a back-of-the-envelope grand total, he performed a variant of the Pitt and Khandker analysis that was much simpler and did not break out the impact by gender. To his surprise, the answer he obtained was, if anything, negative: credit seemed to do harm. Morduch investigated further and ended up questioning some of Pitt and Khandker's key assumptions. For example, 203 of the 905 households in the 1991–92 sample that owned more than half an acre used microcredit even though they were formally ineligible. They owned 1.5 acres on average.54 Evidently, credit officers were pragmatically deviating from the claimed eligibility rule so they could lend to people who seemed reliable and were poor by global standards. The cut-off in credit availability at the halfacre mark was not obvious the way Pitt and Khandker had implied.

As is common in such debates, Morduch did not question the correlations that Pitt and Khandker had found as much as he did their interpretation as proof of microcredit's impacts. Morduch offered some findings of his own, though they were predicated on some assumptions he had just challenged. He found that microcredit was helping families smooth their spending from season to season, perhaps assuring that they had enough to eat more of the time.55

A year later, Pitt fired off a point-by-point retort. In general, he argued that the assumptions Morduch questioned did not need to hold as exactly as Morduch implied and that if the assumptions were relaxed, the original results held up.56 This public debate became a sort of inconclusive trial: it lacked a judge and jury but had a big audience. The statistical arguments were beyond the ken of most people, like a duel between lawyers that degenerates into obscure points of procedure. And not even the litigants could fully explain what was going on because each used his own methods and did not, or could not, reconcile his results with the other's. Both their papers could be scrounged up on the Web, but neither was submitted to the verdict of a peer-reviewed journal.

As the American academics sparred to a murky draw in 1999, on the other side of the world the Bangladesh Institute for Development Studies fielded a new team to spread across the provinces and resurvey the 1,798 households last visited in 1992. For econometricians, tracking the same people over time creates tantalizing new analytical possibilities. In the case at hand, it allowed Khandker, writing on his own, to study how changes in microcredit use over the 1990s affected changes in household spending over the same years. To appreciate the significance of this tack, remember the hypothetical woman who was my example for bias from omitted variables: her aptitude for entrepreneurship simultaneously increased her borrowing and her income, creating a false appearance that the borrowing increased income. Suppose that she was in Khandker's data and that her entrepreneurial talent affected her borrowing and income equally at both ends of the 1990s. To be specific, assume her entrepreneurial pluck raised her daily income by $1 at both ends of the decade, from the $2 of the average woman in the study to $3 in 1992 and from $3 to $4 in 1999; and that her pluck raised her borrowing by $100, from the $200 it would have been if she were average to $300 in 1992 and from $300 to $400 in 1999. Looking just at the 1992 data, as in the original Pitt and Khandker paper, she would manifest as someone with higher borrowing than average and higher income, potentially creating that false appearance of microcredit reducing poverty. But if we looked at changes over time, she would no longer stand out among her peers in that misleading way: Her income would increase $1 over time (from $3 in 1992 to $4 in 1999), while the average income also would increase by $1 (from $2 to $3). And her borrowing would increase by $100 over time (from $300 to $400), while the average borrowing also would increase by $100 (from $200 to $300). By looking at changes, Khandker eliminated bias from omitted factors whose effects on borrowing and welfare are constant.

Using the lengthened data set, Khandker's estimates of the impacts of lending to women roughly lined up with Pitt and Khandker's earlier ones. But in contrast to the earlier study, he found no clear impact of microlending to men.57

While Khandker's study was strong by the standards of the time, doubts could still be raised about it. Why would propensity for entrepreneurship have constant effects in a changing country? GrameenPhone, for example, arrived on the scene in the late 1990s, creating a radical new business opportunity: “phone ladies” could buy a mobile phone with Grameen Bank credit, then rent it out like a pay phone.58 If our hypothetical woman's relative pluck had different impacts on her income or borrowing at different times, then the rationale for looking at changes over time broke down.

From my previous work on the effect of foreign aid on economic growth in receiving countries, I had learned not to trust a non-randomized econometric study until I had replicated and scrutinized it on my own computer. To make an informed judgment—to become an expert on the impacts of microfinance so I could write this book—I felt compelled to approach the leading microfinance studies in the same way. I embarked on a project to pick up where Morduch left off—to obtain the raw data; rebuild the data matrix used in the famous analyses; program my computer to implement Pitt and Khandker's sophisticated methods; rerun the analyses of Pitt, Khandker, and Morduch; and check them all for problems. In particular, I aimed to resolve the confusing stalemate between Morduch and Pitt. For a while, I followed in Morduch's footsteps. But I trod fresher ground, too, writing a computer program that enabled other people to apply the techniques Pitt and Khandker had pioneered.59 Eventually I joined with Morduch in improving the analysis, integrating it with his earlier work, and writing up the results.60

The tale from there took some unexpected twists. When, using my program, we ran the same methods on the same data, we got the opposite result from Pitt and Khandker. Taken at face value, our findings said that microcredit hurt Bangladeshi families when given to women. But we didn't believe that. Rather, additional statistical tests suggested that Pitt and Khandker's attack on reverse causation had not worked. The key pair of assumptions did not hold. That for us was the real bottom line, but it would have been stronger if we had closely matched the original results.

Then in the spring of 2011, Mark Pitt posted a reply, writing, “This response to Roodman and Morduch seeks to correct the substantial damage that their claims have caused to the reputation of microfinance as a means of alleviating poverty by providing a detailed explanation of why their replication of Pitt and Khandker (1998) is incorrect.”61 Pitt pointed out two key ways our “replication” departed from his original. Pitt and Khandker had documented the relevant technical details for one. Removing these discrepancies flipped our signs on credit back to positive, making it look once again as if credit helped. Probably if we had had access to Pitt's computer code the way he had access to ours, we would not have made these mistakes. For me, this vindicated the open approach to impacts research. By freely sharing our data and computer code online when we posted our working paper, we helped others find our mistakes and helped the research community reach more certain conclusions.62

Pitt's helpful critique, however, did not confront our conclusion that the survey data collected in Bangladesh simply could not be used to determine the impacts of microcredit on poverty. In fact, by correcting our error, he helped us to at last put his regression under the microscope. We found that the stark contradiction on sign reflected a deeper problem: trying to estimate the impacts of female and male borrowing separately was pushing the data farther than they could go, making the complex statistical procedure in Pitt and Khandker unstable. It tends to produce strongly positive or strongly negative results, neither of which therefore deserves much credence. Small changes in the data can cause the results to completely reverse.63

In our 2009 working paper, we went on to replicate Morduch's 1999 analysis and Khandker's 2005 one. The story in each case is distinctive, but the bottom line is the same: these studies can show correlations but cannot credibly prove causation.

Randomized Studies

The growing recognition of such difficulties is what gave rise to the new movement built on the idea that randomization could slice the Gordian knot of causality. In 2009, the movement arrived at microfinance's doorstep.

Karlan and Zinman's Study of a “Cash Lender” in South Africa. If one defines “microfinance” broadly, the first randomized study of microfinance took place four years prior in South Africa. There, in 2005, Dean Karlan and Jonathan Zinman worked with a “cash lender,” not unlike a payday lender in the United States. The lender made individual, four-month loans at an interest rate that worked out to 226 percent per year, or an astonishing 586 percent with compounding. To randomize offers of credit while avoiding the ethical pitfalls of arbitrarily depriving some poor people of services, Karlan and Zinman used a technique mentioned earlier. A computerized credit scoring system screened loan applications by taking into account such factors as the borrower's income and repayment history. (Branch employees, however, could override the computer's credit scoring.) The company agreed to tweak the computer program to randomly unreject some applicants whose credit scores fell just below the approval threshold. The unrejected turned out to have incomes of about $6.50 per day per household member.64 That Karlan and Zinman studied people far above the benchmark of $1–2 per day we usually imagine as the target for microcredit is not a flaw but is worth keeping in mind in generalizing from the study. And it reflects a limitation of randomized credit scoring: scoring entails too much time-consuming data collection to be economical when the smallest loans are made to the poorest people. This randomization technique does not get at the impacts of microcredit on the poorest clients, most of whom borrow through groups.

After setting their RCT top to spinning, Karlan and Zinman dispatched surveyors to the homes of accepted and rejected applicants alike. Neither the loan applications nor the surveyor visits took place all at once, so for given applicants, the space between the two events ranged between six and twelve months. To prevent borrowers from tilting their answers, the surveyors did not reveal their connection to the lender—indeed could not, for they were unaware of it themselves.65

The data revealed that applicants unrejected by the computers were no more likely to be self-employed than rejected ones with similar credit score. But they were 11 percentage points more likely to have a job. Apparently as a result, they were also 7 points more likely to be above the poverty line and 6 points less likely to report that someone in the household had gone to bed hungry in the last month.66 These gains are all the more astonishing in light of the interest rate near 600 percent per year. Although the exact mechanisms by which credit improved well-being are unclear, the data, backed by stories from borrowers, suggest that the key lay in helping a subset of clients get or hold jobs. They might have used the loans to buy required uniforms, or sample kits for sales work, or fix or buy a vehicle to get to the job.67

In a separate study, Karlan and Zinman joined University of California, Berkeley, researchers Lia Fernald, Rita Hamad, and Emily Ozer to probe the psychological effects of borrowing using a battery of mental health assessment survey questions. Borrowers, especially men, appeared more stressed and depressed than non-borrowers.68 Perhaps holding down a job and paying off a loan impose a psychological burden even as they boost economic well-being. Or perhaps the stress effect came among people whom credit did not help to gain or retain a job.

In showing economic benefits, have Karlan and Zinman defied Rossi's Stainless Steel Law that the better the study, the less the impact found? Technically, no. Rossi wrote about social programs, interventions funded by governments or charities for the public good. But the South African company lent for profit. Still, the study does defy Rossi in spirit by showing how small loans can help some poor people, even at high interest rates. For understanding microfinance, the more important caveats are that the study subjects lived well above standard poverty lines of $1 and $2 per day, and their successes revolved around employment, not entrepreneurship, contradicting the popular image of microfinance. As we saw in chapter 2, a job is a distant dream for most of the world's poor. Overall, it is a striking finding: maybe small loans “work” best when they help moderately poor people get jobs rather than when they help the very poorest people start businesses.

Karlan and Zinman's Study of an Individual Lender in the Philippines. As this book was being written, more than thirty years after the birth of modern microcredit, the first results began emerging from randomized tests of what is more usually thought of as microfinance. In 2009 two studies of microcredit and one of microsavings appeared. So far the conclusions about microcredit do conform to Rossi's Laws.

Karlan and Zinman reprised their South Africa study in the Philippine capital of Manila with First Macro Bank, which makes individual microloans. Here, the typical borrower was a family with a sari-sari store—a corner store that sells cigarettes, basic foodstuffs, and other items. To merit consideration for a loan, thus inclusion in the study, applicants had to be “18–60 years old; in business for at least one year; in residence for at least one year if owner or at least three years if renter; and [have a] daily income of at least 750 pesos [$34].” Since the subjects were already in business, the study checked whether credit helps people who are already entrepreneurs, not whether it helps people become entrepreneurs. Also, the income floor put the study population well above the Philippine average, as in South Africa. Per household member, the average income in the study worked out to $8 per day.69

After follow-up waits ranging from eleven to twenty-two months, the survey teams succeeded in tracking down about 70 percent of applicants. The surveyors asked questions, observed the quality of borrowers' houses, and administered psychological tests. Karlan and Zinman checked some fifty-five outcomes for impact, ranging from whether someone in the household was working overseas to self-reported trust in acquaintances. Unlike in South Africa, almost none of the outcomes were perturbed by access to the credit. There was no apparent effect on household income, whether a household was above the poverty line, household food quality, whether the household included any students, and whether family members were prevented by lack of funds from visiting a doctor.

A few outcomes did appear to change. But most of these may be statistical mirages. For each variable, Karlan and Zinman checked whether the averages were different for accepted and rejected applicants. And they did so within five groups of people: all applicants, women only, men only, those above the median (50th-percentile) income, and those below. It emerged, for example, that computer-accepted male applicants were 18.5 percentage points less likely to have health insurance than computer-rejected men. In total, Karlan and Zinman performed 275 comparisons (fifty-five outcome indicators for each of the five groups).70 For each, they used standard methods to estimate the probability that they could get a difference as big as they actually got—if in fact the real effect was nil. When that probability was low, say under 5 percent, they marked the difference “significant at the 5 percent level.” But sometimes the improbable happens. If in fact First Macro's loans had no effect on anything, we would expect just by chance that among the 275 treatment-control comparisons, 10 percent (27.5) would be significant at 10 percent, of which 5 percent of the 275 (13.75) would be significant at 5 percent, of which 1 percent of the 275 (2.75) would be significant at 1 percent. Against these totals of 27.5, 13.75, and 2.75 are my tallies for the actual Karlan and Zinman significance results: 37, 17, and 5.71

Thus, most of the “significant” results in Manila are probably flukes and should not be taken literally. Instead, one must look for consistent and plausible patterns in the results. Overall, I am persuaded that access to the credit led entrepreneurs to borrow more from formal institutions, such as First Macro Bank (unsurprisingly); cut back on employees, house renovation, and perhaps health insurance (perhaps a sign of belt-tightening at the time of a big investment partly financed by a new loan); express more trust in their neighbors (after experiencing that the bank is there to help); and borrow more easily from family and friends (having won the bank's seal of approval).

Banerjee, Duflo, Glennerster, and Kinnan's Study of Group Microcredit in India. The other randomized study of microcredit that appeared in 2009 reached people usually thought of as targets for microfinance. Economists Abhijit Banerjee, Esther Duflo, Rachel Glennerster, and Cynthia Kinnan of the Poverty Action Lab worked with the India-based Centre for Micro Finance and the microfinance group Spandana to randomize the rollout of Spandana's lending in Hyderabad. Hyderabad is the capital of Andhra Pradesh, putting it at the center of what was then India's microcredit boom. Spandana chose 104 areas of the city to expand into and then, in 2006 and 2007, started lending in a randomly chosen 52. A year later, surveyors visited more than 6,000 households in all 104 districts, restricting their visits to families that seemed more likely to borrow: ones who had lived in the area at least three years and had at least one working-age woman. The time between the arrival of Spandana in a district and the arrival of surveyors averaged about eighteen months. Respondents reported living on $3 a day on average.72 Twenty-seven percent of households in Spandana-served areas said they took microcredit, two-thirds of them from Spandana. In control areas, 18.7 percent took microcredit. The difference between these two rates of microcredit use, 8.3 percentage points, was the basis for assessing impacts.73

The impulse of microcredit did propagate through the lives of Hyderabad residents in ways picked up in the data, but not all the way to the indicators used to measure poverty. Households in Spandana-served areas were 1.7 percent more likely to have opened a business in the last year than in areas Spandana didn't serve. As a fraction of those who actually borrowed, the number opening businesses is closer to 5 percent. And households with more propensity to open their first business—as indicated by having more land, more working-age or literate women, and of course no business—were indeed more likely to do so if they were in an area Spandana served. Such households also spent more on “durables,” such as sewing machines, and cut back on “temptation goods,” such as snacks and cigarettes. Meanwhile, existing business owners increased profits. But the study found no certain effects on measures of poverty: total household spending per person, women's say in household spending decisions, health spending per person, children having major illnesses, school enrollment, and school spending.

Of course, absence of proof is not proof of absence. Perhaps the team failed to find impacts on poverty because the treatment group had almost as much access to microcredit as the control group. One way to investigate this issue is to look at the confidence intervals around the estimated effects. For example, Spandana's measured impact on household spending (which, recall, averaged $3 a day overall) was not exactly zero. Rather, with 95 percent confidence, it was somewhere between –12¢ and +28¢ per household member per day, with the best estimate being at the center of that range, +8¢.74 Thus even though the relatively modest difference in credit availability between the treatment and control groups reduces the power to detect impacts, the results were sharp enough to tell us that there is only a 2.5 percent chance that the true average effect is above +28¢ (and ditto for being below –12¢). Because zero is well within the 95 percent interval, standard practice in statistics is to conclude that convincing evidence of impact has not been found.

If microcredit clearly led to more businesses and more profits, why didn't household incomes rise with the same statistical obviousness? The survey data do not allow a definitive answer. One possible reason is that people who invested more money and time in their own businesses earned less outside the home. Another is that only about a third owned a business, so their gains waned once averaged over the whole sample. Perhaps a year was too short to wait to witness the full effects of credit. Or perhaps the welfare of those without enterprises went down as they borrowed to pay for televisions and makeup. Indeed, the microcredit market in Hyderabad blew apart in 2010 after the state government cracked down on the industry amid reports of overborrowing and suicide (more on that in chapter 8).

A Randomized Test of Microsavings. The brightest spot in the randomized impact research on microfinance also has been the least heralded—a trial of savings undertaken by two economists just finishing their dissertations. In the rural market town of Bumala, Kenya, on the road between the Kenyan capital of Nairobi and the Ugandan capital of Kampala, Pascaline Dupas and Jonathan Robinson worked with a local village bank to offer free savings accounts to randomly selected “market vendors, bicycle taxi drivers, hawkers, barbers, carpenters, and other artisans”—in other words, existing microentrepreneurs. The accounts paid no interest and in fact charged for withdrawals, a feature that helped people who were looking for the discipline to save.75 Uniquely, the research team followed up with the subjects not once, through the usual door-to-door survey, but daily over several months, through logbooks not unlike the financial diaries at the heart of the landmark 2009 book Portfolios of the Poor, introduced in chapter 2.76

Of the 122 people offered an account, about 67 opened one and actually used it, of which only 54 made more than one deposit in six months. A relatively small group, mainly female, made most of the deposits and withdrawals. That's a small cohort with which to study impacts, yet the authors find significant differences. Within six months, women had invested more in their businesses, increased personal spending from 68¢ per day to 96¢, and increased food spending from $2.80 to $3.40 (with a typical family having two parents and three children).77 The savings accounts appeared to help women accumulate money for major purchases for their businesses, such as stock for their stores, which in turn may have increased profits. The pattern did not hold for men. The accounts may have helped primarily by giving women more control over their own impulses to spend in the moment or by giving them a way to deflect family requests for money. Especially if the latter, the women's gains may have come partly at the expense of relatives outside the study group.

Conclusion

The new generation of randomized studies marks a break with the past and holds real promise for revealing the impacts of microfinance. Yet each RCT makes only an incremental contribution to knowledge. It tells us a bit about what happened to particular groups of people at particular places and times. A good way to put such a study in perspective is to state its findings tightly. Take the Hyderabad study as example: In 2007–08, about eighteen months after Spandana began operating in some areas of Hyderabad, among households that had lived in their area for at least three years and had at least one working-age woman, those in Spandana areas saw no changes in empowerment, health, education, and total spending, on average, that were so large as to defy attribution to pure chance, compared to those in areas Spandana would soon expand into. That conclusion is a far cry from its interpretation in many newspapers that microfinance doesn't work. But more microfinance RCTs are under way, and as their results come in and patterns emerge, they will give us a richer sense of the impacts of microfinance on its clients, their families, and their communities.

I draw two main lessons from this tour of the evidence. First, poor people are diverse, and so are the impacts of microcredit upon them. Thus, microcredit undoubtedly helps many people. A distinction that appears particularly important in the latest results is between entrepreneurs, who are a minority, and everyone else. Microcredit in Hyderabad and commitment microsavings in Bumala helped active entrepreneurs.

And I conclude that there is no convincing evidence that microcredit raises incomes on average. While many have sought that Holy Grail and many have thought it found, it still eludes us. It is entirely possible that a majority of microfinance users do not invest in microenterprises but instead use the loans to smooth spending on necessities, as we saw in chapter 2. That could show up as lower spending (net of interest payments)—and getting to eat every day.

The ambiguity about average impact arises in part from an opaque mixture of four factors:

—Different people use microfinance different ways.

—Even people who use it in the same way can experience different outcomes.

—Families, villages, and neighborhoods are complex webs of causal relationships, which are hard to disentangle.

—Average effects depend as much on the ability of microfinance institutions to select those most likely to use finance well as it does on the potential effects on each user.

These complexities no doubt obscure the real impact of microfinance; yet their significance should not be exaggerated. That cash lending in South Africa at 586 percent helped people get or keep jobs shines through in Karlan and Zinman's data. If the average benefits of microcredit for poorer people without much access to steady employment are as clear-cut, they, too, should shine through in RCTs. Until that happens, prudence calls for skepticism of any claims about the systematic transformative power of microfinance. So does common sense: industrialization rather than financial services for poor people has historically reduced poverty. And until researchers understand better how many are made worse off by microcredit in particular, we cannot be sanguine about the ethics of the intervention. Suppose microcredit lifts 90 percent of borrowers just above the poverty line and consigns the rest to a spiral of indebtedness and destitution. Is that a reasonable trade-off, or a bargain with the devil?

This question brings me back to my encounter with the women in Cairo. Should I have told them they were making a mistake, risking their families' finances on an unproven intervention? Can 150 million microcredit borrowers around the world all be wrong? I think not. Absent strong evidence of harm, it would be the height of arrogance to dispute their judgment. On the other hand, absent strong evidence of benefit, it is reasonable to ask whether the intervention deserves my tax dollars.

Think of the paradox this way. In the last decade, the mobile phone has spread like wildfire in poor nations—for example, in the Democratic Republic of Congo, a vast and war-torn nation with shards of government. Few doubt that this is a fundamentally good thing. Scarce are the skeptics who demand RCTs to prove that mobile telephony helps the poor. While it must be the case that some Congolese are wasting money chatting on the phone, interconnection adds radical new possibilities to life. Iqbal Quadir, the original visionary of mobile phones for poor people and the driving force behind the founding of GrameenPhone, says that “connectivity is productivity.”78 One new possibility, in fact, is a way to do financial business, represented by the wildly popular M-PESA money transfer system in Kenya. Moreover, the triumphs of M-PESA in Kenya, of the telecommunications company Celtel in the heart of Africa, and of GrameenPhone in Bangladesh are sources of indigenous pride. Most of the countries that are today rich got that way thanks to just such business successes, repeated in a thousand industries.

But along with mobile phones, another Western export has also overspread the developing world: cigarettes. Few doubt that this is a fundamentally bad thing. The scientific evidence on the dangers of smoking should trump any attempt to cast tobacco addiction as a victory for consumer autonomy and entrepreneurship.

So is microfinance more like cell phones or cigarettes? Unless or until randomized microfinance trials show strong average benefits or harm, the best answer to this question will come from evidence and analysis that is less compelling—but perhaps also more honest and insightful. The next two chapters explore two major themes in this vein, both hinted at in the mobile phone metaphor: the extent to which financial services give poor people more agency in their lives and the extent to which microfinance has enriched the economic fabric of nations.

 


1. Wolff (1896), 4.

2. Rhyne (2001), 188.

3. For example, see Muhammad Yunus, “The Grameen Bank,” Scientific American, November 1999, 118. The figure comes from Khandker (1998), 56, which extrapolates from Pitt and Khandker (1998).

4. See Kiva (j.mp/kVVJ0L); FINCA (j.mp/kwd44i); Microcredit Summit Campaign (j.mp/iJShlg); Opportunity International (j.mp/ipbaCp); Acción (j.mp/kZv9ob).

5. Morduch and Haley (2002).

6. Demirgüç-Kunt, Beck, and Honohan (2008), 99.

7. Tim Harford, “Perhaps Microfinance Isn't Such a Big Deal After All,” Financial Times, December 5, 2009.

8. Drake Bennett, “Small Change,” Boston Globe, September 20, 2009.

9. Acción International and others, “Measuring the Impact of Microfinance: Our Perspective,” joint statement, April 8, 2010 (j.mp/lZ1n83).

10. David Roodman, “Microfinance Groups, Feeling Misunderstood, Misunderstand Research,” Microfinance Open Book Blog, April 9, 2010 (j.mp/9TKsqH).

11. Quoted in Rahman (1999b), 28.

12. Shahidur R. Khandker, “Household Survey to Conduct Micro-Credit Impact Studies: Bangladesh, Variables List,” September 13, 2007 (j.mp/gPV2Kd).

13. Karlan and Zinman (2008a).

14. Mehnaz Rabbani, researcher, BRAC, Dhaka, interview with author, March 4, 2008.

15. Todd (1996), 86–89.

16. Todd (1996), 87. Emphasis in the original.

17. Todd (1996), 171–74; see also Ito (1999), 138–64.

18. Roodman (2007a, 2009); Roodman and Morduch (2009).

19. Karlan (2001). To be fair, the agency saw its approach as rough and ready—imperfect, but practical for microfinance groups to incorporate into their normal operations.

20. Pitt and Khandker (1998); Roodman (2011).

21. Anscombe (1973).

22. Sterling (1959); Feige (1975); Denton (1985).

23. Tullock (1959); Hamermesh (2007).

24. Leamer (1983). t-values measure statistical significance.

25. MkNelly and Dunford (1999), 89.

26. Littlefield, Morduch, and Hashemi (2003).

27. Cheston and Reed (1999), 8.

28. Grameen Bank Ordinance No. XLVI of 1983 (j.mp/nNoO2A).

29. More precisely, this is assumed to hold after linearly controlling for other variables, including log landholdings.

30. Angrist and Pischke (2010). They also embrace high-quality non-randomized methods such as regression discontinuity design.

31. Majumdar, Almasi, and Stafford (2004), 1987.

32. Writing Group for the Women's Health Initiative Investigators (2002).

33. Gina Kolata, “Hormone Studies: What Went Wrong?” New York Times, April 22, 2003.

34. Majumdar, Almasi, and Stafford (2004).

35. Altekruse and others (2010), Tables 4.7, 4.8.

36. Chlebowski and others (2009). Another theory attributes the rise to the declining rates of mammogram use, but the study finds that it cannot explain why the number of cancer cases fell so much once women were taken off the hormones.

37. Daniel Altman, “Small-Picture Approach to a Big Problem: Poverty,” New York Times, August 20, 2002.

38. Deaton (2009); Ravallion (2009); Heckman and Urzua (2009).

39. Schwartz and Lellouch (1967).

40. Dowla and Barua (2006), 34. Dowla studied under Yunus and served as the Grameen Project's accountant. Barua founded Grameen with Yunus and was its deputy managing director until 2010.

41. Karlan and Zinman (2010, 2011).

42. Rossi (1987).

43. Morduch and Haley (2002).

44. Easterly, Levine, and Roodman (2004); Roodman (2007a, 2007b, 2009); Roodman and Morduch (2009).

45. Meyer (2007).

46. Armendáriz and Morduch (2010), 267–315.

47. Coleman (2006) refines the analysis in Coleman (1999).

48. Coleman (2006), 1619.

49. Coleman (2006), 1635–38.

50. In addition to the eligibility criterion that Pitt and Khandker focused on for purposes of statistical analysis—ownership of half an acre or less—the law creating the Grameen Bank also defined as eligible households with non-land assets worth less than an acre of land.

51. Pitt and Khandker (1998); Pitt and others (1999).

52. More accurately, the presence in a village of credit groups by gender, as well as the characteristic of being above the half-acre line, were assumed exogenous after controlling for various village, household, and individual characteristics.

53. Morduch (1999).

54. Figures from Roodman and Morduch (2009).

55. Morduch (1998).

56. Pitt (1999).

57. Khandker (2005), 279.

58. Sullivan (2007).

59. Roodman (2011). The program is “cmp” and is freely available. It runs in Stata, a commercial statistical package.

60. Roodman and Morduch (2009).

61. Pitt (2011).

62. See David Roodman, “Response to Pitt's Response to Roodman and Morduch's Replication of…. etc.,” Microfinance Open Blog Book, March 31, 2011 (j.mp/gwgo0g).

63. Roodman and Morduch (2009).

64. Karlan and Zinman (2010). Interest rates are author's calculations, based on a flat rate of 11.75 percent per month.

65. Karlan and Zinman (2010), Web appendix table 2. The $6.50 value is based on monthly household income of 4,389 rand for applicants with a 50 percent chance of being unrejected, an average 5.6 people per household, and a purchasing power parity conversion factor of 3.87 rand per dollar for 2005 from World Bank (2011).

66. Karlan and Zinman (2010), Web appendix table 2.

67. Dean Karlan, professor, Department of Economics, Yale University, e-mail to author, July 23, 2009.

68. Fernald and others (2008).

69. Karlan and Zinman (2011). Figures of $34 and $8 use a 2006 purchasing power parity conversion factor of 22.18 pesos per dollar, the latter also using a mean monthly per capita household income of 5,301 pesos.

70. I exclude total formal borrowing as an outcome because it is more directly affected by getting a loan.

71. Tallies are based on the initial working paper (Karlan and Zinman 2009), which included a large set of results that was distilled for final publication in Karlan and Zinman (2011).

72. Based on a control group mean of 1,419 rupees per month at a 2006 purchasing power parity conversion factor of 15.06 rupees per dollar.

73. Banerjee and others (2009).

74. Technically, this statement is not quite correct because confidence intervals have a probabilistic interpretation relative only to the null hypothesis of no effect. Confidence intervals are ±1.96 standard errors, where a standard error is 46.221 rupees per month.

75. The village bank here is a financial services association, a type of institution in Kenya that resembles the nineteenth-century village banks more than the ones devised by John Hatch, in that members buy shares. See chapters 3 and 4.

76. Dupas and Robinson (2009); Collins and others (2009).

77. Using a 2006 purchasing power parity conversion factor of 30.8216 shillings per dollar.

78. Sullivan (2007), 10.