All sciences—social and scientific—have their techniques for validating hypotheses. Economics is no exception. Its practitioners make heavy use of increasingly sophisticated statistical techniques to try to sort out cause and effect and make predictions.
Except in the situations I discuss in the next chapter, empirically oriented economists do not have the luxury of conducting experiments to test their hypotheses, as do their counterparts in the hard physical sciences. Instead, economists must try to tease out relationships and infer behaviors from what is already going on in the real world, which cannot be stopped and restarted to suit the needs of economists who want to know what is really happening.
In this chapter, I will take you through a tour of the statistical method economists most commonly use—regression analysis—first by briefly explaining the concept and its origins, then discussing its use during the heydays of large forecasting models (the era when I learned economics) and later during the waning popularity of those models. I will also discuss how the tools of economics have been used to analyze complex challenges and solve real-world business disputes, in what is called the economic consulting industry. I will then introduce you to the exciting world of sports analytics, in which statistical and economic methods have played a central role. I conclude with an application of the Moneyball concept, popularized by Michael Lewis, to policy and business.1
Consider this introduction to a basic statistical technique (which I promise will be painless) as a worthwhile investment in understanding the really fun stuff in the latter half of the chapter.
Suppose you are a farmer with a reasonably large amount of acreage and you grow corn. You have historical data on the amount you plant, the volume of your crop in bushels (we will ignore prices since they are outside your control because you compete in a highly competitive market), and the amounts of fertilizer and insecticide you apply. Now suppose an agribusiness conglomerate comes to you and talks you into buying its special supplement which it says will enhance your crop volume, based on data supplied from the company’s experience with other farms, and apply it after planting.
Months pass and you reap your crop. Amazingly, it’s up 10 percent compared to the year before. Can you say with confidence that the application of the supplement did it?
Of course you can’t. A whole lot of things influence your crop output, some within your control like the fertilizer, the insecticide, and the supplement, and other factors, such as amount of rain, days of sun, daily temperatures during the growing season, and so on. Ideally, you’d like to control for all factors other than the application of the supplement, so you can know with some degree of confidence whether and to what extent that supplement worked or didn’t.
How would you go about addressing this challenge? Well, it turns out that some very smart statisticians in the early part of the nineteenth century developed techniques to enable you to do precisely this. Furthermore, these same techniques, known as multivariate regression analysis, have been taught to undergraduate and graduate students for decades in statistics and social science classes, in some cases even to advanced high school students.
Regression analysis enables economists (or other social and physical scientists) to understand the relationships of different variables. In the farming example above, an economist or statistician would estimate an equation to understand how different independent factors (such as the special supplement, the fertilizer, the amount of rain, the days of sun, the temperature, and so on) cause an effect on crop output—the dependent factor.
It turns out that when the data are collected and organized, an economist or statistician, or, frankly, many analysts with even less formal training, can estimate an equation to find out if the 10 percent increase in your crop output was in fact caused primarily by the application of the special supplement, or if instead the other factors had a larger effect. The beauty of regression analysis is that it enables the analyst to estimate the effect of every causal variable on the key dependent variable, controlling for the effect of all other causal variables that you want or need to explain and influence, such as crop output.
It could be the case, for example, that the special supplement contributed very little to your increased crop yield and that instead the amounts of fertilizer, rain, and sun were the most important factors. If that is true, you have valuable information: There’s no need to buy the special supplement. Simply keep using the fertilizer and make sure your corn gets enough water (the sun you can’t control). You could even cut back on the insecticide, since the regression results may have shown a negligible effect.
Besides helping you to understand how different factors relate to each other, regression analysis can be used to predict the value of one variable if you know the values of the factors that you think affect that variable. For example, with regression analysis, you can get a fairly good idea of roughly how much output your crop will yield next month under different scenarios (a lot of rain, few days of sun, and so on) based on historical data that you’ve collected. That is the magic of statistical analysis.
In the real world, knowing not only the direction of the effect of a given factor on another (positively or negatively), but also the approximate size of that effect, can be extremely valuable. In a sense, many decisions in business, sports, public policy, and life involve questions about the unknown effect of changing one factor. Will adding the special supplement cause your crop output to increase? Will revenues increase if we raise ticket prices, and if so, by how much? What are the main determinants of economic growth? Regression analysis can be used to answer these and many other questions.
Although I spend the rest of this chapter highlighting some of the ways regression analysis and other statistical techniques have been used in the business world, I want to add a word of caution. There is an old saying: “There are lies, damn lies, and statistics;” implying that, given enough time and ingenuity, one can prove just about anything with statistics. That overstates things, but there is some truth to the saying. Careful choice of the time periods and the specifications of equations to be estimated can generate the results a particular researcher wants. Or, better yet, in this age of essentially costless computing, it is easier than ever to engage data mining to run regressions mindlessly to see which equations best fit the data and then proclaim that one has found the truth.
The best way to guard against data mining, used in its pejorative sense (there is a more positive use of the term I will discuss later), is to test estimated equations out of sample, or in future periods after the period used to estimate the equation. Clearly, an equation that does a poor job predicting future values of the variable of interest, or the dependent variable, calls into question the value of the explanations it purports to advance from the historical data. Conversely, equations that do relatively well predicting future values inspire confidence in the validity of the regression results, which brings us to the first business use of regression analysis—predicting the future.
The business of economic forecasting as we know it today has its roots in the Keynesian revolution during the Great Depression and World War II. One of the pioneers of econometrics and macroeconomic forecasting was a Dutch economist, Jan Tinbergen, who is credited with developing the first national comprehensive model of how an entire economy works. Having done it first for his home country, the Netherlands, in 1936, Tinbergen produced a model of the American economy in 1938 for the League of Nations in Geneva, Switzerland.2 Tinbergen’s initial model is considered a precursor to the large forecasting models solved by computers today. In 1969, Tinbergen and Norwegian economist Ragnar Frisch shared the very first Nobel Prize awarded in economics, “for having developed and applied dynamic models for the analysis of economic processes.”3
American economist Lawrence Klein created the first large comprehensive forecasting model of the U.S. economy (see box that follows). Having studied for his PhD under Paul Samuelson at MIT during World War II, Klein built a more robust version of Tinbergen’s earlier model in order to estimate the impact of the government’s policies on the U.S. economy.4
By the 1960s, Klein was the undisputed star in the field of forecasting. Through a nonprofit organization set up within the University of Pennsylvania known as Wharton Econometric Forecasting Associates (WEFA), he would regularly perform and sell forecasts to both the private sector and governments around the world.5
During roughly this same period, Klein and other economists at the University of Pennsylvania collaborated with Franco Modigliani (another future Nobel Prize winner) and his colleagues at MIT, and with economists at the Federal Reserve Board to build the Penn–MIT–Fed macroeconomic model of the economy. That model has been successively refined through the years, but it is still the workhorse of the Fed staff in preparing their forecasts for the meetings of the Federal Open Market Committee, which sets monetary policy and conducts other business of the Fed.
Klein and WEFA’s foray into econometric forecasting attracted other entrants. Among the more notable, and for a time the most successful, was Data Resources Inc., founded by the late Harvard economist Otto Eckstein and Donald Marron, a former CEO of Paine Webber (a brokerage firm bought by UBS bank in 2000). During the 1970s and 1980s, DRI and WEFA were the dominant macroeconomic forecasting firms, projecting not only the outlook for the entire economy but for specific industries. Both firms also provided one-off studies of particular subjects using their econometric engines—their large bodies of equations, based on the use of regression analysis and historical data on multiple variables.
Through much of this period it seemed as if the macro models had unlimited futures but then, as in other industries, disruptive technologies combined to make the macro forecasting business, as a commercial operation, much less profitable. One of these technologies was older and in use for some time before it helped seal the fate of the large macro models, namely software for regression analysis and other statistical techniques that individual users could use on their own mainframe and later minicomputers. Robert Hall is one of the nation’s leading economists; he has long taught at Stanford and at this writing is the head of the committee of the National Bureau of Economic Research that pinpoints the dates at which expansions end (recessions) and later begin. He also developed one of the most popular programs of this genre, TSP (Time Series Program). Hall did this while he was a graduate student in economics at MIT.
Hall’s and subsequent versions of TSP refined by Berkeley’s Bronwyn Hall were important, but it was a hardware innovation—the personal computer—combined with the statistical software packages then available that really disrupted the macro modelers. Armed with a PC, a statistics software app, and some data, virtually anyone with enough training could build his or her own, much smaller models without paying substantial annual sums to the macro modelers for either macro or industry-specific (micro) forecasts. And that is precisely what many customers of the macro modelers eventually did.
Macro-model customers moved away from the models for other reasons as well. For one thing, they were so large, with so many equations, that they were not transparent. Users couldn’t easily understand how a change in one or more of the input variables translated into changes in projected outputs. They simply had to trust the model, or the modeler, since it was also unclear how often and to what extent those running the models adjusted their forecasts with judgmental factors that reflected the modelers’ own beliefs about whether to trust the unadjusted projections of their models.
Another contributing reason for the decline in macro modeling was the so-called Lucas critique, outlined by eventual Nobel Prize winner Robert Lucas of the University of Chicago. Lucas demonstrated that fiscal and monetary policies were influenced by some of the factors driving the forecasts of macro models, so one could not draw reliable conclusions about the impacts of certain policy changes by using the models. In technical terms, Lucas showed that fiscal and monetary policies were not truly independent variables.
Another problem that has plagued not only the macro models but also users of regression analysis is how to distinguish between causation and correlation. Two or more variables may be highly correlated with the variable to be projected, say GDP, but it may not be clear they cause or determine GDP. Although Clive Granger developed a statistical method for addressing this problem—an achievement that earned him a Nobel—the macro models did not correct all of their equations for it.
Yet another challenge to the macro models was posed by the rise of VAR models (technically vector autoregression models) that were statistically fancy ways of just extrapolating past data into the future. VAR models often outperformed the structural macro models. One of the leading exponents of VAR models is another Nobel Prize winner, Christopher Sims of Princeton University. Both VAR and the macro models had difficulty predicting turning points in the economy, or the beginnings of recessions or expansions.
The decline of the large-scale macro-model business has not ended forecasting, however. Numerous forecasters, on their own or working mostly for financial companies, offer forecasts built with PCs and off-the-shelf software and are routinely surveyed by popular news outlets such as the Wall Street Journal. At the same time, several large-scale commercial models remain (Moody’s, Macroeconomic Advisers, and IHS, which also bought Global Insight). The Fed and the International Monetary Fund, among other official entities, continue to use their own large-scale macro models.
Many businesses and other governmental organizations—notably the Congressional Budget Office and the Council of Economic Advisers—use an average of the major forecasts of key macroeconomic variables, such as GDP growth, inflation, and unemployment, compiled by Blue Chip Economic Indicators. This approach adapts the wisdom-of-crowds approach (really the wisdom of experts) to forecasting, which has been widely popularized by the journalist James Surowiecki of the New Yorker.11
In the 1970s the tools of economics, and particularly econometrics, began to be widely applied to solve real-world business and legal challenges by new firms in what is now known as the economic consulting industry.
Whereas the business of forecasting involves primarily macroeconomic models dealing with the national economy, the business of economic consulting involves the application of microeconomic tools to the challenges that individuals and firms face, rather than whole economies. In particular, the economic consulting firms formalized the business of providing economic expertise in an expanding array of legal disputes, addressing such questions as causation, valuation, and damages. Today, economists from various consulting firms are routinely used as experts, on both sides, in legal disputes involving antitrust, patent, discrimination, and torts (personal injuries) issues, among others. In addition, economists are frequently found in various regulatory proceedings at all levels of government.
The rise of economic consulting also coincided with the development and growth of the field of law and economics, taught in both law schools and economics departments. One of the fathers of law and economics, Richard Posner, has had a tremendous influence on the way many judges analyze and decide cases. Posner, a law professor with economic training who later was appointed to be a federal circuit judge, is widely regarded as the most prolific legal scholar and judge of his generation. In 1977, he cofounded, with his University of Chicago Law School colleague William Landes, Lexecon, which became one of the more successful economic consulting firms. Lexecon is now part of the global firm FTI Consulting.12 Other successful competitors in the economic consulting business include Analysis Group; The Brattle Group; Cornerstone; CRA International; Economists, Inc.; Navigant; and National Economic Research Associates, or NERA. (Full disclosure: during the course of my career I have had a part-time relationship, as many economists do, with several of these firms.)
The growth of the economic consulting industry as we know it today would not have been possible without the technological revolution of the past thirty years. In particular, many of the innovative tools and methods used in economic consulting, such as regression analysis, depend on the use of advanced computers, network services, and software to store and analyze large quantities of data.
The contribution of the economic consulting industry to the economy should be put into some perspective, however. Since the litigation-consulting component of the business consulting industry is linked to specific disputes, the economists who participate in these matters assist in transferring wealth from one pocket to another, which may outweigh any enhancements to the productive efficiency of the economy to the extent that the quantification of damages assists the legal system to deter undesirable behavior. The latter impacts encourage resources to move to more productive activities. Whatever the net impacts of litigation consulting may be, it is uncontestible that economic consultants could not do their jobs and the audiences they address—judges, regulators, and sometimes legislators—could not interpret the consultants’ work without relying on the methods of analyses developed by academic economists and statisticians.
Earlier I referred to the practice of data mining in the pejorative sense in which it was used during much of my career. My, things have changed. With the rise of the Internet, mobile telephones, and the proliferation of various kinds of databases, both public and private, the term now has both negative and positive connotations, but very different from those associated with those just running regressions. The negative associations overwhelmingly reflect concerns about intrusions of personal privacy by the government or private companies. The positive aspects of data mining, now associated with Big Data, relate to the ability of analysts to uncover patterns in very large data sets in a short period of time that can lead to new drugs and other products, new services, and new ways of producing or delivering them.
It is not my purpose here to debate the pros and cons of mining Big Data and how to limit its downsides, but rather simply to point out that the analytical techniques used for extracting useful information from large data sets include (but are not limited to) regression analysis in its various forms. These techniques are used in businesses to analyze customer behavior in the real world and on the Internet; pharmaceutical companies looking for new cures; meteorologists looking to improve their forecasts; financial institutions seeking to improve their detection of fraud; and, as will be discussed in the next chapter, by firms conducting continuous experiments (often on the Internet) seeking to refine their product and service offerings to consumers. Expect more uses and benefits from Big Data as more firms, and even the government, devote more resources to data analytics.
I am aware that the main practitioners of data mining are statisticians rather than economists. Indeed, Google, with a huge amount of data due to the vast number of searches conducted on its website each day, has many more statisticians than economists on its staff for analyzing data.14 Nonetheless, economists can be useful in structuring these analyses, highlighting what to look for, as well as in designing and interpreting the results of the experiments aimed at improving customer experiences. Businesses are increasingly recognizing this to be the case, and they are hiring economists in increasing numbers (after pretty much ignoring them in the preceding two decades).15
Love it or hate it, Big Data is here to stay. There is a growing literature on the topic that is difficult to keep up with. At this writing, I highly recommend two books cited in the endnote for those interested in the subject.16 An earlier book, Super Crunchers, by economist and lawyer Ian Ayres of Yale Law School, anticipated the growth of Big Data analytics and is also worth reading if for no other reason than he was way out front on this topic before it became as popular as it has become.17
One final observation about all this is worth noting. The Big Data movement came largely out of the business world rather than academia, and thus is the exception to the rule of this chapter (although it is roughly consistent with the course of events described in the topic areas covered by the next two chapters). At this writing, in mid-2014, universities are just beginning to catch up to industry’s need for a whole new generation of data scientists—individuals who have training in multiple fields, primarily statistics and computer science, but also economics and perhaps one or more of the physical or biological sciences. As just one example, Georgetown received a $100 million donation in September 2013 to launch a new public policy school, one of whose primary missions will be data analytics. Carnegie Mellon and the University of California at Berkeley already have made their marks in the field. I expect a growing number of other schools to join them in the years ahead.
It may not exactly be Big Data, but the data generated by athletic performances is certainly interesting to millions of Americans and the owners of the teams who put them on the field. Perhaps no sport is more measured or attracts more data geeks than professional baseball. The best-known sports geek of them all is Bill James, who helped launch the baseball data revolution from a makeshift office in the back of his house in Lawrence, Kansas. James and his craft were catapulted into fame by Michael Lewis in his book Moneyball (the basis for the movie of the same name).18
Moneyball entails the use of statistics to discover and exploit the inefficiencies in the valuation of individual players (baseball in the first instance) to determine how and to what extent these players contribute to their teams’ performance. The book Moneyball credits the Oakland Athletics and its manager, Billy Beane, with being the first practitioner of this mode of analysis but in fact other teams were making use of some of the same techniques, now also widely referred to as sabermetrics, at or near the same time.
The fundamental idea behind sabermetrics is to identify the key variables that most contribute to the performance of both players and teams. Since you are now familiar with the basic premise of regression analysis, you won’t be surprised to learn that the Oakland A’s used various forms of it to evaluate baseball players’ batting statistics (and also their fielding statistics) in college and the minor leagues to discover overlooked or undervalued players to build a relatively inexpensive winning team. Put another way, teams practicing moneyball use baseball data to find undervalued players in much the same way that Warren Buffett and other value investors use financial data to discover undervalued stocks (one of the topics covered in Chapter 8).
Today, virtually all baseball teams engage in some form of moneyball, although none to my knowledge use it exclusively. More typical is the way the St. Louis Cardinals employ it—two different teams of experts, traditional scouts who rely on their gut and feel from observing young prospects, and the quants or analysts who go by the numbers, are mixed together to decide who to draft and trade. The stakes are huge, and no one has yet perfected the art of picking all the right people. The Cardinals’ scouting staff, for example, analyzed all of the baseball draft results from all teams between 1990 and 2013 and found that if a club signed nine players from a single year’s draft (in which more than 20 players are taken by each team, counting all rounds) who eventually made it to the major leagues, that would put the team in the 95th percentile of all teams (namely in the top two).19 Comparable data for just the more recent years, when presumably many, if not all, teams are using analytic techniques to help them identify talent, are not available, and one hopes are better. Still, picking young baseball players who are likely to have successful professional careers remains part art and part science, though moneyball techniques are pushing things in the scientific direction.
This is evident from the large and growing interest in sports analytics among fans of all types of sports teams, as well as among academic scholars. For example, if you’re into sports and want to know how a clever economist can come up with really interesting insights into what works and doesn’t, at least statistically, I highly recommend Scorecasting by Tobias Moskowitz (the economist) and L. Jon Wertheim, executive editor of Sports Illustrated.20 For a thorough discussion of the uses and limits of sabermetrics in baseball, where it all started, you can’t do better than The Sabermetrics Revolution: Assessing the Growth of Analytics in Baseball by Benjamin Baumer (a mathematician) and Andrew Zimbalist (one of the leading “sports economists” in the country, baseball in particular).21
The growing academic and real-world research in sports analytics turns out, not surprisingly, to be of more than academic interest. The annual MIT Sloan Sports Analytics Conference, for example, has become the premier forum for discussing the growing importance of the application of analytics to a range of sports. The conference has attracted growing numbers of attendees since its founding in 2006, and representatives from all major sports and all corners of the country come to it every year to discuss the latest trends and developments.
Can all sports be moneyballed? In other words, is it possible to apply analytics to the other major sports—football, basketball, hockey, and soccer—to discover and exploit different inefficiencies? Are individual sports like golf or tennis easier to moneyball? Are there certain sports that simply can’t be moneyballed? And perhaps most important to the vast majority of us who are not professional athletes, can or will a type of moneyball be used to assess our performance in the workplace?
The answers to all these questions seem to be yes, although it will take more time for analytical techniques to penetrate some sports than others. The speed of adoption in various sports will depend on the types of variables and metrics that are unique to each sport, and whether that data can be collected, analyzed, and exploited effectively. Some sports, like football, are more team-oriented and therefore have less individuality than baseball. The insights of moneyball and economics suggest, however, that there could be a lot of low-hanging analytical fruit in sports other than baseball, since they haven’t been explored as extensively yet.
Basketball is one sport outside of baseball where moneyball is starting to make inroads. At the 2011 MIT Sloan Sports Analytics Conference, for example, the backdrop in the main panel room featured a picture of Kobe Bryant taking a fadeaway shot just as Shane Battier was sticking his hand in Bryant’s face. The image illustrated a well-known analytical finding by the front office of the Houston Rockets (Battier’s team at the time): Bryant is a much less efficient scorer (as is likely the case with most other players) when a hand from an opposing player obstructs his view.22
Still, one sign that basketball has a way to go to catch up to baseball in analytical techniques is that professional basketball teams have been reluctant to discuss what measures they look at, citing the information as proprietary. Only when these measures become standardized and are widely adopted by all teams and made public, analogous to on-base percentage and similar publicly available baseball statistics, is moneyball likely to become part of the mainstream in basketball or any other sport.
As for the workplace, most companies already have a variety of ways in which they measure the performance of their employees, both by the quantity and quality of their output. There is an entire human relations sub-industry that has grown up around this subject. As sports analytics become increasingly sophisticated and well accepted across a number of sports, do not be surprised if some of the lessons from the athletic world spill over into the corporate world (and perhaps vice versa).
I concede that after all this talk about sports, economics, and statistics, it may be somewhat of a letdown to conclude this chapter by talking about regulation. But Cass Sunstein, one of the nation’s leading legal scholars and a former regulatory official, has cleverly explained that the main task of regulators is or should be the practice of regulatory moneyball.24 Given the huge impact that federal and other regulations have on business and society, discussion of this topic alone would justify the trillion dollar label in the title of this book and hopefully that fact alone will pique your interest.
Yes, I did say trillion, and that and more may be the aggregate costs and benefits of just the body of federal regulation, even more counting state and local regulation. Admittedly, there remains a debate over the precise price tag, which I do not intend to resolve. My main purpose here is simply to focus on technique—the act of comparing the benefits and costs of rules before implementing them.
You would think such a simple idea—which many economists over many decades have championed—would not be controversial, but it has been one of the most contested notions in the policy arena over the past several decades. In fact, I began my career, after finishing law and graduate schools, as a staff economist at the Council of Economic Advisers (CEA) in 1977, when the political discussion about using cost-benefit analysis (CBA) in regulatory decision making, something which most people informally and routinely do in their everyday lives, was quite intense. The discussion and debate over CBA continues to this day.
Here’s how it all started. The precursor of CBA in the federal government was the inflation impact statement (IIS), which the administration of Gerald Ford required executive branch regulatory agencies to prepare before issuing final rules. The Carter administration, led by the Council of Economic Advisers, reformulated the IIS as something closer to a full cost-benefit analysis. CEA also headed a multi-agency Regulatory Analysis Review Group, which was formed to review the analyses of agencies’ proposed rules.
After President Reagan was elected, he further formalized the regulatory review process by issuing an executive order creating the Office of Information and Regulatory Affairs within the Office of Management and Budget. OIRA exists to this day, and is viewed as an important institutional check on the quality of cost-benefit analyses performed by executive branch regulatory agencies, whether or not their underlying statutes permit the balancing of costs against benefits in issuing rules themselves. Some agencies therefore don’t use CBA to make decisions under some statutes, although the analytical technique tends to find its way into decision making indirectly in many cases.
Although CBA has been controversial through the years—consumer and many environmental groups generally have opposed its use while business has been more friendly—every president since Reagan, both Democratic and Republican, has reaffirmed and refined its implementation. Several contentious issues remain, however. One is the appropriate discount rate to apply to likely benefits and costs in future years (the future values are discounted because a dollar today is more valuable than one received in later years). A second issue relates to the values assigned to avoiding deaths and injuries, in particular whether these values should be adjusted by age (if so, then a strictly economic calculus would assign greater values to avoiding deaths and injuries to younger than to older people).
Does regulatory moneyball have limits? Of course it does. Many benefits of regulatory rules, for example, cannot be monetized or quantified in any objective or scientific way. In addition, there are ethical issues involved in assigning values to lives, discounting them to take account of time value of money, or varying them by age.
In the end, however, regulatory moneyball (or CBA) is an input—a very important one, but not the only one—into regulatory decisions, just as real moneyball (sabermetrics) has become one, albeit not the only, important factor in the sports business.
It is hard to know where economics ends and statistics begins because the two fields are so intertwined. This is clearly the case in academia, where empirical economics is essentially applied statistics. In business, statistical analysis is becoming more important, especially in the age of big data. Companies using statisticians to refine their marketing or their production processes may not be aware of the close connection between economics and statistics. Nor may some sports enthusiasts be aware of the growing role of statistical analysis by the teams and players they root for. But one of the defining features of twenty-first century economies will be their reliance on and use of techniques for data analysis. Economists played a major role in this movement at its inception and will continue to help shape it in the future.
At the same time, economics as a separate academic discipline will also be affected and shaped by big data and the growing importance of analytical techniques in academia and the business word. I close the book in Chapter 16 with some thoughts about this topic.