4
What May and Can Be Forecasted?
FORECASTING IS UNAVOIDABLE IN portfolio management. Even when we do not rely on explicit forecasts such as “the stock market is expected to outperform bonds over the next year,” we use implicit forecasts. If we invest in the stock market, we implicitly forecast that it is better than investing in anything else such as putting it in a risk-free asset. For example, investing in an equity fund instead of a fixed income fund may be motivated by the fact that stocks have historically outperformed safer assets, and we implicitly predict that historical average returns are adequate guidance for future relative performance and that equity prices integrate an adequate risk premium.
What actually varies in forecasting is the degree of sophistication in the forecasting methodology used. The key thing for investors is to know to what extent complexity and sophistication in forecasting methodologies should be used. We touched on the subject of forecasting in chapter 3, but forecasting can be used for more than simply selecting fund managers. Whether you allocate between mutual funds, between different assets, or between risk factors, you need to forecast their respective rewards (expected return) and risks to determine a properly balanced allocation among them.
It is important to begin by understanding what we are actually doing when we attempt to predict. If we could precisely forecast actual events, things would be extremely simple. We could simply adjust our portfolio according to what is going to happen; for example, invest all our wealth in the next Apple. But events are random, and by definition realizations of random variables cannot be forecasted, otherwise they wouldn’t be called random. What may be forecasted are the properties of probabilities associated with these events.
This basic distinction is crucial. Claiming an event is going to happen is meaningless unless we specify that it has a 100 percent probability of happening (i.e., it is the only event possible). What we may forecast are characteristics of the distribution of probabilities of different events such as the expected distribution of returns for specific stocks, indices, or factors. For example, we may say that the expected return for the S&P 500 twelve-month forward is 8 percent with a 10 percent likelihood of a return below −10 percent and a 2.5 percent likelihood of a return below −20 percent, but we cannot forecast the actual realized returns. The distribution of possible returns may be expressed by its variance or quantiles (e.g., the median or the 1 percent or 5 percent quantile used in finance for risk measures such as the Value-at-Risk). Hence it is important to understand what is actually being forecasted.
This distinction brings another crucial issue: assessing a forecast’s accuracy. If we forecast a high of 77 degrees for a summer day in New York, but the temperature hits 80, we cannot conclude that we are wrong. Only after a series of forecasts can our forecasting ability be evaluated. After fifty forecasts of daily temperature, are we on average correct? Is our average forecast error close to zero? Other criteria for evaluating forecasting accuracy include the variance of forecast errors and the degree of the worst forecast error.
Despite the impossibility of forecasting actual realized value, many experts still attempt to make explicit point forecasts, such as the S&P 500 will rise by 10 percent next year. Hence, although we should now understand what may be forecasted, this chapter presents evidence on our inability to make explicit point forecasts and also deals with what can be forecasted. We start with the easiest way to make predictions: asking experts for theirs. Then we move to the other side of the spectrum, which is about using only statistical predictions. We contrast the evidence on forecasts resulting from personal expertise to statistical models in different domains. We also discuss whether greater complexity and sophistication in forecast methodologies lead to more precise and useful forecasts.
We Know Less Than We Think
The easiest route is to use predictions made by experts. If economists predict favorable economic conditions over the next year, we might allocate more to riskier assets in our portfolio. As will become clear, the identification of skilled professional forecasters is plagued with several problems, many of which are the same ones that complicate the selection of skilled fund managers discussed in chapter 3.
There is not yet good evidence that we are able to accurately forecast the dynamics of complex systems. Nassim Taleb describes our record concerning these forecasts as dismal and marred with retrospective distortions;1 we usually only understand risk after a significant negative event. In 2006, the political scientist Philip Tetlock of the University of Pennsylvania published a landmark book on the forecasting accuracy of 284 so-called experts who made their livings offering advice on political trends.2 The study contained no fewer than 28,000 predictions, gathered over a period of more than eighteen years, in which experts were asked to attribute probabilities to three alternative outcomes. These experts were generally no more skilled than the layperson. When experts determined that an event had no chance of occurring, it would occur 15 percent of the time. If they were 80 percent certain that an event would occur, it would occur 45 percent of the time. Twenty-five percent of events that were a sure thing would not occur at all. Their average performance was worse than if equal probabilities had been attributed to all events. These observations do not mean that experts lack a mastery of their subject matter; rather, this knowledge does not translate to better prediction making.
Nate Silver analyzed 733 political forecasts made by John McLaughlin and his panelists on the TV show The McLaughlin Group.3 On average, 39 percent of their predictions were completely true, 37 percent were completely false, with the balance accounted by mostly true, mostly false, and in between answers. Individual panelists did no better.
Economic forecasting is not much more accurate. In November 2014, the Montreal CFA Society invited former Federal Reserve chairman Ben Bernanke to speak to a crowd of nearly 1,200 individuals. Dr. Bernanke indicated that the economic models used at the Fed were little better than random. Yet many organizations make allocation decisions based on value forecasts from experts that have, on average, less expertise and/or resources than the Fed. In 1929, the Harvard Economic Society declared that a depression was “outside the range of probability.” Nearly eighty years later, 97 percent of economists surveyed by the Federal Reserve Bank of Philadelphia in November 2007 forecasted a positive growth rate for 2008;4 nearly two-thirds of professional forecasters expected growth above 2.0 percent.
There are those, though, whose predictions turn out to be correct; let’s consider the forecasts of four individuals credited with having foreseen the economic and financial crisis of 2007–2009.5
Between mid-2006 and early 2007, Med Jones of the International Institute of Management published a series of papers in which he argued that economic growth was less sustainable than commonly thought, fueled by household debt and a housing bubble. He specifically indicated that the highly rated mortgage-backed securities promoted by Wall Street were in fact very high-risk securities. In March 2007, he indicated to Reuters that “if people started to think there may be a lot of bankruptcies (in the subprime lending market), then you’re going to see the stock market sell off.” In early 2009, he also accurately predicted the bottom of the recession and anticipated modest recoveries in late 2010 and early 2011.
At a meeting of the International Monetary Fund in September 2006, the economist Nouriel Roubini laid out arguments for a recession in the United States deeper than that of 2001. In his full testimony, he discussed the deep decline in housing resulting in a significant impact on consumers becoming a contagion to financial institutions and other market players.
Peter Schiff of Euro Pacific Capital, while on Fox news in December 2006, indicated that the U.S. economy was not strong, and as a result the housing market would decline significantly and unemployment would rise. Three out of four of Schiff’s fellow panelists on the program vehemently disagreed with him. In August 2007, a member of a similar panel referred to the subprime issue as minor; many agreed that the equity market was a buying opportunity, opposing Schiff.6 Finally, Dean Baker of the Center for Economic and Policy Research also predicted a housing bubble.
In analyzing successful predictions made by experts, there are three significant issues. First, there are approximately 14,600 economists in the United States alone, plus other noneconomist forecasters. Observing that some of them foretold the crisis is not surprising. This is the same problem as the one we faced in chapter 3 when we wanted to distinguish skills from luck in fund managers’ performance. Second, it is unclear if these forecasts provide an estimate of the depth of the crisis and its translation into quantitative predictions used for portfolio allocation. Third and most importantly, with the exception of Med Jones, the three other forecasters lack unblemished forecasting records.
For example, Dean Baker made his first prediction of a housing correction in 2002, more than five years prior to the actual event. When housing prices peaked sometime in 2006, they were more than 50 percent above their 2002 level and nominal prices did not go below the 2002 level even at the worst of the housing crisis. We can conclude that the 2002 forecast was premature. Furthermore, Baker is the same economist that criticized Bill Clinton in 1999 for proposing a fix for Social Security, arguing that the projected insolvency was extremely unlikely. He called it a phony crisis and projected surpluses for Social Security of $150 billion by 2008 (in 1999 dollars). Social Security has been running a deficit since 2010, which will only grow unless policy changes are implemented.7
Similarly, Nouriel Roubini predicted recessions in 2004, 2005, 2006, and 2007. He also failed to predict a worldwide recession at the time. In his own words: “I do not expect a global recession, I think it is going to be a slowdown.” He again forecasted recession for 2011 and 2013, in addition to these post-crisis statements: “U.S. stocks will fall and the government will nationalize more banks as the economy contracts through the end of 2009” (March 2009),8 “We’re going into a recession based on my numbers” (August 2011),9 “Whatever the Fed does now is too little too late” (September 2011),10 “The worst is yet to come” (November 2012).11
Peter Schiff wrongly predicted post-crisis interest rates increasing, hyperinflation, the dollar dropping, gold nearing $5,000, and more. Since 2010, he has been forecasting some form of financial and economic collapse. His daily radio show, “The Peter Schiff Show,” was broadcast from 2010 to 2014 to sixty-eight stations in thirty states and up to 50,000 listeners online. His website advertised six books, five of which have the word “crash” in their titles. It would be imprudent to point to his success while ignoring the many failures. Anyone forecasting a financial catastrophe will eventually be proven right. Statistically, such an event will happen again given a long enough horizon. However, a specific forecast is only useful to investors if accompanied by a reasonable timeline (such as being right in the next two years, not in the next ten years), and the forecasters should only be given credibility after the fact if the trigger was properly identified by the forecaster (such as being right for the right reason).
Tetlock classifies experts in two groups: “Foxes,” who have a more balanced approach and see shades of gray, and “Hedgehogs,” who do not allow for the possibility of being wrong. Foxes are cautious and less exciting. Hedgehogs, meanwhile, are confident entertainers. Those who rise to fame may be better at marketing themselves than at making forecasts. As investors during the 2007–2008 crisis, we would have needed to sift through the thousands of forecasters to determine the worth of the individual forecasts of our chosen experts.
Issues of sample size also affect forecasting accuracy. There have been twenty-two recessions in the United States since 1900, according to the National Bureau of Economic Research. A New York Times article in October 2011 indicates that the Economic Cycle Research Institute, a private forecasting firm created by the late economist Geoffrey H. Moore, had correctly called all recessions over the last fifteen years, or a total of two events for the United States. The article states: “In the institute’s view, the United States, which is struggling to recover from the last downturn, is lurching into a new one…. If the United States isn’t already in a recession now it’s about to enter one.” An eventual recession is inevitable, but has not happened yet (as of mid-2016). This is not to discount the ECRI, but to point to the need for a better sample size to render a definitive decision.12
Prakash Loungani of the International Monetary Fund gave a damning assessment:13 only two of sixty recessions worldwide during the 1990s were predicted a year ahead of time and only one-third by seven months. Recessions are usually forecasted upon reaching the cusp of disaster. Furthermore, the larger issue for asset managers is not only whether economists and other forecasters fail to anticipate a recession, but also if they do call a recession that does not occur.
Overconfidence has negative effects on forecasting, as with investing. Ohio State University professor Itzhak Ben-David and Duke University professors John Graham and Campbell Harvey analyze responses of chief financial officers (CFOs) of major American corporations to a quarterly survey run by Duke University since 2001.14 CFOs are asked their expectations for the return of the S&P 500 Index for the next year, as well as their 10 percent lower bound and 90 percent upper bound, which should allow for only a 20 percent probability that the actual returns would fall outside of the estimated range. By definition, overconfident individuals would provide tight lower and upper estimates and would likely be wrong more than 20 percent of the time. In reality, market returns remained outside the set limits 64 percent of the time.
These observations are entirely consistent with the work of Glaser, Langer, and Weber on overconfidence. In an experiment, they asked a group of professional traders and a group of students to complete three tasks:15
•   Answer knowledge questions (ten questions concerning general knowledge and ten questions concerning economics and finance knowledge).
•   Make fifteen stock market forecasts.
•   Predict trends in artificially generated charts. This approach neutralizes the “expertise” advantages that experts may have over nonexperts.
This experiment also required experts and students to provide a lower and upper estimate that had a 90 percent probability of including the right answer. All results confirmed overconfidence among experts and nonexperts. Interestingly, although traders are more confident than students, they are not more accurate. We can conclude that knowledge impacts confidence levels but does not necessarily improve accuracy.
Our choices in forecasting methodology also work to illustrate the difficulties of making accurate predictions. Dukes, Peng, and English surveyed the methodologies used by research analysts to make their investment recommendations.16 Did they use the theory-grounded dividend discount model (DDM) or another method? About 35 percent of analysts admitted using this approach, though it was rarely the primary decision driver. The favorite approach (77 percent) was some form of price-earning (PE) multiple applied to an earnings forecast. Though users of the DDM approach will use longer time horizons to forecast earnings and dividends (five years), those that rely on a PE multiple approach use a short-term horizon (one or two years), an implementation ill-equipped to deal with regression to the mean.
Montier evaluated the performance of portfolios over a long period by the ranking deciles of historical and forecasted PEs and found no material difference.17 He found that analysts who prefer stocks with greater earning growth expectations see historical earnings momentum as the most important determinant of future earnings momentum. Almost invariably, their top quintile of highest future earnings growth stocks is composed of securities having the highest historical five-year growth in earnings. This again illustrates that the concept of reversion to the mean is ignored. Montier reported that when analysts forecast earnings twelve months ahead, they were wrong by an average of 45 percent,18 concluding that most analysts simply follow trends.
The PE forecast approach is a shortcut methodology because a PE level incorporates implicit assumptions about earnings growth, inflation, and risk. However, although the DDM approach is conceptually superior, its application is challenging. The estimated fundamental valuation of a security derived from a DDM approach is extremely sensitive to the assumptions made about the growth rate in earnings and the discount rate. For example, let’s assume an analyst believes earnings will grow by 8 percent per year for five years and then by 5 percent afterward. Let’s also assume the risk premium on this security is 4 percent and the long-term risk-free rate is also 4 percent. What would be the approximate valuation difference if we were to instead assume a growth rate of 10 percent for the initial five years and a risk premium of 3.5 percent? The price valuation difference would be more than 31 percent. Hence analysts prefer the PE forecast approach because it is far easier to anchor the forecasted PE on the current PE and to limit the earnings forecast horizon to a few years than to meet all the requirements of a DDM implementation. However, these forecasts based on short-term considerations have limited long-term value to investors.
We Know Less Than We Are Willing to Admit
Forecasting is difficult, and our failures lie in our desire to prefer one form of forecasting over another, use of improper forecasting horizons, and overconfidence in forecast accuracy. Many investors have unreasonable expectations: they aim for positive and stable excess performance at short horizons.
Asset prices are influenced by fundamental factors and noise. However, the shorter the horizon, the greater the ratio of irrelevant noise to relevant information. What most significantly determines the pattern of returns on the S&P 500 at a one-second horizon? Is it noise or is it fundamental factors? What if we were to ask the same question for a horizon of ten seconds, a day, a month, a year, or ten years? After all, luck, which is performance attributed to noise, fades over time.
There is simply too much noise and volatility around expected returns at short horizons to showcase our predictive abilities. The daily volatility of the S&P 500 since 1980 is 1.11 percent. Observing a 2 percent move on a given day does not represent even a two standard deviation move in the market. Thus, most of the time, such movement does not require a unique explanation other than simply the confounding effects of market participants’ actions during the day. We should not expect to necessarily find a rational explanation for the security price movements that we observe on a daily basis.
Yet financial news looks to experts for daily interpretations of market movements, and sometimes the same explanation can be used to justify opposite market movements, such as:
The equity market rose today because several indicators show improving prospects for economic growth.
The equity markets declined today because investors are concerned that improvement in some key indicators for economic growth may lead to rising interest rates and a less accommodating monetary policy.
Do market commentators know what is good for financial markets and the economy? Let’s consider the global market correction that occurred after Ben Bernanke made several announcements on June 19, 2013, concerning a hypothetical scenario for ending the unprecedented liquidity injection after the liquidity crisis.
the Committee reaffirmed its expectation that the current exceptionally low range for the funds rate will be appropriate at least as long as the unemployment rate remains above 6 and 1/2 percent so long as inflation and inflation expectations remain well-behaved in the senses described in the FOMC’s statement…assuming that inflation is near our objective at that time, as expected, a decline in the unemployment rate to 6 and 1/2 percent would not lead automatically to an increase in the federal funds rate target, but rather would indicate only that it was appropriate for the Committee to consider whether the broader economic outlook justified such an increase.
Importantly, as our statement notes, the Committee expects a considerable interval of time to pass between when the Committee will cease adding accommodation through asset purchases and the time when the Committee will begin to reduce accommodation by moving the federal funds rate target toward more normal levels.
Although the Committee left the pace of purchases unchanged at today’s meeting, it has stated that it may vary the pace of purchases as economic conditions evolve. Any such change will reflect the incoming data and their implications for the outlook, as well as the cumulative progress made toward the Committee’s objectives since the program began in September. Going forward, the economic outcomes that the Committee sees as most likely involve continuing gains in labor markets, supported by moderate growth that picks up over the next several quarters as the near-term restraint from fiscal policy and other headwinds diminish. We also see inflation moving back towards our two percent objective over time. If the incoming data are broadly consistent with this forecast, the Committee currently anticipates that it would be appropriate to moderate the monthly pace of purchases later this year; and if the subsequent data remain broadly aligned with our current expectations for the economy, we would continue to reduce the pace of purchases in measured steps through the first half of next year, ending purchases around midyear. In this scenario, when asset purchases ultimately come to an end, the unemployment rate would likely be in the vicinity of 7 percent, with solid economic growth supporting further job gains, a substantial improvement from the 8.1 percent unemployment rate that prevailed when the Committee announced this program.19
Here’s a quick summary of the above transcript excerpt:
•   When the unemployment rate is close to 6.5 percent (it was 7.6 percent at the time of the statement), the FOMC committee will determine whether or not circumstances dictate that the federal fund rate should increase.
•   A fairly long period will be observed between the end of the asset purchase program and the increase in the federal fund rate.
•   The committee made no change to the current asset purchase program.
•   If inflation increases toward 2 percent and if the unemployment rate eases toward 7 percent, the asset purchase program may have been ended.
The market reacted badly to these statements. According to Paul Christopher, chief international strategist at Wells Fargo Advisors, “People aren’t sure that the economy is well enough for the Fed to pull back…. The market is signaling to the Fed that we don’t trust your assessment of the economy; we don’t trust your assessment of inflation.”20 Ben Bernanke responded to doubts of this sort as such:
If you draw the conclusion that I’ve just said that our policies, that our purchases will end in the middle of next year, you’ve drawn the wrong conclusion, because our purchases are tied to what happens in the economy.
In the end, the market reached a new high less than three weeks after the June 19 press conference. The asset purchase program was first reduced in December 2013 and ended in October 2014 as the unemployment rate declined more quickly than market participants or the Federal Reserve expected. The job gains in 2014 were the largest since 1999 and total employment surpassed the pre-crisis peak by 2.7 million.21 At that time, inflation was inching toward 2 percent, but the larger than expected decline in energy and commodity prices put significant downward pressure on inflation in late 2014 and in 2015. The Fed eventually raised rates only in December 2015.
Consider the following interpretations from Bloomberg news on July 3, 2013: “U.S. stock-index futures remained lower after a private report showed companies added more jobs than economists forecast last month, fueling concern the Federal Reserve will begin to reduce monetary stimulus.” A few hours later, the following comment was also made: “Stocks fell as turmoil in Egypt and political uncertainty in Portugal overshadowed better-than-estimated data on jobs growth and unemployment claims.” By the end of the day they had come full circle with the following: “U.S. stocks rise as jobs reports offset Egypt, Portugal.” The first statement makes the argument that equity markets declined because the risk of a reduction in monetary stimulus outweigh the benefits of better economic data, while the second and third statements support the argument that without the political uncertainty of Egypt and Portugal, the economic data would have supported rising equity markets. These contradictory statements raise another question: Why do we even ask for and read these comments? We should not assume that experts are correctly interpreting new information or even understanding its impact.
Overcoming Our Failures at Forecasting
Finucane and Gullion suggest that the key to a good decision process is “understanding information, integrating information in an internally consistent manner, identifying the relevance of information in a decision process, and inhibiting impulsive responding.”22
There is evidence that we can improve our abilities to forecast both political and economic events. The work of Philip Tetlock on the inability of experts to forecast political events has had interesting consequences, leading the U.S. intelligence community to question if forecasts could be improved. Therefore in 2011, the Intelligence Advanced Research Projects Agency (IARPA) launched the ACE (Aggregate Contingent Estimation) program with an eye toward enhancing the accuracy, precision, and timeliness of intelligence forecasts across a variety of sectors.
The ACE program funded a 2011 tournament of five forecasting teams with fifteen thousand participants. Participants were asked to predict whether or not approximately two hundred events would occur, such as: Would Bashar al-Assad still be president of Syria by January 31, 2012? There was also a control group selected by the intelligence community. One of those teams, under the umbrella of the “Good Judgment Project,” is based at the University of Pennsylvania and the University of California–Berkeley. The project is led by Phil Tetlock (a psychologist), Barbara Mellers (an expert on judgment and decision making), and Don Moore (an expert on overconfidence). In the first year, the Penn/Berkeley team far surpassed the competition. Their superiority was such that they were the only team that IARPA funded in the third year of the project.
According to Tetlock, “One thing that became very clear, especially after Gorbachev came to power and confounded the predictions of both liberals and conservatives, was that even though nobody predicted the direction that Gorbachev was taking the Soviet Union, virtually everybody after the fact had a compelling explanation for it. We seemed to be working in what one psychologist called an ‘outcome irrelevant learning situation.’ ”23
In his opinion, few pundits are willing to change their minds in response to new evidence. They are also unable to outperform a random decision process. Experts rarely admit: “I was wrong”; rather, they claim: “I would have been right if not for…” The success of Tetlock’s team can be partially attributed to a training model developed to teach forecasters how to choose relevant information and avoid bias. Some of the members did narrative thinking while others did probabilistic thinking. The latter were trained to turn hunches into probabilities and did better. For example, instead of basing their forecasts solely on their understanding of a situation, forecasters trained in probabilistic thinking would consider what has happened in other “similar” circumstances. Providing feedback of their performance was also helpful.
Tetlock found that among all the test subjects, the most successful predictions were made by a concentrated group of “super forecasters.” Their personality traits, rather than any specialized knowledge, allowed them to make predictions that outperformed the accuracy of several of the world’s intelligence services, despite forecasters lacking access to classified data. In the process of identifying better forecasters, hedgehogs disappeared and foxes thrived. Super forecasters are:
•   Open-minded and more likely to look for information that challenges their own opinion,
•   Trained in probabilistic reasoning and will collaborate with others to share information and discuss rationales,
•   Update their predictions more often and spend more time challenging themselves.
There is a decisive parallel between political and financial/economic forecasting. Experts involved in financial and economic forecasts suffer from the same biases as political forecasters.
Few experts exceed the success of layperson forecasting. Michael Mauboussin discusses the conditions under which a crowd can predict with greater accuracy than experts.24 He refers to the work of Scott Page, a social scientist who developed a simple understanding of collective decision making. According to Page, the collective error is equal to the average individual error minus the diversity of predictions. To illustrate his point, he used jelly beans in a jar as an example. X individuals guessed the number of jelly beans in the jar. Individuals were wrong by more than 60 percent, but all the individuals together were wrong by less than 3 percent. This concept is well known in financial circles; by aggregating all investors’ opinions through price formation, financial markets effectively provide superior forecasts that are hard to beat with any non-price-based variable. There are always outliers, but from experiment to experiment, these are rarely the same individuals. The greatest danger lies in homogeny of opinions, when all agents think alike. Therefore we must not reject diversity, but embrace it.
Can Statistical Models Really Be Useful?
Can statistical models really outperform expert opinions and help to weed out conflicts of interests, marketing prowess, and distinguish luck from skills? Orley Ashenfelter, a Princeton economist, former editor of the American Economic Review and wine enthusiast, has published a number of papers since the late 1980s on the economics of wine, including a controversial equation designed to forecast the value of Bordeaux Grands Crus.25 The equation takes just a few variables into account and explains more than 80 percent of the variations in prices: age of vintage, average temperature during growing season (April to September), amount of rain at harvest time (August), and amount of rain during previous winter (October to March).
Unsurprisingly, the Bordeaux equation was not welcomed among wine experts. For example, Robert Parker, the world’s most influential wine critic, commented, “really a neanderthal way of looking at wine. It is so absurd as to be laughable.” In short, “an absolute total sham.”26 Normally, the market value of the wine is not known until the wine is finally released and traded, which usually happens three years after the harvest. Because young Bordeaux Grands Crus need a maturation period of eight to ten years to be drinkable, a professional assessment is needed to determine the expected value of the wine.
However, the Bordeaux equation provides better estimates of future prices than experts. Storchmann presents the following example: “In 1983, Parker deemed the 1975 vintage in Pomerol and St. Emilion outstanding and awarded it 95 out of 100 points. He also added that the wines were too tannic to be drunk and should be stored for a long time (a sign of a great vintage). However, as these wines matured Parker dramatically adjusted his rating. In 1989, he awarded this very vintage only 88 points and recommended that the wine should be drunk immediately rather than stored. That is, within six years, Parker’s 1975 vintage rating dropped from outstanding to below average. In contrast, the Bordeaux equation predicted the mediocre quality of this vintage already in 1975, immediately after the harvest.” Overall, the equation predicted 83 percent of the variance in the price of mature Bordeaux red wines at auction.27 Frank Prial of the New York Times wrote about the reasons wine experts attack the empirical approach:
Two reasons. Some elements of the wine trade are angry because the Ashenfelter equation could be helpful in identifying lesser vintages they have promoted…. Second, more seriously, he is accused of relegating the whole wine-tasting mystique to a minor role. Supposedly, the sipping, spitting, sniffling and note-taking so dear to wine romantics have all been rendered obsolete by mathematics.28
Simple statistical models can also be useful in predicting expected financial returns. Of course, expert opinion is still needed if only to design better models. When we decide to invest in passive funds, the implementation of a portfolio is far from simple. We still need to determine which factors or asset classes to buy, a topic we will cover in the next two chapters. Similarly, for the statistical predictive model, we still need to decide which variables to use and how to control for statistical problems that will arise.
To understand our current abilities, it is important to know where we came from. At first, finance academics thought that returns were unpredictable. This is the fundamental claim made in the investment classic A Random Walk Down Wall Street by Princeton University professor Burton Malkiel. In such cases, the sophistication needed is at its lowest level: to forecast expected returns you only need to compute the historical average return that will be representative of future expected returns.
The 1980s saw the emergence of a ton of evidence of in-sample (i.e., after the fact) return predictability. An important and recurring theme of economic models that embed this predictability is that it is closely tied to the business cycle. For example, investors become more risk averse during recessions, leading to a higher risk premium.29 Therefore recession forecasts also predict future higher average returns.
While these in-sample results already suffered from statistical biases,30 an influential and award-winning study in the Review of Financial Studies in 2008 reveals that none of the predictors commonly used led to out-of-sample performance (i.e., when used before the facts) better than using the historical average return of an asset.31
In response, we have witnessed in the last few years a proliferation of statistical techniques that do lead to significant out-of-sample performance predictability. In a sense, the study mentioned previously helped to mobilize and organize the literature on these enhanced statistical techniques in finance. All of these methods deal to some extent with two crucial issues: model uncertainty (i.e., how is a predictor related to future returns?) and parameter instability (i.e., does the relation between a predictor and future returns change over time?).
These techniques are manifold. First, imposing economically sound restrictions leads to better out-of-sample forecasting performance. For example, economic intuition says risk premiums should be positive, and therefore negative statistical forecasts should be set to zero. Another example consists in averaging predictions made using a variety of simple predictive models, rather than using one complicated model with many predictors. These techniques cleverly rely on a diversification argument; it is better to diversify model risks than relying on one complex model that will overfit the data. Tetlock’s team in the Good Judgment Project similarly designed algorithms to combine individual forecasts. The objective was to achieve a prediction accuracy that is on average better than the forecast of any single participant.
Some techniques go further by avoiding nearly any model estimation, for instance by fixing expected excess returns to the current dividend yield or payout yield. John Cochrane, in an excellent overview of the field of asset pricing, discusses how dividend yields predict expected returns in many markets.32
In the 1970s, when it was widely believed that equity returns could not be forecasted, a high dividend yield would have been interpreted as an indication of slower future dividend growth because higher dividends may indicate a low level of corporate investment. Alternatively, a low dividend yield would have been an indication of greater dividend growth. Therefore the dividend yield was believed to represent a view of future dividend growth, not future expected returns.
The data suggests the opposite. Higher dividend yields predict higher future excess returns and vice-versa. This pattern shows up in many markets:
•   Stocks: higher dividend yields signal greater expected returns, not lower dividend growth.
•   Treasuries: a rising yield curve signals greater one-year returns for long-term bonds, not higher future interest rates.
•   Bonds: increasing credit spreads over time signal greater expected returns, not higher default probabilities.
•   Foreign exchange: a positive interest rate spread signals greater expected returns, not exchange rate depreciation.
•   Sovereign debt: high levels of sovereign or foreign debt signal low returns, not higher government or trade surpluses.
•   Housing: high price/rent ratios signal low returns, not rising rents or prices that rise forever.33
These findings do explain why investors who use recent historical performance to make investment decisions usually do poorly. For example, increasing credit spreads lead to capital losses, which may cause some investors to exit the market, but make corporate bond investing more attractive.
Other powerful but more involved techniques extract predictive information from the whole cross-section of underlying assets. For an excellent treatment of these and other techniques, see the review by David Rapach of Saint-Louis University and Guofu Zhou from Washington University in the Handbook of Economic Forecasting.34
As an illustration, we provide the investment performance of an investor who invests only in the S&P 500 Index and short-term government bonds using only information that would have been available at each point in time. To guide his investment decision, he uses a combination of two of the above techniques. The return on a stock index can be written (in logarithms) as the sum of three return components: growth rate in the price-earnings multiple, growth rate in earnings, and growth rate of a simple transformation of the dividend-to-price ratio. These three components can be forecasted with simple quantities as follows: First, we can safely assume that price-earnings (PE) multiple cannot grow indefinitely in the long run. The collapse of PEs in the early 2000s following the technology bubble certainly illustrated this. We assume a constant PE and consequently set its growth rate to zero. Second, because growth rates in earnings are hard to predict, we use the twenty-year historical average. Third, we use the current dividend-to-price ratio as its forecast. Finally, following basic finance intuition, we expect investors to require positive excess returns on the market and set the forecast to zero if it happens to be negative.35 We also do not allow for leverage.
Are expected return forecasts useful for investors? Table 4.1 suggests that they are. As we consider an investor who is only interested in maximizing his long-term compounded portfolio return,36 the results in table 4.1 are easily interpretable. Each value represents the return for which the investor would be indifferent between the portfolio and a risk-free asset that pays this return each period. For example, the second-to-last column in the first row shows that an investor using the optimal forecast every month to rebalance his portfolio achieves an annualized geometric return of 8.72 percent. This favorably compares to 4.96 percent realized by an investor who uses the historical average available at each point in time (fourth column). The last column shows the difference between the two, which is 3.76 percent in this case. This difference can be understood as the maximum annual fee the investor would be willing to pay to obtain the expected return forecasts instead of using the historical averages. Finally, we also provide for comparison in the second and third columns the geometric return obtained by only investing in the short-term government bonds or only in the stock market. Again, using the optimal statistical forecasts is useful.
TABLE 4.1 Geometric return improvement in portfolio using expected return forecast (1950–2015)
image
As return predictability varies with the investment horizon, we also consider an investor who forecasts the stock market return and rebalances his portfolio each quarter (second row) and each year (last row). The usefulness of our expected return forecast is present at all horizons. Despite the simplicity of it all, most investors would not apply such a process because it would entail significant tracking error against the standard fixed equity/bond allocation benchmarks.
A common theme of most of these forecasting methodologies is that they imply simple predictive models. That simplicity trumps complexity is far from unique to expected return prediction. The great psychologist Paul Meehl reviewed the results of twenty studies that analyzed whether subjective predictions by trained professionals were more accurate than simple scoring systems.37 The studies covered a wide range of forecast such as the grades of freshmen at the end of the school year, the likelihood of parole violations, of success in pilot training, etc. In all studies, the accuracy of experts was matched or surpassed by simple algorithms. Since then, the number of such similar studies has been expanded to forecasting medical issues (such as longevity of cancer patients), prospects of business success, winners of football games, etc. A meta-analysis of 136 research studies shows that mechanical predictions of human behaviors are equal or superior to clinical prediction methods for a wide range of circumstances (in 94 percent of such studies).38
That simple predictive models can be better or equivalent to subjective predictions of experts is sometimes difficult to accept. For example, Meehl indicates that some clinicians refer to statistical approaches as “mechanical, cut and dry, artificial, unreal, pseudoscientific, blind, etc.,” while qualifying their method as “dynamic, global, meaningful, holistic, subtle, rich, etc.” Asset managers using a discretionary approach sometimes make similar comments about quantitative and passive managers.
Even when a statistical approach can save lives, there is resistance to adopt it. Bloodstream infections in the United States kill 31,000 people a year. After the death of an eighteen-month-old infant at Johns Hopkins Hospital, Dr. Peter Pronovost designed a simple five-point checklist that has been very efficient at reducing, if not eliminating, such infections at Johns Hopkins.39 Furthermore, the checklist was applied at more than one hundred ICUs in Michigan. The median infection rate at the typical ICU dropped from about 2.7 per catheter hour to zero after three months. The results were presented in the December 2006 issue of the New England Journal of Medicine. Yet despite the logic and simplicity of it all, the checklist is not widely accepted because, in the opinion of some, many physicians do not like being monitored by nurses or otherwise being forced to follow a checklist.
As reported by Meehl and Grove, “The human brain is an inefficient device for noticing, selecting, categorizing, recording, retaining, retrieving and manipulating information for inferential purpose.”40 Humans cannot assign optimal weights to variables (intuition only goes so far), even less so on a consistent basis. Second, according to Meehl, experts have a need to look clever, which can be achieved by relying on greater complexity. They feel they can override or outperform simple algorithms because they have more information. However, experts are also biased. They have their own agendas, personalities, misperceptions, dislikes, and career risks. They simply cannot accept the fact that in most cases simple statistical algorithms can outperform them. We should not try to shoot the messenger, deny our shortcomings, and pay higher fees for information that is nothing more than noise. To again quote Nassim Taleb: “Simplicity has been difficult to implement in life because it is against a certain brand of people who seek sophistication so they can justify their profession.”41
Again, we stress that we should not conclude from all prior statements that experts are useless. We need experts to identify and understand the primary factors that impact a variable. Isolating substance from noise is the work of experts. Again, according to Meehl and Grove, “The dazzling achievements of Western post-Galilean science are attributable not to our having better brains than Aristotle or Aquinas but to the scientific method of accumulating knowledge,” a process that is not well implemented in the financial industry at large, but may be found in specific knowledge-based firms. As mentioned in the introduction, we operate in an industry that resists accumulated knowledge when it threatens the existing business model.
Are We Any Good at Predicting Risk?
This chapter focused thus far on predicting expected values: expected economic growth, expected market returns, expected wine prices, etc. Expected returns are notoriously hard to predict, but it turns out that we are much better at predicting risk.
Consider volatility, which is a measure of the variability of returns around their expected values. Let’s first examine why we can do a better job predicting volatility than expected returns. Figure 4.1 contains what financial econometricians call auto-correlation functions. Auto-correlations are measures of the degree of co-movements of returns with its own lags; for example, how today’s return tends to be related to yesterday’s return or to the daily return five days ago. If auto-correlations of returns were large, it would be easy to predict future returns from past movements alone. The black line shows the auto-correlations of daily returns of the S&P 500. Unfortunately, the black line shows that all auto-correlations are close to zero; it is difficult to infer a statistical relationship from past returns to predict future returns.
The dashed line shows the auto-correlations in absolute returns. Absolute daily returns are a proxy for volatility; indeed, a larger return in magnitude, regardless of whether it is positive or negative, is indicative of more variability. From this graph, we see that auto-correlation is positive and decays slowly as we increase the lag. Not only is the magnitude of today’s return related to yesterday’s, but also to five or twenty days ago. We usually refer to this phenomenon as volatility clustering: when risk is high in the markets, it tends to remain high.
image
FIGURE 4.1 Auto-correlation functions of daily returns and absolute returns of the S&P 500 Index (1927–2015)
As an illustration of the importance and of our ability to predict volatility, we provide in table 4.2 the investment performance of an investor who invests only in the S&P 500 Index and short-term government bonds. To control risk in his portfolio, he wishes to target at all times a volatility level of 10 percent, which is similar to what is obtained by investing in a balanced portfolio on average.
Every month, he uses the predicted volatility from a statistical model that is slightly more sophisticated than using a moving average of volatility.42 The fourth column shows that the realized volatility by following this strategy from 1950 to 2015 would have been 9.33 percent, pretty impressive given the simplicity of our volatility model and the fact that our sample period contains the crash of 1987 and the economic crisis of 2007–2009.
TABLE 4.2 Investment performance of a portfolio invested in the S&P 500 that targets a 10 percent annual volatility (1950–2015)
image
The targeted volatility compares favorably to the one obtained by using the historical volatility at each point of time as a forecast (6.92 percent) and to the one obtained by investing all the portfolio in the market (14.55 percent). Table 4.2 also shows that controlling volatility would have cost the investors only 0.91 percent (7.35 percent to 6.44 percent) in average returns. Figure 4.2 shows the time-varying volatility for the market and our portfolio that targets a 10 percent annual volatility.43 Clearly the market experiences spikes in risk; the effect of the oil crisis in 1974, the market crash in 1987, and the financial crisis in 2008 are evident. Comparatively, our portfolio using our simple volatility predictions achieves a volatility that is close to 10 percent in all periods. Clearly we can forecast volatility.
image
FIGURE 4.2 Time-varying volatility of the S&P 500 and the portfolio that targets a 10 percent annual volatility (1950–2015)
Again, we have provided an illustration based only on volatility. Risk is by no means only captured by volatility; a comprehensive portfolio management technique should take into account other measures of risk, especially measures of co-movements between different assets. The next two chapters will have more to say about risk measures. For an overview of models used for volatility and dependence, we suggest the excellent textbook Elements of Financial Risk Management by Peter Christoffersen.
Unfortunately, it has been our experience that many managers set their asset allocation strategies based on forecasts of events and explicit return forecasts (where the evidence of our ability to forecast is weaker), instead of building their strategy on the basis of risk forecasts (where the evidence of our ability to forecast is stronger). Obviously, it is a lot more interesting to discuss why we expect that Amazon will outperform Alphabet than to discuss how forecasting and managing volatility could improve compounded returns. The next two chapters will illustrate that it is unnecessary to do explicit return forecasts to outperform.