Nightingale, perched upon an oak, was seen by Hawk, who swooped down and snatched him. Nightingale, begging earnestly, besought Hawk to let him go, insisting he wasn’t big enough to satisfy the hunger of Hawk, who ought instead to pursue larger birds. Hawk replied, “I should indeed have lost my senses if I should let go food ready to my hand, for the sake of pursuing birds which are not even seen within sight.”
Undoubtedly you recognize the fable of “The Hawk and the Nightingale,” and you already know the moral of the story: “A bird in hand is worth two in the bush.”
The fable is credited to Aesop, a slave and storyteller believed to have lived in ancient Greece between 620 and 560 B.C.E. Since then, countless versions have been told. In The Boke of Nurture or Schoole for Good Maners (1530), Hugh Rhodes mentions, “a byrd in hand—is worth ten flye at large.” A few years later, John Heywood in his ambitiously titled A dialogue conteinyng the number in effect of all the proverbes in the Englishe tongue (1546) claims “Better one byrde in hand than ten in the wood.” Finally, John Ray in A Hand-book of Proverbs (1670) gives us the first fully developed written version, which remains the definitive interpretation: “A bird in the hand is worth two in the bush.” But my favorite version of Aesop’s fable comes from Warren Buffett: “A girl in a convertible is worth five in the phone book.”
I am quite sure that when Aesop wrote “The Hawk and the Nightingale” 2,600 years ago, he had no idea he was laying down one of the definitive laws of investing.
Listen to Buffett: “The formula we use for evaluating stocks and businesses is identical. Indeed, the formula for valuing all assets that are purchased for financial gain has been unchanged since it was first laid out by a very smart man in about 600 B.C.E. The oracle was Aesop and his enduring, though somewhat incomplete, insight was ‘a bird in the hand is worth two in the bush.’ To flesh out this principle, you must answer only three questions. How certain are you that there are indeed birds in the bush? When will they emerge and how many will there be? What is the risk-free interest rate? If you can answer these three questions, you will know the maximum value of the bush—and the maximum number of birds you now possess that should be offered for it. And, of course, don’t literally think birds. Think dollars.”1
Buffett goes on to say that Aesop’s investment axiom is immutable. And it matters not whether you apply the fable to stocks, bonds, manufacturing plants, farms, oil royalties, or lottery tickets. Buffett also points out that Aesop’s “formula” survived the advent of the steam engine, electricity, automobiles, airplanes, and the Internet. All you need to do, Buffett says, is insert the correct numbers and the attractiveness of all investment opportunities will be rank-ordered.
In this chapter we will take a close look at several mathematical concepts that are critical to smart investing: calculating cash flow discounts, probability theory, variances, regression to the mean, and uncertainty vis-à-vis risk. As in earlier chapters, we will peel back a few layers, learning where and how these concepts originated, how they have evolved over time, and how they contribute to an investor’s latticework of ideas.
You may recall that in our chapter on philosophy, we described John Burr Williams’s theory of discounted cash flow as the best model for determining value. We also acknowledged that applying the model is anything but easy. You have to calculate the future growth rate of the company. You have to determine how much cash the company will generate over its lifetime. And you have to apply the appropriate discount rate. (For the record, Buffett uses the risk-free rate, defined as the interest rate on the ten-year U.S. Treasury note, whereas modern portfolio theory adds an equity risk premium to this risk-free rate).
Because of these challenges, many investors drop down one level of explanation and select one of the second-order models, perhaps price-to-earnings ratios or price-to-book ratios or dividend yields. Buffett gives no weight to these common investment yardsticks. Although these are mathematical ratios, he says, they tell you nothing about value. They are, at best, relative markers of value used by investors who are unable, or unwilling, to work with discounted cash-flow models.
Buffett gives a great deal of thought about the company he is going to invest with as well as the industry the company operates within. He also closely examines the behavior of management, particularly how management thinks about allocating capital.2 These are all important variables, but they are largely subjective measurements. As such, they do not easily lend themselves to mathematical computation. In contrast, Buffett’s mathematical principles of investing are straightforward. He has often said he can do most business-value calculations on the back of an envelope. First, tabulate the cash. Second, estimate the growth probabilities of the cash coming and going over the life of the business. Then, discount the cash flows to present value.
For help with that last step, we start by looking backward to the Great Depression.
In 1923, a young John Burr Williams enrolled at Harvard University to study mathematics and chemistry. After graduation, he was snared by the stock market euphoria of the late 1920s and became a security analyst. Despite the bullish attitude of Wall Street, or perhaps because of it, Williams was puzzled by the lack of framework to determine a stock’s intrinsic value. After the crash of 1929 and the Great Depression that followed, Williams returned to Harvard as a PhD candidate in economics. He wanted to understand what caused the crash.
When it came time to pick a thesis topic, Williams consulted his advisor, Joseph Schumpeter. You may recall we met Professor Schumpeter in our chapter on biology. Considering his background, Schumpeter suggested Williams study the question of how to determine the intrinsic value of a common stock. In 1940, Williams won faculty approval for his doctorate. Amazingly, his dissertation, titled The Theory of Investment Value, was published in book form by Harvard University Press two years before he received his doctorate.
Williams’s first challenge was to counter the conventional view of most economists who believed financial markets and asset prices were largely determined by investors’ expectations for capital gains—all investors, collectively. In other words, prices were driven by opinion, not economics. This was similar to John Maynard Keynes’s famous “beauty-contest” doctrine. In Chapter 12 of The General Theory of Employment, Interest and Money (1936), Keynes offered his own explanation for the fluctuation of stock prices. He suggested investors pick stocks similarly to how a newspaper might run a beauty contest in which people are asked to pick the most beautiful woman from among six photographs. The trick to winning the contest, said Keynes, was not to pick the woman you thought was the most beautiful, but the one you thought everyone else would consider the most beautiful.
But Williams believed that prices in a financial market were ultimately a reflection of the asset’s value. Economics, not opinion. In making this statement, Williams turned attention away from the time series of markets (technical analysis) and instead sought to measure the underlying components of asset value. Rather than forecasting stock prices, Williams believed investors should focus on a corporation’s future earnings. He then proposed that the value of an asset could be determined using the “evaluation by the rule of present worth.” In other words, the intrinsic value of a common stock, for example, is the present value of the future net cash flows earned over the life of the investment.
In his book, Williams acknowledged that his theory was built on a foundation laid down by others. The pioneering step in measuring intrinsic value, he said, was taken in a 1931 book titled Stock Growth and Discount Tables by Guild, Weise, Heard, and Brown. In addition, Williams had the benefit of studying the mathematical appendix in The Nature of Dividends (1935), by G. A. D. Preinreich. Following the same trail, Williams showed in particular how a company’s dividends could be forecast using estimates of the company’s future growth. Although Williams did not originate the idea of “present value,” he is given credit for the concept of discounted cash flows, largely because of his approach to modeling and forecasting, an approach he called “algebraic budgeting.”
For those who are still fuzzy about the discounted present value of future cash, think about how a bond is valued. A bond has both a coupon (cash flows) and a maturity date; together they determine its future cash flows. If you add up all the bond’s coupons and divide the sum by the appropriate rate, the price of the bond will be revealed. You determine the value of a business the same way. But instead of counting coupons, you are counting the cash flows the business will generate for a period into the future and then discounting that total back to the present.
You may be asking yourself, if the discounted present value of future cash flows is the immutable law for determining value, why do investors rely on relative valuation factors, second-order models? Because predicting a company’s future cash flows is so very difficult. We can calculate the future cash flows of a bond with near certainty—it’s a contractual obligation. But a business does not have a contractual obligation to generate a fixed rate of return. A business does the best it can, but many forces—the vagaries of the economy, the intensity of competitors, and innovators who have the ability to disrupt an industry—combine to make predictions about future cash flows less than precise. That doesn’t excuse us from making the effort, for as Buffett often quips, “I would rather be approximately right than precisely wrong.”
Yes, forecasting growth rates and cash flows gives us only an approximation. But there are mathematical models that can help us navigate these uncertainties and keep us on course for determining the true value of assets. These models help us quantify risk and put us in a better position to navigate our approximations.
We can trace the fundamental concept of risk back eight hundred years to the Hindu-Arabic numbering systems. But for our purposes, we know the serious study of risk began during the Renaissance. In 1654, the Chevalier de Méré, a French nobleman with a taste for gambling, challenged the famed French mathematician Blaise Pascal to solve a puzzle: “How do you divide the stakes of an unfinished game of chance when one of the players is ahead?”
Pascal was a child prodigy educated by his father, himself a mathematician and tax collector in Rouen, the capital of Upper Normandy. Early on, it was clear the younger Pascal was special. He discovered Euclidean geometry on his own by drawing diagrams on the tiles of his playroom floor. When he was sixteen, Pascal wrote a paper on the mathematics of the cone, a paper so advanced and detailed that it was said Descartes himself was impressed. At eighteen, Pascal began tinkering with what was called a calculating machine. After three years of work and over fifty prototypes, Pascal invented a mechanical calculator. Over the next ten years, he built twenty machines he called the “Pascaline.”
De Méré’s challenge was already well-known. Two hundred years earlier, the monk Luca Pacioli had posed the same question, and for two hundred years the answer remained hidden. Pascal was not deterred. Instead, he turned for help to Pierre de Fermat, a lawyer who was also a brilliant mathematician. He invented analytical geometry and contributed to the early developments in calculus. In his spare time, he worked on light refraction, optics, and research that sought to determine the weight of the earth. Pascal could not have picked a better intellectual partner.
Pascal and Fermat exchanged a series of letters, which ultimately formed the basis of what today is called probability theory. In Against the Gods, the brilliant treatise on risk, Peter Bernstein writes that this correspondence “signaled an epochal event in the history of mathematics and the theory of probability.”3 Although they attacked the problem differently—Fermat used algebra whereas Pascal turned to geometry—each was able to construct a system for determining the probability of several possible but not yet realized outcomes. Indeed, Pascal’s geometric triangle of numbers can be used today to solve many problems, including the probability that your favorite baseball team will win the World Series after losing the first game.
The contributions of Pascal and Fermat mark the beginning of what we now call decision theory—the process by which we can make optimal decisions even in the face of an uncertain future. “Making that decision,” wrote Bernstein, “is the essential first step in any effort to manage risk.”4
We now know probability theory is a potent instrument for forecasting. But, as we also know, the devil is in the details. In our case, the details are the quality of information, which forms the basis for the probability estimate. The first person to think scientifically about probabilities and information quality was Jacob Bernoulli, a member of the famed Dutch-Swiss family of mathematicians that also included both Johann and Daniel Bernoulli.
Jacob Bernoulli recognized the differences between establishing odds for a game of chance and odds for answering life’s dilemmas. As he pointed out, you do not need to actually spin the roulette wheel to figure out the odds of the ball landing on the number seventeen. However, in real life, relevant information is essential in understanding the probability of an outcome. As Bernoulli explained, nature’s patterns are only partly established, so probabilities in nature should be thought of as degrees of certainty, not as absolute certainty.
Although Pascal, Fermat, and Bernoulli are credited with developing the theory of probability, it was another mathematician, Thomas Bayes, who laid the groundwork for putting the theory into practical action.
Thomas Bayes (1701–1761) was both a Presbyterian minister and a talented mathematician. Born one hundred years after Fermat and seventy-eight years after Pascal, Bayes lived an unremarkable life in the British county of Kent, south of London. He was elected to membership in the Royal Society in 1742 on the basis of his treatise, published anonymously, about Sir Isaac Newton’s calculus. During his lifetime, he published nothing else in mathematics. However, he stipulated in his will that at his death a draft of an essay he had written and one hundred pounds sterling was to be given to Richard Price, a preacher in neighboring Newington Green. Two years after Bayes’s death, Price sent a copy of the paper, “Essay Towards Solving a Problem in the Doctrine of Chances,” to John Canton, a member of the Royal Society. In his paper, Bayes laid down the foundation for the method of statistical inference—the issue first proposed by Jacob Bernoulli. In 1764, the Royal Society published Bayes’s essay in its journal, Philosophical Transactions. According to Peter Bernstein, it was a “strikingly original piece of work that immortalized Bayes among statisticians, economists, and other social scientists.”5
Bayes’s theorem is strikingly simple: When we update our initial belief with new information, we get a new and improved belief. In Sharon Bertsch McGrayne’s thoughtful book on Bayes, The Theory That Would Not Die, she succinctly lays out the Bayesian process. “We modify our opinions with objective information: Initial Beliefs + Recent Objective Data = A New and Improved Belief.” Later mathematicians assigned terms to each part of the method. Priori is the probability of the initial belief; likelihood for the probability of a new hypothesis based on recent objective data; and posterior for the probability of a newly revised belief. McGrayne tells us “each time the system is recalculated, the posterior becomes the prior of the new iteration. It was an evolving system, with each bit of new information pushed closer and closer to certitude.”6 Darwin smiles.
Bayes’s theorem gives us a mathematical procedure for updating our original beliefs and thus changing the relevant odds. Here’s a short, easy example of how it works.
Let’s imagine that you and a friend have spent the afternoon playing your favorite board game and now, at the end of the game, are chatting about this and that. Something your friend says leads you to make a friendly wager: that with one roll of the die you will get a “6.” Straight odds are one in six, a 16 percent probability. But then suppose your friend rolls the die again, quickly covers it with her hand, and takes a peek. “I can tell you this much,” she says; “it’s an even number.” With this new information your odds change to one in three, a 33 percent probability. While you consider whether to change your bet, your friend teasingly adds: “And it’s not a 4.” Now your odds have changed again, to one in two, a 50 percent probability. With this very simple sequence, you have performed a Bayesian analysis. Each new piece of information affected the original probability.
Bayesian analysis is an attempt to incorporate all available information into a process for making inferences, or decisions. Colleges and universities use Bayes’s theorem to help students learn decision making. In the classroom, the Bayesian approach is more popularly called the “decision tree theory,” in which each branch of the tree represents new information that, in turn, changes the odds in making decisions. “At Harvard Business School,” explains Charlie Munger, “the great quantitative thing that bonds the first-year class together is what they call decision tree theory. All they do is take high school algebra and apply it to real-life problems. The students love it. They’re amazed to find that high school algebra works in life.”7
Now let’s insert Bayes’s theorem into Williams’s discounted cash flow model (DCF). We already know one of the challenges of employing the DCF is the uncertainty of predicting the future. Probability theory and Bayes’s theorem help us overcome this uncertainty. Still, another criticism of the DCF model is that it’s a linear extrapolation of the economic return of a company operating in a nonlinear world. The model assumes the growth rate of cash will remain constant for the number of years you are discounting. But it is of course highly unlikely that any company will be able to produce a perfectly predictable and constant rate of return. The economy jumps up and down, consumers are fickle, and competitors are vigorous.
How does an investor compensate for all the possibilities?
The answer is to expand your decision tree to include various time horizons and growth rates. Let’s say you want to determine the value of a certain company, and you know it has grown its cash at a rate of 10 percent in the past. You might reasonably start with an assumption that the company has a 50 percent chance of generating the same growth rate over the next five years, a 25 percent chance of a 12 percent rate, and a 25 percent chance of growing at 8 percent. Then, because the economic landscape invites competition and innovation, you might lower the assumptions for years six through eight, giving it a 50 percent probability of 8 percent growth, a 25 percent probability of 6 percent, and 25 percent probability of 10 percent. Then break the growth assumptions again for years nine and ten.
There are two broad categories of probability interpretations. The first is called physical probabilities, more commonly referred to as frequency probabilities. They are commonly associated with systems that can generate tons of data over very long periods. Think roulette wheels, flipping coins, and card and dice games. But frequency probabilities can also include probability estimates for automobile accidents and life expectancy. Yes, cars and drivers are different, but there are enough similarities among people driving in a particular area that tons of data can be generated over a multiyear period that in turn will give you frequency-like interpretations.
When a sufficient frequency of events, along with an extended time period to analyze the results, is not available, we must turn to evidential probabilities, commonly referred to as subjective probabilities. It is important to remember, a subjective probability can be assigned to any statement whatsoever, even when no random process is involved, as a way to represent the “subjective” plausibility. According to the textbooks on Bayesian analysis, “if you believe your assumptions are reasonable, it is perfectly acceptable to make your subjective probability of a certain event equal to a frequency probability.”8 What you have to do is to sift out the unreasonable and illogical in favor of reasonable.
A subjective probability, then, is not based on precise computations but is often a reasonable assessment made by a knowledgeable person. Unfortunately, when it comes to money, people are not consistently reasonable or knowledgeable. We also know that subjective probabilities can contain a high degree of personal bias.
Any time subjective probabilities are in use, it is important to remember the behavioral finance missteps we are prone to make and the personal biases to which we are susceptible. A decision tree is only as good as its inputs, and static probabilities—those that haven’t been updated—have little value. It is only through the process of continually updating probabilities with objective information that the decision tree will work.
Whether or not they recognize it, virtually all decisions investors make are exercises in probability. To succeed, it is critical that their probability statements combine the historical record with the most recent data available. That is Bayesian analysis in action.
Eight years after Claude Shannon wrote “A Mathematical Theory of Communication” (Chapter 5), a young scientist at Bell Labs, James Larry Kelly Jr., took Shannon’s celebrated paper and distilled from its findings a new probability theory.9
Kelly worked alongside Shannon at Bell Labs and thus had a close look at Shannon’s mathematics. Inside Shannon’s paper was a mathematical formula for the optimal amount of information that, considering the possibilities of success, can be pushed through copper wire. Kelly pointed out that Shannon’s various transmission rates and the possible outcomes of a chance event are essentially the same thing—probabilities—and the same formula could optimize both. He presented his ideas in a paper called “A New Interpretation of Information Rate.” Published in The Bell System Technical Journal in 1956, it opened a mathematical doorway that could help investors make portfolio decisions.10
The Kelly criterion, as applied to investing, is also known as the Kelly Optimization Model, which in turn is called the optimal growth strategy. It provides a way to determine, mathematically, the optimal size of a series of bets that would maximize the growth rate of a portfolio over time, and it’s based on the simple idea that if you know the probability of success, you bet the fraction of your bankroll that maximizes the growth rate. It is expressed as a formula: 2p − 1 = x; where 2 times the probability of winning minus 1 equals the percentage of one’s bankroll that should be bet. For example, if the probability of beating the house is 55 percent, you should bet 10 percent of your bankroll to achieve maximum growth of your winnings. If the probability is 70 percent, bet 40 percent. And if you know the odds of winning are 100 percent, the model would say, bet your entire bankroll.
Ed Oakley Thorp, a mathematics professor, blackjack player, hedge fund manager, and author, was the pioneer in applying the Kelly criterion to gambling halls as well as to the stock market. Thorp worked at MIT from 1959 to 1961. There he met Claude Shannon and read Kelly’s paper. He immediately set forth to prove to himself whether or not the Kelly criterion would actually work. Thorp learned Fortran so he could program the university’s computer to run countless equations on the probabilities of winning at blackjack using the Kelly criterion.
Thorp’s strategy was based on a simple concept. When the deck is rich with tens, face cards, and aces, the player has a statistical advantage over the dealer. If you assign a −1 for the high cards and +1 for the low cards, it’s quite easy to keep track of the cards dealt; just keep a running tally in your head, adding or subtracting as each card shows. When the count turns positive, you know there are more high cards yet to be played. Smart players would save their biggest bets for the tipping point at which the card count reached a high relative number.
Thorp continued to devise card-counting schemes. He would tweak the computer programming language while using the Kelly criterion to determine the weight of each bet. He soon ventured to Las Vegas to test his theory in practice. Starting with $10,000, Thorp doubled his money in the first weekend. He claimed he could have won more but his winning streak caught the eye of the casino security, and he was tossed out.
Over the years, Thorp became a celebrity among the blackjack aficionados. His reputation skyrocketed when it was learned he actually used a wearable computer to play roulette. This device, codeveloped with Claude Shannon, was the first computer used in a casino and is now considered illegal. No longer able to apply his mathematical theories on gambling floors, Thorp tipped his hat to Las Vegas by writing Beat the Dealer in 1962, a New York Times best seller that sold over seven hundred thousand copies. Today, it is considered the original card-counting and betting-strategy manual.
Over the years, the Kelly criterion has become a part of the mainstream investment theory. Some believe that both Warren Buffett and Bill Gross, the famed bond portfolio manager at PIMCO, use Kelly methods in managing their portfolio. William Poundstone further popularized the Kelly criterion in his popular book, Fortune’s Formula: The Untold Story of the Scientific Betting System That Beat the Casinos and Wall Street (2005). However, despite its academic pedigree and simplistic formulation, I caution that the Kelly criterion should be used by only the most thoughtful investors and even then with reservation.
In theory, the Kelly criterion is optimal under two criteria: (1) the minimal expected time to achieve a level of winnings and (2) the maximal rate of wealth increase. For example, let’s say two blackjack players each have a $1,000 stake and twenty-four hours to play the game. The first player is limited to betting only one dollar on each hand dealt; player number two can alter the bet depending on the attractiveness of the cards. If the second player follows the Kelly approach and bets the percentage of the bankroll that reflects the probability of winning, it is likely that, at the end of twenty-four hours, he will have done much better than player number one.
Of course, the stock market is more complex than a game of blackjack, in which there is a finite number of cards and therefore a limited number of possible outcomes. The stock market has thousands of companies and millions of investors and thus a greater number of possible outcomes. Using the Kelly approach, in conjunction with the Bayes theorem, requires constant recalculations of the probability statement and adjustments to the investment process.
Because in the stock market we are dealing with probabilities that are less than 100 percent, there is always the possibility of realizing a loss. Under the Kelly method, if you calculated a 60 percent chance of winning you would bet 20 percent of your assets, even though there is a 2 in 5 chance of losing. It could happen.
Two caveats to the Kelly criterion that are often overlooked: You need (1) an unlimited bankroll and (2) an infinite time horizon. Of course, no investor has either, so we need to modify the Kelly approach. Again, the solution is mathematical in the form of simple arithmetic.
To avoid “gambler’s ruin,” you minimize the risk by underbetting—using a half-Kelly or fractional Kelly. For example, if the Kelly model were to tell you to bet 10 percent of your capital (reflecting a 55 percent probability of success), you might choose to invest only 5 percent (half-Kelly) or 2 percent (fractional Kelly). That underbet provides a margin of safety in portfolio management; and that, together with the margin of safety we apply to selecting individual stocks, provides a double layer of protection and a very real psychological level of comfort.
Because the risk of overbetting far outweighs the penalties of underbetting, investors should definitely consider fractional-Kelly bets. Unfortunately, minimizing your bets also minimizes your potential gain. However, because the relationship in the Kelly model is parabolic, the penalty for underbetting is not severe. A half-Kelly, which reduces the amount of the bet by 50 percent, reduces the potential growth rate by only 25 percent.
“The Kelly system is for people who simply want to compound their capital and see it grow to very large numbers over time,” said Ed Thorp. “If you have a lot of time and a lot of patience, then it’s the right function for you.”11
At age 40, Stephen Jay Gould, the famous American paleontologist and evolutionary biologist, was diagnosed with abdominal mesothelioma, a rare and fatal form of cancer, and was rushed into surgery. After the operation Gould asked his doctor what he could read to learn more about the disease. She told him there was “not much to be learned from the literature.”12
Undeterred, Gould headed to Harvard’s Countway medical library and punched “mesothelioma” into the computer. After spending an hour reading a few of the latest articles, Gould understood why his doctor was not so forthcoming. The information was brutally straightforward: mesothelioma was incurable, with a median life expectancy of only eight months. Gould sat stunned until his mind began working again. Then he smiled.
What exactly did an eight-month median mortality signify? The median, etymologically speaking, is the halfway point between a string of values. In any grouping, half the members of the group will be below the median and half above it. In Gould’s case, half of those diagnosed with mesothelioma would die in less than eight months and half would die sometime after eight months. (For the record, the other two measures of central tendency are mean and mode. Mean is calculated by adding up all the values and dividing by the number of cases—a simple average. Mode refers to the most common value. For example, in the string of numbers 1, 2, 3, 4, 4, 4, 7, 9, 12, the number 4 is the mode.)
Most people look on averages as basic reality, giving little thought to the possible variances. Seen this way, “eight months’ median mortality” meant he would be dead in eight months. But Gould was an evolutionary biologist and evolutionary biologists live in a world of variation. What interests them is not the average of what happened but the variation in the system over time. To them, means and medians are abstractions.
Most of us have a tendency to see the world along the bell shape curve with two equal sides, where mean, median, and mode are all the same value. But as we have learned, nature does not always fit so neatly along a normal, symmetrical distribution but sometimes skews asymmetrically to one side or the other. These distributions are called either right or left skewed depending on the direction of the elongation.
Gould the biologist did not see himself as the average patient of all mesothelioma patients but as one individual inside a population set of mesothelioma patients. With further investigation, he discovered that the life expectancy of patients was strongly right skewed, meaning that those on the plus side of the eight-month mark lived significantly longer than eight months.
What causes a distribution to skew either left or right? In a word, variation. As variation on one or the other side of the median increases, the sides of the bell curve are pulled either right or left. Continuing with our example, in Gould’s case, those patients who lived past the eight-month mark showed high variance (many of them lived not just more months but years), and that pulled the curve to form a right skew. In a right-skewed distribution, the measures of central tendency do not coincide; the median lies to the right of the mode and the mean lies to the right of the median.
Gould began to think about the characteristics of those patients who populated the right skew of the distribution, who exceeded the median distribution of life expectancy. Not surprisingly, they were young, generally in good health, and had benefited from early diagnosis. This was Gould’s own profile, and so he reasoned there was a good chance he would live well beyond the eight-month mark. Indeed, Gould lived for another twenty years.
“Our culture encodes a strong bias either to neglect or ignore variation,” Gould said. “We tend to focus instead on measures of central tendency, and as a result we make some terrible mistakes, often with considerable practical import.”13
The most important lesson investors can learn from Gould’s experience is to appreciate the differences between the trend of the system and trends in the system. Put differently, investors need to understand the difference between the average return of the stock market and the performance variation of individual stocks. One of the easiest ways for investors to appreciate the differences is to study sideways markets.
Most investors have experienced two types of stock markets—bull and bear—that go either up or down over time. But there is a third, less familiar type of market. It is called a “sidewinder” and it produces a sideways market—one that barely changes over time.
One of the more famous sideways markets occurred between 1975 and 1982. On October 1, 1975, the Dow Jones Industrial Average stood at 784. Nearly seven years later, on August 6, 1982, the Dow closed at the exact 784. Even though nominal earnings grew over the time period, the price paid for those earnings dropped. By the end of 1975, the trailing price-earnings multiple for the S&P 500 was almost 12 times. By the fall of 1982, it had declined to nearly 7 times.
Some stock market forecasters are drawing analogies to what happened then to what may be happening today. There are concerns about the rate of corporate profit growth against the backdrop of a weak global economic recovery. Others fear the massive stimulation provided by the monetary authorities will cause a rise in commodity prices, inflation, and decline in the dollar. This will, in turn, feed back into the stock market, causing price-earnings multiples to fall. Ultimately, investors could face a prolonged period when the market barely budges—and when they are best advised to avoid stocks.
When I first heard that argument—that we might be facing a sideways market similar to the late 1970s and it was best to avoid stocks—I was puzzled. Was it really true that sideways markets are unprofitable for long-term investors? Warren Buffett, for one, had generated excellent returns during the period; so did his friend and Columbia University classmate Bill Ruane. From 1975 through 1982, Buffett generated a cumulative total return of 676 percent at Berkshire Hathaway; Ruane and his Sequoia Fund partner Rick Cunniff posted a 415 percent cumulative return. How did they manage these outstanding returns in a market that went nowhere? I decided to dig a little.
First, I examined the return performance of the 500 largest stocks in the market between 1975 and 1982. I was specifically looking for stocks that had produced outsized gains for shareholders. Over the 8-year period, only 3 percent of the 500 stocks went up in price by at least 100 percent in any one year. When I extended the holding period to 3 years, the results were more encouraging: Over rolling 3-year periods, 18.6 percent of the stocks, on average, doubled. That equals 93 out of 500. Then I extended the holding period to 5 years. Here the returns were eye-popping. On average, an astonishing 38 percent of the stocks went up 100 percent or more; that’s 190 out of 500.14
Putting it in Gould’s terms, investors who observed the stock market between 1975 and 1982 and focused on the market average came to the wrong conclusion. They wrongly assumed that the direction of the market was sideways, when in fact the variation within the market was dramatic and led to plenty of opportunities to earn high excess returns. Gould tells us “the old Platonic strategy of abstracting the full house as a single figure (an average) and then tracing the pathway of this single figure through time, usually leads to error and confusion.” Because investors have a “strong desire to identify trends,” it often leads them “to detect a directionality that doesn’t exist.” As a result, they completely “misread the expanding or contracting variation within a system. “In Darwin’s world,” said Gould, “variation stands as the fundamental reality and calculated averages become abstractions.”15
On the first page of their seminal book Security Analysis, Benjamin Graham and David Dodd included a quote from Quintus Horatius Flaccus, (65–8 B.C.E.) “Many shall be restored that now are fallen and many shall fall that are now in honor.” Just as Aesop had no clue his fable about Hawk and Nightingale was the literary preamble to the discounted cash flow model, so too I am sure Horace had no idea he had just written down the narrative formula for regression to the mean.
Whenever you hear someone say, “It all averages out,” that’s a colloquial rendition of regression to the mean—a statistical phenomenon that, in essence, describes the tendency of unusually high or unusually low values to eventually drift back toward the middle. As used in investing, it suggests that very high or very low performance is not likely to continue and will probably reverse in a later period. (That’s why it is sometimes called reversion to the mean.) Regression to the mean, Peter Bernstein points out, is the core of several homilies, including “what goes up must come down,” “pride goeth before a fall,” and Joseph’s prediction to Pharaoh that seven years of famine would follow seven years of plenty. And, Bernstein tells us, it also lies at the heart of investing, for regression to the mean is a common strategy—often applied and sometimes overused—for picking stocks and predicting markets.
We can trace the mathematical discovery of regression to the mean to Sir Francis Galton, a British intellectual and cousin of Charles Darwin. (You may recall Galton and his ox-weighing contest in our chapter on sociology). Galton had no interest in business or economics. Rather, one of his principal investigations was to understand how talent persisted in a family generation after generation—including the Darwin clan.
Galton was the beneficiary of the work by a Belgian scientist named Lambert Adolphe Jacques Quetelet (1796–1874). Twenty years older than Galton, Quetelet had founded the Brussels Observatory and was instrumental in introducing statistical methods to the social sciences. Chief among his contributions was the recognition that normal distributions appeared rooted in social structures and the physical attributes of human beings.
Galton was enthralled with Quetelet’s discovery that “the very curious theoretical law of the deviation from the average—the normal distribution—was ubiquitous, especially in such measurements as body height and chest measurements.”16 Galton was in the process of writing Hereditary Genius, his most important work, which sought to prove that heredity alone was the source of special talents, not education or subsequent professional careers. But Quetelet’s deviation from the average stood in his way. The only way Galton could advance his theory was to explain how the differences within a normal distribution occurred. And the only way he could do this was to figure out how data arranged itself in the first place. In doing so, Galton made what Peter Bernstein calls an “extraordinary discovery” that has had vast influence in the world of investing.
Galton’s first experiments were mechanical. He invented the Quincunx, an unconventional pinball machine shaped like an hourglass with twenty pins stuck in the neck. Demonstrating his idea before the Royal Society, Galton showed that when he dropped balls at random they tended to distribute themselves in compartments at the bottom of the hourglass in a classic Gaussian fashion. Next he studied garden peas—or more specifically, the peas in the pod. He measured and weighed thousands of peas and sent ten specimens to friends throughout the British Isles with specific instructions on how to plant them. When he studied the off spring of the ten different groups, Galton found that their physical attributes were arranged in normal, Gaussian distribution just as the Quincunx would have predicted.
This experiment, along with others including the study of height variation between parents and their children, became known as regression, or reversion, to the mean. “Reversion,” said Galton, “is the tendency of the ideal filial type to depart from the parent type, reverting to what may be roughly and perhaps fairly described as the average ancestral type.”17 If this process were not at work, explained Galton, then large peas would produce ever-larger peas and small peas would produce ever-smaller peas until we had a world that consisted of nothing but giants and midgets.
J. P. Morgan was once asked what the stock market would do next. His response: “It will fluctuate.” No one at the time thought this was a backhanded way of describing regression to the mean. But this now-famous reply has become the credo for contrarian investors. They would tell you greed forces stock prices to move higher and higher from intrinsic value, just as fear forces prices lower and lower from intrinsic value, until regression to the mean takes over. Eventually, variance will be corrected in the system.
It is easy to understand why regression to the mean is slavishly followed on Wall Street as a forecasting tool. It is a neat and simple mathematical conjecture that allows us to predict the future. But if Galton’s Law is immutable, why is forecasting so difficult?
The frustration comes from three sources. First, reversion to the mean is not always instantaneous. Overvaluation and undervaluation can persist for a period longer—much longer—than patient rationality might dictate. Second, volatility is so high, with deviations so irregular, that stock prices don’t correct neatly or come to rest easily on top of the mean. Last, and most important, in fluid environments (like markets) the mean itself may be unstable. Yesterday’s normal is not tomorrow’s. The mean may have shifted to a new location.
In physics-based systems, the mean is stable. We can run a physics experiment ten thousand times and get roughly the same mean over and over again. But markets are biological systems. Agents in the system—investors—learn and adapt to an ever-changing landscape. The behavior of investors today, their thoughts, opinions and reasoning, is different from investors of the last generation.
Up until the 1950s, the dividend yield on common stocks was always higher than the yield on government bonds. That’s because the generation that lived through the 1929 stock market crash and Great Depression demanded safety in the form of higher dividends if they were to purchase stocks over bonds. They may not have used the term, but in fact they employed a simply strategy of regression to the mean. When common stock yields approached or dipped below government bond yields, they sold stocks and bought bonds. Galton’s Law reset prices.
As economic prosperity returned in the 1950s, a generation removed from the painful stock market losses of the 1930s embraced common stocks. Had you held steadfast to the idea that common stock yields would revert back to levels higher than bond yields, you would have lost money. And an example from today’s market: In a striking turn of events, the dividend yields on many common stocks in 2011 were higher than the yield on 10-year U.S. Treasury notes. Following the regression approach, you would have sold bonds in favor of stocks. Yet as we move into 2012, bonds have continued to outpace stocks. How long will this economic deviation from the mean last? Or has the mean now shifted?
Most people think the S&P 500 Index is a passively managed basket of stocks that rarely changes. But that is untrue. Each year the selection committee at Standard & Poor’s subtracts companies and adds new ones; about 15 percent of the index, roughly 75 companies, is exchanged. Some companies exit the index because they have been taken over by another company. Others are removed because their declining economic prospects mean they no longer qualify for the largest 500 companies. The companies that are added are typically healthy and vibrant in industries that are having a positive impact on the economy. As such, the S&P 500 Index evolves in a Darwinian manner, populating itself with stronger and stronger companies—survival of the fittest.
Fifty years ago, the S&P 500 Index was dominated by manufacturing, energy, and utility companies. Today it is dominated by technology, health care, and financial companies. Because the return on equity for the latter three is higher than the first group of three, the average return on equity of the index is now higher today than it was thirty years ago. The mean has shifted. In the words of Thomas Kuhn, there has been a paradigm shift.
Overemphasizing the present without understanding the subtle shifts in composition can lead to perilous and faulty decisions. Although regression to the mean remains an important strategy, it is imperative that investors remember it is not inviolable. Stocks that are thought to be high in price can still move higher; stocks that are low in price can continue to decline. It is important to remain flexible in your thinking. Although reversion to the mean is the most likely outcome in markets, its presence is not sacrosanct.
Gottfried Leibniz (1646–1716), the German philosopher and mathematician, wrote, “Nature has established patterns originating in the return of events, but only for the most part.”18 The mathematics in this chapter is very much about helping investors better understand, so they can better anticipate, the “returning events.” Still, we are left with uncertainties, discontinuities, irregularities, volatilities, and fat tails.
Frank H. Knight (1885–1972) was an American economist who spent his career at the University of Chicago and is credited with founding the Chicago School of Economics. His students included Nobel laureates James Buchanan, Milton Friedman, and George Stigler. Knight is best known as the author of Risk, Uncertainty, and Profit, in which he seeks to distinguish between economic risk and uncertainty. Risk, he said, involves situations with unknown outcomes but is governed by probability distributions known at the outset. We may not know exactly what is going to happen, but based on the past events and the probabilities assigned, we have a pretty good idea what is likely to happen.
Uncertainty is different. With uncertainty, we don’t know the outcome, but we also don’t know what the underlying distribution looks like, and that’s a bigger problem. Knightian uncertainty is both immeasurable and impossible to calculate. There is only one constant: surprise.
Nassim Nicholas Taleb, in his best-selling book The Black Swan: The Impact of the Highly Improbable (2007), has done much to reconnect investors to Knight’s notion of uncertainty. A “black swan,” as described by Taleb, is an event with three attributes: (1) “it is an outlier, as it lies outside the realm of regular expectations, because nothing in the past can convincingly point to its possibility, (2) it carries an extreme impact, (3) in spite of its outlier status, human nature makes us concoct explanations for its occurrence after the fact, making it explainable and predictable.”19
In The Black Swan, Taleb’s goal was to help investors better appreciate the disproportionate role of events that are hard-to-predict, high-impact, and rare—a swan born black—events well beyond the normal expectations we have for history, science, technology, and finance. Second, he wanted to bring attention to the incomputable nature of these ultrarare events using scientific methods based on the nature of a small probability set. Lastly, he wanted to bring to light the psychological biases, the blindness, we have to uncertainty and history’s rare events.
According to Taleb, our assumptions about what is going to happen grow out of the bell-shape curve of predictability—what he calls “Mediocristan.” Instead, the world is shaped by wild, unpredictable, and powerful events he calls “Extremistan.” In Taleb’s world, “history does not crawl, it jumps.”
The attack on Pearl Harbor in 1941 and the 9/11 terrorist attack on the World Trade Center are examples of black swan events. Both were outside the realm of expectation, both had extreme impact, and both were readily explainable after the fact. Unfortunately, the term black swan has become trivialized. Media is quick to attach the moniker to just about anything that is the least bit irregular, including freak snowstorms, earthquakes, and stock market volatility. It would be more appropriate to label these events “gray swans.”
Statisticians have a term for black swan events; it is called a fat tail. William Safire, New York Times columnist, explains the terminology: In a normal distribution, the bell curve is tall and wide in the middle and drops and flattens out at the bottom. The extremities at the bottom, either on the right side or the left, are called tails. When the tails balloon instead of vanishing in a normal distribution, the tails are designated as “fat.”20 Taleb’s black swan event shows up as a fat tail. In statistics, events that deviate from a normal distribution mean by five or more standard deviations are considered extremely rare.
Like the term black swan, fat tail has become a part of the investing nomenclature. We hear constantly that investors cannot suffer another “left-tail” event. Institutional investors are now buying “left-tail” insurance; hedge funds are selling “left-tail” protection. Here again, I believe we are misusing terms. Today, any mild deviation from the norm is quickly labeled as a black swan or a fat tail.
Mathematics, like physics, has a seductive quality about it. Math leads us toward precision and away from ambiguity. Still, there is an uneasy relationship between quantification of the past in order to predict versus subjective degrees of belief about what the future might hold. The economist and Nobel laureate Kenneth Arrow warns us that the mathematically driven risk management approach to investing contains the seeds of its own self-destructive technology. He writes, “Our knowledge of the way things work, in society or in nature, comes trailing clouds of vagueness. Vast ills have followed a belief in certainty.”21
This is not to say probability, variance, regression to the mean, and fat tails are useless. Far from it. These mathematical tools have helped us narrow the cone of uncertainty that exists in markets—but not eliminate it. “The recognition of risk management as a practical art rests on a simple cliché with the most profound consequences: when our world was created, nobody remembered to include certainty,” said Peter Bernstein. “We are never certain; we are always ignorant to some degree. Much of the information we have is either incorrect or incomplete.”22
Gilbert Keith Chesterton, the English literary critic and author of the Father Brown mysteries, captured our dilemma perfectly:
The real trouble with this world of ours is not that it is an unreasonable world, nor even that it is a reasonable one. The commonest kind of trouble is that it is nearly reasonable, but not quite. Life is not an illogicality; yet it is a trap for logicians. It looks just a little more mathematical and regular than it is; its exactitude is obvious, but its inexactitude is hidden; its wildness lies in the wait.23