The Lady Tasting Tea

AUTHOR’S PREFACE

Science entered the nineteenth century with a firm philosophical vision that has been called the clockwork universe. It was believed that there were a small number of mathematical formulas (like Newton’s laws of motion and Boyle’s laws for gases) that could be used to describe reality and to predict future events. All that was needed for such prediction was a complete set of these formulas and a group of associated measurements that were taken with sufficient precision. It took over forty years for popular culture to catch up with this scientific vision.

Typical of this cultural lag is the exchange between Emperor Napoleon and Pierre Simon Laplace in the early years of the nineteenth century. Laplace had written a monumental and definitive book describing how to compute the future positions of planets and comets on the basis of a few observations from Earth. “I find no mention of God in your treatise, M. Laplace,” Napoleon is reported to have said. “I had no need for that hypothesis,” Laplace replied.

Many people were horrified with the concept of a godless clockwork universe running forever without divine intervention, with all future events determined by events of the past. In some sense, the romantic movement of the nineteenth century was a reaction to this cold, exact use of reasoning. However, a proof of this new science appeared in the 1840s, which dazzled the popular imagination. Newton’s mathematical laws were used to predict the existence of another planet, and the planet Neptune was discovered in the place where these laws predicted it. Almost all resistance to the clockwork universe crumbled, and this philosophical stance became an essential part of popular culture.

However, if Laplace did not need God in his formulation, he did need something he called the “error function.” The observations of planets and comets from this earthly platform did not fit the predicted positions exactly. Laplace and his fellow scientists attributed this to errors in the observations, sometimes due to perturbations in the earth’s atmosphere, other times due to human error. Laplace threw all these errors into an extra piece (the error function) he tacked onto his mathematical descriptions. This error function sopped them up and left only the pure laws of motion to predict the true positions of celestial bodies. It was believed that, with more and more precise measurements, the need for an error function would diminish. With the error function to account for slight discrepancies between the observed and the predicted, early-nineteenth-century science was in the grip of philosophical determinism—the belief that everything that happens is determined in advance by the initial conditions of the universe and the mathematical formulas that describe its motions.

By the end of the nineteenth century, the errors had mounted instead of diminishing. As measurements became more and more precise, more and more error cropped up. The clockwork universe lay in shambles. Attempts to discover the laws of biology and sociology had failed. In the older sciences like physics and chemistry, the laws that Newton and Laplace had used were proving to be only rough approximations. Gradually, science began to work with a new paradigm, the statistical model of reality. By the end of the twentieth century, almost all of science had shifted to using statistical models.

Popular culture has failed to keep up with this scientific revolution. Some vague ideas and expressions (like “correlation,” “odds,” and “risk”) have drifted into the popular vocabulary, and most people are aware of the uncertainties associated with some areas of science like medicine and economics, but few nonscientists have any understanding of the profound shift in philosophical view that has occurred. What are these statistical models? How did they come about? What do they mean in real life? Are they true descriptions of reality? This book is an attempt to answer these questions. In the process, we will also look at the lives of some of the men and women who were involved in this revolution.

In dealing with these questions, it is necessary to distinguish among three mathematical ideas: randomness, probability, and statistics. To most people, randomness is just another word for unpredictability. An aphorism from the Talmud conveys this popular notion: “One should not go hunting for buried treasure, because buried treasure is found at random, and, by definition, one cannot go searching for something which is found at random.” But, to the modern scientist, there are many different types of randomness. The concept of a probability distribution (which will be described in chapter 2 of this book) allows us to put constraints on this randomness and gives us a limited ability to predict future but random events. Thus, to the modern scientist, random events are not simply wild, unexpected, and unpredictable. They have a structure that can be described mathematically.

Probability is the current word for a very ancient concept. It appears in Aristotle, who stated that “It is the nature of probability that improbable things will happen.” Initially, it involved a person’s sense of what might be expected. In the seventeenth and eighteenth centuries, a group of mathematicians, among them the Bernoullis, a family of two generations, Fermat, de Moivre, and Pascal, all worked on a mathematical theory of probability, which started with games of chance. They developed some very sophisticated methods for counting equally probable events. de Moivre managed to insert the methods of calculus into these techniques, and the Bernoullis were able to discern some deep fundamental theorems, called “laws of large numbers.” By the end of the nineteenth century, mathematical probability consisted primarily of sophisticated tricks but lacked a solid theoretical foundation.

In spite of the incomplete nature of probability theory, it proved useful in the developing idea of a statistical distribution. A statistical distribution occurs when we are considering a specific scientific problem. For instance, in 1971, a study from the Harvard School of Public Health was published in the British medical journal Lancet, which examined whether coffee drinking was related to cancer of the lower urinary tract. The study reported on a group of patients, some of whom had cancer of the lower urinary tract and some of whom had other diseases. The authors of the report also collected additional data on these patients, such as age, sex, and family history of cancer. Not everyone who drinks coffee gets urinary tract cancer, and not everyone who gets urinary tract cancer is a coffee drinker, so there are some events that contradict their hypothesis. However, 25 percent of the patients with this cancer habitually drank four or more cups of coffee a day. Only 10 percent of the patients without cancer were such heavy drinkers of coffee. There would seem to be some evidence in favor of the hypothesis.

This collection of data provided the authors with a statistical distribution. Using the tools of mathematical probability, they constructed a theoretical formula for that distribution, called the “probability distribution function,” or just distribution function, which they used to examine the question. It is like Laplace’s error function, but much more complicated. The construction of the theoretical distribution function makes use of probability theory, and it is used to describe what might be expected from future data taken at random from the same population of people.

This is not a book about probability and probability theory, which are abstract mathematical concepts. This is a book about the application of some of the theorems of probability to scientific problems, the world of statistical distributions, and distribution functions. Probability theory alone is insufficient to describe statistical methods, and it sometimes happens that statistical methods in science violate some of the theorems of probability. The reader will find probability drifting in and out of the chapters, being used where needed and ignored when not.

Because statistical models of reality are mathematical ones, they can be fully understood only in terms of mathematical formulas and symbols. This book is an attempt to do something a little less ambitious. I have tried to describe the statistical revolution in twentieth-century science in terms of some of the people (many of them still living) who were involved in that revolution. I have only touched on the work they created, trying to give the reader a taste of how their individual discoveries fit into the overall picture.

The reader of this book will not learn enough to engage in the statistical analysis of scientific data. That would require several years of graduate study. But I hope the reader will come away with some understanding of the profound shift in basic philosophy that is represented by the statistical view of science. So, where does the nonmathematician go to understand this revolution in science? I think that a good place to start is with a lady tasting tea … .