THE MARCH OF THE MARTINGALES
Congestive heart failure is one of the major causes of death in the world. Although it often attacks men and women in the prime of life, it is primarily a disease of old age. Among citizens of the United States above the age of sixty-five, congestive heart failure or its complications account for almost half the deaths. From the standpoint of public health, congestive heart failure is more than a cause of death; it is also a cause of considerable illness among the living. The frequent hospitalizations and the complicated medical procedures used to stabilize patients with congestive heart failure are a major factor in the overall cost of medical services in the country. There is an intense interest in finding effective outpatient care that can reduce the need for hospitalizations and improve the quality of life for these patients.
Unfortunately, congestive heart failure is not a simple disease that can be attributed to a single infective agent or which can be alleviated by blocking a particular enzymatic pathway. The primary symptom of congestive heart failure is the increasing weakness of the heart muscle. The heart becomes less and less able to respond to the subtle commands of hormones that regulate its rate and strength of contraction to meet the changing needs of the body. The heart muscle becomes enlarged and flabby. Fluid builds up in the lungs and ankles. The patient becomes breathless after slight exertion. The reduced amount of blood being pumped through the body means that the brain has a reduced level when the stomach demands blood to digest a meal, and the patient becomes confused or dozes for long periods of time.
To maintain homeostasis, the life forces in the patient adapt to this decrease in heart output. In many patients, the balance of hormones that regulate the heart and other muscles changes to reach a somewhat stable state, where some of the hormonal levels and their responses are “abnormal.” If the physician treats this abnormal balance with drugs like beta-adrenergic agonists or calcium channel blockers, the result may be an improvement in the patient’s condition. Or, by tipping over that barely stable state, the treatment may drive the patient into further deterioration. One of the major causes of death among congestive heart failure patients used to be the buildup of fluid in their lungs (formerly called dropsy). Modern medicine makes use of powerful diuretics, which keep the fluid level down. In the process, however, these diuretics can, themselves, introduce problems in the feedback between hormones generated by the kidney and those generated by the heart in response.
The search continues for effective medical treatments to prolong the life of these patients, reduce the frequency of hospitalizations, and improve the quality of their lives. Since some treatments may have counterproductive effects on some patients, any clinical study of these treatments will have to take specific patient characteristics into account. In this way, the final analysis of data from such a study can identify those patients for whom the treatment is effective and those patients for whom it is countereffective. The statistical analysis of congestive heart failure studies can become exceedingly difficult.
When designing such a study, the first question is what to measure. We could, for instance, measure the average number of hospitalizations of patients on a given treatment. This is a rough overall measure that misses such important aspects as the patients’ ages, their initial health states, and the frequency and length of those hospitalizations. It would be better to consider the time course of each patient’s disease, accounting for the hospitalizations that might occur, how long they last, how long since the previous one, measurements of quality of life between hospitalizations, and adjusting all these outcomes for the patient’s age and other diseases that might be present. This might be the ideal from a medical point of view, but it poses difficult statistical problems. There is no single number to associate with each patient. Instead, the patient’s record is a time course of events, some of them repeated, others of which are measured by multiple measurements. The “measurements” of this experiment are multileveled, and the distribution function, whose parameters must be estimated, will have a multidimensional structure.
The solution to this problem begins with the French mathematician Paul Levy. Levy was the son and grandson of mathematicians. Born in 1886, he was identified early on as a gifted student. Following the usual procedure in France at the time, he was quickly moved through a series of special schools for the gifted and won academic honors. He received the Prix du Concours Général in Greek and mathematics while still in his teens; the Prix d’Excellence in mathematics, physics, and chemistry at Lycée Saint Louis; and a first Concours d’Entrée at the Ecole Normale Supérieure and at the Ecole Polytechnique. In 1912, at age twenty-six, he received his docteur des sciences degree, and his thesis was the basis of a major book he wrote on abstract functional analysis. By the time he was thirty-three, Paul Levy was a full professor at the
Ecole Polytechnique and a member of the Académie des Sciences. His work in the abstract theories of analysis made him world famous. In 1919, he was asked by his school to prepare a series of lectures on probability theory, and he began examining that subject in depth for the first time.
Paul Levy was dissatisfied with probability theory as a collection of sophisticated counting methods. (Andrei Kolmogorov had not yet made his contribution.) Levy looked for some underlying abstract mathematical concepts that might allow him to unify these many methods. He was struck by de Moivre’s derivation of the normal distribution and by the “folk theorem” among mathematicians that de Moivre’s result should hold for many other situations—what came to be called the “central limit theorem.” We have seen how Levy (along with Lindeberg in Finland) in the early 1930s finally proved the central limit theorem and determined the conditions necessary for it to hold. In doing so, Levy started with the formula for the normal distribution and worked backward, asking what were the unique properties of this distribution that would make it rise out of so many situations.
Levy then approached the problem from the other direction, asking what was it about specific situations that led to the normal distribution. He determined that a simple set of two conditions will guarantee that data tend to be normally distributed. These two conditions are not the only ways the normal distribution can be generated, but Lévy’s proof of the central limit theorem established the more general set of conditions that is always needed. These two conditions were adequate for the situation where we have a sequence of randomly generated numbers, one following on the other:
1. The variability has to be bounded so individual values do not become infinitely large or small.
2. The best estimate of the next number is the value of the last number.
Lévy called such a sequence a “martingale.”
Levy appropriated the word martingale from a gambling term. In gambling, a martingale is a procedure wherein the gambler doubles his bet each time he loses. If he has a 50:50 chance of winning, the expected loss is equal to his previous loss. There are two other meanings to the word. One describes a device used by French farmers to keep a horse’s head down and to keep the animal from rearing. The farmer’s martingale keeps the horse’s head in a position so that it can be moved at random, but the most probable future position is the one the head is held in now. A third definition of the term is a nautical one. A martingale is a heavy piece of wood hung from the boom of a sail to keep the boom from swinging too far from one side to the other. Here, too, the last position of the boom is the best predictor of its next position. The word itself is derived from the inhabitants of the French town of Martique, who were legendary for their stinginess, so the best estimate of the little money they would give next week was the little they gave today.
Thus, the stingy inhabitants of Martique gave their name to a mathematical abstraction in which Paul Levy developed the stingiest possible characteristics of a sequence of numbers that tend to have a normal distribution. By 1940, the martingale had become an important tool in abstract mathematical theory. Its simple requirements meant that many types of sequences of random numbers could be shown to be martingales. In the 1970s, Odd Aalen of the University of Oslo in Norway realized that the course of patient responses in a clinical trial was a martingale.
Recall the problems arising from a congestive heart failure study. The patient responses tend to be idiosyncratic. There are questions about how to interpret events like hospitalizations when they occur early in the study or later (when the patients have become older).
There are questions about how to deal with the frequency of hospitalizations and the length of stay in the hospitals. All of these questions can be answered by viewing as martingales the stream of numbers taken over time. In particular, Aalen noted, a patient who is hospitalized can be taken out of the analysis and returned to it when released. Multiple hospitalizations can be treated as if each one were a new event. At each point in time, the analyst need know only the number of patients still in the study (or returned to the study) and the number of patients who were initially entered into the study.
By the early 1980s, Aalen was working with Erik Anderson of the University of Aarhus in Denmark and Richard Gill of the University of Utrecht in the Netherlands, exploiting the insight that he had developed. In the first chapter of this book, I pointed out that scientific and mathematical research is seldom done alone. The abstractions of mathematical statistics are so involved that it is easy to make mistakes. Only by discussion and criticism among colleagues can many of these mistakes be found. The collaboration among these three, Aalen, Anderson, and Gill, provided one of the most fruitful developments of the subject in the final decades of the twentieth century.
The work of Aalen, Anderson, and Gill has been supplemented by that of Richard Olshen and his collaborators at the University of Washington and by Lee-Jen Wei at Harvard University, to produce a wealth of new methods for analyzing the sequence of events that occur in a clinical trial. L. J. Wei, in particular, has exploited the fact that the difference between two martingales is also a martingale, to eliminate the need to estimate many of the parameters of the model. Today, the martingale approach dominates the statistical analysis of long-term clinical trials of chronic disease.
The legendary stinginess of the inhabitants of Martique was the starting point. A Frenchman, Paul Lévy, had the initial insights. The mathematical martingale passed through many other minds,
with contributions from Americans, Russians, Germans, English, Italians, and Indians. A Norwegian, a Dane, and a Dutchman brought it to clinical research. Two Americans, one born in Taiwan, elaborated on their work. A complete listing of the authors of papers and books on this topic, which have emerged since the late 1980s, would fill many pages and involve workers from still other countries. Truly, mathematical statistics has become an international work in the making.