CHAPTER 4
RAKING OVER THE MUCK HEAP
Ronald Aylmer Fisher was twenty-nine years old when he moved with his wife, three children, and sister-in-law into an old farmhouse near the Rothamsted Agricultural Experimental Station, north of London, in the spring of 1919. By many measures, he could have been considered a failure in life. He had grown up as a sickly and lonely child with severe vision impairment. To protect his nearsighted eyes, the doctors had forbidden him to read by artificial light. He had taken early to mathematics and astronomy. He was fascinated with astronomy at age six. At ages seven and eight, he was attending popular lectures given by the famous astronomer Sir Robert Ball.
Fisher matriculated at Harrow, the renowned public school6 where he excelled in mathematics. Because he was not allowed to use electric light, his mathematics tutor would teach him in the evening without the use of pencil, paper, or any other visual aids. As a result, Fisher developed a deep geometric sense. In future years, his unusual geometric insights enabled him to solve many difficult problems in mathematical statistics. The insights were so obvious to him that he often failed to make them understandable to others. Other mathematicians would spend months or even years trying to prove something that Fisher claimed was obvious.
e9781466801783_i0008.jpg
He entered Cambridge in 1909, rising to the prestigious title of wrangler in 1912. A student at Cambridge becomes a wrangler by passing a series of extremely difficult mathematics exams, both oral and written. This was something accomplished by no more than one or two classmen a year, and some years there were no wranglers. While still an undergraduate, Fisher published his first scientific paper, where complicated iterative formulas are interpreted in terms of multidimensional geometric space. In this paper, what had hitherto been an exceedingly complicated method of computation is shown to be a simple consequence of that geometry. He stayed for a year after graduation to study statistical mechanics and quantum theory. By 1913, the statistical revolution had entered physics, and these were two areas where the new ideas were sufficiently well formulated to produce formal course work.
Fisher’s first job was in the statistical office of an investment company, which he left suddenly to do farmwork in Canada; again, he left his work suddenly to return to England at the beginning of the First World War. Although he qualified for a commission in the army, his poor eyesight kept him out of military service. He spent the war years teaching mathematics in a series of public schools, each experience worse than the previous one. He was short-tempered with students who could not understand what was to him obvious.
While still an undergraduate, Fisher had a note published in Biometrika, as mentioned in the previous chapter. As a result, Fisher met Karl Pearson, who introduced him to the difficult problem of determining the statistical distribution of Galton’s correlation coefficient. Fisher thought about the problem, cast it into a geometric formulation, and within a week had a complete answer. He submitted it to Pearson for publication in Biometrika; Pearson could not understand the mathematics and sent the paper to William Sealy Gosset, who also had difficulty understanding it. Pearson knew how to get partial solutions to the problem for specific cases. His method involved monumental amounts of calculation, and he set the workers in his biometrical laboratory to computing those specific answers. In every case, they agreed with Fisher’s more general solution. Still, Pearson did not publish Fisher’s paper. He urged Fisher to make changes and to reduce the generality of the work. Pearson held Fisher off for over a year, while he had his assistants (the “calculators”) computing a large, extensive table of the distribution for selected values of the parameters. Finally, he published Fisher’s work, but as a footnote to a larger paper in which he and one of his assistants displayed these tables. The result was that, to the casual reader, Fisher’s mathematical manipulations were a mere appendix to the more important and massive computational work done by Pearson and his coworkers.
Fisher never published another paper in Biometrika, although this was the preeminent journal in the field. In the following years, his papers appeared in the Journal of Agricultural Science, The Quarterly Journal of the Royal Meteorological Society, The Proceedings of the Royal Society of Edinburgh, and the Proceedings of the Society of Psychical Research. All of these are journals that one does not normally associate with mathematical research. According to some who knew Fisher, these choices were made because Pearson and his friends effectively froze Fisher out of the mainstream of mathematical and statistical research. According to others, Fisher himself felt rebuffed by Pearson’s cavalier attitude and by his failure to get a similar paper published in the Journal of the Royal Statistical Society (the other prestigious journal in the field); and he proceeded to use other journals, sometimes paying the journal to have his paper appear in it.
Some of these early papers by R. A. Fisher are highly mathematical. The paper on the correlation coefficient, which Pearson finally published, is dense with mathematical notation. A typical page is half or more filled with mathematical formulas. There are also papers in which no mathematics appear at all. In one of them, he discusses the ways in which Darwin’s theory of random adaptation is adequate to account for the most sophisticated anatomical structures. In another, he speculates on the evolution of sexual preference. He joined the eugenics movement and, in 1917, published an editorial in the Eugenics Review, calling for a concerted national policy “to increase the birth-rate in the professional classes and among highly skilled artisans” and to discourage births among the lower classes. He argued in this paper that governmental policies that provided welfare for the poor encouraged them to procreate and pass on their genes to the next generation, whereas the concerns of the middle class for economic security led to postponement of marriages and limited families. The end result, Fisher feared, was for the nation to select the “poorest” genes for future generations and to deselect the “better” genes. The question of eugenics, the movement to improve the human gene stock by selective breeding, would dominate much of Fisher’s political views. During World War II, he would be falsely accused of being a fascist and squeezed out of all war-related work.
Fisher’s politics contrast with the political views of Karl Pearson, who flirted with socialism and Marxism, whose sympathies lay with the downtrodden, and who loved to challenge the entrenched “better” classes. Whereas Pearson’s political views had little obvious effect on his scientific work, Fisher’s concern over eugenics led him to put a great deal of effort into the mathematics of genetics. Starting with the (at that time) new ideas that specific characteristics of a plant or animal can be attributed to a single gene, which can occur in one of two forms, Fisher moved well beyond the work of Gregor Mendel,7 showing how to estimate the effects of neighboring genes on each other.
The idea that there are genes that govern the nature of life is part of the general statistical revolution of science. We observe characteristics of plants and animals that are called “phenotypes,” but we postulate that these phenotypes are the result of interactions among the genes with different probabilities of interaction. We seek to describe the distribution of phenotypes in terms of these underlying and unseen genes. In the late twentieth century, biologists identified the physical nature of these genes as segments of the hereditary molecule, DNA. We can read these genes to determine what proteins they instruct the cell to make, and we talk about these as real events. But what we observe is still a scatter of possibilities, and the segments of DNA we call genes are imputed from that scatter.
This book deals with the general statistical revolution, and R. A. Fisher played an important role in it. He was proud of his achievements as a geneticist, and about half of his output deals with genetics. We will leave Fisher, the geneticist, at this point and look at Fisher primarily in terms of his development of general statistical techniques and ideas. The germs of these ideas can be found in his early papers but were more fully developed as he worked at Rothamsted during the 1920s and early 1930s.
Although Fisher was ignored by the mathematical community during that time, he published papers and books that greatly influenced the scientists working in agriculture and biology. In 1925, he published the first edition of Statistical Methods for Research Workers. This book went through fourteen English-language editions and appeared in French, German, Italian, Japanese, Spanish, and Russian translations.
Statistical Methods for Research Workers was like no other mathematics book that had appeared before it. Usually, a book of mathematics has theorems and proofs of those theorems, and develops abstract ideas and generalizes them, relating them to other abstract ideas. If there are applications in such books, they occur only after the mathematics have been fully described and proven. Statistical Methods for Research Workers begins with a discussion of how to create a graph from numbers and how to interpret that graph. The first example, occurring on the third page, displays the weight of a baby each week for the first thirteen weeks of life. That baby was Fisher’s firstborn, his son George. The succeeding chapters describe how to analyze data, giving formulas, showing examples, interpreting the results of those examples, and moving on to other formulas. None of the formulas is derived mathematically. They all appear without justification or proof. They are often presented with detailed techniques of how to implement them on a mechanical calculator, but no proofs are displayed.
Despite, or perhaps because of, its lack of theoretical mathematics, the book was rapidly taken up by the scientific community. It met a serious need. It could be handed to a lab technician with minimum mathematical training, and that technician could use it. The scientists who used it took Fisher’s assertions as correct. The mathematicians who reviewed it looked askance at its audacious unproven statements, and many wondered how he had come to these conclusions.
During the Second World War, the Swedish mathematician Harald Cramér, isolated by the war from the international scientific community, spent days and weeks reviewing this book and Fisher’s published papers, filling in the missing steps of proofs and deriving proofs where none were indicated. In 1945, Cramér produced a book entitled Mathematical Methods of Statistics, giving formal proofs for much of what Fisher had written. Cramér had to choose among the many outpourings of this fertile genius, and a great deal that Fisher had written was not included in this book. Cramér’s book was used to teach a generation of new mathematicians and statisticians, and his redaction of Fisher became the standard paradigm. In the 1970s, L. J. Savage of Yale University went back to Fisher’s original papers and discovered how much Cramér had left out. He was amazed to see that Fisher had anticipated later work by others and had solved problems that were thought to be still unsolved in the 1970s.
But all of this was still in the future in 1919, when Fisher abandoned his unsuccessful career as a schoolmaster. He had just finished a monumental work, combining Galton’s correlation coefficient and the gene theory of Mendelian heredity. The paper had been rejected by the Royal Statistical Society and by Pearson at Biometrika. Fisher heard that the Royal Society of Edinburgh was looking for papers to publish in its Transactions, but that the authors were expected to pay for the publication costs. Thus, he paid to have his next great mathematical work published in an obscure journal.
At this point, Karl Pearson, still impressed by young Fisher, came through with an offer to take him on as chief statistician at the Galton Biometrical Laboratory. The correspondence between the two men was cordial, but it was obvious to Fisher that Pearson was strong-willed and dominating. His chief statistician would, at best, be engaged in detailed calculations that were dictated by Pearson.
Fisher had also been contacted by Sir John Russell, head of the Rothamsted Agricultural Experimental Station. The Rothamsted station had been set up by a British maker of fertilizer on an old farm that had once belonged to the original owners of the fertilizer firm. The clay soil was not particularly suited to growing much of anything, but the owners had discovered how to combine crushed stone with acid to produce what was known as Super-Phosphate. The profits from the production of Super-Phosphate were used to establish an experimental station where new artificial fertilizers might be developed. For ninety years, the station ran “experiments,” testing different combinations of mineral salts and different strains of wheat, rye, barley, and potatoes. This had created a huge storehouse of data, exact daily records of rainfall and temperature, weekly records of fertilizer dressings and measures of the soil, and annual records of harvests—all of it preserved in leather-bound notebooks. Most of these experiments had not produced consistent results, but the notebooks had been carefully stored away in the station’s archives.
Sir John looked at this vast collection of data and decided that maybe somebody might be hired to see what was in it, take a sort of statistical look at these records. He inquired around, and someone recommended Ronald Aylmer Fisher. He offered Fisher a year’s employment at a thousand pounds; he could not offer more and could not guarantee that the job would last past that one year.
Fisher accepted Russell’s offer. He took his wife, sister-in-law, and three children into the rural area north of London. They rented a farm next door to the experimental station, where his wife and sister-in-law tended a vegetable garden and kept house for him. He put on his boots and walked across the fields to Rothamsted Agricultural Experimental Station and its ninety years of data, to engage in what he was later to call “raking over the muck heap.”