DOES SMOKING CAUSE CANCER?
In 1958, R. A. Fisher published a paper entitled “Cigarettes, Cancer, and Statistics” in the Centennial Review, and two papers in Nature entitled “Lung Cancer and Cigarettes?” and “Cancer and Smoking.” He then pulled these together, along with an extensive preface, in a pamphlet entitled “Smoking: the Cancer Controversy. Some Attempts to Assess the Evidence.” In these papers, Fisher (who was often photographed smoking a pipe) insisted that the evidence purported to show that smoking caused lung cancer was badly flawed.
Nor was Fisher alone in his criticisms of the smoking/cancer studies at that time. Joseph Berkson, the head statistician at the Mayo Clinic and a leader among American biostatisticians, questioned the results. Jerzy Neyman had raised objections to the reasoning used in the studies that associated lung cancer and cigarette smoking. Fisher was the most strident in his criticism. As the evidence accumulated over the next few years, and both Berkson and Neyman appeared satisfied that the relationship was proved, Fisher remained adamant, actually accusing some of the leading researchers of doctoring their data. It became an embarrassment to many statisticians. At that time, the cigarette companies were denying the validity of the studies, pointing out that they were only “statistical correlations” and that there was no proof that cigarettes caused lung cancer. On the surface, it appeared that Fisher was agreeing with them. His arguments had the air of a polemic. Here, for instance, is a paragraph from one of his papers:
The need for such scrutiny [of the research that appeared to show the relationship] was brought home to me very forcibly about a year ago in an annotation published by the British Medical Association’s Journal, leading up to the almost shrill conclusion that it was necessary that every device of modern publicity should be employed to bring home to the world at large this terrible danger. When I read that, I wasn’t sure that I liked “all the devices of modern publicity,” and it seemed to me that a moral distinction ought to be drawn at this point … . [It] is not quite so much the work of a good citizen to plant fear in the minds of perhaps a hundred million smokers throughout the world—to plant it with the aid of all the means of modern publicity backed by public money—without knowing for certain that they have anything to be afraid of in the particular habit against which the propaganda is to be directed … .
Unfortunately, in his anger against the use of government propaganda to spread this fear, Fisher did not state his objections very clearly. It became the conventional wisdom that he was playing the role of a crotchety old man who did not want to relinquish his beloved pipe. In 1959, Jerome Cornfield joined with five leading cancer experts from the National Cancer Institute (NCI), the American Cancer Society, and the Sloan-Kettering Institute, to write a thirty-page paper that reviewed all the studies that had been published.
They examined Fisher’s, Berkson’s, and Neyman’s objections, along with objections raised by the Tobacco Institute (on behalf of the tobacco companies). They provided a carefully reasoned account of the controversy and showed how the evidence was overwhelmingly in favor of showing that “smoking is a causative factor in the rapidly increasing incidence of human epidermoid carcinoma of the lung.”
That settled the issue for the entire medical community. The Tobacco Institute continued to pay for full-page advertisements in popular magazines, which questioned the association as being only a statistical correlation, but no articles that questioned this finding appeared after 1960 in any reputable scientific journal. Within four years, Fisher was dead. He could not continue the argument, and no one else took it up.
Was it all a lot of nonsense put forward by an old man who wanted to smoke his pipe in peace, or was there something to Fisher’s objections? I have read Fisher’s smoking and cancer papers, and I have compared them to previous papers he had written on the nature of inductive reasoning and the relationship between statistical models and scientific conclusions. A consistent line of reasoning emerges. Fisher was dealing with a deep philosophical problem—a problem that the English philosopher Bertrand Russell had addressed in the early 1930s, a problem that gnaws at the heart of scientific thought, a problem that most people do not even recognize as a problem: What is meant by “cause and effect”? Answers to that question are far from simple.
Bertrand Russell may be remembered by many readers as a white-haired, grandfatherly looking but world-renowned philosopher, who lent his voice to the criticism of United States involvement in the war in Vietnam in the 1960s. By that time, Lord Russell had received both official and scholarly recognition
as one of the great minds of twentieth-century philosophy. His first major work, written with Alfred North Whitehead—who was many years his senior—dealt with the philosophical foundations of arithmetic and mathematics. Entitled Principia Mathematica, it tried to establish the basic ideas of mathematics, like numbers and addition, on simple axioms dealing with set theory.
One of the essential tools of the Russell-Whitehead work was symbolic logic, a method of inquiry that was one of the great new creations of the early twentieth century. The reader may recall having studied Aristotelian logic with examples like “All men are mortal. Socrates is a man. Therefore, Socrates is mortal.”
Although it has been studied for about 2,500 years, Aristotle’s codification of logic is a relatively useless tool. It belabors the obvious, sets up arbitrary rules as to what is logical and what is not, and fails to mimic the use of logic in mathematical reasoning, the one place where logic has been used to produce new knowledge. While students were dutifully memorizing categorizations of logic based on Socrates’s mortality and the blackness of raven feathers, the mathematicians were discovering new areas of thought, like calculus, with the use of logical methods that did not fit neatly into Aristotle’s categories.
This all changed with the development of set theory and symbolic logic in the final years of the nineteenth century and early years of the twentieth. In its earliest form, the one that Russell and Whitehead exploited, symbolic logic starts with atoms of thought known as “propositions.” Each proposition has a truth value called “T” or “F.” The propositions are combined and compared with symbols for “and,” for “or,” for “not,” and for “equals.” Because each of the atomic propositions has a truth value, any combination of
them has a truth value, which can be computed via a series of algebraic steps. On this simple foundation, Russell, Whitehead, and others were able to build combinations of symbols that described numbers and arithmetic and seemed to describe all types of reasoning.
All except one! There seemed to be no way to create a set of symbols that meant “A causes B.” The concept of cause and effect eluded the best efforts of the logicians to squeeze it into the rules of symbolic logic. Of course, we all know what “cause and effect” means. If I drop a glass tumbler on the bathroom floor, this act causes it to break. If the master restrains the dog whenever it goes in the wrong direction, this act causes the dog to learn to go in the right direction. If the farmer uses fertilizer on his crops, this act causes the crops to grow bigger. If a woman takes thalidomide during the first trimester of her pregnancy, this act causes her child to be born with attenuated limbs. If another woman suffers pelvic inflammation, it was because of the IUD she used. If there are very few women in senior management positions at the ABC firm, it was caused by prejudice on the part of the managers. If my cousin has a hair-trigger temper, this was caused by the fact that he was born under the sign of Leo.
As Bertrand Russell showed very effectively in the early 1930s, the common notion of cause and effect is an inconsistent one. Different examples of cause and effect cannot be reconciled to be based on the same steps of reasoning. There is, in fact, no such
thing as cause and effect. It is a popular chimera, a vague notion that will not withstand the batterings of pure reason. It contains an inconsistent set of contradictory ideas and is of little or no value in scientific discourse.
In place of cause and effect, Russell proposed the use of a well-defined concept from symbolic logic, called “material implication.” Using the primitive notions of atomic propositions and the connecting symbols for “and,” “or,” “not,” and “equals,” we can produce the concept that proposition A implies proposition B. This is equivalent to the proposition that not B implies not A. This begins to sound a little like the paradox that lies behind Bayes’s theorem (which we looked at in chapter 13). But there are very deep differences, which we will examine in a later chapter.
In the late nineteenth century, the German physician Robert Koch proposed a set of postulates needed to prove that a certain infective agent caused a specific disease. These postulates required:
1. Whenever the agent could be cultured, the disease was there.
2. Whenever the disease was not there, the agent could not be cultured.
3. When the agent was removed, the disease went away.
With some redundancy, Koch was stating the conditions for material implication. This may be adequate to determine that a particular species of bacteria caused the infectious disease. When it comes to something like smoking and cancer, however, Koch’s postulates are of little value. Let us consider how well the connection between lung cancer and cigarette smoking fits Koch’s postulates (and hence Russell’s material implication). The agent is a history of cigarette smoking. The disease is human epidermoid carcinoma of the lung. There are cigarette smokers who do not get
lung cancer. Koch’s first postulate is not met. There are some people who get lung cancer who claim that they have not been smokers. If we are to believe their claims, Koch’s second postulate is not met. If we restrict the type of cancer to small oat-cell carcinoma, the number of nonsmokers with this disease appears to be zero, so maybe the second postulate has been met. If we take the agent away, that is, if the patient stops smoking, the disease may still come, and Koch’s third postulate is not met.
If we apply Koch’s postulates (and with them Russell’s material implication), then the only diseases that meet them are acute conditions that have been caused by specific infective agents that can be cultured from the blood or other fluids of the body. This does not hold for heart disease, diabetes, asthma, arthritis, or cancer in other forms.
Let us return to the 1959 paper by Cornfield and five prominent cancer specialists. One by one, they describe all the studies that had been run on the subject. First there was a study by Richard Doll and A. Bradford Hill, published in the British Medical Journal in 1952. Doll and Hill had been alarmed by the rapid rise in the number of patients dying from lung cancer in the United Kingdom. They located several hundred cases of such patients and matched them to similar patients (same age, sex,
socioeconomic status) who had been admitted to the same hospitals at the same time, but who did not have lung cancer. There were almost ten times as many smokers among the lung cancer patients than there were among the others (called the “controls” in such a study). By the end of 1958, there were five other studies of this nature, using patients in Scandinavia, the United States, Canada, France, and Japan. All of them showed the same results: a much greater percentage of smokers among the cancer patients than among the controls.
These are called “retrospective studies.” They start with the disease and work backward to see what prior conditions are associated with the disease. They need controls (patients without the disease) to be sure that these prior conditions are associated with the disease and not with some more general characteristic of the patients. These controls can be criticized as not matching the disease cases. One prominent retrospective study was run in Canada on the effects of artificial sweeteners as a cause of bladder cancer. The study seemed to show an association between artificial sweeteners and bladder cancer, but a careful analysis of the data showed that the disease cases were almost all from low socioeconomic classes and the controls were almost all from upper socioeconomic classes. This meant that the disease cases and the controls were not comparable. In the early 1990s, Alvan Feinstein and Ralph Horvitz at the Yale Medical School proposed very rigid rules for running such studies to ensure that the cases and controls match. If we apply the Feinstein-Horvitz rules to these retrospective case-control cancer and smoking studies, all of them fail.
An alternative approach is the prospective one. In such a study, a group of individuals is identified in advance. Their smoking histories are carefully recorded, and they are followed to see what will become of them. By 1958, three independent prospective studies had been run. The first (reported by the same Hill and Doll who had done the first retrospective study) involved 50,000 medical doctors in the United Kingdom. Actually, in the Hill and Doll study,
the subjects were not followed over a long period of time. Instead, the 50,000 doctors were interviewed about their health habits, including their smoking habits, and followed for five years, as many of them began to come down with lung cancer. Now the evidence did more than suggest a relationship. They were able to divide the doctors into groups depending upon how much they smoked. The doctors who smoked more had greater probabilities of having lung cancer. This was a dose response, the key proof of an effect in pharmacology. In the United States, Hammond and Horn ran a prospective study (published in 1958) on 187,783 men, whom they followed for four months. They also found a dose response.
There are some problems with prospective studies, however. If the study is small, it may be dealing with a particular population. It may not make sense to extrapolate the results to a larger population. For instance, most of these early prospective studies were done with males. At that time, the incidence of lung cancer in females was too low to allow for analysis. A second problem with prospective studies is that it may take a long time for enough of the events (lung cancer) to occur to allow for sensible analysis. Both these problems are dealt with by following a large number of people. The large number gives credence to the suggestion that the results hold for a large population. If the probability of the event is small in a short period of time, following a large number of people for a short period of time will still produce enough events to allow for analysis.
The second Doll and Hill study used medical doctors because it was believed that their recollection of smoking habits could be relied upon and because their belonging to the medical profession made it virtually certain that all the lung cancers that occurred in the group would be recorded. Can we extrapolate the results from educated, professional doctors to what would happen to a dockhand with less than a high school education? Hammond and Horn used almost 200,000 men in hopes that their sample would be more representative—at the risk of getting less than accurate
information. At this point, the reader may recall the objection to Karl Pearson’s samples of data because they were opportunity samples. Weren’t these also opportunity samples?
To answer this objection, in 1958, H. F. Dorn studied the death certificates from three major cities and followed up with interviews of the surviving families. This was a study of all deaths, so it could not be considered an opportunity sample. Again, the relationship between smoking and lung cancer was overwhelming. However, the argument could be made that the interviews with surviving family members were flawed. By the time this study was run, the relationship between lung cancer and smoking was widely known. It was possible that surviving relatives of patients who had died of lung cancer would be more likely to remember that the patient had been a smoker than would relatives of patients who had died from other diseases.
Thus it is with most epidemiological studies. Each study is flawed in some way. For each study, a critic can dream up possibilities that might lead to bias in the conclusions. Cornfield and his coauthors assembled thirty epidemiological studies run before 1958 in different countries and concentrating on different populations. As they point out, it is the overwhelming consistency across these many studies, studies of all kinds, that lends credence to the final conclusion. One by one, they discuss each of the objections. They consider Berkson’s objections and show how one study or another can be used to address them. Neyman suggested that the initial retrospective studies could be biased if the patients who smoked lived-longer than the nonsmokers and if lung cancer was a disease of old age. Cornfield et al. produced data about the patients in the studies to show that this was not a sensible description of those patients.
They addressed in two ways the question of whether the opportunity samples were nonrepresentative. They showed the range of patient populations involved, increasing the likelihood that the conclusions held across populations. They also pointed out that, if
the cause and effect relationship holds as a result of fundamental biology, then the patients’ different socioeconomic and racial backgrounds would be irrelevant. They reviewed toxicology studies, which showed carcinogenic effects of tobacco smoke on lab animals and tissue cultures.
This paper by Cornfield et al. is a classic example of how cause is proved in epidemiological studies. Although each study is flawed, the evidence keeps mounting, as one study after another reinforces the same conclusions.
A contrast to this can be seen in the attempts to indict Agent Orange as a cause for health problems Vietnam War veterans have suffered in later life. The putative agents of cause are contaminants in the herbicide that was used. Almost all studies have dealt with the same small number of men exposed in different ways to the herbicide. Studies in other populations did not support these findings. In the 1970s, an accident in a chemical factory in northern Italy resulted in a large number of people being exposed to much higher levels of the contaminant, with no long-term effects. Studies of workers on New Zealand turf farms who were exposed to the herbicide suggested an increase in a specific type of birth defect, but the workers were mostly Maoris, who have a genetically related tendency toward that particular birth defect.
Another difference between the smoking and the Agent Orange studies is that the putative consequences of smoking are highly specific (epidermoid carcinoma of the lung). The events that were supposedly caused by Agent Orange exposure consisted of a wide range of neurological and reproductive problems. This runs contrary to the usual finding in toxicology that specific agents cause specific types of lesions. For the Agent Orange studies, there is no indication of a dose response, but there are insufficient data
to determine the different doses to which individuals have been exposed. The result is a muddied picture, where objections like those of Berkson, Neyman, and Fisher have to go unaddressed.
With the analysis of epidemiological studies, we have moved a long way from the highly specific exactitude of Bertrand Russell and material implication. Cause and effect are now imputed from many flawed investigations of human populations. The relationships are statistical, where changes in the parameters of distributions appear to be related to specific causes. Reasonable observers are expected to integrate a large number of flawed studies and see the underlying common threads.
What if the studies have been selected? What if all that is available to the observer is a carefully selected subset of the studies that were actually run? What if, for every positive study that is published, a negative study was suppressed? After all, not every study gets published. Some never get written up because the investigators are unable or unwilling to complete the work. Some are rejected by journal editors because they do not meet the standards of that journal. All too often, especially when there is some controversy associated with the subject, editors are tempted to publish that which is acceptable to the scientific community and reject that which is not acceptable.
This was one of Fisher’s accusations. He claimed that the initial work by Hill and Doll had been censored. He tried for years to have the authors release detailed data to back up their conclusions. They had only published summaries, but Fisher suggested that these summaries had hidden inconsistencies that were actually in the data. He pointed out that, in the first Hill and Doll study, the authors had asked if the patients who smoked inhaled when they smoked. When the data are organized in terms of “inhalers” and “noninhalers,” the noninhalers are the ones with an excess of lung
cancer. The inhalers appear to have less lung cancer. Hill and Doll claimed that this was probably due to a failure on the part of the respondents to understand the question. Fisher scoffed at this and asked why they didn’t publicize the real conclusions of their study: that smoking was bad for you, but if you had to smoke, it was better to inhale than not to inhale.
To Fisher’s disgust, Hill and Doll left that question out of their investigation when they ran their prospective study on medical doctors. What else was being carefully selected? Fisher wanted to know. He was appalled that the power and money of government was going to be used to throw fear into the populace. He considered this no different from the use of propaganda by the Nazis to drive public opinion.
Fisher had also been influenced by Bertrand Russell’s discussion of cause and effect. He recognized that material implication was inadequate to describe most scientific conclusions. He wrote extensively on the nature of inductive reasoning and proposed that it was possible to conclude something in general about life on the basis of specific investigations, provided that the principles of good experimental design were followed. He showed that the method of experimentation, where treatments were randomly assigned to subjects, provided a logically and mathematically solid basis for inductive inference.
The epidemiologists were using the tools Fisher developed for the analysis of designed experiments, such as his methods of estimation and tests of significance. They were applying these tools to opportunity samples, where the assignment of treatment was not from some random mechanism external to the study but an intricate part of the study itself. Suppose, he mused, that there was something genetic that caused some people to be smokers and others not to smoke. Suppose, further, that this same genetic
disposition involved the occurrence of lung cancer. It was well known that many cancers have a familial component. Suppose, he said, this relationship between smoking and lung cancer was because each was due to the same event, the same genetic disposition. To prove his case, he assembled data on identical twins and showed that there was a strong familial tendency for both twins to be either smokers or nonsmokers. He challenged the others to show that lung cancer was not similarly genetically influenced.
On one side there was R. A. Fisher, the irascible genius who put the whole theory of statistical distributions on a firm mathematical setting, fighting one final battle. On the other side was Jerry Cornfield, the man whose only formal education was a bachelor’s degree in history, who had learned his statistics on his own, who was too busy creating important new statistics to pursue a higher degree. You cannot prove anything without a randomized experimental design, said Fisher. Some things do not lend themselves to such designs, but the accumulation of evidence should prove the case, said Cornfield. Both men are now deceased, but their intellectual descendants are still with us. These arguments resound in the courts, where attempts are made to prove discrimination on the basis of outcomes. They play a role in attempts to identify the harmful results of human activity on the biosphere. They are there whenever great issues of life and death arise in medicine. Cause and effect are not so simple to prove, after all.