A Statistically Significant Difference in Understanding the Scientific Process

Diane F. Halpern

Trustee Professor of Psychology and Roberts Fellow, Claremont McKenna College

Statistically significant difference—it’s a simple phrase that is essential to science and has become common parlance among educated adults. These three words convey a basic understanding of the scientific process, random events, and the laws of probability. The term appears almost everywhere that research is discussed—in newspaper articles, advertisements for “miracle” diets, research publications, and student laboratory reports, to name just a few of the many diverse contexts. It is a shorthand abstraction for a sequence of events that includes an experiment (or other research design), the specification of a null and alternative hypothesis, (numerical) data collection, statistical analysis, and the probability of an unlikely outcome. That’s a lot of science conveyed in a few words.

It would be difficult to understand the outcome of any research without at least a rudimentary understanding of what is meant by the conclusion that the researchers found or did not find evidence of a “statistically significant difference.” Unfortunately, the old saying that “a little knowledge is a dangerous thing” applies to the partial understanding of this term. One problem is that “significant” has a different meaning when used in everyday speech than when used to report research findings.

Most of the time, the word means that something important happened. For example, if a physician told you that you would feel significantly better following surgery, you would correctly infer that your pain would be reduced by a meaningful amount—you would feel less pain. But, when used in “statistically significant difference,” “significant” means that the results are unlikely to be due to chance (if the null hypothesis were true); the results themselves may or may not be important. Moreover, sometimes the conclusion will be wrong, because the researchers can assert their conclusion only at some level of probability. “Statistically significant difference” is a core concept in research and statistics, but, as anyone who was taught undergraduate statistics or research methods can tell you, it is not an intuitive idea.

Although “statistically significant difference” communicates a cluster of ideas essential to the scientific process, many pundits would like to see it removed from our vocabulary, because it is frequently misunderstood. Its use underscores the marriage of science and probability theory, and despite its popularity, or perhaps because of it, some experts have called for a divorce, because the term implies something that it should not, and the public is often misled. In fact, experts are often misled as well. Consider this hypothetical example: In a well-done study that compares the effectiveness of two drugs relative to a placebo, it is possible that Drug X is statistically significantly different from a placebo and Drug Y is not, yet Drugs X and Y might not be statistically significantly different from each other. This could result when Drug X is statistically different from placebo at a probability level of p < .04 but Drug Y is statistically significantly different from a placebo only at a probability level of p < .06, which is higher than most a priori levels used to test for statistical significance. If reading about this makes your head hurt, you are among the masses who believe they understand this critical shorthand phrase which is at the heart of the scientific method but who actually may have only a shallow level of understanding.

A better understanding of the pitfalls associated with this term would go a long way toward improving our cognitive toolkits. If common knowledge of what this term means included the ideas that (a) the findings may not be important, and (b) conclusions based on finding or failure to find statistically significant differences may be wrong, then we would have substantially advanced our general knowledge. When people read or use the term “statistically significant difference,” it is an affirmation of the scientific process, which, for all its limitations and misunderstandings, is a substantial advance over alternative ways of knowing about the world. If we could just add two more key concepts to the meaning of that phrase, we could improve how the general public thinks about science.