CODING, COUNTING, CORRELATION, AND CAUSALITY
I have been speaking prose all my life, and didn’t even know it.
—Monsieur Jourdain, in Molière’s The Bourgeois Gentleman
Like Molière’s bourgeois gentleman, who was delighted to discover he had been speaking prose all his life, you may be surprised and pleased to discover that you’ve been making statistical inferences all your life. The goal of the next two chapters is to help you to make better statistical inferences and more of them.
Regardless of whether you think you know how to do statistics, you need to read these chapters.
This is true if either of the following is the case.
a) You don’t know much statistics. If that’s true, these chapters are the most painless way you’ll ever find to gain sufficient knowledge to be able to use statistics in everyday life. And you simply can’t live an optimal life in today’s world without basic knowledge of statistics.
You may feel that statistics is too boring or difficult for you to trudge through. My sympathies. When I was in college, I was desperate to become a psychologist, and that was going to be impossible unless I took a statistics course. But I had little math background and I was scared witless for the first few weeks of what I thought was a course in mathematics. But eventually I realized that the math in basic inferential statistics doesn’t go much beyond knowledge of how to extract a square root. (These days the knowledge that’s required for that is to be aware of the location on your calculator of the square root button.) Some theorists believe statistics isn’t a branch of mathematics at all but rather a set of empirical generalizations about the world.
To relax you even more, I can tell you that all the statistical principles explained here—and they’re the ones that are most valuable for everyday life—are commonsensical. Or at least on a little reflection they satisfy common sense. You already know how to apply most of the principles in at least some circumstances, so many if not most of the shocks you’ll get in these chapters will be shocks of recognition.
b) You know a fair amount about statistics, or even a lot. If you quickly peruse the statistical terms in the next two chapters, you may feel that you have little to learn from them. I assure you that is not the case. Statistics is normally taught in order to prevent if at all possible its use in any domain except IQ tests and agricultural yields. But statistical competence will escape to an unlimited number of everyday life domains if you learn how to frame events in such a way that statistical principles are immediately relevant.
Psychology graduate students at most universities take two or more statistics courses during their first two years. Darrin Lehman, Richard Lempert, and I tested students on their ability to apply statistical principles to everyday life problems, and on their ability to critique scientific claims, at the beginning of their graduate careers and again two years later.1 Some students gain hugely in their ability to apply these principles to everyday life and some gain little.
The students who gain ability to apply statistics to everyday life events tend to be those in the so-called soft areas of psychology—social psychology, developmental psychology, and personality psychology. The low gainers are those in the hard areas of psychology—biopsychology, cognitive science, and neuroscience.
Since they’ve all taken the same statistics courses, why do the soft-area students learn more than the hard-area students? It’s because the soft-area students are constantly applying the statistics they’ve learned to everyday life kinds of events. Which behaviors of mothers are most associated with social confidence in infants? How do we code and measure mothers’ behaviors and how do we assess and measure social confidence? Do people change their evaluations of objects simply by virtue of being given the objects? How do we measure their evaluation of objects? How much more talking in small groups is done by extroverts compared to introverts? How should we code amount of talking: Percent of time each person talks? Number of words? Should we count interruptions separately?
In short, the soft-area students learn to do two things that this chapter will help you to do: (1) frame everyday life events in such a way that the relevance of statistical principles is obvious and you can make contact with them, and (2) code the events in such a way that approximate versions of statistical rules can be applied to them. The next two chapters do that with anecdotes and realistic problems that can crop up in everyday life. The chapters are intended to help you build statistical heuristics—rules of thumb that will suggest correct answers for an indefinitely large number of everyday life events. These heuristics will shrink the range of events to which you will apply only intuitive heuristics, such as the representativeness and availability heuristics. Such heuristics invade the space of events for which only statistical heuristics are appropriate.
Two years of thinking about rats or brains or memory for nonsense syllables produces little improvement in ability to apply statistical principles to everyday life events. Students in the hard areas of psychology may learn scarcely more than students in chemistry and law. I found that students in those fields gain literally nothing over two years in the way of ability to apply statistics to the everyday world.
I also studied medical students, expecting that they would gain very little in ability to think statistically about everyday life problems. I was wrong. The students improved a fair amount. I attended the University of Michigan’s medical school for a few days to find out what might account for the improvement. To my surprise, the medical school does require some training in statistics, in the form of a pamphlet that is handed out early on. Probably much more important than the rather minimal formal training in statistics, students learn about medical conditions and human behavior in potentially quantifiable ways and reason about them in explicitly statistical terms. “The patient has symptoms A, B, and C and does not have D and E. What is the likelihood that the patient has Disease Y? Disease Z? Disease Z, you say? You’re probably wrong about that. Disease Z is quite rare. If you hear hoofbeats, think horses, not zebras. What tests would you want to order? Tests Q and R, you say? You’re wrong. Those tests are not very statistically reliable; moreover they’re quite expensive. You might order test M or N, which are cheap and statistically reliable, but neither is a very valid predictor of either disease Y or disease Z.”
Once you have the knack of framing real-world problems as statistical ones and coding their elements in such a way that statistical heuristics can be applied, those principles seem to pop up magically to help you solve a given problem—often without your conscious awareness that you’re applying a rough-and-ready version of a statistical principle.
I’ll introduce in ordinary language some basic statistical principles that have been around for one hundred years or more. Scientists in many fields use these concepts to determine how confident they can be that they’ve characterized an object in the right way, to estimate the strength of relationships between events of various kinds, and to try to determine whether those relationships are causal. As we’ll see, they can also be used to illuminate everyday problems and help us make better decisions at work and at home.