Statistics can be helpful, and sometimes essential, in order to characterize something accurately. Statistics are equally valuable for determining whether there’s a relationship between one thing and another. As you might guess, being sure about whether a relationship exists or not can be even more problematic than characterizing a given thing accurately.
You have to characterize things of type 1 correctly as well as things of type 2. Then you have to count how frequently type 1 things occur with type 2 things, how frequently type 1 things do not occur with type 2 things, and so on. If the variables are continuous, the job gets still harder. We have to figure out whether greater values for type 1 things are associated with greater values for type 2 things. When stated in this abstract way, it seems pretty clear that we’re going to have big problems in estimating the degree of association between variables. And in fact, our problems with detecting covariation (or correlation) are really very severe. And the consequences of being off base with our estimate can be very serious.
Correlation
Have a look at Table 3 below. Is symptom X associated with disease A? Put another way, is symptom X diagnostic of disease A?
The way to read Table 3 is to note that twenty of the people who have disease A have symptom X and eighty of the people who have disease A don’t have symptom X; ten of the people who don’t have the disease have the symptom and forty of them don’t have it. On the face of it, this would seem to be the simplest covariation detection task you could present to people. The data are dichotomous (either/or). You don’t have to collect information, or code the data points and assign numerical values to them, or remember anything about the data. You don’t have any prior beliefs that might make you predisposed to see one pattern versus another; and the data are set up for you in summary form. How do people perform on this very basic covariation detection task?
Pretty badly, actually.
A particularly common failing is to rely exclusively on the “Present/Yes” cell of the table. “Yes, the symptom is associated with the disease. Some of the people with symptom X have the disease.” This tendency is an example of the confirmation bias—a tendency to look for evidence that would confirm a hypothesis and failing to look for evidence that might disconfirm the hypothesis.
Other people who look at the table pay attention only to two cells. Some of these conclude that the symptom is associated with the disease “because more people who have the disease have the symptom than do people who do not have the disease.” Others conclude that the symptom is not associated with the disease “because more people with the disease don’t have the symptom than do have it.”
Without having been exposed to some statistics, very few people understand that you have to pay attention to all four cells in order to be able to answer the simple question about association.
You have to compute the ratio comparing the number of people who have the disease and also have the symptom with the number of people who have the disease and don’t have the symptom. You then compute the ratio comparing the number of people who don’t have the disease but do have the symptom with the number of people who don’t have the disease and don’t have the symptom. Since the two ratios are the same, you know that the symptom is not associated with the disease any more than it is associated with not having the disease.
You might be alarmed to know that most people, including doctors and nurses whose daily lives are concerned with treatment of disease, usually fail to get the right answer when examining tables like Table 3.1 For example, you can show them a table indicating how many people with a disease got better with a particular treatment and how many didn’t and how many people with the disease who didn’t get the treatment got better and how many didn’t. Doctors will sometimes assume that a particular treatment helps people because more people with the treatment got better than didn’t. Without knowing the ratio of the untreated who got better to the untreated who didn’t, no conclusion whatsoever is possible. Tables like these, incidentally, are sometimes called the 2 × 2 table and sometimes called the fourfold table.
There’s a neat little statistic called chi square that examines the probability that the two proportions differ enough for us to be confident that there is a genuine relationship. We say that the relationship is real if the difference between the two proportions is statistically significant.
A typical criterion for saying that an association is significant or not is whether the test (chi square or any other statistical test) shows that the degree of association could happen by chance only five in one hundred times. If so, we say it’s significant at the .05 level. Significance tests can be applied not only to dichotomous (either/or) but also to continuous data.
When the variables are continuous and we want to know how closely they’re associated with one another, we apply the statistical technique of correlation. Two variables that are obviously correlated are height and weight. Not perfectly correlated of course, because we can think of many examples of short people who are relatively heavy and tall people who are relatively light.
A variety of different statistical procedures can tell us just how close the association between two variables is. A frequently used technique for examining the degree of association of continuous variables is one called the Pearson product moment correlation. A zero correlation means there is no association at all between two variables. A correlation of +1 means there is a perfect positive association between two variables: as values on variable 1 go up, values on variable 2 go up to an exactly corresponding degree. A correlation of −1 means there is a perfect negative association.
Figure 3. Scatterplots and correlations.
Figure 3 shows visually, on so-called scatterplots, how strong a correlation of a given magnitude is. The individual graphs are called scatterplots because they show the degree of scatter away from a straight-line, perfect relationship.
A correlation of .3 is barely detectable visually, but it can be very important practically. A correlation of .3 corresponds to the predictability of income from IQ,2 and of graduate school performance from college grades.3 The same degree of predictability holds for the extent to which incipient cardiovascular illness is predicted by the degree to which an individual is underweight, average, or overweight.
A correlation of .3 is no joke: it means that if someone is at the 84th percentile (one SD above the mean) on variable A, the person would be expected to be at the 63rd percentile (.3 SD above the mean) on variable B. That’s a lot better predictability for variable B than you have when you don’t know anything about variable A. In that case you have to guess the 50th percentile for everybody—the mean of the distribution of variable B. That could easily be the difference between having your business thrive or go belly-up.
A correlation of .5 corresponds to the degree of association between IQ and performance on the average job. (The correlation is higher for demanding jobs and lower for jobs that are not very demanding.)
A correlation of .7 corresponds to the association between height and weight—substantial but still not perfect. A correlation of .8 corresponds to the degree of association you find between scores on the math portion of the Scholastic Aptitude Test (SAT) at one testing and scores on that test a year later—quite high but still plenty of room for difference between the two scores on average.
Correlation Does Not Establish Causality
Correlation coefficients are one step in assessing causal relations. If there is no correlation between variable A and variable B, there (probably) is no causal relation between A and B. (An exception would be when there is a third variable C that masks the correlation between A and B when there is in fact a causal relation between A and B.) If there is a correlation between variable A and variable B, this doesn’t establish that variation in A causes variation in B. It might be that A causes B or B causes A, and the association could also be due to the fact that both A and B are associated with some third variable C, and there is no causal connection between A and B at all.
Pretty much everyone with a high school education recognizes the truth of these assertions—in the abstract. But often a given correlation is so consistent with plausible ideas about causation that we tacitly accept that the correlation establishes that there is a causal relation. We are so good at generating causal hypotheses that we do so almost automatically. Causal inferences are often irresistible. If I tell you that people who eat more chocolate have more acne, it’s hard to resist the assumption that something about chocolate causes acne. (It doesn’t, so far as is known.) If I tell you that couples who make elaborate wedding preparations have longer-lasting marriages, it’s natural to start wondering just what it is about elaborate weddings that makes for longer marriages. In fact, a recent article in a distinguished newspaper reported on the correlation and then went on to speculate about why serious work on planning weddings would make the marriage last longer. But if you think about the correlation long enough, you’ll realize that elaborate wedding preparations aren’t a random event; rather, they’re obviously going to be more likely for people with more friends, more time to be together, more money, and goodness knows what else. Any of those things, or more likely all of them, could be operating to make marriages more lasting. To pull one fact out of that tangled web and start to speculate on its causal role makes little sense.
Consider the associations in Box 1, all of which are real. You’ll see that for some the implied causal link seems highly plausible and for others the implied link is highly implausible. Whether you think the implied causal link is plausible or not, see whether you can come up with explanations of the following types: (1) A causes B, (2) B causes A, or (3)something correlated with both A and B is causal and there is no causal link at all between A and B. See some possible answers in Box 2.
Box 1. Thinking About Correlations: What Causal Relationships Could Be Going On?
1. Time magazine reported that attempts by parents to control the portions their children eat will cause the children to become overweight. If the parents of overweight children stop trying to control their portions, will the children get thinner?
2. Countries with higher average IQs have higher average wealth measured as per capita gross domestic product (GDP). Does being smarter make a country richer?
3. People who attend church have lower mortality rates than those who don’t.4 Does this mean that belief in God makes people live longer?
4. People who have dogs are less likely to be depressed. If you give a dog to a depressed person, will the person get happier?
5. States with abstinence-only sex education have higher homicide rates. Does abstinence-only sex education cause aggression? If you give more informative sex education to students in those states, will the homicide rate go down?
6. Intelligent men have better sperm—more sperm and more mobile sperm.5 Does this indicate that attending college, which makes people smarter, also improves sperm quality?
7. People who smoke marijuana are subsequently more likely to use cocaine than people who don’t smoke marijuana. Does marijuana use cause cocaine use?
8. Ice cream consumption and polio were almost perfectly correlated in the 1950s, when polio was a serious threat. Would it have been a good public health move to outlaw ice cream?
Box 2. Possible Answers to Questions About Correlations in Box 1
1. It could be that parents try to control the portions that children eat if they’re overweight. If so, the direction of causation runs opposite from Time magazine’s hypothesis. You don’t make the child obese by trying to control portions; you try to control portions if the child is obese. It could also be the case that less happy, more stressful families have more controlling parents and their children are more likely to be overweight, but there’s no causal connection between food policing behavior on the part of the parent and weight of the child.
2. It could be that richer countries have better education systems and hence produce people who get higher IQ scores. In that case, it’s wealth that causes intelligence rather than the other way around. It’s also possible that some third factor, such as physical health, influences both variables. (All three of these causal relations are real, incidentally.)
3. It could be that healthier people engage in more social activities of all kinds, including going to church. If so, the direction of causation runs opposite to the one implied: One reason people go to church is that they’re healthy, but going to church doesn’t make them any healthier. Or it could be that an interest in social activities such as going to church causes people both to participate in more social activities and to be healthier.
4. It could be that people who are depressed are less likely to do anything fun, such as buying a pet. If so, the direction of causation is opposite to the one implied: depression makes you less likely to get a pet. (But in fact, giving a pet to a depressed person does improve the person’s mood, so pets can indeed be good for your mental health; it’s just that the correlation between the two doesn’t prove that.)
5. It could be that states that are poorer are more likely to have higher homicide rates and states that are poorer are more likely to have abstinence-only sex education. Indeed, both are true. So there may be no causal connection at all between sex education and homicide. Rather, poverty or low educational levels or something associated with them may be causally linked to both.
6. It could be that greater physical health helps people to be smarter and helps sperm to be of better quality. Or some other factor could be associated with both intelligence and sperm quality, such as drug or alcohol use. So there might be no causal connection between intelligence and sperm quality.
7. It could be that people who take any kind of drug are more sensation seeking than other people and therefore engage in many kinds of stimulating behavior that are against the law. Smoking marijuana may not cause cocaine use, and cocaine use may not cause marijuana use. Rather, the third factor of sensation seeking may influence both.
8. Ice cream consumption and polio were correlated highly in the 1950s because polio is easily communicated in swimming pools. And both ice cream and swimming become more common as the weather gets warmer.
Illusory Correlation
I can’t stress enough how important it is to actually collect data in a systematic way and then carry out calculations in order to determine how strong the association is between two variables. Just living in the world and noticing things can leave you with a hopelessly wrong view about the association between two events. Illusory correlation is a real risk.
If you think it’s plausible that there’s a positive relation between two variables (the more A, the more B), your casual observations are likely to convince you that you’re right. This is often the case not only when there is in fact no positive correlation between the variables but even when there is actually a negative correlation. Noticing and remembering the cases that support your hypothesis more than ones that don’t is another aspect of confirmation bias.
Conversely, if a link is implausible, you’re not likely to see it even if the link is fairly strong. Psychologists have placed pigeons in an apparatus with a food pellet dispenser and a disk on the floor that can be lit up. The pellet dispenser will deliver a pellet if the disk is lit and the pigeon does not peck at it. If the pigeon does peck at it, there will be no food pellet. A pigeon will starve to death before it discovers that not pecking at a lighted disk will result in its getting food. Pigeons haven’t made it this far by finding it plausible that not pecking something is likely to result in getting food.
People can find it as hard as pigeons to overcome presuppositions.
Experimenters have presented clinical psychologists with a series of Rorschach inkblot responses allegedly made by different patients, with the patients’ symptoms printed along with the patients’ alleged responses.6 One card might show a patient who (a) saw genitals in the inkblot and (b) had problems with sexual adjustment. After perusing the set, psychologists are quite likely to report that patients who see genitals are likely to have problems with sexual adjustment, even when the data are rigged to indicate that such patients are less likely to have problems with sexual adjustment. It’s just too plausible that sexual adjustment problems might be associated with being hypervigilant about genitals, and the positive instances stand out.
When you tell the psychologists that they’re mistaken and that the series shows a negative association between seeing genitals and having problems with sexual adjustment—that patients who see genitals are actually less likely to have problems with sexual adjustment—the psychologists may scoff and tell you that in their clinical experience it is the case that people with sexual adjustment problems are particularly likely to see genitals in Rorschach blots. No, it isn’t. When you actually collect the data you find no such association.
In fact, virtually no response to any Rorschach card tells you anything at all about a person.7 Hundreds of thousands of hours and scores of millions of dollars were spent using the test before anyone bothered to see whether there was any actual association between responses and symptoms. And then for decades after the lack of association was established, the illusion of correlation kept the test in circulation, and more time and money were wasted.
I don’t mean to pick on psychologists and psychiatrists with these examples. Undergraduates make exactly the same errors that clinicians do in experiments on illusory correlation using the Rorschach, reporting that seeing genitals goes with sexual problems, seeing funny-looking eyes goes with paranoia, seeing a weapon goes with hostility.
These findings can be summarized by saying that if a person (or other organism) is prepared to see a given relationship, that relationship is likely to be seen even when it’s not in the data.8 If you’re counterprepared to see a given relationship, you’re likely to fail to see it even when it’s there. Cats will learn to pull a string to get out of a box; they will not learn that licking themselves will get them out of a box. Dogs can readily learn to go to the right to get food rather than to the left if a speaker sounds a tone on the right; only with great difficulty will dogs learn which direction to go when a higher-pitched tone indicates food is on the right and a lower tone indicates food is on the left. It just seems more likely that spatial cues are related to spatial events than it is that pitch cues are related to spatial events.
Our old friend the representativeness heuristic generates infinite numbers of prepared relationships. Genitals are representative of anything having to do with sex. Eyes are representative of suspicion. Weapons are representative of hostility. The availability heuristic also does a good job of creating prepared relationships. Films and cartoons show people with funny-looking eyes (squinting, rolling, etc.) when they’re suspicious.
What if a person is neither prepared nor counterprepared to see a relationship?
What would happen if, for example, a person listened to a bunch of people say the first letter of their names and then sing a musical note—and was then asked whether there was a relationship between the position of the letter in the alphabet and the duration of the musical note?
How high does the correlation between such arbitrarily paired events have to be before people can reliably detect it?
The answer is that the correlation has to be about .6—slightly higher than the .5 correlation shown in Figure 3.9 And that’s when the data come to the person all at once and the person is doing his level best to see what the relationship is. As a practical matter, this finding means that you can’t rely on your belief that there’s a correlation between two variables unless the association is quite strong—higher than many of the correlations on which we base the choices in our daily lives. You’ve got to be systematic to get it right: observe, record, and calculate or you’re just blowing smoke.
An Exception
There’s one important exception to the rule that covariation is very difficult to detect accurately. When two events—even arbitrary ones—occur close together in time, the covariation will usually be noticed. If you switch on a light just before delivering an electric shock to a rat, the rat will quickly learn the association between the light and the shock. But even for this sort of highly dramatic pairing of events, there is a sharp decline in ability to learn as a function of the time interval between two events. Animals—and humans—don’t learn associations between arbitrarily paired events if you go much beyond a couple of minutes.
Reliability and Validity
Many years ago, a friend of mine and his wife had been trying to have a baby. After several years without success, they finally went to a fertility specialist. The news was not good. My friend’s sperm count was “too low to result in impregnation by normal means.” My friend asked the physician how reliable the test was. “Oh, it’s very reliable,” said the physician. What he meant was: the test doesn’t make mistakes—it gives you the true score. He was using the term “reliable” in its lay sense of accuracy.
Reliability is the degree to which measurement of a particular variable gives the same value across occasions, or the degree to which one type of measure of a variable gives the same value as another type of measure of that variable.
Measures of height have a reliability (correlation across occasions) of virtually 1. IQ measured across occasions separated by a couple of weeks is around .9. IQ as measured by two different tests typically indicates reliability of more than .8. Two different dentists will agree about extent of decay in a tooth with a reliability of less than .8.10 This means that not all that infrequently your tooth gets filled by Dentist Smith whereas Dentist Jones would have let it be. For that matter, any given dentist’s judgments don’t correlate perfectly with her own judgments on different occasions. Dr. Jones may fill a tooth on Friday that she would have left undrilled on Tuesday.
How about the reliability of sperm counts? Reliability for any given type of test for sperm count is low,11 and reliability as indicated by the degree to which you get the same result with different measures is also low. Different ways of measuring sperm count at the same time can come up with quite different results.12
Validity is typically also measured by correlations. The validity of a measure is the degree to which it measures what it’s supposed to measure. IQ tests have substantial validity—around .5—as measured by the degree to which IQ scores correlate with GPA in grade school. (In fact, it was the desirability of predicting school performance that motivated the early twentieth-century French psychologist Alfred Binet to create the first IQ test.)
Please note the extremely important principle that there can be no validity without reliability. If a given person’s judgment about a variable is utterly inconsistent (for example, a correlation of zero between the person’s judgments about the level of variable A on one occasion and the level of variable A on another occasion), that person’s judgments can have no validity, that is, they can’t predict the level of another variable B with any accuracy at all.
If test X and test Y that are supposed to measure a given variable don’t agree beyond a chance level, then at most only one of those tests can have any validity. Conversely, there can be very high reliability with no validity whatsoever. Two people can agree perfectly on the degree of extroversion characteristic of each of their friends, and yet neither observer may be able to accurately predict the degree of extroversion exhibited by his friends in any given situation (as judged by objective measures of extroversion such as talkativeness or assessments by psychological experts).
Handwriting analysts claim to be able to measure honesty, hardworkingness, ambition, optimism, and a host of other attributes. To be sure, any two handwriting analysts may agree with each other quite well (high reliability), but they’re not going to be able to predict any actual behavior related to personality (no validity). (Handwriting analysis can be quite useful for some purposes, though; for example, for medical diagnosis of a number of central nervous system maladies.)
Coding Is the Key to Thinking Statistically
I’m going to ask you some questions concerning your beliefs about what you think the correlation is between a number of pairs of variables. The way I’ll do that is to ask you how likely it is that A would be greater than B on one occasion given that A was greater than B on another occasion. Your answers in probability terms can be converted to correlation coefficients by a mathematical formula.
Note that if you say “50 percent” for a question below, you’re saying that you think there’s no relationship between behavior on one occasion and behavior on another. If you say “90 percent,” you’re saying that there is an extremely strong relationship between behavior on one occasion and behavior on another. For the first question below about spelling ability, if you think that there is no consistency between spelling performance on one occasion and spelling performance on another occasion, you would say “50 percent.” If you think that there is an extremely strong relationship between spelling performance on one occasion and spelling performance on another spelling test, you might say “90 percent.” Commit yourself: write down your answer for each of the questions below or at least say your answer out loud.
1. If Carlos gets a higher grade on a spelling test than Craig at the end of the first month of fourth grade, what is the likelihood that Carlos will get a higher grade on a spelling test at the end of the third month?
2. If Julia scores more points than Jennifer in the first twenty games of the basketball season, what is the likelihood that she will score more points in the second twenty games?
3. If Bill seems friendlier than Bob on the first occasion you encounter him, what is the likelihood that he will seem friendlier on the second occasion?
4. If Barb behaves more honestly than Beth in the first twenty situations in which you observe them (paying a fair share of the bill, cheating or not while playing a board game, telling the truth about a grade in a class, etc.), what is the likelihood that Barb will behave more honestly than Beth in the second twenty situations in which you observe them?
Table 4 presents the correlations corresponding to percentage estimates of the kind you just made.
It so happens that I know the answers to these questions based on studies that have been conducted.13 I know the correlation between performance on one spelling test and performance on another and between the average of twenty spelling tests and the average of another twenty spelling tests, between how friendly a person seems on one occasion and how friendly a person seems on another occasion and between friendliness averaged over twenty situations and then over another twenty situations, and so on.
I’m betting that your answers showed the following pattern.
1. Your answers indicate that you think the correlation between basketball performance in twenty games and performance in another twenty games is high, and higher than the correlation between scores on one spelling test and scores on another.
2. Your answers indicate that you think that the correlation between friendliness on one occasion and friendliness on another occasion is quite high, and about as high as the correlation between honesty on twenty occasions and honesty on another twenty occasions.
3. Your answers indicate that the correlations for traits are higher than the correlation for abilities.
At any rate, that describes the guesses of the college student participants in the experiment that I did with Ziva Kunda.14
Take a look at Figure 4. Note that people’s guesses about behaviors that reflect abilities (averaging over the actual data for spelling and basketball) are close to the facts. The correlation between behavior (spelling or points scored in basketball) in one situation and another is moderately large—about .5. And people’s guesses about the magnitude of that relationship are right on the money.
Figure 4. People’s guesses about correlations based on small and large amounts of data for abilities (averaged over spelling and basketball) and for traits (averaged over friendliness and honesty).
There is also pretty good recognition of the role of the law of large numbers in affecting correlations. If you look at scores summing across many behaviors and correlate them with the sum of another large batch of behaviors, the correlations are higher. People don’t recognize how very much higher the correlation across summed behaviors is, but they do recognize that behavior over twenty occasions gives you a substantially better prediction for the next twenty occasions than behavior on one single occasion does for another single occasion.
Contrast the accuracy for abilities with the hopeless inaccuracy for traits. People think that honesty in one situation is correlated with honesty in another, and friendliness in one situation is correlated with friendliness in another, to the tune of .8! That is grievously wrong. The correlation between behavior on one occasion that reflects any personality trait whatsoever with behavior on another occasion reflecting that trait is typically .1 or less and virtually never exceeds .3. The error here is colossal and full of implications for everyday life that were discussed in the previous chapter. We think we can get a very good bead on someone’s traits by observing their behavior in a single situation that taps that trait. This mistake is part and parcel of the fundamental attribution error, compounded by our failure to recognize that the law of large numbers applies to personality estimates just as it does to ability estimates. We think we learn much more than we do from a small sample of a person’s behavior because we are inclined to underestimate the possible role of the context and because we think behavior on one occasion is sufficient to make a good prediction about behavior on the next, possibly quite different occasion. Moreover, there is virtually no recognition of the effect of increasing the number of observations. If you observe people’s trait-related behaviors over a large number of occasions and correlate that total with the total of behaviors in another twenty situations, you do indeed get very high correlations. The problem is that people believe that the law of large numbers for observations of trait-related behavior also holds for a small number of observations of trait-related behavior!
Why is there such radically different accuracy at the level of single occasions measuring abilities and single occasions measuring traits? And why is there fairly decent recognition of the role of the law of large numbers in producing accurate measures for abilities but virtually no recognition at all for traits?
It’s all in the coding. For many if not most abilities we know what the units are for measuring behavior and we can actually give them numbers: proportion of words spelled correctly; percentage of free throws made. But what are the appropriate units for judging friendliness? Smiles per minute? “Good vibes” per social exchange? How do we compare the ways that people manifest friendliness at Saturday night parties with the ways they show friendliness in Monday afternoon committee meetings? The types of behavior that people engage in are so different for the two types of circumstances that the things we’re labeling as evidence of friendliness in one situation are quite different from what we’re using as indicators of friendliness in another situation. And to try to give numbers to the friendliness indicators for situation A is difficult or impossible. Even if we could give numbers to them, we wouldn’t know how to compare them to the numbers we have for the friendliness indicators for situation B.
What’s the cure for the error with traits? We’re not going to be able to define the relevant units of behavior with much accuracy and we’re not going to give them numbers if we could. Psychologists do this in the context of research, but if we made such measurements, we couldn’t mention it to a single soul because that person would think we were crazy. (“I’m giving Josh a score of 18 on friendliness of smiles at the meeting based on multiplying number of upward bends of the lips times the angle of each bend. Wait. Come back. Where are you going?”)
The most effective way to avoid making unjustifiably strong inferences about someone’s personality is to remind yourself that a person’s behavior can only be expected to be consistent from one occasion to another if the context is the same. And even then, many observations are necessary for you to have much confidence in your prediction.
It may help to remember that you are not all that darned consistent. I’d bet that people who have met you in some situations have regarded you as pretty nice and people who have seen you in other situations have regarded you as not so nice at all. And I’d bet further that you couldn’t blame people in those situations from reaching those conclusions given the evidence available to them. Just remember that it’s the same for that guy you just met. You can’t assume that you would experience his personality the same way in the next, perhaps rather different, situation in which you might encounter him.
More generally, know what you can code and what you can’t. If you can’t code or assign numbers to the event or behavior in question offhand, try the exercise of attempting to think of a way to code for it. The sheer effort it would take to do this is likely to alert you to the fact that you’re susceptible to overestimating consistency of the event or behavior.
The best news I can offer you about the topics in this chapter and the preceding one is that, although I’ve shown how you can think statistically in just a tiny number of domains where you didn’t previously, I know from my research on teaching people how to reason statistically that just a few examples in two or three domains are sufficient to improve people’s reasoning for an indefinitely large number of events, even if they bear little resemblance to the ones I taught them about.
When I teach the law of large numbers with problems that people are inclined to reason about statistically anyway, such as lotteries and coin tosses, their inferences for the kinds of events they only sometimes think about probabilistically, such as objectively scorable abilities, improve.15 Their inferences for the sorts of things they rarely think about statistically, such as personality traits, also improve. The same is true if I teach using just objectively scorable examples about abilities or teach using more subjective, difficult-to-score examples. Teaching about problems of one type improves reasoning about other, very different types.
Summing Up
Accurate assessment of relationships can be remarkably difficult. Even when the data are collected for us and summarized, we’re likely to guess wrongly about the degree of covariation. Confirmation bias is a particularly likely failing: if some As are Bs, that may be enough for us to say that A is associated with B. But an assessment of whether A is associated with B requires comparing two ratios from a fourfold table.
When we try to assess correlations for which we have no anticipations, as when we try to estimate the correlation between meaningless or arbitrarily paired events, the correlation must be very high for us to be sure of detecting it. Our covariation detection abilities are very poor for events separated in time by more than just a few minutes.
We’re susceptible to illusory correlations. When we try to assess the correlation between two events that are plausibly related to each other—for which we’re prepared to find a positive correlation—we’re likely to believe there is such a correlation even when there isn’t. When the events aren’t plausibly related, we’re likely to fail to see a positive correlation even when a relatively strong one exists. Worse—we’re capable of concluding there is a positive relationship when the real relationship is negative and capable of concluding there is a negative relationship when the real relationship is positive.
The representativeness heuristic underlies many of our prior assumptions about correlation. If A is similar to B in some respect, we’re likely to see a relationship between them. The availability heuristic can also play a role. If the occasions when A is associated with B are more memorable than occasions when it isn’t, we’re particularly likely to overestimate the strength of the relationship.
Correlation doesn’t establish causation, but if there’s a plausible reason why A might cause B, we readily assume that correlation does indeed establish causation. A correlation between A and B could be due to A causing B, B causing A, or something else causing both. We too often fail to consider these possibilities. Part of the problem here is that we don’t recognize how easy it is to “explain” correlations in causal terms.
Reliability refers to the degree to which a case gets the same score on two occasions or when measured by different means. Validity refers to the degree to which a measure predicts what it’s supposed to predict. There can be perfect reliability for a given measuring instrument but no validity for the instrument. Two astrologers can agree perfectly on the degree to which Pisces people are more extroverted than Geminis—and there most assuredly is no validity to such claims.
The more codable events are, the more likely it is that our assessments of correlation will be correct. For readily codable events such as those determined by ability, our assessment of correlations across two occasions can be quite accurate. And we recognize that the average of many events is a better predictor of the average of many other events of the same kind than measurement of a single event is for another single event—when the events in question are influenced by some ability. Even for abilities, though, gain in predictability from observation on one occasion to predictability based on the average of many occasions tends to be substantially greater than we realize. Our assessments of the strength of relationships based on difficult-to-code events such as those related to personality can be wildly off the mark, and we show little or no recognition of the extent to which observations of many such events are a far better guide to future behavior than are observations of a few such events.
Caution and humility are called for when we try to predict future trait-related behavior from past trait-related behavior unless our sample of behavior is large and obtained in a variety of situations. Recognizing how difficult it is to code behavior of a particular kind may alert us to the possibility that our predictions about that kind of behavior are particularly susceptible to error. Reminding ourselves of the concept of the fundamental attribution error may help us to realize that we may be overgeneralizing.