CHAPTER 6

THINKING AS HYPOTHESIS TESTING

Contents

UNDERSTANDING HYPOTHESIS TESTING

Explanation, Prediction, and Control

Inductive and Deductive Methods

Operational Definitions

Independent and Dependent Variables

Measurement Sensitivity

POPULATIONS AND SAMPLES

Biased and Unbiased Samples

Sample Size

Variability

SCIENCE VERSUS SCIENCE FICTION

Amazing and Not True

DETERMINING CAUSE

Isolation and Control of Variables

Three-Stage Experimental Designs

Using the Principles of Isolation and Control

Prospective and Retrospective Research

Correlation and Cause

Illusory Correlation

Validity

Convergent Validity

Illusory Validity

Reliability

THINKING ABOUT ERRORS

Experience is an Expensive Teacher

Anecdotes

SELF-FULFILLING PROPHECIES

OCCULT BELIEFS AND THE PARANORMAL

Conspiracy Theories

THINKING AS AN INTUITIVE SCIENTIST

CHAPTER SUMMARY

TERMS TO KNOW

Suppose that the following is true: You are seriously addicted to heroin and you have two choices of treatment programs.

Program #1: This program is run by former heroin addicts. Your therapist will be a recovered addict who is the same age as you. The literature about this program states that among those who stay with the program for at least one year, the success rate is very high (80%). One of the biggest advantages of this program is that your therapist knows what it’s like to be seriously addicted and can offer you insights from his own recovery.

Program #2: The therapists in this program have studied the psychology and biology of heroin addiction. The success rate that they provide is much lower than that provided for Program #1 (30%), but the percentage of successes is based on everyone who enters treatment, not just those who are still using the program after one year. Your therapist has never been addicted to heroin, but has studied various treatment options.

This is an important decision for you. Which do you choose?

Understanding Hypothesis Testing

Research is an intellectual approach to an unsolved problem, and its function is to seek the truth.

—Paul D. Leedy (1981, p. 7)

Much of our thinking is like the scientific method of hypothesis testing. A hypothesis is usually a belief about a relationship between two or more variables. In order to understand the world around us, we accumulate observations, formulate beliefs or hypotheses (singular is hypothesis), and then observe if our hypotheses are confirmed or disconfirmed. Thus, hypothesis testing is one way of finding out about the “way the world works.” Formulating hypotheses and making systematic observations that could confirm or disconfirm them is the same method that scientists use when they want to understand events in their academic domain. Thus, when thinking is done in this manner, it has much in common with the experimental methods used by scientists.

Explanation, Prediction, and Control

Government policies—from teaching methods in schools to prison sentencing to taxation—would also benefit from more use of controlled experiments.

—Timo Hannay (2012, p. 26)

There is a basic need to understand events in life. How many times have you asked yourself questions like, “Why did my good friends get divorced when they seemed perfect for each other?” or “How can we understand why the son of the U.S. Surgeon General, the chief doctor in the United States, is addicted to illegal drugs?” When we try to answer questions like these, we often function as an “intuitive scientist.” Like the scientist, we have our own theories, which are explanations about the causes of social and physical events. It is important to be able to explain why people react in certain ways (e.g., He’s a bigot. She’s tired and cranky after work.), to predict the results of our actions (e.g., If I don’t study, I’ll fail. If I wear designer clothes, people will think I’m cool.), and to control some of the events in our environment (e.g., In order to get a good job in business, I’ll have to do well in my accounting course.).

The goal of hypothesis testing is to make accurate predictions about the portion of the world we are dealing with (Holland, Holyoak, Nisbett, & Thagard, 1993). In order to survive and function with maximum efficiency, we must reduce the uncertainty in our environment. One way to reduce uncertainty is to observe sequences of events with the goal of determining predictive relationships. Children, for example, may learn that an adult will appear whenever they cry; your dog may learn that when he stands near the kitchen door, you will let him out; and teenagers may learn that their parents will become angry when they come home late. These are important predictive relationships because they reduce the uncertainty in the environment and allow us to exercise some control over our lives. The process that we use in determining these relationships is the same one that is used when medical researchers discover that cancer patients will go into remission following chemotherapy or that longevity is associated with certain life styles. Because the processes are the same, some of the technical concepts in scientific methods are applicable to practical everyday thought.

Inductive and Deductive Methods

Inductive reasoning is a major aspect of cognitive development and plays an important role in both the development of a system of logical thought processes and in the acquisition of new information.

—James Pellegrino and Susan Goldman (1984, p. 143)

Sometimes a distinction is made between inductive and deductive methods of hypothesis testing (see the chapter on reasoning skills). In the inductive method, you observe events and then devise a hypothesis about the events you observed. For example, you might notice that Armaund, a retired man whom you know, likes to watch wrestling on television. Then you note that Minnie and Sue Ann, who are retired older-adults, also like to watch wrestling on television. On the basis of these observations, you would hypothesize (invent a hypothesis or explanation) that older people like to watch wrestling. In this way, you would work from your observations to your hypothesis. The inductive method is sometimes described as “going from the specific to the general.” In a classic book entitled Induction (Holland et al., 1986), the authors argue that the inductive process is the primary way in which we learn about the nature of the world. As explained in Chapter 5, with inductive reasoning, if your premises are true and the reasoning is valid, then you can decide that the conclusion is probably correct. So, for example, if you are a juror, you can decide “beyond a reasonable doubt” that the defendant is guilty, or using the example just given, that older people like to watch wrestling. By contrast, with deductive methods, if the premises are true and the syllogism is valid, then the conclusion must be true.

Although a distinction is usually made between inductive and deductive reasoning, in real life they are just different phases of the hypothesis-testing method. Often people observe events, formulate hypotheses, observe events again, reformulate hypotheses, and collect even more observations. The question of whether the observations or the hypothesis comes first is moot because our hypotheses determine what we choose to observe, and our observations determine what our hypotheses will be. It is like the perennial question of which came first, the chicken or the egg. Each process is dependent on the other for its existence. In this way, observing and hypothesizing recycle, with the observations changing the hypotheses and the hypotheses changing what gets observed.

If you are a Sherlock Holmes fan, you will recognize this process as one that was developed into a fine art by this fictional detective. He would astutely note clues about potential suspects. For example, Sherlock Holmes could remember that the butler had a small mustard-yellow stain on his pants when it is well known that you don’t serve mustard with wild goose, which was the main course at dinner that evening. He would use such clues to devise hypotheses like, “the butler must have been in the field where wild mustard plants grow.” The master sleuth would then check for other clues that would be consistent or inconsistent with this hypothesis. He might check the butler’s boots for traces of the red clay soil that surrounds the field in question. After a circuitous route of hypotheses and observations, Sherlock Holmes would announce, “The butler did it.” When called on to explain how he reached his conclusion, he would utter his most famous reply, “It’s elementary, my dear Watson.”

Many of our beliefs about the world were obtained with the use of inductive and deductive methods, much like the great Sherlock Holmes. We use the principles of inductive and deductive reasoning to generate and evaluate beliefs. Holmes was invariably right in the conclusions he drew. Unfortunately, it is only in the realm of fiction that mistakes are never made because conclusions that result from inductive reasoning can never be known with absolute certainty. Let’s examine the components of the hypothesis testing process to see where mistakes can occur.

Operational Definitions

Scientific reasoning and everyday reasoning both require evidence-based justification of beliefs, or the coordination of theory and evidence.

—Deanna Kuhn (1993, p. 74)

An operational definition tells us how to recognize and measure a concept. For example, if you believe that successful women are paid high salaries, then you will have to define “successful” and “high salary” in ways that will allow you to identify who is successful and who receives a high salary. If you have already read the chapter “The Relationship between Thought and Language” (Chapter 3), then you should recognize the need for operational definitions as being the same as the problem of vagueness. You would need to provide some statement like, “Successful individuals are respected by their peers and are famous in their field of work.” You will find that it is frequently difficult to provide good operational definitions for terms. I can think of several people who are not at all famous, but who are successful by their own and other definitions. If you used the operational definition that requires fame as a component of success, then you would conclude that homemakers, skilled crafts people, teachers, nurses, and others could not be “successful” based on this definition. Thus, this would seem to be an unsatisfactory operational definition. Suppose, for purposes of illustration, that this is our operational definition to classify people into “successful” and “unsuccessful” categories.

How would you operationally define “paid a high salary?” Suppose you decided on “earns at least $2,000 per week.” Once these terms are operationally defined, you could go around finding out whether successful and unsuccessful women differ in how much they are paid. Operational definitions are important. Whenever you hear people talking about “our irresponsible youth,” “knee-jerk liberals,” “bleeding hearts,” “red-necks,” “reactionaries,” “fascists,” or “feminists,” ask them to define their terms operationally. You may find that the impact of their argument is diminished when they are required to be precise about their terms.

Many arguments hinge on operational definitions. For example, consider the debate over whether homosexuality is a mental disorder. The issue turns on the answer to operational definitions. What defines a “mental disorder?” Who gets to decide how “mental disorder” should be defined? Does homosexuality possess the defining characteristics of a mental disorder? The vitriolic arguments about whether abortion is murder can be transformed into calmer arguments over what is the appropriate definition of murder, and again the more important question of who is the right authority to define what constitutes murder. Thus, with critical thinking, explosive divisions over issues like abortion will not be resolved, but they will be changed in their character as people consider what is really being argued about. Although most people tend to think that topics like the need for operational definitions are only relevant when discussing research, in fact, these topics are useful everyday thinking skills. Consider this example from a computerized learning game (Halpern et al., 2012): Imagine how you would respond to “My roommate and I got into an argument yesterday over who was more influential on hip hop: James Brown or Stevie Wonder.” You might respond, “You know, you could have resolved the argument by using what scientists call an operational definition.” When you use operational definitions, you avoid the problems of ambiguity and vagueness. Try, for example, to write operational definitions for the following terms: love, prejudice, motivation, good grades, sickness, athletic, beautiful, and maturity.

Independent and Dependent Variables

Psychologists have been making the case for the ‘nothing special’ view of scientific thinking for many years.

—David Klahr and Herbert Simon (2001, p. 76)

A variable is any measurable characteristic that can take on more than one value. Examples of variables are gender (female and male), height, political affiliation (Republican, Democrat, Communist, etc.), handedness (right, left, ambidextrous), and attitudes towards traditional sex roles (could range from extremely negative to extremely positive). When we test hypotheses, we begin by choosing the variables of interest.

In the opening scenario of this chapter, you were asked to determine which of the two programs would more likely help you kick your heroin habit. In this example, there are two variables—type of treatment, which is the independent variable, or the one that is under your control (Program #1 and Program #2) and recovery, which is the dependent variable or the one that you believe will change as a result of the different treatments—you will either (a) recover from the addiction, or (b) you will not recover from it. You want to select the program that is more likely to help you to recover. In the jargon of hypothesis testing, you want to know which level of the independent variable will have a beneficial effect on the dependent variable.

The next step in the hypothesis-testing process is to define the variables operationally. Suppose we decide to define “recovery” as staying drug-free for at least two years and “not recovering” as staying drug-free for less than two years, which would include never being drug-free. It is important to think critically about operational definitions for your variables. If they are not stated in precise terms, the conclusions you draw from your study may be wrong.

Measurement Sensitivity

When we measure something, we systematically assign a number to it for the purposes of quantification. Someone who is taller than you are is assigned a higher number of inches of height than you are. If not, the concept of height would be meaningless.

When we think as scientists and collect information in order to understand the world, we need to consider how we measure our variables. For example, suppose you believe that love is like a fever, and that people in love have fever-like symptoms. To find out if this is true, you could conduct an experiment, taking temperatures from people who are in love and comparing your results to the temperatures of people who are not in love. How will you measure temperature? Suppose that you decide to use temperature headbands that register body temperature with a band placed on the forehead. Suppose further that these bands measure temperature to the nearest degree (e.g., 98°, 99°, 100°, etc.). If being in love does raise your body temperature, but only raises it one-half of a degree, you might never know this if you used headband thermometers. Headband thermometers just wouldn’t be sensitive enough to register the small increment in body temperature. You would incorrectly conclude that love doesn’t raise body temperatures, when in fact it may have. As far as I know, this experiment has never been done, but it is illustrative of the need for sensitive measurement in this and similar situations.

Populations and Samples

People make innumerable decisions daily about other people that affect their lives and careers. These decisions are inevitably fraught with errors of judgment that reflect ignorance, personal biases, or stereotypes. …

—W. Grant Dahlstrom (1993, p. 393)

In deciding which heroin treatment program to enter, or for that matter, which college to attend or which job to accept, you are making a bet about the future, which necessarily involves uncertainty. Hypothesis testing principles are used to reduce uncertainty. We cannot eliminate uncertainty, but we can use the principles of hypothesis testing to help us make the best choice. In the example at the beginning of the chapter, you would have to examine and evaluate information about the success rate of both programs. You would then use this information to make your decision.

The group that we want to know about is called a population. Since we obviously cannot study every heroin addict to determine which program has more successes, we need to study a subset of this population. A subset of a population is called a sample. In this example, all of the people who entered each of the programs constitute the sample.

Biased and Unbiased Samples

Self-selected samples are not much more informative than a list of correct predictions by a psychic.

—John Allen Paulos (2001, p. 152)

We want our sample to be representative of our population. To be representative, the addicts in our sample would need to be both female and male, from all socioeconomic levels, all intellectual levels, rural and urban areas, and so on, assuming that heroin addicts come from all of these demographic groups. We need representative samples so that we can generalize our results and decide, on average, that one program is more successful than the other. Generalization refers to using the results obtained with a sample to infer that similar results would be obtained from the population if everyone in the population had been measured. Generalizations are valid only if the sample is representative of the population.

What happens when the sample is not representative of the population? Suppose that one program is very expensive and one is county-run to serve the poor. These are examples of biased samples. Because they are not representative or unbiased, you could not use these samples to draw conclusions about the population of all heroin addicts. Far too often, mistakes occur because a sample was biased. One of the biggest fiascoes in sampling history probably occurred in 1936 when the Literary Digest mailed over 10 million straw ballots to people’s homes in order to predict the winner of the presidential election that was to be held that year. The results from this large sample were clear-cut: The next president would be Alf Landon. What, you don’t remember learning about U.S. President Landon? I am sure that you do not because Franklin Delano Roosevelt was elected president of the United States that year. What went wrong? The problem was in how the Literary Digest sampled voters. They mailed ballots to subscribers to their literary magazine, to people listed in the phone book, and to automobile owners. Remember, this was 1936, and only the affluent belonged to the select group of people who subscribed to literary magazines, or had phones, or owned automobiles. They failed to sample the large number of poorer voters, many of whom voted for Roosevelt, not Landon. Because of biased sampling, they could not generalize their results to the voting patterns of the population. Even though they sampled a large number of voters, the results were wrong because they sampled in a biased way.

It is often not easy to recognize the profound effect that biased sampling can have on the information that we receive. For example, phone-in polls are very popular, probably because someone makes money from the phone calls. Suppose that a phone-in poll shows that 75% of the people who responded to a question about same-sex marriage were opposed to it. What can we conclude from this poll? Absolutely nothing! Polls of this sort are called “SLOPs,” which stands for S elected L istener O pinion P oll s and also describes their worth. Only people with extreme views on a topic will take the time and expense to call in their opinions. Even though these polls are usually preceded with warnings like “this is a nonscientific survey,” the announcer then goes on to present meaningless results as though they could be used to gauge public opinion.

Another pitfall in sampling is the possibility of confounding. Confounds confuse the interpretation of results because their influence is not easily separated from the influence of the independent variable. Suppose that patients in the two hypothetical heroin treatment programs differ systematically in more than one way—that is, Program #1 provides peer counseling and the addicts in this program are very wealthy, while addicts in Program #2 get a different type of treatment and they are very poor—we cannot determine if any differences in recovery rate are due to the type of treatment or income levels of the patients. Because you cannot separate the effect of type of treatment and income, you could not use these results to decide which treatment is more successful.

Usually, scientists use convenience samples. They study a group of people who are readily available. The most frequently used subjects in psychology experiments are college students and rats. The extent to which you can generalize from these samples depends on the research question. If you want to understand how the human visual system works, college students should be useful as subjects, especially if you want to know about young, healthy eyes. If, on the other hand, you want to understand sex-role stereotyping or attitudes toward subsidized health insurance for the elderly, college students would not be a representative sample. In this case, you could only generalize the results you obtained to college students.

There has been much debate over the issue of establishing a voucher system as a means of paying for K–12 education, especially in my home state of California. As you may know, some people believe that education would improve if parents received vouchers in an amount that is equal to what the state pays to educate a child in the public schools. The parents could then use this voucher to select any school that they deemed best for their children. This is a complex issue as proponents argue that the competition would improve all schools, and opponents argue that wealthy parents would supplement the voucher and send their children to private schools, while the poor parents would have to use the vouchers at cheaper and inferior schools. I do not want to debate the issue of vouchers here, but I do want to repeat an advertisement that was continually seen during the preelection period. It went something like this:

The public schools in California are doing a poor job at educating our children. Did you know that the California high school students score much lower than high school students from Mississippi on the college entrance examinations?

There are many ways the thinking in this advertisement could be criticized (including the obvious slur on the state of Mississippi), but for the purpose of this discussion, consider only the nature of the samples that are being compared. Only students who are planning on attending college take the college entrance examinations. A much greater proportion of high school students in California take these examinations than those in Mississippi. Although I do not know what the actual figures are, suppose that the top 40% of California high school graduates take these exams, but only the top 10% of Mississippi high school graduates take these exams. Can you see why you would expect Mississippi students to score higher because of the bias in sampling? There are other reasons why we might expect these results, which do not relate directly to the quality of education. California has many recent immigrants, which means many students whose English is not as good as that of native English speakers. This fact would also lower state-wide averages. Again, this is a sampling problem because comparable groups that differ only on the variable of interest (state in which education was obtained) are not being made. Of course, it is possible that students in Mississippi are getting a better education than those in California, but we cannot conclude this from these data.

Sample Size

Given a thimbleful of facts, we rush to make generalizations as large as a tub.

—Gordon Allport (1954, p. 8)

The number of subjects you include in your sample is called the sample size. Suppose that treatment Program #1 had six patients/participants and Program #2 had 10 patients/participants. Both of these numbers are too small to determine the success rate of the treatments. When scientists conduct experiments, they often use large numbers of subjects because the larger the sample size, the more confident they can be in generalizing the findings to the population. This principle is called the law of large numbers, and it is an important statistical law. If, for some reason, they cannot use a large number of subjects, they may need to be more cautious or conservative in the conclusions that they derive from their research. Although a discussion of the number of subjects needed in an experiment is beyond the scope of this book, it is important to keep in mind that for most everyday purposes, we cannot generalize about a population by observing how only a few people respond.

Suppose this happened to you:

After months of deliberation over a new car purchase, you finally decided to buy the fuel-efficient Ford Focus. You found that both Consumer Reports and Road and Track magazines gave the Focus a good rating. It is priced within your budget, and you like its “sharp” appearance. On your way out the door to close the deal, you run into a close friend and tell her about your intended purchase. “A Focus!” she shrieks. “My brother-in-law bought one and it’s a tin can. It’s constantly breaking down on the freeway. He’s had it towed so often that the rear tires need replacing.”

What do you do?

Most people would have a difficult time completing the purchase because they are insufficiently sensitive to sample size issues. The national magazines presumably tested many cars before they determined their rating. Your friend’s brother-in-law is a single subject. You should place greater confidence in results obtained with large samples than in results obtained with small samples (assuming that the “experiments” were equally good). Yet, many people find the testimonial of a single person, especially if it is someone they know, more persuasive than information gathered from a large sample, especially when there is preference for the results obtained from the small sample.

We tend to ignore the importance of having an adequately large sample size when we function as intuitive scientists. This is why testimonials are so very powerful in persuading people what to do and believe. However, testimonials are based on the experiences of only one person, and often that person is usually being paid to say that some product or purchase is good. I would like to dismiss testimonials and similar sorts of “evidence” as hog wash that no one would fall for, but I know differently. A family member spent over $300 in calls to psychics when struggling with decisions regarding her critically ill husband. This was money that she did not have for advice that was, at best, harmless, and, at worst, caused her to ignore the recommendations of the hospital staff. I later was told that psychics are not permitted to predict that anyone will die, so they gave her false hope, which made the death even more difficult to bear. I am telling you this true personal anecdote because I hope that it will be effective in causing you to think about the sort of evidence that you would need to spend hundreds of dollars for advice by a paid stranger who has no credentials or training in psychology or science.

Variability

But all evolutionary biologists know that variation itself is nature’s only irreducible essence. Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions.

—Stephen Jay Gould (1985, para. 14)

The term variability is used to denote the fact that all people are not the same. Suppose that you know someone who “drank a six-pack” twice a day and lived to be 100 years old. Does this mean that the hype about the negative effects of alcohol on health is wrong? Of course not! The effect of alcohol on health was determined by numerous separate investigators using large numbers of subjects. Not everyone responds in the same way, or maintains the same opinion, or has the same abilities. It is important to remember the role of variability in understanding results. Our studies can tell us, with some probability, what is generally true, but there will be individuals who do not conform to the usual pattern. There are people who drink heavily and live to a ripe old age, but this does not mean that the studies that show that heavy drinking causes many terminal illnesses are wrong. It just means that people are different.

There are many examples where people generalize inappropriately from small samples. In a tragic example, the teenage son of a prominent politician in the United States committed suicide after taking a commonly prescribed acne medication. The family was absolutely convinced that the suicide was a result of depression caused by the drug, and they began an active campaign to get the drug taken off the market. In their view, their son had been a normal teen until he took the medication, and soon after taking the medication, he committed suicide. Large-scale studies did not support their belief that teens taking this drug were more likely to be depressed or commit suicide than similar teens not taking this drug. However, for the family of this teen, their own tragic experience was far more convincing than data collected on thousands of teens.

People’s willingness to believe that results obtained from a small sample can be generalized to the entire population is called the law of small numbers (Tversky & Kahneman, 1974). In fact, we should be more confident when predicting to or from large samples than small samples. In an experimental investigation of this phenomenon (Quattrone & Jones, 1980), college students demonstrated their belief that if one member of a group made a particular decision, then the other members of that group would make the same decision. This result was especially strong when the college students were observing the decisions of students from other colleges. Thus, it is easy to see how a belief in the law of small numbers can maintain prejudices and stereotypes. We tend to believe that the actions of a single group member are indicative of the actions of the entire group. Have you ever heard someone say, “___________’s (fill in your group) are all alike.”? An acquaintance once told me that all Jamaicans are sneaky thieves. She came to this conclusion after having one bad experience with a person from Jamaica. Expressions like this one are manifestations of the law of small numbers. Can you see how the law of small numbers can also explain the origin of many prejudices like racism? A single memorable event involving a member of a group with which we have little contact can color our beliefs about the other members of that group. Generally, when you collect observations about people and events, it is important to collect a large number of observations before you reach a conclusion.

There is one exception to the general principle that we need large samples in order to make valid generalizations about a population. The one exception occurs when everyone in the population is exactly the same. If, for example, everyone in the population of interest responded exactly the same way to any question (e.g., Do you approve of the death penalty?) or any treatment (e.g., had no “heart attacks” when treated with a single aspirin), then sample size would no longer be an issue. Of course, all people are not the same. You may be thinking that this was a fairly dumb statement because everyone knows that people are different. Unfortunately, research has shown that most of us tend to underestimate the variability of groups with which we are unfamiliar.

Minorities who are members of any group often report that the leader or other group members will turn to them and ask, “What do African Americans (or women, or Hispanics, or Asians, or whatever the minority is) think about this issue?” It is as though the rest of the group believes that the few minority members of their group can speak for the minority group as a whole. This is a manifestation of the belief that groups other than the ones to which we belong are much more homogeneous (less variable) than the groups to which we belong. The ability to make accurate predictions depends, in part, on the ability to make accurate assessments of variability. It is important to keep the concept of variability in mind whenever you are testing hypotheses, either formally in a research setting, or informally as you try to determine relationships in everyday life.

Science versus Science Fiction

Science is a method for understanding; it is a way of “knowing” about the world. Data (or evidence) are at the heart of the scientific method but not just “any old” data or evidence—data or evidence that were collected under controlled conditions, data that can be openly scrutinized, and data that can be replicated. Consider, for example, the hypothesis that there are psychics who can accurately predict future events. What sort of data or evidence would convince you that at least some people have psychic powers?

Amazing and Not True

I have been told by believers in psychics that they have good data—a psychic made amazingly true predictions about their lives. Testimonials by one or even several people are offered as evidence that psychics exist. Believers claim that they “know” psychics have true powers because they “saw it with their own eyes.” (If you have already read the chapter on memory, then you know that eye witness testimony can be highly inaccurate, an important fact, but not the main point in this discussion.) I like to remind them that they also saw Mr. Ed, the talking horse who had his own television show and a talking pig in the children’s classic move Babe, just to name two examples; do they believe that there are horses and pigs that can talk?

I like to begin my college classes in critical thinking by demonstrating my own amazing psychic powers. Each semester, before the start of the first class, I check my class roster and then look up some of the students in the university records. The records include date of birth and sometimes information about siblings (if they are also students), so I can determine the astrological sign and some family information for the students enrolled in my class. I then amaze students by asking a series of questions that lead up to my ability to name students I have never met and tell them their astrological sign and some family information. (Let’s see—I am getting the feeling that you are a Taurus. Your name—don’t tell me—it’s something like Jean or Jane, no wait, it’s Jackie, isn’t it? I also see a family picture with a younger sister in it. Is that correct? And her name is Latisha, right? And so on.) It is really an easy trick; yet, students wonder how I was able to know so much about people I have never met. The fact that I could tell students their name, astrological sign, and other information might look like evidence in support of psychic phenomena, and without appropriate controls, it is a very easy trick.

Cold Reading

There are many people who claim to be psychics or clairvoyants—some even claim to be able to speak with the dead. I think we can all understand the pain of grieving relatives who really want to believe that they can talk to the dead. Televised shows have sometimes shown a portion of the séance or session with the dead. Typically, viewers will see only a few minutes of a session that may last for several hours—usually a portion where the psychic made a successful prediction like, “the person you are trying to contact—is it a parent?” What we do not see is the many instances where questions like this one are incorrect. As explained earlier, James Randi has offered a $1,000,000 challenge to anyone who, “under proper observing conditions,” can show evidence of “paranormal, supernatural, or occult power.” The Australian Skeptics Society also offers a cash prize of $100,000 for any proven demonstration of psychic powers. No one has collected either of these two prizes. Ray Hyman, a psychologist who studies ways to debunk psychics, explains how they “read” your mind, “speak with the dead,” and other similar shows. The technique is called “cold reading.” The psychic relies on body language and basic statistical information (e.g., common boys’ names that start with a known letter). He offers these 13 points for amazing your own friends with cold reading (Hyman, 1977):

1.  Look confident—act as though you believe in what you are doing.

2.  Use polls, surveys, and statistical abstracts—predict attitudes from a person’s educational level and religion.

3.  Stage the reading—make no excessive claims; be modest.

4.  Gain the subject’s cooperation—make the subject try to find truth in what you say.

5.  Use a gimmick—crystal balls and palm reading are good.

6.  Have a list of stock phrases available—there are fortune telling and other manuals that have lots of good ones.

7.  Use your eyes—observe clothing, jewelry, speech—, these can be good hints.

8.  Fish—get the subject to tell about him or herself.

9.  Listen—subjects want to talk about themselves.

10.  Be dramatic—ham it up for effect.

11.  Pretend to know more than you are saying—subject will assume you do.

12.  Flatter the subject—people love flattery.

13.  Always tell the subject what she or he wants to hear.

Hyman also suggests some “stock Spiels” that make a plausible case for your ability to “read” people. For example, here is one suggestion.

Does it accurately reflect your personality? Some of your aspirations tend to be pretty unrealistic. At times you are extroverted, affable, and sociable, while at other times you are introverted, wary, and reserved. You have found it unwise to be too frank in revealing yourself to others. You pride yourself on being an independent thinker and do not accept others’ opinions without satisfactory proof. You prefer a certain amount of change and variety and become dissatisfied when hemmed in by restrictions and limitations. At times you have serious doubts as to whether you have made the right decision or done the right thing. Disciplined and controlled on the outside, you tend to be worrisome and insecure on the inside.

People who talk to the dead, read palms or tea leaves (pattern of tea leaves left in a cup after you drink from it), and have visited beings from other planets use similar methods when providing evidence of their unusual abilities. Remember, just because it looks like someone has made a rabbit appear from the inside of a hat does NOT mean that is what really happened. Magic tricks and science fiction can be great fun, but they are not substitutes for critical thinking. If there is only one goal that I could have for readers of this book, it is that you come to value data and evidence for the purpose of making sound decisions. It may be the most critical part of critical thinking.

Determining Cause

•  Do you believe that children who are neglected become teenage delinquents?

•  Does jogging relieve depression?

•  Will a diet that is low in fat increase longevity?

•  Do clothes make the man?

•  Will strong spiritual beliefs give you peace of mind?

•  Does critical thinking instruction improve how you think outside the classroom?

All of these questions concern a causal relationship in which one variable (e.g., neglect) is believed to cause another variable (e.g., delinquency). What sort of information do we need to determine the truth of causal relationships?

Isolation and Control of Variables

Stop and think for a minute about the way you would go about deciding if neglecting children causes them to become delinquent when they are teenagers. You could decide to conduct a long-term study in which you would divide children into groups—telling some of their parents to cater to their every need, others to neglect them occasionally, and still others to neglect their children totally. You could require everyone to remain in their groups, catering to or neglecting their children as instructed until the children reach their teen years, at which time you could count up the number of children in each group who became delinquents. Remember, of course, that you would have to define operationally the term “delinquent.” This would be a good, although totally unrealistic, way to decide if neglect causes delinquency. It is a good way because this method would allow you to control how much neglect each child received and to isolate the cause of delinquency, as this would be the only systematic difference among the people in each group. It is unrealistic to the point of being ludicrous, because very few people would comply with your request to cater to or neglect their children. Furthermore, it would also be unethical to ask people to engage in potentially harmful behaviors.

In some experimental settings, it is possible to isolate and control the variables that interest you. If you wanted to know if grading students for course work will make college students work harder and therefore learn more, you could randomly assign college students to different grading conditions. Half the students could be graded as pass or fail (no letter grades), while the other students would receive traditional letter grades (A, B, C, D, or F). At the end of the semester, all students would take the same final exam. If the average final exam scores of the students who received grades were statistically significantly higher than of the students in the pass/fail condition, we could conclude that grades do result in greater learning. (See the chapter on probability for a discussion of significant differences.)

Can you see why it is so important to be able to assign students at random to either the graded or pass/fail conditions instead of just letting them pick the type of grading they want? It is possible that the students who would pick the pass/fail grading are less motivated or less intelligent than the students who would prefer to get grades or vice versa. If the students could pick their own grading condition, we would not know if the differences we found were due to the differences in grading practices, due to differences in motivation or intelligence, or due to some other variable that differs systematically as a function of which grading condition the students select. If we cannot use random assignment, then we usually cannot make any causal claims.

Let’s return to the question of whether child neglect causes delinquency. Given the constraint that you cannot tell parents to neglect their children, how would you go about deciding if child neglect causes delinquency? You could decide to find a group of parents and ask each about the amount of care he or she gives to each child. Suppose you found that, in general, the more that children are neglected, the more likely they are to become teenage delinquents. Because you lost control over your variables by not assigning parents to catering and neglecting groups, it is not possible, on the basis of this experiment alone, to conclude that neglect causes delinquency. It is possible that parents who neglect their children differ from caring parents in other ways. Parents who tend to neglect their children may also encourage drug use, or engage in other life style activities that contribute to the development of teenage delinquency. A point that is made in several places in this book is that just because two variables occur together (in this example, neglect and delinquency), it does not necessarily mean that one caused the other to occur.

Three-Stage Experimental Designs

When researchers want to be able to make strong causal claims, they use a three-stage experimental design. An experimental design is a plan for how observations will be made.

1.  The first stage involves creating different groups that are going to be studied. In the example about the effect of pass/fail grading on how much is learned, the two groups would be those who receive a letter grade and those who receive a grade of either “pass” or “fail.” It is important that the two groups differ systematically only on this dimension. You would not want all the students in the letter grade group to take classes taught by Professor Longwinded while those in the pass/fail group take classes taught by Professor Mumbles. One professor may be a better teacher, and students may learn more in one condition than the other because of this confounding variable. One way to avoid this confound is to assign half of the students in each class to each grading condition with the assignment of students to either group done at random. Strong causal claims will involve equating the groups at the outset of the experiment. The random assignment of subjects to groups is essential in discovering cause and effect.

2.  The second stage involves the application of the “experimental treatment.” If we were conducting a drug study, one group would receive the drug, and the other group would not receive the drug. Usually, the “nondrug” group would receive a placebo, which would look and/ or taste like the drug, but would be chemically inert. The reason for using a placebo is to avoid any effects of subjects’ beliefs or expectancies. We know that placebos can have positive effects on a variety of symptoms (Bishop, Jacobson, Shaw, & Kaptchuk, 2012.) The topic of expectancies and the way they can bias results is discussed later in this chapter.

3.  Evaluation is the final phase. Measurements are taken and two (or more) groups are compared on some outcome measure. In the grading example, final examination scores for students in the letter grade group would be compared to the scores of students in the pass/fail group. If students in the graded group performed significantly better than the students in the other group, then we would have strong support for the claim that one grading method caused students to study harder and learn more than the other.

Image

Figure 6.1 A real advertisement for a placebo, including its sole ingredient which is sugar and the (puzzling) additional fact that it is “not for human consumption.” (http://www.etsy.com/listing/99763771/placebo-max-strength, with permission by Darren Cullen)

Of course, we are not always able to equate groups at the outset and randomly assign subjects to groups, but when we can, results can be used to make stronger causal claims than in less controlled conditions.

Consider this hypothetical example:

Researchers at Snooty University have studied the causes of divorce. They found that 33% of recently divorced couples reported that they had serious disagreements over money during the two-year period that preceded the divorce. The researchers concluded that disagreements over money are a major reason why couples get divorced. They go on to suggest that couples should learn to handle money disagreements as a way of reducing the divorce rate.

What, if anything, is wrong with this “line of reasoning”? Plenty. First, we have no data from a comparison group that did not divorce (i.e., no control group). Maybe 33% of all families disagree about money; maybe the number is even higher in families that stay together. Second, there is no reason to believe that disagreements over money caused or even contributed to divorce. Maybe couples in the process of breaking-up disagree more about everything. Third, there is the problem of retrospective review, a topic that is discussed in the next section. Studies like this one are found everywhere from radio talk shows, to news reports, to scientific journals, and to people’s casual examinations of life. If you rely on the principles of hypothesis testing to interpret findings like this one, you are less likely to be bamboozled.

Using the Principles of Isolation and Control

In an earlier chapter, I presented Piaget’s notion that people who have attained the highest level of cognitive development can reason about hypothetical situations. Piaget called the highest level of cognitive development the Formal Stage of thought. He developed several different tasks that could be used to identify people who could think at this level. If you already read Chapter 4, then you will recall the “combinatorial reasoning” task devised by Piaget. It required a planful and orderly procedure for combining objects. Another one of Piaget’s tasks involved using the principles of isolation and control that are integral to hypothesis testing. Try this task.

Bending Rods: This task is to determine which of several variables affects the flexibility of rods. Imagine that you are given a long vertical bar with 12 rods hanging from it. Each rod is made of brass, copper, or steel. The rods come in two lengths and two thicknesses. Your task is to find which of the variables (material, length, or thickness) influence how much the rods will bend. You can test this by pressing down on each rod to see how much it bends. You may perform as many comparisons as you like until you can explain what factors are important in determining flexibility. It may help you to visualize the set up as presented in Figure 6.2.

What do you need to do to prove that length, or diameter, or the material the rods are constructed from or some combination of these variables is important in determining flexibility? Stop now and write out your answer to this problem. Do not go on until you have finished this problem.

Bending Rods: How did you go about exploring the effect of length, diameter, and material on rod flexibility? In order to solve this problem, you had to consider the possible factors that contribute to rod flexibility and then systematically hold constant all of the variables except one. This is a basic concept in experimental methods. If you wanted to know if material was an important factor, which rods would you test? You would bend a brass rod, a copper rod, and a steel rod of the same length and diameter. This would hold constant the length and diameter variables while testing the material variable. Some possible tests of this would be to compare flexibility between the short and wide brass, copper, and steel rods. Similarly, if you wanted to find out if length is important, you would bend a short and a long rod of the same diameter that was constructed with the same material. An example of this would be to compare the short and wide copper rod with the long and wide copper rod.

Image

Figure 6.2 Bending rods. How would you determine whether material, length, or thickness affects rod flexibility?

How would you decide if diameter influences rod flexibility? By now it should be clear that you would compare two rods of the same material and length and different diameters. You could test this by bending a short and wide steel rod with a short and thin steel rod. Thus, you should be able to recognize that the same principles used in hypothesis testing were needed in this task and be able to apply them correctly in order to solve this seemingly unrelated problem.

Prospective and Retrospective Research

Consider a medical example: Some health psychologists believe that certain stressful experiences can cause people to develop cancer. If this were your hypothesis, how would you determine its validity? One way would be to ask cancer patients if they had anything particularly stressful happen to them just before they were diagnosed as having cancer. If the stress caused the cancer, it would have to precede the development of cancer. When experiments are conducted in this manner, they are called retrospective experiments. Retrospective experiments look back in time to understand causes for later events. There are many problems with this sort of research. As discussed in the memory chapter, memories are selective and malleable. It is possible that knowledge of one’s cancer will change how one’s past is remembered. Moderately stressful events like receiving a poor grade in a college course may be remembered as being traumatic. Happier events, like getting a raise, may be forgotten. It is even possible that the early stages of the cancer were causing stress instead of the stress causing the cancer. Thus, it will be difficult to determine if stress causes cancer from retrospective research.

A better method for understanding causative relationships is prospective research. In prospective research, you identify possible causative factors when they occur and then look forward in time to see if the hypothesized result occurs. In a prospective study, you would have many people record stressful life events when they occur (e.g., death of a spouse, imprisonment, loss of a job) and then see which people develop cancer. If the people who experience more stressful events are more likely to develop cancer, this result would provide support for your hypothesis.

Most of the research we conduct as intuitive scientists is retrospective. We often seek explanations for events after they have occurred. How many times have you tried to understand why a seemingly angelic child committed a serious crime, or why a star rookie seems to be losing his touch, or why the underdog in a political race won? Our retrospective attempts at explanations are biased by selective memories and lack of systematic observations. (See the section on hindsight in the decision-making skills chapter for a related discussion.)

Correlation and Cause

The process by which children turn experience into knowledge is exactly the same, point for point, as the process by which those whom we call scientists make scientific knowledge.

—John Holt (1964, p. 93)

What you are about to read is absolutely true: As the weight of children increases, so does the number of items that they are likely to get correct on standardized tests of intelligence. In other words, heavier children answer more questions correctly than lighter ones. Before you start stuffing mashed potatoes into your children in an attempt to make them smarter, stop and think about what this means. Does it mean that gaining weight will make children smarter? Certainly not! Children get heavier as they get older, and older children answer more questions correctly than younger ones.

In this example, the variables weight and number of questions answered correctly are related. An increase in one variable is associated with an increase in the other variable—as weight increases the number of questions answered correctly concomitantly increases. Correlated variables are two or more variables that are related. If you’ve already read the previous chapter on analyzing arguments, then you should recognize this concept as the Fallacy of False Cause.

People frequently confuse correlation with causation. Consider the following example: Wally and Bob were arguing about the inheritance of intelligence. Wally thought about everyone he knew and concluded that since smart parents tend to have smart children, and dumb parents tend to have dumb children, intelligence is therefore an inherited characteristic. Bob disagreed with Wally’s line of reasoning, although he concurred with the facts that Wally presented. He agreed that if the parents score high on intelligence tests, then their children will also tend to score high, and if the parents score low on intelligence tests, then their children will also tend to score low. When two measures are related in this way, that is they tend to rise and fall together, they have a positive correlation. Although parents’ intelligence and their children’s intelligence are positively correlated, we cannot infer that parents caused their children (through inheritance or any other means) to be intelligent. It is possible that children affect the intelligence of their parents, or that both are being affected by a third variable that has not been considered. It is possible that diet, economic class, or other life style variables determine intelligence levels, and since parents and children eat similar diets and have the same economic class, they tend to be similar in intelligence. In understanding the relationship between two correlated variables, it is possible that variable A caused the changes in variable B (A→B), or that variable B caused the changes in variable A (B→A), or that both A and B caused changes in each other (A→B and B→A), or that both were caused by a third variable C (C→A and C→B). Of course, it also possible that you found two variables that are related just by chance, and if you repeated the study, they would not be correlated.

Let’s consider a different example. Many people have taken up jogging in the belief that it will help them to lose weight. The two variables in this example are exercise and weight. I have heard people argue that because there are no fat athletes (except perhaps sumo wrestlers), exercise must cause people to be thin. I hope that you can think critically about this claim. It does seem to be true that exercise and weight are correlated. People who tend to exercise a great deal also tend to be thin. This sort of correlation, in which the tendency to be high on one variable (exercise) is associated with the tendency to be low on the other variable (weight) is called a negative correlation. Let’s think about the relationship between exercise and weight. There are several possibilities: (1) It is possible that exercise causes people to be thin; or (2) It is possible that people who are thin tend to exercise more because it is more enjoyable to engage in exercise when you are thin; or (3) It is possible that a third variable, like concern for one’s health, or some inherited trait, is responsible for both the tendency to exercise and the tendency to be thin. Perhaps there are inherited body types that naturally stay thin and also are graced with strong muscles that are well suited for exercise.

If you wanted to test the hypothesis that exercise causes people to lose weight, then you would use the three design stages described earlier. If the subjects who were assigned at random to the exercise group were thinner after the treatment period than those in a no-exercise condition, then you could make a strong causal claim for the benefits of exercise in controlling weight.

Actually, the question of causation is usually complex. It is probably more accurate to use the word “influence” instead of cause because there is usually more than a single variable that affects another variable. Recent research has found that when states decriminalize the use of marijuana, traffic fatalities are reduced (Anderson & Rees, 2011). How can that happen? According to the researchers, alcohol use decreases when marijuana is legally available, and it is the decrease in alcohol consumption that reduces traffic fatalities. In this example, Variable A, decriminalizing marijuana, caused a change in a third variable, alcohol consumption, and it was the third variable that influenced Variable B, the reduction in traffic fatalities.

Image

Dilbert by Scott Adams. Used with permission by Universal Press Syndicate.

Illusory Correlation

An amusing anecdote of attributing cause to events that occur together was presented by Munson (1976):

A farmer was traveling with his wife on a train when he saw a man across the aisle take something out of a bag and begin eating it. “Say, Mister,” he asked, “What’s that thing you’re eating?”

“It’s a banana,” the man said, “Here, try one.”

The farmer took it, peeled it, and just as he swallowed the first bite, the train roared into a tunnel. “Don’t eat any, Maude,” he yelled to his wife. “It’ll make you go blind!” (p. 277)

Do blondes really have more fun? A popular advertisement for hair dye would like you to believe that having blonde hair will cause you to have more fun. Many people believe that they see many blondes having fun; therefore, blondes have more fun than brunettes and redheads. The problem with this sort of observation is that there are many blondes who are not having “more fun” (a term badly in need of an operational definition) than brunettes, but because they are at home or in other places where you are unlikely to see them, they don’t get considered. The term illusory correlation refers to the erroneous belief that two variables are related when, in fact, they are not. Professionals and nonprofessionals alike maintain beliefs about relationships in the world. These beliefs guide the kinds of observations we make and how we determine if a relationship exists between two variables.

Validity

The validity of a measure is usually defined as the extent to which it measures what you want it to measure. If I wanted to measure intelligence and measured the length of your big toe, this would obviously be invalid. Other examples of validity are less obvious. A popular radio commercial touting the benefits of soup points out that tomato soup has more Vitamin A than eggs. This is true, but it is not a valid measure of the goodness of tomato soup. Eggs are not a good source of Vitamin A. Thus, the wrong comparisons were made, and the measure does not support the notion that soup is an excellent food. If you have already read the previous chapter “Analyzing Arguments,” then you should realize that the claim that tomato soup has more Vitamin A than eggs does not support the conclusion that “soup is good food.” It may well be true that soup is an excellent source of vitamins, but claims like this one do not support that conclusion.

Convergent Validity

When several different measures all converge onto the same conclusion, the measures are said to have convergent validity. If, for example, you wanted to measure charisma—the psychological trait that is something more than charm that people as diverse as Beyonc→, Justin Bieber, and Jennifer Lopez are said to possess—you would need convergent validity for your measure. People who scored high on your charisma test should also be the ones who are selected for leadership positions and have other personality traits that are usually associated with charisma. If the class wallflower scored high on your test of charisma, you would need to rethink the validity of your test.

Intuitive scientists also need to be mindful of the need for convergent validity. Before you decide that your classmate, Willa Mae, is shy because she hesitates to talk to you, you need to determine if she acts shy with other people in other places. If she frequently speaks up in class, you would not want to conclude that she is a shy person because this inconsistency in her behavior would signal a lack of convergent validity. The idea of convergent validity is very similar to the topic of convergent argument structures that was presented in the previous chapter. If you have already read the chapter “Analyzing Arguments”, then you should recall that the strength of an argument is increased when many premises support (or converge on) a conclusion. This is exactly the same situation as when several sources of evidence support the same hypothesis. The language used in these two chapters is different (support for a conclusion versus support for a hypothesis), but the underlying ideas are the same: the more reasons or evidence we can provide for believing that something is true, the greater the confidence we can have in our belief.

Illusory Validity

Everyone complains of his memory and no one complains of his judgment.

—François La Rochefoucauld (1613–1680)

Both professionals and nonprofessionals place great confidence in their conclusions about most life events, even when their confidence is objectively unwarranted. Overconfidence in judgments is called illusory validity. In an experimental investigation of this phenomenon, Oskamp (1965) found that as clinicians were given more information about patients, they became more confident in the judgments they made about patients. What is interesting about this result is they were not more accurate in judgment, only more confident that they were right. Why do people place confidence in fallible judgments? There are several reasons why we persist in maintaining confidence in our judgments. A primary factor is the selective nature of memory. Consider this personal vignette: As a child, I would watch Philadelphia Phillies baseball games on television with my father. As each batter would step up to home plate, my father would excitedly yell, “He’s going to hit a home run, I just know it!” Of course, he was usually wrong. (Phillies fans had to be tough in the 60s.) On the rare occasions when a Phillies batter actually did hit a home run, my father would talk about it for weeks. “Yep, I knew as soon as he stepped up to home plate that he would hit a home run. I can always tell just by looking at the batter.” In this instance, and countless others, we selectively remember our successful judgments and forget our unsuccessful ones. This tends to bolster confidence in the judgments we make.

A second reason for the illusion of validity is the failure to seek or consider disconfirming evidence. (See the chapter on decision making for an additional discussion of this phenomenon.) This is the primary reason why people tend to believe that variables are correlated when they are not. Suppose that you have the job of personnel officer in a large corporation. Over a period of a year, you hire 100 new employees for your corporation. How would you go about deciding if you’re making good (valid) hiring decisions? Most people would check on the performance of the 100 new employees. Suppose that you did this and found that 92% of the new employees were performing their jobs in a competent, professional manner. Would this bolster your confidence in your judgments? If you answered yes to this question, you forgot to consider disconfirming evidence. What about the people you didn’t hire? Have most of them gone on to become Vice Presidents at General Motors? If you found that 100% of the people you did not hire are superior employees at your competitor’s corporation, you would have to revise your confidence in your judgmental ability.

Part of the reason that we fail to utilize disconfirming evidence is that it is often not available. Personnel officers do not have information about the employees they did not hire. Similarly, we do not know much about the person we chose not to date, or the course we did not take, or the house we did not buy. Thus, on the basis of partial information, we may conclude that our judgments are better than they objectively are.

In a scathing review of the Rorschach test, commonly known as the inkblot test because subjects are asked to tell what they see in amorphous, symmetrical blots of ink, Dawes (1994) concluded that it is not a valid measure of mental functioning. A recent review of the Rorschach arrived at the same conclusion (Wood, Nezworski, Lilienfeld, & Garbm, 2008). That is, there is no evidence that it is useful in diagnosing or treating mental disorders (although it is possible to determine if someone gives unusual answers). This means that the Rorschach has no validity. Despite these empirical results, Dawes reports that some psychotherapists respond to this fact with, “Yes, I know that it has no validity, but I find it useful.” Do you see why this is a ridiculous statement? If it has no validity, then it cannot be useful. If therapists believe that it is useful, they are fooling themselves and demonstrating the phenomenon of illusory validity. It may seem useful because they interpret the responses in ways that they believe make sense, but its only real value is as a clear demonstration of the biases that we maintain.

Why do some psychologists continue to use projective tests when many reviews of the literature show that these tests are not valid? Most psychologists genuinely want to do a good job, and many believe that the tests are valid despite the lack of data to support that conclusion. They believe these tests are valid because they “feel or look right”—the results confirm what they believe to be true. Few people will cast aside their own sense of what is true for a mass of cold and impersonal data collected by and from people they don’t know. In general, the scientific method is not valued when it conflicts with our personal belief system. Humans are ill-equipped to use data from large samples, perhaps because the only information that was available to us for most of the history of humankind came from personal experience or the second-hand personal experience of those we know. It is important that we use critical thinking skills for conclusions that we like and do not like.

Reliability

Both law and science must weigh the reliability of evidence, the trustworthiness of experts, and the probability that something is true beyond a reasonable doubt.

—K. C. Cole (1995, June 29, p. B2)

The reliability of a measure is the consistency with which it measures what it is supposed to measure. If you used a rubber ruler that could stretch or shrink to measure the top of your desk, you would probably get a different number each time you measured it. Of course, we want our measurements to be reliable.

Researchers in the social and physical sciences devote a great deal of time to the issue of reliable measurement. We say that an intelligence test, for example, is reliable when the same person obtains scores that are in the same general range whenever she takes the test. Few of us even consider reliability when we function as intuitive scientists. When we decide if a professor or student is prejudiced, we often rely on one or two samples of behavior without considering if the individual is being assessed reliably.

Thinking about Errors

To a scientist a theory is something to be tested. He seeks not to defend his beliefs, but to improve them. He is, above everything else, an expert at changing his mind.

—Wendell Johnson, p. 39 (1946)

When we try to understand relationships by devising and testing hypotheses, we will sometimes be wrong. This idea is expanded on more fully in the next chapter, which concerns understanding probabilities. For now, consider this possibility: Suppose that you drive into work every day with a friend. Every morning you stop at a drive-through window and buy coffee. You decide that, instead of hassling every morning with who will pay (“I’ll get it—No, no let me”), he will flip a coin. When the outcome is heads, he will pay; when the outcome is tails, you will pay. Sounds fair enough, but on nine of the last ten days, the coin landed with tails up. Do you think that your friend is cheating?

The truth is that your friend is either cheating or he is not cheating. Unfortunately, you don’t know which is true. Nevertheless, you need to make a decision. You will decide either that he is cheating or he is not cheating. Thus, there are four possibilities: (1) He is cheating, and you correctly decide that he is cheating; (2) He is not cheating, and you correctly decide that he is not cheating; (3) He is cheating, and you incorrectly decide that he is not cheating; and (4) He is not cheating, and you incorrectly decide that he is cheating. With these four possibilities, there are two ways that you can be right and two ways that you can be wrong. These four combinations are shown in Table 6.1.

As you can see from Table 6.1, there are two different ways that we can make errors in any hypothesis-testing situation. These two different errors are not equally “bad.” It is far worse to decide that your friend is cheating when he is not (especially if you accuse him of cheating) than it is to decide that he is not cheating when he is. Thus, you would want stronger evidence to decide that he is cheating than you would want to decide that he is not cheating. In other words, you need to consider the relative “badness” of different errors when testing hypotheses.

If you take a course in statistics or experimental design, you’ll find that the idea of error “badness” is handled by requiring different levels of confidence for different decisions. The need to consider different types of errors is found in many contexts. A basic principle of our legal system is that we have to be very certain that someone has committed a crime (beyond a reasonable doubt) before we can convict her. By contrast, we do not have to be convinced beyond a reasonable doubt that she is innocent because wrongly deciding that someone is innocent is considered a less severe error than wrongly deciding that someone is guilty. Similarly, when you are testing hypotheses informally, you also need to be aware of the severity of different types of errors. Before you decide, for example, that no matter how hard you study, you’ll never pass some course or that the medicine you are taking is or is not making you better, you need to consider the consequences of right and wrong decisions. Some decisions require that you should be more certain about being correct than others.

Table 6.1: Four possible outcomes for the “Who Buys the Coffee” Example

 

You Decide

The truth is

He is cheating

He is not cheating

He is cheating.

He is cheating, and you decide that he is cheating. Correct Decision!

He is cheating, and you decide that he is not cheating. An Error!

He is not cheating.

He is not cheating, and you decide that he is cheating. A Serious Error!

He is not cheating, and you decide that he is not cheating. Correct Decision!

Note. The error associated with deciding that he is cheating is more serious than the error associated with deciding that he is not cheating. Because of the difference in the severity of the errors, you will want to be more certain when deciding that he is cheating than when deciding that he is not cheating.

Experience is an Expensive Teacher

Perhaps one of our greatest strengths is our ability to explain. In one study, participants were confronted with the following puzzle: “If a pilot falls from a plane without a parachute, then the pilot dies. This pilot didn’t die. Why not?” A perfectly valid deduction would be that the pilot must not have fallen out of the plane without his parachute, but many participants relied instead on inductive reasoning to answer the question. They came up with creative explanations, like “The plane was on the ground and he didn’t fall far,” “The pilot fell into deep snow or a deep cushion,” and, Johnson-Laird’s favorite, “The pilot was already dead.” Humans are extraordinarily good at this kind of reasoning—we can explain just about anything.

—Philip Johnson-Laird (2011, para.7)

Suppose that your friend shares her “secret” for curing a cold—she rubs her stomach with garlic and the symptoms of the cold go away. You are dubious, but she persists, “I know it works. I’ve tried it, and I have seen it work with my own eyes.” I am certain that there are many people who would respond to this testimonial by rubbing garlic over their stomach, just as there are many people who willingly swallow capsules filled with ground rhinoceros penis to increase their sexual potency, megavitamins to feel less tired, and ginseng root for whatever ails them. You may even join the ranks of those who tout these solutions because sometimes your cold will get better after rubbing yourself with garlic—sometimes a desired effect follows some action (like taking capsules of ground rhinoceros penis). But, did the action cause the effect that followed it? This question can only be answered using the principles of hypothesis testing. Personal experience cannot provide the answer.

Dawes (1994) corrected the famous expression that we attribute to Benjamin Franklin. It seems that Franklin did not say that “experience is the best teacher,” instead he said “experience is a dear teacher,” with the word “dear” meaning expensive or costly. Sometimes, we are able to use systematic feedback about what works and what does not work so that we can use our experience to improve at some task, but it is also possible to do the same thing over and over without learning from experience. We are far better off using information that is generated by many people to determine causal relationships than to rely on personal experience with all of its biases and costs.

Anecdotes

The recent medical controversy over whether vaccinations cause autism reveals a habit of human cognition—thinking anecdotally comes naturally, whereas thinking scientifically does not.

—Michael Shermer (July 25, 2008, para. 1)

I have a dear friend who is 80 years old and in great health. Guess what—he smoked a pack of cigarettes just about every day for the last 60 years! Yes, this is a true story. Can I conclude that cigarette smoking is (a) good for your health, (b) good for the health of some people, or (c) none of the above? The answer is (drum roll please) none of the above. The problem is that the image of my friend is so vivid and personal to me that it is tempting to let it override the massive research literature and conclusions of just about every credible medical society that smoking is a major cause of cancer, heart disease, and premature death. An isolated anecdote can be found to support almost any point of view, and even though you may already know this, it is very hard to discredit anecdotal evidence (Cox & Cox, 2001).

Maybe using smoking as an example of anecdotal thinking is unfair because of the wide spread antismoking campaigns. So, how about wheat-grass? (You can fill in your own examples here.) You probably know someone who swears about the positive health benefits of wheatgrass. It is easy to find testimonials from people who claim that they were sick and then got better after drinking a juice made from this type of grass. The National Council Against Health Fraud (www.ncahf.org) counters claims that it can “detox” your body and takes the clear position that there is no evidence that it is beneficial. (You can check them out—they are a credible medical authority.)

Several years ago, it was my great honor to present testimony to the United States House of Representatives Committee on Science about using principles from the science of learning as part of educational reform. On the way into the hearing room, a helpful legislative aide coached me about how to provide effective testimony to the Committee on Science. More than once she warned me not to present too many numbers because the Committee Members tended to get bored and confused by data. She advised that a good story works best. I could not believe that I should tell anecdotes in an attempt to persuade the United States House of Representatives Committee on Science that educational methods supported by research findings were more likely to provide beneficial outcomes than those that are not. Wouldn’t such an approach be insulting to the highest elected officials in the United States who are the national guardians of science? I now know the answer to what I had intended to be a rhetorical question. Unfortunately, public policies are often made by anecdotes. We like stories—they make abstract concepts come alive and provide flesh and bones to colorless data. Astute readers will realize that I began this paragraph with an anecdote. A single vivid example can often outweigh a huge body of data collected from a random sample of a population, whereas anecdotes are self-selected, based on a sample size of 1, subject to all of the biases of memory, and likely to be atypical because they would not be told if they represented an expected outcome. It can take some practice, but it is possible to overcome the tendency to prefer anecdotes to the conclusions from carefully controlled research. It is an important critical thinking skill.

The need to develop the habit of thinking like a scientist can be seen in this eloquent quote from Bloom (2012):

Consider science. Plainly, scientists are human and possess the standard slate of biases and prejudices and mindbugs. This is what skeptics emphasize when they say that science is “just another means of knowing” or “just like religion.” But science also includes procedures—such as replicable experiments and open debate—that cultivate the capacity for human reason. Scientists can reject common wisdom, they can be persuaded by data and argument to change their minds. It is through these procedures that we have discovered extraordinary facts about the world.

Self-Fulfilling Prophecies

Science is not simply a collection of facts; it is a discipline of thinking about rational solutions to problems after establishing the basic facts derived from observations. It is hypothesizing from what is known to what might be, and then attempting to test the hypotheses.

—Rosalyn S. Yalow (quoted in Smith, 1998, p. 42)

In a classic set of experiments, Robert Rosenthal, a well known psychologist, and his colleagues (Rosenthal & Fode, 1963) had their students train rats to run through mazes as part of a standard course in experimental psychology. Half of the students were told that they had rats that had been specially bred to be smart at learning their way through mazes, while the other half of the students were told that they had rats that had been specially bred to be dumb at this task. As you probably expected, the students with the bright rats had them out-performing the dull rats in a short period of time. These results are especially interesting because there were no real differences between the two groups of rats. Rosenthal and Fode lied about the rats being specially bred. All of the rats were the usual laboratory variety. They had been assigned at random to either group. If there were no real differences between the groups of rats, how do we explain the fact that students who believed they had been given bright rats had them learn the maze faster than the other group?

The term “self-fulfilling prophecies” has been coined as a label for the tendency to act in ways that will lead us to find what we expected to find. I do not know what the students did to make the rats learn faster in the “bright” group or slower in the “dull” group. Perhaps the bright group was given extra handling or more food in the goal box. (When rats learn to run through mazes, they are rewarded with food when they reach the goal box.) Maybe the students given the “dull” rats dropped them harshly into the maze or were not as accurate in the records that they kept. Whatever they did, they somehow influenced their experimental results so that the results were in accord with their expectations.

If self-fulfilling prophecies can influence how rats run through mazes, what sort of an effect will it have on everyday thinking and behavior? Earlier in this chapter, illusory correlations were discussed as the tendency to believe that events that you are observing are really correlated because you believe that they should be. Psychologists are becoming increasingly aware of the ways that personal convictions direct our selection and interpretation of facts. When you function as an intuitive scientist, it is important to keep in mind the ways we influence the results we obtain.

One way to eliminate the effects of self-fulfilling prophecies is with double blind procedures. Let’s consider a medical example. There are probably 100 home remedies for the common cold. How should we decide which, if any, actually relieve cold symptoms? Probably, somewhere, sometime, someone gave you chicken soup when you had a cold. Undoubtedly, you got better. Almost everyone who gets a cold gets better. The question is, “Did the chicken soup make you better?” This is a difficult question to answer because if you believe that chicken soup makes you better, you may rate the severity of your symptoms as less severe even when there was no real change. This is just another example of self-fulfilling prophecies. The only way to test this hypothesis is to give some people chicken soup and others something that looks and tastes like chicken soup and then have each group rate the severity of their cold symptoms. In this example, all of the subjects are unaware or blind to the nature of the treatment they are receiving. It is important that the experimenters also be unaware of which subjects received the “real” chicken soup so that they do not inadvertently give subtle clues to the subjects. Experiments in which neither the subjects nor the experimenters know who is receiving the treatment are called double blind experiments.

Although the chicken soup example may seem a little far-fetched, the need for double blind procedures is critical in deciding whether any drug or treatment is working. Formal laboratory research on drugs that may be effective against AIDS or cancer always uses double blind procedures. Most people, however, do not apply these same standards when making personal decisions, such as which type of psychotherapy is effective or whether massive doses of a vitamin or advice from a palm reader will improve some aspect of their life. Before you decide to see a therapist who claims to be able to improve your diabetes by manipulating your spine or to engage in screaming therapy to improve your self-confidence, look carefully for double blind studies that support the use of the proposed therapy.

Occult Beliefs and the Paranormal

“Media distortions, social uncertainty, and deficiencies of human reasoning seem to be at the basis of occult beliefs.”

—Barry Singer and Victor Benassi (1981, p. 49)

Do you believe that houses can be haunted, or that the position of the stars and planets can affect people’s lives, or that extraterrestrial beings invaded the earth, or perhaps that witches are real and not just a Halloween fantasy (Lyons, 2005)? If you answered “yes” to any of these, you are not alone. Although a recent Gallup Poll showed that the proportion of the general population in the United States, Canada, and Great Britain who believe in these paranormal phenomena is well below a majority, there is a sizeable number of believers for each of these paranormal (outside of normal) phenomena. One possible explanation for these beliefs is the increase in the number of television shows about these topics, some of which have an almost “news-like” appearance to them, that carry the message that beings from outer space are living among us or that other paranormal phenomena (e.g., ghosts) are real. Often these shows are narrated by people who appear to be honest and sincere. Few people remember the difference between actors and scientists when the actors provide a good imitation of something that looks like science.

How can we understand these beliefs when there is no good evidence that they have any basis in fact (Shermer, 1997; Stanovich & West, 2008)? In our attempt to make sense out of events in the world, we all seek to impose a meaningful explanation, especially for unusual events. Have you thought about a friend whom you have not seen in many years and then received a phone call from him? Did you ever change your usual route home from school or work and then learn that there was a tragic accident that you probably would have been in if you had not changed your route? What about stories of people who recover from a deadly disease after they use imagery as a means of healing? We are all fascinated by these unusual events and try to understand them. Support for paranormal experiences comes from anecdotes. Can you understand how small sample sizes (usually a single example), retrospective review (in hindsight we seek explanations that are available in memory), illusory correlations, self-fulfilling prophecies, difficulty in understanding probabilities, and other cognitive biases contribute to the popularity of paranormal beliefs? It is a fact that there is no positive evidence whatsoever for the existence of any psychic abilities. Remember—no one has collected the million-dollar prize offered by the James Randi Foundation or other similar foundations for credible evidence of psychic or other paranormal phenomena. There are many anecdotes, but there has never been a statistically significant finding of psychic power that has been duplicated in another independent laboratory. “Anecdotes do not make a science” (Shermer, Benjamin, & Randi, 2011). They are, however, powerful in directing what we believe to be true.

There are many real mysteries in the world and much that we do not understand. It is possible that someone has found a strange herbal cure for cancer, or that the lines on the palms of our hand or pattern of tea leaves in our tea cups are indicators of important life events, but if these are “real” phenomena, then they will hold up under the bright lights of double-blind, controlled laboratory testing. We can all laugh at the predictions made by various “psychics,” such as the prediction that Fidel Castro would move to Beverly Hills after his government was overthrown and the many predictions about Princess Diana that included almost everything imaginable except her tragic death (Emery, 2001). We need to become much more skeptical when a friend tells us that crystals have healing powers or vitamin E can be used to revive those who have recently died. This topic is also addressed in the next chapter where I discuss how to reason with probabilities.

Conspiracy Theories

A conspiracy theory is an explanation for something that is based on the idea that there is a secret group that was responsible for that event. Common examples of conspiracy theories are that president Kennedy’s assassination was the result of a covert military operation and that the rapid spread of AIDS was caused by a secret plot of a powerful (and homophobic) group. According to Barkun (2006), conspiracy theories share four characteristics: they (a) challenge the predominant theories, (b) rely on secret knowledge and flimsy evidence, (c) do not entertain doubt or permit rebuttals, and (d) divide the world into black or white categories of good and evil. In the United States, the assassination of President Kennedy was momentous. It is hard to think that something so important was caused by just one or a few people. Similarly, the spread of AIDS has decimated many groups of people. As humans, we search for reasons to understand events, and momentous events seem to require weighty reasons.

Although not quite the same, it is useful to think about the vast array of so-called “cures” for horrible diseases that are touted on the Internet and in other outlets as similar to conspiracy theories. In these cases, the “conspiracy” is some group of physicians or some government agency who is keeping knowledge of the cure from the people who need it. Imagine finding the cure for such heart breaking disorders as autism, and diseases such as cancer and Alzheimer’s. Many people believe that they have. The proponents of these cures claim to have evidence that the medical establishment (note the negative language that suggests an impersonal corporation) will not pay attention to, and of course, the ones claiming to have cures are the “good guys” and anyone who disagrees with them are the “bad guys.” They often argue that the group that is suppressing news of the cure is doing so for financial gain—arguing that doctors who specialize in cancer, for example, would go broke if the rest of us knew about the secret cure. A quick search on Google will reveal many such claims (maybe as many as 100 “cures” for autism), and it is not surprising that desperate families fall for these phony cures. Infomercials, which are commercials that look a lot like regular programming, show miraculous recoveries. If there were a cure for any of these diseases, you would read about it on the front page of every reputable newspaper and medical journal, and not learn about it from someone who is selling the cure on late-night television or a web site that no reputable organization (e.g., reputable organizations have credentials similar to those found in the National Autism Association and National Cancer Association) endorses.

A colleague (Larry Alferink at Illinois State University) suggested several flags for skepticism when reading about these miracle cures, including a high degree of self-promotion (buy my vitamins, DVDs, books, etc.), and vague references to “peer-reviewed journals” that turn out not to have the rigorous review process that highly regarded medical journals have, and a vague reference to published research that cannot be verified. As one medical commentator noted (Burton, 2009, para. 6): “To support his fringe opinion, Dr. [name deleted] has used what I refer to as a cut-and-paste technique; he takes isolated observations out of context to generate a theory not proven or justified by the findings.”

Thinking as an Intuitive Scientist

Most people seem to believe that there is a difference between scientific thinking and everyday thinking. … But, the fact is, these same scientific thinking skills can be used to improve the chances of success in virtually any endeavor.

—George (Pinky) Nelson (1998, April 29, p. A14)

One theme throughout this chapter is that everyday thinking has much in common with the research methods used by scientists when they investigate phenomena in their academic domains. Many of the pitfalls and problems that plague scientific investigations are also common in everyday thought. If you understand and avoid some of these problems, you will be a better consumer of research and a better intuitive scientist.

When you are evaluating the research claims of others or when you are asserting your own claims, there are several questions to keep in mind:

1.  What was the nature of the sample? Was it large enough? Was it biased?

2.  Are the variables operationally defined? What do the terms mean?

3.  Were the measurements sensitive, valid, and reliable? Were the appropriate comparisons made to support the claims?

4.  Were extraneous variables controlled? What are other plausible explanations for the results?

5.  Do the conclusions follow from the observations?

6.  Are correlations being used to support causative arguments?

7.  Is disconfirming evidence being considered?

8.  How could the experimenter’s expectancies be biasing the result?

Let’s apply these guidelines to the choice of treatment programs that was presented at the opening of this chapter. The scenario presented you, as a heroin addict with two choices of treatment programs. The first program is run by former heroin addicts and the second is run by therapists. First, what is the evidence for success rates? Although Program #1 cites a much higher success rate than Program #2, these numbers cannot be used to compare the two programs because Program #1 gives the success for those who stayed with the program at least one year, and we have no information about how many dropped out before achieving the one-year mark. Thus, the success rate for Program #1 is not a valid measure of success. We also have no information about how likely someone is to maintain recovery without treatment. In other words, there is no control group against which to measure the efficacy of treatment. Unfortunately, there is no information about the sample size because we are not told how many patients entered each program. If this were a real decision, then you would ask for this information. So far, there is little to go on.

I have found that most people like the idea that the therapist is a recovered addict who “has been there himself.” The problem with this sort of qualification is that his anecdotes about “what worked for him” may be totally worthless.

Dawes (1994) is highly critical of the sort of reasoning that leads people to believe that a former addict is a good choice for a counselor. As Dawes noted, the thinking that goes into this evaluation is something like this:

The therapist was an addict.

He did X and recovered.

If I do X, then I will also recover.

I hope that you can see that this is very weak evidence. If you have already read the chapter on reasoning, then you will recognize this as a categorical syllogism—one that is invalid. You have a single individual (sample size of one), all the biases of memory, no independent verification that X is useful, the problem of illusory correlation, and more. Of course, this individual could be an excellent therapist, but with the information that you are given, there is no reason to expect it. On the other hand, the therapist who has studied the psychology and biology of addiction should know about different treatment options, theories of addiction, and most importantly, the success rates for a variety of different types of treatments based on results from large samples of addicts. This is an important point. Try posing the question that I used at the opening of this chapter to friends and relatives. You will probably find the bias toward selecting the recovered addict as a therapist.

If you scrutinize your own conclusions and those of others with the principles of hypothesis testing in mind, you should be able to defend yourself against invalid claims and improve your own ability to draw sound conclusions from observations.

Chapter Summary

•  Much of our everyday thinking is like the scientific method of hypothesis testing. We formulate beliefs about the world and collect observations to decide if our beliefs are correct.

•  In the inductive method, we devise hypotheses from our observations. In the deductive method, we collect observations that confirm or disconfirm our hypotheses. Most thinking involves an interplay of these two processes so that we devise hypotheses from experience, make observations, and then, on the basis of our observations, redefine our hypotheses.

•  Operational definitions are precise statements that allow the identification and measurement of variables.

•  Independent variables are used to predict or explain dependent variables. When we formulate hypotheses, we want to know about the effect of the independent variable on the dependent variable(s).

•  When we draw conclusions from observations, it is important to utilize an adequately large sample size because people are variable in the way they respond. Most people are too willing to generalize results obtained from small samples.

•  In order to generalize about a population, the sample needs to be representative of the population. You need to ask if the sample you are using is biased in any way before making generalizations.

•  In determining if one variable (e.g., smoking) causes another variable (e.g., lung cancer) to occur, it is important to be able to isolate and control the causal variables. Strong causal claims require the three-stage experimental design that was described in this chapter.

•  In every day contexts, we often use retrospective techniques to understand what caused an event to occur. This is not a good technique because our memories tend to be selective and malleable and because we have no objective systematic observations of the cause. Prospective techniques that record events when they occur and then see if the hypothesized result follows are better methods for determining cause–effect relationships.

•  Variables that are related so that changes in one variable are associated with changes in the other variable are called correlated variables. Correlations can be positive, as in the relationship between height and weight (taller people tend to weigh more; shorter people tend to weigh less), or negative, as in the relationship between exercise and weight (people who exercise a great deal tend to be thin, and those who exercise little tend to be heavy).

•  A common error is to infer a causative relationship from correlated variables. It is possible that variable A caused variable B, or that variable B caused variable A, or that A and B influenced each other, or that a third variable caused them both.

•  The belief that two variables are correlated when they are not (illusory correlation) is another type of error that is common in human judgment.

•  It is important that you use measurements that are sensitive, valid, and reliable or the conclusions you draw may be incorrect. Few people consider the importance of measurement issues when they draw everyday conclusions about the nature of the world.

•  Although many of our judgments lack validity, people report great confidence in them. This is called illusory validity.

•  Inadvertently, we may act in ways that will lead us to confirm or disconfirm hypotheses according to our expectations. These are called self-fulfilling prophecies.

The following skills to determine whether a conclusion is valid were presented in this chapter. Review each skill and be sure that you understand how and when to use each one.

The skills involved when thinking as an intuitive scientist include:

•  recognizing the need for and using operational definitions

•  explaining the need to isolate and control variables in order to make strong causal claims

•  checking for adequate sample size and unbiased sampling when a generalization is made

•  describing the relationship between any two variables as positive, negative, or unrelated

•  recognizing the limits of correlational reasoning

•  seeking converging validity to increase your confidence in a decision

•  checking for and understanding the need for control groups

•  being aware of the bias in most estimates of variability

•  considering the relative “badness” of different sorts of errors

•  determining how self-fulfilling prophecies could be responsible for experimental results or everyday observations

•  knowing when causal claims can and cannot be made.

Terms to Know

Check your understanding of the concepts presented in this chapter by reviewing their definitions. If you find that you’re having difficulty with any term, be sure to reread the section in which it is discussed.

Hypothesis. A set of beliefs about the nature of the world, usually concerning the relationship between two or more variables.

Hypothesis Testing. The scientific method of collecting observations to confirm or disconfirm beliefs about the relationships between variables.

Inductive Method. A method of formulating hypotheses in which you observe events and then devise a hypothesis about the events you observed.

Deductive Method. A method of testing hypotheses in which you formulate a hypothesis that you believe to be true and then infer consequences from it. Systematic observations are then made to verify if your hypothesis is correct.

Operational Definition. An explicit set of procedures that tell the reader how to recognize and measure the concept in which you are interested.

(A) Variable. A quantifiable characteristic that can take on more than one value (e.g., height, gender, age, race).

Independent Variable. The variable that is selected (or manipulated) by the experimenter who is testing a hypothesis to see if changes in the independent variable will result in changes in the dependent variable. For example, if you want to know if people are more readily persuaded by threats or rational appeals, you could present either a threatening message or a rational appeal to two groups of people (the message type is the independent variable) and then determine how much their attitudes toward the topic have changed (the dependent variable).

Dependent Variable. The variable that is measured in an experiment to determine if its value depends on the independent variable. Compare with independent variable.

Population. For statistical and hypothesis testing purposes, a population is the entire group of people (or animals or entities) in which one is interested and to which one wishes to generalize.

Sample. A subset of a population that is studied in order to make inferences about the population.

Representative Sample. A sample that is similar to the population in important characteristics, such as the proportion of males and females, socioeconomic status, and age.

Generalization. Using the results obtained in a sample to infer that similar results would have been obtained from the population if everyone in the population had been measured. (When used in the context of problem solving, it is a strategy in which the problem is considered as an example of a larger class of problems.)

Biased Sample. A sample that is not representative of the population from which it was drawn.

Confounding. When experimental groups differ in more than one way, it is not possible to separate the effects due to each variable. For example, if you found that teenage girls scored higher on a test of verbal ability than preteen boys, you wouldn’t know if the results were due to sex differences or age differences between the two groups.

Convenience Samples. The use of a group of people who are readily available as participants in an experiment. Such samples may be biased in that they may not be representative of the population from which they were drawn.

Sample Size. The number of people selected for a study.

Subject. A person, animal, or entity who serves as a participant in an experiment.

Variability. Term to denote the fact that people (and animals) differ in the way they respond to experimental stimuli.

Random Sample. A sample in which everyone in a population has an equal chance of being selected.

Law of Small Numbers. The willingness to believe that results obtained from a few subjects can be generalized to the entire population.

Retrospective Research. After an event has occurred, the experimenter looks backward in time to determine its cause.

Prospective Research. A method of conducting research in which possible causative factors of an event are identified before the event occurs. Experimenters then determine if the hypothesized event occurs.

Correlated Variables. Two or more variables that are related. See positive correlation and negative correlation.

Positive Correlation. Two or more variables that are related so that increases in one variable occur concomitantly with increases in the other variable, and decreases in one variable occur with decreases in the other.

Negative Correlation. Two or more variables that are related such that increases in one variable are associated with decreases in the other variable.

Illusory Correlation. The belief that two variables are correlated, when in fact they are uncorrelated.

Validity. The extent to which a measure (e.g., a test) is measuring what you want it to.

Convergent Validity. The use of several different measures or techniques that all suggest the same conclusion.

Illusory Validity. The belief that a measure is valid (measures what you want it to) when, in fact, it is not. This belief causes people to be overconfident in their judgments.

Reliability. The consistency of a measure (e.g., a test) on repeated occasions.

Double-Blind Procedures. An experimental paradigm in which neither the subjects nor the person collecting data know the treatment group to which the subject has been assigned.

Self-Fulfilling Prophecy. The tendency to act in ways that influence experimental results so that we obtain results that are consistent with our expectations.

Conspiracy Theory. An explanation for something that is based on the idea that there is a secret group that is responsible for that event.