12. Don’t Ask, Can’t Tell

How many questionnaire and survey results about people’s beliefs, values, or behavior will you read during your lifetime in newspapers, magazines, and business reports? Thousands, surely. You may even create some of these surveys yourself in order to get information that is important for your business, school, or charitable organization.

Most of us tend to read survey results rather uncritically. “Hmm, dear, I see in the Times that 56 percent of Americans favor tax increases for creating more national parks.” Ditto for questions we create ourselves and the answers our respondents give us.

So far, all the methods I’ve discussed are applicable to pretty much everything—animal, vegetable, or mineral. We can do A/B testing on rats, learn from natural experiments about factors influencing corn yields, and do multiple regression studies of factors associated with water purity. Now I’d like to look at methodological difficulties in measuring specifically human variables. Unlike rats, corn, or water, people can tell you in verbal form (oral or written) about their attitudes, emotions, needs, goals, and behavior. And they can tell you what the causal influences on these variables are. In this chapter you’ll see just how misleading such reports can be, which won’t be surprising given what you read in Part I about our limited accessibility to the factors that influence our behavior. The chapter will show you how a variety of behavioral measures can provide much more trustworthy answers to questions about people’s attributes and states than their verbal reports can.

You’ll also get some tips about experiments you can do on yourself to learn what kinds of things influence your attitudes, behaviors, and physical and emotional health. Correlational evidence about yourself can be just as misleading as correlational evidence about anything else. Experiments on yourself can produce evidence that is accurate and compelling.

Constructing Attitudes on the Fly

The following examples may make you pause before trusting self-reported verbal answers, and may help you consider how best to get useful information about people’s attitudes and beliefs. The examples may also increase your doubts about people’s explanations of the causal influences on their judgments and behavior.

Q.  Suppose I ask you about three positive events in your life and then ask you about your life satisfaction; or I ask you about three negative events and then about your life satisfaction. In which case do you report greater life satisfaction?

A.  Whatever you guessed about the effect of asking about positive versus negative events, I’m sorry to tell you your answer is wrong. It all depends on whether those events I asked you about were in the recent past or happened five or so years ago. Your life seems worse if you’ve just contemplated some lousy things that have happened lately than if you’ve contemplated some good things that have happened lately.1 No surprise there. But the reverse is true if you contemplate events from five years ago. Your life seems good compared to the bad things that happened in your past. And your life seems not so great compared to the wonderful things that used to happen. (This helps to explain the otherwise puzzling fact that for members of the Greatest Generation, life satisfaction is greater the worse their experiences during the Depression had been.)2

Q.  Your cousin from Omaha calls you and asks you how things are going. Is your answer influenced by whether it’s sunny and warm versus cloudy and cold where you are?

A.  Turns out that it depends. If the weather is nice, you’re more likely to say things are going well than if the weather is lousy. Well, of course. But … if your cousin inquires first about the weather in your city today and then asks you how things are going, there is no effect of the weather on your report of how things are going.3 Why? Psychologists say that when prompted to think about the weather, we discount some of our mood as being weather-related and add or subtract happiness points accordingly. In effect: “Life seems to be going pretty well, but probably part of the reason I feel that way is that it’s seventy degrees and sunny out, so I guess things are just so-so.”

Q.  What do you suppose is the correlation between satisfaction with one’s marriage and satisfaction with one’s life as a whole?

A.  This seems like a fairly easy thing to examine. We can ask people about their satisfaction with their lives and then ask them about their satisfaction with their marriages. The higher the correlation between the two, the greater we might assume the impact of marriage satisfaction on life satisfaction to be. That correlation has been examined.4 The correlation is .32, indicating a modestly important effect of marriage satisfaction on life satisfaction as a whole. But suppose we reverse the question order and ask people how satisfied they are with their marriages before we ask how satisfied they are with their lives. Now the correlation is .67, indicating a major effect of marriage quality on life quality. So whether Joe tells us that life is good or just all right depends—and depends heavily—on whether you just asked him how good his marriage is. This phenomenon, and many others discussed in this chapter, shows the effects of verbal priming of the type discussed in Chapter 1 on people’s reports about their attitudes. Other phenomena show the influence of context of the kind discussed in Chapter 2 on reports about attitudes.

The likely reason question order is so important is that asking first about marriage makes it highly salient, so it heavily influences the respondent’s feelings about life overall. If you don’t ask first about marriage, the respondent considers a broader range of things, and that wider set of influences figures into the assessment of life satisfaction. So just how important, really, is marriage quality for life quality? There can be no answer to that question. At any rate, not by asking questions of this kind. If the apparent importance of marriage quality for life quality is so malleable, then we’ve learned little about the reality.

But the truth is, the answer to just about every question concerning attitudes and behavior can be pushed around—often by things that seem utterly fortuitous or silly.

Suppose I ask you how favorable you are toward politicians. But before I do, I point out that the average rating of politicians given by other people is 5 on a scale of 1–6, with higher numbers being more favorable. Or I point out that the average rating of politicians is 2 on that scale. You will rate politicians higher in the first case than in the second. Some of that is due to sheer conformity. You don’t want to seem an oddball. But more interesting, announcing others’ ratings tacitly changes not just your judgment of politicians but your assumptions about the kind of politicians I’m asking about.5 If I tell you most people have a high opinion of politicians, I’ve implied that by “politicians” I mean statesmen on the order of Churchill or Roosevelt. If I tell you that most people have a low opinion of politicians, I have tacitly implied that by “politicians” I mean hacks and chiselers. I’ve literally changed what it is you’re making your judgment about.

What percent of Americans are in favor of the death penalty? In the abstract, a majority. For any given case, a minority. The more details we present about the crime, the criminal, and the circumstances, the less inclined respondents are to be willing to execute the perpetrator.6 Remarkably, that’s true even for the most heinous of crimes, such as a criminal who rapes women and then kills them. The more details you give about the perpetrator’s character and life history, the more reluctant people are to favor the death penalty. This is true even when that information is overwhelmingly negative.

What percent of Americans support abortion? Here I close the blinds and ask, sotto voce, “What do you want it to be?” According to a 2009 Gallup poll, 42 percent of Americans say they are “pro-choice” as opposed to “pro-life.”7 So 42 percent of Americans support abortion. But according to another Gallup poll the same year, 23 percent believe that abortion should be legal in all circumstances and 53 percent believe that abortion should be legal under certain circumstances.8 So 76 percent of Americans support abortion. I have no doubt that we could get that percentage higher still if we asked whether the respondent favors abortion in the case of rape, in the case of incest, or in order to save the life of the mother. If the respondent replies yes to any of those questions, we can record the respondent as favoring abortion. So whether less than half the population supports abortion or a heavy majority supports it is entirely a matter of question wording.

A host of studies by psychologists show that people don’t carry all their attitudes around with them in a mental file drawer. “How do I feel about abortion? Hmm. I’ll check. Let’s see: abortion, attitudes toward. Ah yes, here I have it. I’m moderately opposed.”

Instead, many attitudes are extremely context dependent and constructed on the fly. Change the context and you change the expressed attitude. Sadly, even trivial-seeming circumstances such as question wording, the type and number of answer categories used, and the nature of the preceding questions are among the contextual factors that can profoundly affect people’s reports of their opinions. Even reports about attitudes of high personal or social importance can be quite mutable.

What Makes You Happy?

Verbal reports about attitudes are susceptible to a host of other methodological problems. People lie about some things. Sex. Money. People want to look good in their own eyes and in the eyes of others. This social desirability bias often causes people to accentuate the positive and eliminate the negative. But lies and trying to look good are really the least of our problems in finding out the truth about people’s attitudes and behavior, and why they believe what they believe and do what they do.

At least we’re pretty good at knowing what makes us happy or unhappy. Or are we?

Rank the following factors in order of the degree to which they seem to influence your mood on a given day. Let’s see how accurately you can assess what causes your mood to fluctuate. Rate the importance of the following items on a scale of 1 (very little) to 5 (a great deal).

1. How well your work went

2. Amount of sleep you got the preceding night

3. How good your health is

4. How good the weather is

5. Whether you had any sexual activity

6. Day of the week

7. If you are a woman—stage of the menstrual cycle

No matter what you said, there’s no reason to believe it’s accurate. At any rate, we know that’s the case for Harvard women.9 Psychologists asked students to report at the end of each day, for two months, the quality of their mood. Respondents also reported the day of the week, the amount of sleep they had the night before, what their health status was, whether they had had any sexual activity, what stage of the menstrual cycle they were in, and so forth. At the end of the two months, participants were asked how each of the factors tended to affect their mood.

The participants’ answers to these questions made it possible to find out two things: (1) how much participants thought each factor affected their mood, and (2) how well each factor actually predicted their mood. Did these self-reports reflect the actual correlations between reported factors and reported moods?

As it turned out, participants were not accurate at all. There was zero correlation between a factor’s actual effect on mood (based on the daily ratings) and participants’ beliefs about the degree to which variations in the factor influenced variations in mood. Literally no correspondence at all. If the woman said day of the week was very important, the actual association between day of the week and mood was as likely to be low as high. If the woman said sexual activity was not very important, the actual correlation between sexual activity and mood was as likely to be high as low.

There was an even more embarrassing finding. (Embarrassing to the participants, but also to everybody else, since there’s no reason to assume Harvard women are uniquely lacking in insight into the causes of their mood.) Jane’s self-reports about the relative influence of the factors affecting her mood were no more accurate than her guesses about the effects of those factors on a typical Harvard woman’s mood. In fact, her guesses about the typical student were pretty much the same as her guesses about herself.

Clearly, we have theories about what affects our moods. (Goodness knows where we get them all.) When asked how various things affect our mood, we consult these theories. We’re unable to access the facts, even though it feels as though we can.

I’m tempted to say we don’t know what makes us happy. That goes too far, of course. What we can say is that our beliefs about the relative importance of different events affecting our well-being are poorly calibrated with their actual importance. Of course there’s nothing unique about the factors affecting mood. As you read in Chapter 8 on correlations, detecting correlations of any kind is not one of our strong suits.

The lesson of the Harvard study is a general one. Psychologists find that our reports about the causes of our emotions, attitudes, and behavior can be quite untrustworthy, as was first shown in Part I.

The Relativity of Attitudes and Beliefs

First man: “How’s your wife?”

Second man: “Compared to what?”

—Old vaudeville routine

Test the validity of your opinions about ethnic and national differences by answering the following questions:

Who values being able to choose personal goals more: Chinese or Americans?

Who are more conscientious: Japanese or Italians?

Who are more agreeable: Israelis or Argentineans?

Who are more extroverted: Austrians or Brazilians?

I’m betting you didn’t guess that Chinese value choosing their own goals more than Americans,10 or that the Italians are more conscientious than the Japanese, the Israelis more agreeable than the Argentineans, or the Austrians more extroverted than the Brazilians.11

How do we know these differences exist? People from those countries tell us so themselves.

How could people’s beliefs about their values and personalities differ so much from popular opinion? (And for that matter, from the opinions of academic experts who are highly familiar with each of the cultural pairs above.)

People’s answers about their own values, traits, and attitudes are susceptible to a large number of artifacts. (The word “artifact” has two dimly related meanings. In archaeology, the word refers to an object created by humans, for example, a piece of pottery. In scientific methodology, the word refers to a finding that is erroneous due to some unintended measurement error, often due to intrusive human action.)

In the case of the cultural comparisons above, the discrepancy between people’s self-reports about their characteristics and our beliefs about the characteristics of people of their nationality is due to the reference group effect.12 When you ask me about my values, my personality, or my attitudes, I base my answer in part on a tacit comparison with some group that is salient to me, for example because I’m a member of it. So an American, asked how important it is to be able to choose her own goals, implicitly compares herself to other Americans, and perhaps to other Jewish Americans, and perhaps to other Jewish American females in her college. So compared to other Americans (or Jews, or Jewish females, or Jewish females at Ohio State), choosing her own goals doesn’t seem like all that big a deal to her. The Chinese respondent is comparing himself to other Chinese, or other Chinese males, or other Chinese males at Beijing Normal University—and it may seem to him that he cares more about choosing his own goals than do most people in his reference group.

One reason we know that tacit comparison with a reference group is a big factor in producing these self-reports (Austrians more extroverted than Brazilians, etc.) is that they disappear when you make the reference group explicit. European Americans at Berkeley rate themselves as more conscientious than do Asian Americans at Berkeley, but not when you have both groups compare themselves to the explicit reference group of “typical Asian American Berkeley students.”13

Other things being equal, people in most cultures believe they are superior to most others in their group. This self-enhancement bias is sometimes known as the Lake Wobegon effect, after Garrison Keillor’s mythical town where “all the children are above average.” Seventy percent of American college students rate themselves as above average in leadership ability, and only 2 percent rate themselves below average.14 Virtually everyone self-rates as above average in “ability to get along with others.” In fact, 60 percent say they are in the top 10 percent and 25 percent say they are in the top 1 percent!

Degree of self-enhancement bias differs substantially across cultures and across subgroups within a given culture. No one seems to top Americans in this respect, whereas East Asians often show a contrary effect, namely a modesty bias.15 So any self-assertions concerning issues having a value component (leadership, ability to get along with others) will find Westerners rating themselves higher than East Asians do. Americans will rate themselves as better leaders than Koreans do and Italians will rate themselves as more conscientious than Japanese do.

Many other artifacts find their way into self-reports. These include what’s called acquiescence response set or agreement response bias. This is the tendency to say yes to everything. As you might expect, yea-saying is more common among polite East Asians and Latin Americans than it is among frank Europeans and European Americans. There are also individual differences within a culture in tendency to agree. Fortunately, there’s a way to counteract this: investigators can counterbalance response categories so that half the time respondents get a high score on some dimension—extroversion versus introversion, for example—by agreeing with a statement and half the time by disagreeing with a statement. (“I like to go to large parties” versus “I don’t like to go to large parties.”) This cancels out any bias to agree with statements in general. The counterbalancing correction is well known to all social scientists but is surprisingly often neglected by them.

Talking the Talk Versus Walking the Walk

But is there a better way to compare people, groups, and whole cultures than just by asking them? You bet there is. Behavioral measures, especially those taken when people don’t realize they’re being observed, are much less susceptible to artifacts of all kinds.

Rather than ask people how conscientious they are, you can measure how conscientious they are by examining their grades (or better, their grades controlling for their cognitive ability scores), the neatness of their rooms, how likely they are to be on time for an appointment or a class, and so on. We can also examine the conscientiousness of whole cultures by measuring such proxies for conscientiousness as speed of postal delivery, accuracy of clocks, on-time record of trains and buses, longevity, and number of questions people answer on a lengthy and boring questionnaire. (Incidentally, the correlation between the math scores of different nations and the number of tedious questions they answer on an interminable questionnaire is extremely high.)

Remarkably, it turns out that when we examine behavior to find out how conscientious people of different countries are, we find that the less conscientious a nation is as measured by behavioral indices, the more conscientious its citizens are as measured by self-report!16

When it comes to the measurement of virtually any psychological variable, I follow the maxim that you should trust behavior (including physiological behavior such as heart rate, cortisol output, and the activity of different brain regions) more than responses to concrete scenarios (descriptions of situations followed by measures of expected or preferred outcomes or behaviors by the self or others). In turn, you should trust scenario responses more than verbal reports about beliefs, attitudes, values, or traits.

I wouldn’t wish to have you doubt every verbal report you see in the media or doubt your own ability to construct a questionnaire of any kind. If you want to find out whether your employees would rather have the picnic on a Saturday or a Sunday, you don’t have to worry much about their answers being valid.

But even for expressions of preference, you can’t necessarily trust self-reports. As Steve Jobs said, “It’s not the customers’ job to know what they want.” Henry Ford remarked that if he had asked people what they wanted in the way of transportation, they would have said “faster horses.” And Realtors have an expression: “Buyers are liars.” The client who assures you she must have a ranch house falls in love with a 1920s Tudor. The client who pines after a modern steel and glass edifice ends up with a faux adobe house.

Finding out people’s preferences is a tricky matter for businesses. Even the best thought-out focus group can come a cropper. Henry’s successors at Ford Motor Company had a fondness for focus groups, in which a group of people are quizzed by corporate representatives and by each other; the organizers use the expressed preferences to establish what new goods or services would be likely to succeed. Automotive legend has it that in the mid-1950s, Ford had the idea of removing the center post from a four-door sedan to see whether its sporty appearance would appeal to buyers. The people they gathered for the focus groups thought the idea was a bad one: “Why, it hasn’t got a center post.” “It looks weird.” “I don’t think it would be safe.” General Motors skipped the focus groups and went straight into production with a center-post-free Oldsmobile, calling it a four-door hardtop convertible. It was a huge success. The hardtop experience apparently didn’t cause Ford to rethink how much attention they should pay to focus groups. The company doubled down on them in making their decision to market the 1950s Edsel—the very icon of product failure.17

The take-home lesson of this section: whenever possible don’t listen too much to people talk the talk, watch them walk the walk.

More generally, the chapters in this section constitute a sermon about the need to get the best possible measures of any variable that we care about and find the best possible means to test how it’s related to other variables. In the great chain of investigation strategies, true experiments beat natural experiments, which beat correlational studies (including multiple regression analyses), which, any day, beat assumptions and Man Who statistics. Failure to use the best available scientific methodology can have big costs—for individuals, for institutions, and for nations.

Experiments on Yourself

As shown by the Harvard study of women asked to assess factors influencing their moods, we are in as much trouble detecting correlations in our own lives as in other areas. Fortunately, we can do experiments with ourselves as the subject and get better information about what makes us tick.

What factors make it difficult to fall asleep? Is coffee in the morning helpful for efficiency during the day? Do you get more useful work done in the afternoon if you take a catnap after lunch? Are you more effective if you skip lunch? Does yoga improve well-being? Does the Buddhist practice of “loving-kindness”—visualizing smiling at others, reflecting on their positive qualities and their acts of generosity, and repeating the words “loving-kindness”—bring you peace and relieve you of anger toward others?

A problem with experiments on the self is that you’re dealing with an N of 1. An advantage, however, is that experiments on the self automatically have a within, before/after design, which can improve accuracy because of the reduction in error variance. You can also keep confounding variables to a minimum. If you’re looking to discover the effect of some factor on you, try to keep everything else constant across the study period when you’re comparing presence of the factor versus absence of the factor. That way you can have a fairly good experiment. Don’t take up yoga at the same time as you move from one house to another or break up with your boyfriend. Arrange to start yoga when a proper before/after design is possible. Monitor your physical and emotional well-being, the quality of your relations with others, and your effectiveness at work for a few weeks before taking up yoga, and use the same measures for a few weeks after taking it up. Simple three-point scales provide adequate measures of these things. At the end of the day rate your well-being: (1) not great, (2) okay, (3) very good. Get the mean on each variable for the days before taking up yoga and for the days after. (And hope nothing big happens in your life to muddy the waters.)

Often you can do better than the before/after study. You can take advantage of random assignment to condition. If you try to figure out whether coffee in the morning improves your efficiency, don’t just drink coffee haphazardly. If you do, any number of confounding variables can distort the test results. If you drink coffee only when you feel particularly groggy in the morning, or only on a day when you have to be at the top of your form at work, your data are going to be a mess, and any lesson you’ll think you’ve learned will likely be off the mark. Literally flip a coin as you walk into the kitchen—heads you have coffee, tails you don’t. Then keep track—in writing!—of your efficiency during the day. Use a three-point scale: not very efficient, fairly efficient, very efficient. Then after a couple of weeks do a tally. Calculate the mean effectiveness on days with and without coffee.

The same experimental procedure works for any number of things that are candidates for influencing your well-being or effectiveness. And don’t kid yourself that you can figure these things out without being systematic about random assignment to condition and rigorously keeping track with decent measures of outcomes.

It’s eminently worth doing experiments like this because there are actually big individual differences in things such as the effects of coffee, the degree of benefit from both endurance training and weight training, and whether peak work efficiency is in the morning, afternoon, or evening. What works for Jill or Joe may not work for you.

Summing Up

Verbal reports are susceptible to a huge range of distortions and errors. We have no file drawer in our heads out of which to pull attitudes. Attitude reports are influenced by question wording, by previously asked questions, by “priming” with incidental situational stimuli present at the time the question is asked. Attitudes, in other words, are often constructed on the fly and subject to any number of extraneous influences.

Answers to questions about attitudes are frequently based on tacit comparison with some reference group. If you ask me how conscientious I am, I will tell you how conscientious I am compared to other (absent-minded) professors, my wife, or members of some group who happen to be salient because they were around when you asked me the question.

Reports about the causes of our behavior, as you learned in Chapter 3 and were reminded of in this chapter, are susceptible to a host of errors and incidental influences. They’re frequently best regarded as readouts of theory, innocent of any “facts” uncovered by introspection.

Actions speak louder than words. Behavior is a better guide to understanding people’s attitudes and personalities than are verbal responses.

Conduct experiments on yourself. The same methodologies that psychologists use to study people can be used to study yourself. Casual observation can mislead about what kinds of things influence a given outcome. Deliberate manipulation of something, with condition decided upon randomly, plus systematic recording, can tell you things about yourself with an accuracy unobtainable by simply living your life and casually observing its circumstances.