The Secret Life of Pronouns

Notes

In addition to the extensive comments and feedback from Cindy Chung, Sam Gosling, and Ruth Pennebaker, a number of other people read all or parts of the manuscript, including David Beaver, Molly Ireland, Jeff Hancock, Maureen O’Sullivan, Rich Slatcher, and Yla Tausczik. Thank you all.

PREFACE

ix Estimates of the number of words in the average person’s vocabulary range from thirty thousand to a hundred thousand. Part of the problem is in the definition of a word. For example, do you count singular and plural? Different spellings of the same word? How about a word that you might use that isn’t in the dictionary, such as “fancify” (as in, to make something fancy)? For a discussion of the number of function words, see Chung and Pennebaker (2007).

x One of the most exciting breakthroughs in language analysis is the recent release of the Google Books corpus. In a stunning article by Jean-Baptiste Michel and his colleagues in the journal Science, the authors analyzed the words from over five million books, or 4 percent of all the books that have ever been printed. Focusing on language use over the last two hundred years, the authors were able to examine a wide range of historical trends. For example, how frequently has Freud or Darwin been mentioned? How long does fame last? How has language been evolving?

I urge you to try out Google Labs’ program Books Ngram Viewer at ngrams.googlelabs.com/. The study of history will never be the same.

xi Some of the best popular books on language that combine a rich knowledge of language with basic social and psychological questions include Goffman’s Forms of Talk, Lakoff and Johnson’s Metaphors We Live By, George Miller’s The Science of Words, Pinker’s The Language Instinct, Tannen’s You Just Don’t Understand, and Wierzbicka’s Understanding Cultures Through Their Key Words.

CHAPTER 1: DISCOVERING THE SECRET LIFE OF THE MOST FORGETTABLE WORDS

3 The website Analyzewords.com is one of several experimental sites that we have been developing. This one was the result of a collaboration among Chris Wilson, a writer at Slate.com; my daughter, Teal Pennebaker, who is a world-class expert on social media; and my longtime computer guru and collaborator Roger Booth. The program analyzes the recent history of anyone’s Twitter posts. Using algorithms we have developed over several years of studying language and personality, we are able to estimate personality profiles by language use. As with all experimental systems, please don’t take the results too seriously.

4 The expressive writing research has a rich history. The basic idea is that if people are asked to write about a traumatic experience for fifteen to twenty minutes a day for three or four consecutive days, they later show improvements in physical and mental health compared to others who have been asked to write about superficial topics. To get a better sense of the literature, see my popular book Opening Up as well as articles by Frattaroli (2006) and Pennebaker and Chung (2011). For a list of articles, including many that you can read for free, go to the publications link on my website, www.psy.utexas.edu/pennebaker.

6 For technical information about the development of the LIWC program, see the articles associated with Linguistic Inquiry and Word Count on the publications link of my website, www.psy.utexas.edu/pennebaker. The LIWC program is commercially available from www.liwc.net. All profits from the sales of the program are returned to the University of Texas at Austin graduate program in the Department of Psychology.

LIWC is by no means the only text analysis program around. Dozens of different programs are now available. Programs that are particularly useful for people interested in language and psychology include:

Art Graesser’s Coh-Metrix program (cohmetrix.memphis.edu), which calculates the degree to which any groups of texts are readable and coherent.

Rod Hart’s DICTION program (www.dictionsoftware.com) was designed to capture the verbal tone of messages, with particular focus on political speeches.

Tom Landauer’s Latent Semantic Analysis (lsa.colorado.edu) is a suite of programs that allows people to compare the similarity of any texts with other texts.

Mike Scott’s Wordsmith program (www.lexically.net) is a wonderful all-purpose word analysis tool. Although not tailored for advanced psychological analyses, it is a nice word counting system.

10–12 The findings concerning positive emotions, negative emotions, and change in cognitive words were first published in Pennebaker, Mayne, and Francis in 1997. See also Moore and Brody (2009) and Graham, Glaser, Loving, Malarkey, Stowell, and Kiecolt-Glaser (2009).

12–13 The findings on pronouns and changing perspectives were based on an article by Sherlock Campbell and me, published in 2003.

13 One of the more interesting discoveries about expressive writing was reported in an important paper by Youngsuk Kim. In her expressive writing project, bilingual Korean-English and Spanish-English students wrote either in their native language, second language, or both. Youngsuk had all participants wear a portable tape recorder for two days prior to writing and, again, a month after the writing experiment. Compared to people who did not write, participants who wrote about emotional upheavals spent more time with others, talked more, and laughed more. Although the effects were modest, those who wrote in both their native and learned languages tended to benefit the most.

14 Cheryl Hughes and Martha Francis were the first two students to earn doctoral degrees in psychology at Southern Methodist University. The results of Hughes’s attempts to change people’s thinking by influencing their word choice were the basis of her doctoral dissertation in 1994. More recently, Yi-Tai Seih, Cindy Chung, and I asked people to write in either the first person (as if they were simply describing their experience), second person (as if they were talking into a mirror), or third person (as if they were watching themselves in a movie) in their expressive writing. In another study, people were asked to change perspectives from essay to essay. To our surprise, people reported that writing in any perspective was helpful, as was switching perspectives. Our current thinking is that enforced perspective switching is probably good for some people and not others. Not the kind of groundbreaking conclusion we were looking for.

CHAPTER 2: IGNORING THE CONTENT, CELEBRATING THE STYLE

19 The drawing is from the Thematic Apperception Test by Henry A. Murray, Card 12F, Cambridge, MA, Harvard University Press.

20 Throughout this book, I include quotations from people who have been in my studies or classes, from text on the Internet, or even from conversations or e-mails from friends or family members. In all cases, all identifying information has been removed or altered.

22 In this book, the terms style, function, and stealth words are used interchangeably. They have many other names as well—junk words, particles, and closed-class words. Linguists tend to disagree about the precise definitions of each of these overlapping terms.

26 The table of function word usage rate is based on analyses from our own data representing language samples from over two thousand conversations, two hundred novels, twenty thousand blogs, and thousands of college student and adult writing samples. For more information, see Pennebaker, Chung, Ireland, Gonzales, and Booth (2007).

28 Political campaigns are wonderful to analyze. Presidential candidates are under the spotlight most every day, whether giving speeches, doing interviews, or simply talking with people on the street. Most of their words are transcribed, making it easy for text analysis experts to track language over the course of the campaign. For some analyses of the 2004 presidential election, see an article by Rich Slatcher, Cindy Chung, Lori Stone, and me. You might also be interested in a site that David Beaver (a linguist from the University of Texas), Art Graesser (cognitive scientist, University of Memphis), Jeff Hancock (communications expert, Cornell), and I have created devoted to political issues, www.wordwatchers.wordpress.com.

29 One of the finest books that discuss the roles of the brain and language is George Miller’s The Science of Words. It is the perfect introduction to the topic for someone without a great deal of background in the area.

31 The story of Phineas Gage is discussed in George Miller’s book. Alexander Luria described Pavlov’s experiences with his dogs in his classic 1973 book.

32–34 Knowing what pronouns and other words refer to within a given communication based on previous shared knowledge is called common ground. It is so central to communication that it has been studied across various fields and has been written about extensively in psycholinguistics by Herb Clark as a theory of common ground, in linguistics by David Beaver as presupposition theory, and in the cognitive sciences by Phil McCarthy and his colleagues as givenness/newness.

35–36 Unfortunately, the cross-language research is not covered in any detail in this book. My former student Nairán Ramírez-Esparza is discovering some remarkable changes in the ways people think as they switch from English to Spanish and vice versa. For example, bilingual speakers feel that they are more outgoing when speaking English and more reserved when speaking Spanish even though their behaviors show the opposite. Working with Cindy Chung and others, Nairán finds that the ways people think about a topic in Spanish are subtly different than the ways they think about it in English. See also work by Kashima and Kashima (1998).

I’m pretty certain that there is a legal requirement for anyone who writes about language to mention the Whorf Hypothesis (or, if you are an anthropologist, the Sapir-Whorf Hypothesis). In line with these regulations, the extreme version of the Whorf Hypothesis is that one’s language and vocabulary dictates one’s perception. So, for example, if I didn’t have a word for the color orange, I wouldn’t be able to perceive the color. Due to an interesting historical movement in psychology and linguistics, the Whorf Hypothesis was roundly dismissed, beginning in the 1960s. New and more interesting versions of the hypothesis keep returning. The research of Nairán, Lera Boroditsky, and many others is now showing that, in some cases, our language can direct our attention, thinking, and memory.

CHAPTER 3: THE WORDS OF SEX, AGE, AND POWER

40–44 The best summary of our research on sex differences and language is a paper by Matt Newman, Carla Groom, Lori Handelman, and me published in 2008.

41 When people sit in front of a mirror and complete a questionnaire, they use more words like I and me than when the mirror is not present (Davis and Brock, 1975). The nature of self-focus among women has been demonstrated in a series of studies by Barbara Frederickson, Tomi-Ann Roberts, and their colleagues. An interesting take on people’s stereotypes of the ways pronouns work has been shown by Wendi Gardner and her colleagues.

45 Ask any group of people in any culture and they will agree that women talk far more than men. A group of my former students Matthias Mehl, Simine Vazire, Nairán Ramírez, Rich Slatcher, and I ran six experiments in the United States and Mexico where we asked almost four hundred college students to wear a digital tape recorder for two to four days. Everything they said was later transcribed. Surprisingly, across all studies men and women spoke almost exactly the same number of words per day. It is very rare for a study that found absolutely nothing to be published, but the premier journal Science did so in 2007.

46–47 The blog project where we identified authors was conducted by Shlomo Argamon, Moshe Kopel, Jonathan Schler, and me in papers that came out in 2007 and 2009.

49–51 The script paper by Molly Ireland and me is currently under review.

57–60 In addition to the Pennebaker, Groom, Loew, and Dabbs article, see recent work by Jon Maner’s lab as well as by Robert Josephs and his colleagues dealing with testosterone and behavior.

60–63 As we get older, our personalities change. Richard Robins and his colleagues have shown higher self-esteem, John Loehlin and Nick Martin reported increases in emotional stability, and Seider et al. (2009) found greater use of relational pronouns with increasing age. Much of this section is based on an article by Pennebaker and Stone (2003).

67 In the last twenty years, research on social class and physical health has ballooned within the United States. Some excellent articles on the topic include papers by Edith Chen, Karen Matthews, and Thomas Boyce as well as a classic by Nancy Adler and her colleagues.

68–70 There have been virtually no scientifically sound studies tracking natural language use in the home among people of differing social classes. The Hart and Risley study is promising, as is an earlier one by Brandis and Henderson in 1970.

CHAPTER 4: PERSONALITY: FINDING THE PERSON WITHIN

76–82 Most of the initial work on personality and language was the result of a collaboration with Laura King. You can see the actual analyses by reading the Pennebaker and King (1999) paper. A number of people have expanded on this work, especially in how language is related to self-reported personality. Two great projects have been published, one by Francois Mairesse and colleagues and another by Tal Yarkoni. Psychologists Lisa Fast and David Funder, as well as Matthias Mehl, Sam Gosling, and I, have published on word use and personality. See also the creative work of Jon Oberlander, Alastair Gill, and Scott Nowson.

85–90 Cindy Chung’s insight into the Meaning Extraction Method, or MEM, is now being used by labs around the world. It essentially allows researchers to extract themes from text automatically. One of the slickest applications is in pulling out themes written in different languages. For example, we can have the computer analyze thousands of blogs in, say, Finnish. Using the MEM, we can pull out several themes that are popular in Finnish blogs. All we have to do at the end of the project is to find someone who speaks Finnish so that he or she can tell us what we have discovered. The first MEM paper was by Chung and Pennebaker (2008).

97 Some of the criticisms of the TAT mirror those about the Rorschach. For example, the traditional judge-based method of grading essays is subjective. Two people could look at the same writing sample and come away with very different interpretations. Another problem is that when you ask the same person to write a story about the same picture on two occasions, they usually think up a new story to tell, often with different themes. Traditional personality researchers want tests that provide the same results on different occasions. Unfortunately, people are too inventive—they like to create new stories, especially if they have to tell them to the same people.

99 The bin Laden project was published by Pennebaker and Chung (2008). Cindy Chung, Art Graesser, Jeff Hancock, David Beaver, and I have all been immersed in a broader endeavor to learn how leaders change in their rhetoric over time. Hitler, Mao, Castro, as well as less extreme western leaders share changes in language use prior to going to wars, after being attacked, and as their regimes begin to fail.

100 To learn more about other text analysis methods, refer back to the notes for page 6 of the book.

CHAPTER 5: EMOTION DETECTION

106 Many people have argued that emotions affect the ways we think. Barbara Fredrickson, for example, has proposed a broaden-and-build model of positive emotions. Her studies suggest that when people are happy, they are able to pull back, see a broader perspective, and build new ideas. Thomas Borkovec has found that people who are sad or depressed tend to focus their negative feelings on past events, whereas people who are anxious are more worried about future events. Recently, Charles Carver and Eddie Harmon-Jones have provided compelling evidence to suggest that the emotion of anger activates brain areas associated with approach—much like positive emotions. Feelings of sadness or anxiety are linked to brain areas typically associated with avoidance.

108–110 The links between depression and self-focused attention have been independently discovered by several labs over the last thirty years. Walter Weintraub, one of the founding fathers of word analysis, was the first to report it in his 1981 book on verbal behavior. Tom Pysczynski and Jeff Greenberg developed a general theory of self-focus as a cause of depression. In addition to the suicidal poet project by Shannon Stirman and me published in 2001, a paper by Rude, Gortner, and Pennebaker (2004) found depressed college students used more I-words in their essays about their college experience than non-depressed students. See also recent work by Kaufman and Sexton on self-focus among suicidal poets.

110–114 The Giuliani project reference is Pennebaker and Lay (2002).

115 Dan Wegner’s research on thought suppression is ingenious on several levels. Drawing on Tolstoy’s childhood experience, Wegner asked students to not think of a white bear for as little as a minute or two. In doing so, he found that they actually started thinking of bears at high rates. More striking, when students were told that they no longer had to suppress their white bear thoughts, many students reported thinking of white bears at even higher rates. Wegner’s early studies led him to develop a series of important theories about how the mind monitors information.

117–121 A number of investigators are now using blogs and other social media to track the emotional states of large communities of people. For example, Peter Dodds and Christopher Danforth have done this to track happiness, as has Adam Kramer; Bob Kraut, a pioneer of Internet research in psychology, has examined bulletin message boards to track emotional support in online communities; Elizabeth Lyons, Matthias Mehl, and I have analyzed blogs by pro-anorexics to assess their emotional profiles; Markus Wolf, Cindy Chung, and Hans Kordy have analyzed e-mails by psychotherapy patients to their therapists.

122 Over the years, my students and I have studied several large upheavals. Most of this work has attempted to understand how large-scale emotional events unfold over time. To read more about them, see the papers on the Texas A&M bonfire disaster (Gortner and Pennebaker, 2003); the death of Princess Diana (Stone and Pennebaker, 2002); the Loma Prieta earthquake and the Persian Gulf War (Pennebaker and Harber, 1993); and September 11, 2001 (Cohn, Mehl, and Pennebaker, 2004; Mehl and Pennebaker, 2003).

122–123 My colleague Darren Newtson and I traveled around Oregon and Washington for about a month after Mount St. Helens erupted. In towns that had experienced a great deal of damage, we randomly interviewed people by going door-to-door and by making random phone calls. One of the more interesting findings was that the more damage that had been done to a community, the more the residents wanted to talk. Those communities that were close to the volcano but that escaped damage were the most suspicious of outsiders and were least likely to agree to be interviewed. See my earlier book Opening Up for a more detailed account.

123 A particularly promising development in social psychology concerns “affective forecasting.” According to its founders, Dan Gilbert, Tim Wilson, and others, people are remarkably bad at guessing how they will react to a future event. In general, people predict that they will be more distressed about a traumatic experience than they actually would be.

124 Societies frequently adopt well-intentioned interventions that don’t work. In addition to CISD, abstinence-only sex education and D.A.R.E. have generally caused more problems than they solved. See Timothy D. Wilson’s book Redirect.

125 Cindy Chung’s dissertation involved tracking 186 bloggers on a blog community devoted to dieting for one year.

126 Are secrets always toxic? Actually, no. In fact, secrets may be bad but telling others can sometimes be worse. For a wonderful summary of the psychology of secrets, see Anita Kelly’s book and articles.

126–127 The story of Laura was first described in my book Opening Up.

127 People have a powerful urge to talk about emotional events. Bernard Rimé and his colleagues find that people across all cultures typically confide in others for virtually all types of emotional events. One of my favorite findings is that if you have an emotional experience and you tell someone about it and you ask your friend to keep it secret, your friend will tell two or three others about it. Even hearing about a secret is an emotional event that the listener needs to share.

CHAPTER 6: LYING WORDS

131 The research on the physiology of confession involved people talking about the most traumatic experience of their lives into a tape recorder. All potential identifying information from the story has been changed. From an article by Pennebaker, Hughes, and O’Heeron (1987).

134 For anyone seeking a discussion of self-deception in English literature, I recommend the writer and scholar Jim Magnuson. His wisdom on this and other topics has been invaluable to me in our discussions over the years.

140 Somewhere between self-deception, deception, and marketing is the concept of “spin.” In the political world, spin is defined as the art of glossing over the truth. Indeed, after an important debate or speech, political operatives will often meet with members of the press to spin the speech of their own candidate to make it sound even better than it was and spin the opponent’s words to sound worse. In fact, a cutting-edge computational linguist at Queen’s University in Canada, David Skillicorn, has developed a model to identify spin based on linguistic features associated with deception (research.cs.queensu.ca/home/skill/election/election.html).

140–141 In his Introductory Lectures on Psychoanalysis, Freud devotes a surprising amount of time to slips of the tongue, or, as he refers to them, parapraxes. In retrospect, you can understand why he used parapraxes as a way to introduce his general theory. Everyday slips of the tongue are common and reveal certain truths about what people are really thinking.

142–143 I’m indebted to Melanie Greenberg for allowing me to reanalyze her language data.

144–146 A fascinating account of the Stephen Glass case is available at www.rickmcginnis.com/articles/Glassindex.htm. For the analyses, I only examined thirty-nine of his forty-one stories (one was co-written by someone else and the other was made up exclusively of published quotes).

Somehow relevant to this discussion is a wonderful quote by author Mary McCarthy. In 1979, she was interviewed by the television host Dick Cavett about the author Lillian Hellman, who had a reputation for fabricating some of her stories. When asked about Hellman’s work, McCarthy snorted, “Every word she writes is a lie, including and and the.” Hellman subsequently filed a defamation suit against McCarthy, which was dropped after Hellman’s death in 1984.

149 There are several other fascinating papers that are worth mentioning in the literature on high-stakes, real-world deception detection using computerized text analysis. For example, David Skillicorn and his colleagues, as well as Max Louwerse, Gun Semin, and their colleagues, along with other labs around the world have examined the e-mails between Enron employees during the decade leading up to the company’s bankruptcy in 2001 due to its fraudulent accounting activities. In addition, David Larcker and Anastasia Zakolyukina used computerized text analysis to classify deceptive versus truthful chief executives in their quarterly earnings conference calls.

152–153 The project by Denise Huddle and me has not yet been submitted for publication. One additional finding is particularly noteworthy. Recall that I-words signal innocence. Closer inspection indicated that it wasn’t all I-words. In fact, the more that defendants used the actual word I (and I’ll, I’m, I’d, etc.), the more likely they were to be innocent. In fact, use of the word me was used more by the truly guilty.

155 The dating project is by Toma, Hancock, and Ellison (2008).

165 Discrepancy or modal verbs identify words that make a distinction between an ideal and a real state. “I should be eating vegetables” indicates that I’m not eating them (reality) but the ideal person would be. Columbia University psychologist Tory Higgins has developed an elaborate theory around the self-discrepancy idea that has implications for goals, motivations, and mental health. The ideal-real discrepancy is also inherent in Robert Wicklund’s work on self-awareness.

167 I love performatives. Not only are they psychologically interesting but they are at the center of some eye-opening debates in philosophy. Check out the work of the philosopher John Searle and also John L. Austin.

168 Although not discussed here, another verbal feature of deception is the use of words such as um or er. A recent article by Joanne Arciuli and her colleagues is worth reading. Also, see Michael Erard’s book Um.

CHAPTER 7. THE LANGUAGE OF STATUS, POWER, AND LEADERSHIP

176 Recall that pronouns tell us where people are paying attention. Many of the pronoun effects we see with humans match the gaze finding among nonhuman primates. For a delightful analysis of status hierarchies in chimpanzees, see the work of Frans De Waal. In humans, visual cues to dominance have been discussed by Jack Dovidio and his colleagues.

188–189 John Dean, personal communications, August 30, 2002.

190 I’m indebted to Ethan Burris and his colleagues for allowing us to reanalyze their data for this project.

192 The leadership literature has grown increasingly complex. To get a flavor of some of the directions, see the pioneering work of Fred Fiedler. A summary of research dealing with leadership among women and men is best captured by Alice Eagly and her colleagues. David Waldman and his group have done a nice job of examining leadership attributes.

194 Another example of language shifts across languages and cultures was discovered by Doug Sofer as part of his dissertation. He found that the use of the first-person singular in letters to the presidents of the South American country Colombia between 1944 and 1958 differed by social class. As you might guess, the lower the social class of the author, the higher their use of I-words. See also the seminal work of Howard Giles on accommodation theory, a framework for understanding when people shift in their language to match their interaction partners.

194 Thanks to George Theodoridis for his observations on Ancient Greek language (personal communication, November 6, 2009).

195 Note that there are other language dimensions that we find to be linked to status. Although the effects are modest, people with lower status tend to use the following word categories more: negations, impersonal pronouns, tentative words, swear words.

CHAPTER 8: THE LANGUAGE OF LOVE

197–198 The IM interaction was part of the Slatcher and Pennebaker (2006) project. The on-air fight between Elisabeth Hasselbeck and Rosie O’Donnell that follows had been brewing for several months. O’Donnell had already announced that she was leaving the show in three weeks. The overall interaction between the two women evidenced a language style matching (or LSM) score of .94—which is exceedingly high.

202–204 The research on the mirror neuron system continues to be controversial. There is an increasing number of studies that demonstrate highly specialized brain activity in Broca’s area that reflects behavioral mimicking. The primary objection about the mirror neuron approach is that no consistent theory or model explains how it works or how it is related to cognitive activity. Particularly interesting papers are available by Rizzolatti and Craighero (2004), Kimberly Montgomery and her colleagues (2009), and Kotz et al. (2010).

207 The LSM project dealing with liars was conducted by Hancock, Curry, Goorha, and Woodworth (2008). For a report on additional text analyses of the same data, see Duran, Hall, McCarthy, and McNamara (2010).

208 Despite the intuitive appeal of multitasking, the evidence is clear that it is not an effective technique to accomplish even moderately complex tasks. A particularly convincing case of the downside of multitasking has been published by Ophir, Nass, and Wagner (2009).

211 The speed-dating project had a complicated history. Paul Eastwick, a faculty member at Texas A&M, visited our department in the spring of 2010 to describe some speed-dating research he had been doing with a colleague of his, Eli Finkel, who is at Northwestern. Molly Ireland was fascinated by his talk and asked if he would be interested in applying the LSM methodology to the speed-dating transcripts. Within a few days, Molly’s analyses yielded the remarkable finding that LSM in speed-dating conversations was a powerful predictor of later dates. Molly then added the speed-dating analyses to Richard Slatcher’s IM project (see p. 212) and, in record time, submitted the paper to a top journal, where it was accepted and published. The resulting paper is Ireland, Slatcher, Eastwick, Scissors, Finkel, and Pennebaker (2011).

212–215 The IM project was initially published as Slatcher and Pennebaker (2006) and then, with the reanalyses of the data, as Ireland, Slatcher, et al. (2011).

216 John Gottman’s research on relationships has a number of practical applications for making good marriages. In addition to his books and articles, New York Times writer Tara Parker-Pope has written a balanced book on marriage and relationships that relies on some of the most recent research.

218–223 The analyses of Elizabeth Barrett and Robert Browning, Sylvia Plath and Ted Hughes, and Sigmund Freud and Carl Jung were part of a paper published by Molly Ireland and me in 2010.

CHAPTER 9: SEEING GROUPS, COMPANIES, AND COMMUNITIES THROUGH THEIR WORDS

228 Several studies have tracked language use and its relationship with successful marriages. Not surprisingly, use of pronouns, especially we-words, between the couples is a reliable predictor. See the work of Seider and colleagues (2009) and of Rachel Simmons, Peter Gordon, and Diane Chambless (2005).

229 The project linking pronoun use among couples and heart failure was conducted by Rohrbaugh and colleagues.

The Sexton and Helmreich project focused only on flight simulation studies. Later analyses by Brian Sexton found links between low we-word use and human error in the cockpit voice recordings of planes that had crashed (personal communication, April 20, 2010). See also the work of Foushee and Helmreich.

232 One of the more interesting approaches to studying natural interactions was pioneered by Bill Ickes, a social psychologist at the University of Texas at Arlington. In a typical study, pairs of students are instructed to visit Ickes’s research lab to participate in a conversation. After both complete questionnaires and a consent form to be videotaped, the experimenter tries to begin filming and then “discovers” that his camera is broken. The experimenter leaves the lab, claiming he’s going to find a technician. The students remain in the lab and usually begin talking with one another.

What they don’t know is that another hidden camera is taping their interaction. Later, the students are told about the hidden camera and are asked to rate their interaction on a minute-by-minute basis. Ickes is able to see how the two people were thinking about each other as their conversation unfolded. Bill has kindly allowed us to analyze some of his interactions. I strongly recommend his recent book, Strangers in a Strange Lab.

And while we are talking about real-world approaches to studying the behavior of people, I insist that you check out the work of Sandy Pentland and Roz Piccard, who are at MIT’s Media Laboratory. Together and separately, the two have devised a striking number of methods that track how people see and emotionally react to their worlds as they go about daily life.

232–235 One way to think about the increase in we-words over time is that the longer people talk with others, the more their identities become fused. Bill Swann and his colleagues have been conducting a number of imaginative projects tracking identity fusion. For example, making people more aware of their own group increases the likelihood that they will endorse fighting and dying for it.

233 The national defense project was run by Andrew Scholand, Yla Tausczik, and me and funded by Sandia National Laboratory. The research tracking twenty professional therapists over three years was conducted by Susan Odom and Stephanie Rude. The findings are reported in Odom’s dissertation, which was completed in 2006.

234–235 Drops in suicide rates following terrorist attacks have been reported by Emad Salib and his colleagues. Additional findings about language and psychological changes following the subway bombings in Madrid in 2004 have been reported by Itziar Fernandez, Dario Paez, and me. The language changes in written essays among New Orleans residents after Hurricane Katrina were collected by Sandy Hartman.

238–239 A former graduate student of mine, Amy Gonzales, conducted a complex laboratory experiment where groups of students had to work together either in face-to-face groups or in online groups. The details are reported in Gonzales, Hancock, and Pennebaker (2010). A second project, which was described earlier, was run with business school students by Ethan Burris and his colleagues. The two lab studies are consistent with some fascinating real-world projects conducted by Paul Taylor and his colleagues. For example, Taylor found higher LSM levels in the transcripts of successful hostage negotiations between police and hostage-takers in the UK relative to unsuccessful hostage negotiations.

240–243 The Craigslist project is part of a larger study focusing on measures of community cohesiveness. The primary team members include Cindy Chung, Yla Tausczik, and me. We are indebted to Mark Hayward for his help in providing the relevant Gini statistics.

243–247 The word-catching research is based on an archive of tape recordings I have collected between 1990 and 2010. They include the anlayses of 1,162 conversational files of people in the real world having natural conversations. Discriminant analyses (for you statistics fans out there) show that cross-validation classifications are accurate at 80 to 84 percent for anywhere from five to seven settings, where 16 to 20 percent is chance.

248 One of my favorite language maps tracks the usage of the words pop, soda, and Coke as generic names for soft drinks. Check out www.popvssoda.com.

248–253 One of the giants in the world of sociolinguistics is William Labov from the University of Pennsylvania. Labov has pioneered ways to track how word usage and accents change across regions and time. Some of his early work, for example, examined language differences within blocks and neighborhoods of large cities. Later, he began to focus on much broader trends across the United States.

Due in large part to Labov’s influence, the University of Pennsylvania has taken an important lead in advancing our knowledge of social communication and language use. It houses the Linguistic Data Consortium, or LDC (www.ldc.upenn.edu), which houses one of the largest text archives in the world. In addition, Mark Liberman—a particularly thoughtful linguist—has created Language Log, a highly influential blog site (languagelog.ldc.upenn.edu).

249–251 The This I Believe project has been growing in multiple directions. Cindy Chung, Jason Rentfrow, and I have been developing detailed maps of language use across the United States based on both function words and content words.

251–252 A particularly hot approach to text analysis examines how people use emotion words in their blogs, tweets, or other communications. Although sentiment analysis focuses only on people’s use of positive and negative emotion words, it can provide a general overview of the happiness of cities, regions, or entire countries. For a discussion, see the work of Adam Kramer, Jason Rentfrow, and also Alex Wright’s article in the New York Times. Also, check out a truly wonderful book by Eric Weiner, The Geography of Bliss, on one man’s attempt to understand why some countries are happier than others.

252 In deducing the linguistic fingerprint of the Texas high schools, discriminant analyses showed that we could accurately classify students at a 19 to 20 percent rate, where 11 percent was chance.

CHAPTER 10: WORD SLEUTHING

258–261 Matching blog entries to specific authors can be done in a number of ways. In the chapter, we try to match blogs written today with those written many years ago by the same authors. This is much harder than matching blogs written by authors at about the same time. In fact, think back to the example of the twenty bloggers. Imagine we have, say, ten blog entries on consecutive days from each of the twenty people. We pull out one of the ten entries for each person and put this into a separate stack. The goal is to match the twenty “orphan” entries with the twenty bloggers by reading the nine blog entries of known authorship. Our computer does a much better job at guessing which orphan entry goes with which blogger. The overall hit rate is closer to 58 percent (where 5 percent is chance).

262–265 In addition to the work of Adair and of Mosteller and Wallace dealing with the Federalist Papers, be sure to see recent articles by Patric Juola (2006) and by Jeff Collins and his colleagues (2004).

265 Pardon me for a minute while I have a little chat with the twenty people on Earth who really, really want to know the methods for analyzing the Federalist Papers. The cross-validation approach is based on discriminant analyses assuming equal group size. The original function-word assignment method, which assigned all unknown texts to Madison, correctly classifed 92.4 percent of the original essays and 86.4 percent for cross-validation. The numbers for function words plus punctuation were 98.5 percent and 84.8 percent. Analyses based on the fourteen “tell” words used a binary procedure (was the word used or not within an essay) and yielded both classification and cross-validation accuracies of 98.5 percent. The one assignment error was for essay forty-one, which is attributed to Madison. The tell-word analyses estimated that Hamilton was the author of 49, 52 through 57, and 63, and that Madison was the author of 50, 51, and 62.

Whereas Hamilton claimed credit for all eleven of the unknown manuscripts, he reported that three additional ones were jointly written by Madison and himself. Madison’s later recollection was that he (Madison) had written them with some supplemental comments by Hamilton. All linguistic analyses show that the jointly written papers were completely different from either Hamilton’s or Madison’s solo-authored pamphlets. Given this, I tend to side with Hamilton’s accounts of the authorship issue rather than with Madison’s.

265–267 A recent project by Terry Pettijohn and Donald Sacco (2009) analyzed the lyrics of number one Billboard songs between 1955 and 2003. They discovered that during economic downturns, people preferred lyrics that were more complex, social, and future oriented.

268 There are several ways to determine if collaborations result in average or synergistic language use. Consider how John Lennon and Paul McCartney used present-tense verbs in their lyrics. For their individually written songs, Lennon consistently used more than McCartney (15.8 percent versus 13.7 percent). According to the average-person hypothesis, their collaboration should have resulted in songs that ranged between 13.7 and 15.8 percent present-tense verbs. In fact, the Lennon-McCartney eyeball-to-eyeball collaborations resulted in songs with 17.6 percent present-tense verbs. In this case, Lennon was somewhere between McCartney and Lennon-McCartney—the average writer. We can calculate the percentage of time that Lennon, McCartney, and Lennon-McCartney produced songs that were in the middle of the other two linguistically. The author who was statistically the average person for the Beatles was: 50.6 percent for Lennon, 36.1 percent for McCartney, and 13.3 percent for Lennon-McCartney. The statistically average author for the Federalist Papers was: 39.5 percent for Hamilton, 53.9 percent for Madison, and 6.6 percent for Hamilton-Madison. In other words, when collaborating Lennon-McCartney and Hamilton-Madison were far more extreme than either author on his own.

270 N-gram analyses have been used to characterize authors. For example, Art Graesser and his colleagues have also developed speech-act classifiers that assess the first three words in a sentence to determine what type of sentence is being uttered (e.g., “Are you here?” “Here you are!” “You are here.”). Their speech-act classifier can be used to determine the relative status of two interactants.

270–271 Another way to think about language use is to listen to how presidents create stories about themselves. Dan McAdams has spent much of his career analyzing the stories people tell to get a better sense of their personality. His most recent work is a fascinating analysis of George W. Bush.

272 Perhaps the best source for presidential documents is through the American Presidency Project, directed by Gerhard Peters at the University of California at Santa Barbara. Peters and his collaborators are bringing together one of the largest archives of presidential documents, including speeches, interviews, press conferences, and much more. For more information, go to www.presidency.ucsb.edu/.

273 The figure is based on summing the standardized scores (z-scores) for personal pronouns and total emotion word use. To make all the numbers positive values, a constant of 3.0 was added to the resultant z-scores.

274 Although Franklin Roosevelt’s press conferences have been transcribed, they have also been heavily edited. FDR had arrangements with press members so that large blocks would be off the record. In terms of social-emotional language, his was the lowest of any modern president. However, because his language records are so heavily edited, they have not been included in the press conference corpus.

275 Bosch quote from the 2000 program notes of the PBS documentary series Reagan. www.pbs.org/wgbh/amex/reagan/filmmore/description.html

275–277 The Obama missing-I case was originally reported on Mark Liberman’s blog, the Language Log, at http://languagelog.ldc.upenn.edu/nll/?p=1651. The I-word press conference data includes thirty-five press conferences or meetings of Obama from his inauguration in January 2009 through May 2010. Note that Mark Liberman, the founder of the Language Log blog, reported comparable findings in his analysis of Obama’s speeches.

276 The quotations about Obama’s use of I-words were written by George Will in the Washington Post, June 7, 2009, and by Stanley Fish in the New York Times, June 7, 2009.

281 An increasing number of researchers are trying to determine if it is possible to predict terrorism, extremism, and violent behavior through language analysis. Allison Smith, who now works for the Department of Homeland Security, has analyzed both violent and nonviolent extremist groups around the world and found that the ways they express themselves are quite different. For example, those groups that make the most references to in-group affiliations and power or dominance are the ones most likely to engage in violent behaviors.

283 On the southeast tip of Australia is Tasmania, an Australian island the size of England. One of Tasmania’s first and, arguably, most famous explorers was Henry Hellyer. To read an interesting account of how and why he died, see a recent analysis of his letters and journals by Jenna Baddeley, Gwyneth R. Daniel, and me.

284–287 The research on the admissions essays comes with several caveats. In general, traditional academic markers such as college board scores and high school rank are correlated with college grades at levels higher than our language measures (multiple Rs for traditional measures = .435; for language alone = .233; for both = .455, based on an N of 23,794).

APPENDIX: A HANDY GUIDE FOR SPOTTING AND INTERPRETING FUNCTION WORDS IN THE WILD

292 I’m indebted to Claude M. Chemtob for providing a large number of transcripts from people suffering from PTSD.