Chapter Ten

Foundations of Indirect Assessment

Daniel J. Neller1

The uselessness of interviews is matched only by the unwavering confidence that most of us put in them.

—Samuel D. Gosling

Psychologists across the globe routinely assess individuals whom they have never personally interviewed. Some describe these assessments as having been conducted “at-a-distance” or “remotely” (e.g., Post, 1979, 2003a). Others describe them as having been conducted “indirectly” (e.g., Williams, Picano, Roland, & Bartone, 2012), the term used in this chapter. Common features distinguish indirect assessments from more traditional, “direct” assessments. Among other features, indirect assessments often:

  1. (1) are conducted pursuant to legal or other regulatory authorization;
  2. (2) rest upon analyses of extensive collateral material (e.g., reviews of files and interviews of third-party informants);
  3. (3) rely heavily upon a combination of deductive reasoning (i.e., from generally accepted facts to a specific conclusion), inductive reasoning (i.e., from specific observations to broad generalizations), and abductive reasoning (i.e., from incomplete observations to likeliest possible explanations); and
  4. (4) result in conclusions and recommendations intended to serve one or more practical purposes (e.g., diagnosis and disposition), but not necessarily or exclusively to improve the well-being of the subject of the assessment.

Indirect assessments are conducted across almost all specialty areas of psychological practice (see, e.g., Neller, 2016). Accordingly, this review draws from undertakings as seemingly diverse as leader and criminal profiling, clinical diagnosis and formulation, personnel selection, deception detection, and violence risk assessment; its conclusions rest largely on trends suggested by meta-analyses, narrative reviews, and seminal works. This breadth of coverage is intended to help psychologists think about the potential utility of indirect assessments in general rather than in any specific area of practice.

The chapter begins with a discussion of contexts that gave rise to professionally performed indirect assessments several decades ago. Next, it covers reliability and validity of several procedures that might shape our understanding of the potential reliability and validity of indirect assessments across contexts. Then it addresses ethical issues relevant to indirect assessments. The chapter concludes with a discussion of foundational principles that might prove useful to practitioners who conduct indirect assessments across a wide variety of settings.

Historical Context

The application of psychological research and assessment to issues involving law enforcement, national defense, and national security can be traced to the turn of the 20th century. As early as 1908, Harvard University professor and American Psychological Association (APA) president, Hugo Munsterberg, advocated for the use of psychology to enhance aspects of police investigations, including interrogations, to help courts determine veracity of confessions and accuracy of eyewitness testimony, and to support the legal system’s efforts to prevent crime. A mere decade later, shortly after the United States had entered into World War I, another Harvard professor and APA president, Robert Yerkes (1917, 1918), urged psychologists to render “all possible assistance” to the service of national security and defense. The application of psychology to these kinds of practical problems proved to be highly successful, affirming the value of the nascent science and profession (e.g., Benjamin, 1986; Kohler, 1943).

The successful application of psychology to practical problems continued into World War II, evidently with support from the APA. Harvard professor Gordon Allport proclaimed in his APA presidential address, “From agencies of government, industry, education, and human welfare come daily appeals for assistance in their service to mankind. Psychology, as a science … can be justified only by giving mankind practical control over its destinies” (1940, p. 23). Allport concluded that the ultimate success of the field should be measured by its ability to understand, predict, and ultimately control human behavior.

Psychologists embraced Allport’s call (Capshew & Hilgard, 1992). In World War II they investigated the appeal of Nazism to the German population, the probable response of Germans to particular types of propaganda, and the effect of strategic bombing on Germans’ morale (Abbott, 1980). Psychological consultants to the Office of Strategic Services (OSS), a predecessor of the Central Intelligence Agency, held seminars in an effort to improve U.S. leaders’ understanding of a single German whom they had never personally examined, Adolf Hitler (Hoffman, 1992).

At least two psychological profiles of Hitler were generated during World War II. One was authored primarily by Walter Langer, a former professor of psychology at Harvard and the first person without a medical degree to become a member of the American Psychiatric Association. Head of the Research and Analysis section of the OSS, Langer considered a number of possible behaviors in which Hitler might engage as the tide turned against Germany. The most plausible outcome, he predicted, was death by suicide (Langer, Murray, Kris, & Lewin, 1943).

According to a declassified article authored by Post (1979), psychologists continued to assess foreign leaders after the end of World War II. As examples, a psychological profile of Soviet First Party secretary Nikita Khrushchev was constructed for President John F. Kennedy; and, in anticipation of the Camp David summit, profiles of Egyptian president Anwar Sadat and Israeli prime minister Menachem Begin were constructed for President Jimmy Carter (Winter, 2013). The successful use of psychological profiles led to their acceptance as “a requisite for each summit meeting and a required resource for managing politico-military crises” (Post, 2003b, p. 59).

Just as psychology was applied to complex international matters in the wake of World War II, it also was applied to complicated domestic matters—the identification and apprehension of unknown criminal subjects (Kapardis, 2017). The first time that a law enforcement agency in the United States sought expertise within this domain occurred in the midst of a series of New York bombings that had begun in earnest in 1950. Highly motivated to stop the “Mad Bomber,” a detective consulted James Brussel, a psychiatrist in independent practice who also served as the New York assistant commissioner of mental hygiene.

Analyzing available evidence, Brussel (1968) elegantly used deductive, inductive, and abductive reasoning to generate multiple inferences about the unknown subject. Brussel stated the subject was likely an Eastern European male, between ages 40 and 50, with an athletic build. He inferred the subject was likely a stickler for rules and order, outwardly polite and proper in all his dealings, and a regular parishioner of a Catholic Church. He concluded the subject likely had been an exemplary employee who had begun the bombing campaign after developing a long-standing grievance against his former employer. He also stated the subject likely had an extensive history of civil litigation but no history of arrests.

Brussel described the subject as likely aloof, paranoid, and grandiose. He stated the subject likely had no history of intimate relationships and lived alone or with an older female relative in a house in Bridgeport, Connecticut. As a final detail, Brussel legendarily told the referring detective, “One more thing… . When you catch him—and I have no doubt you will—he’ll be wearing a double-breasted suit… . And it will be buttoned” (p. 46). Brussel offered courses of actions that might draw out the Mad Bomber. When law enforcement took the Mad Bomber into custody a short time later, the profile proved to be highly accurate, even down to the double-breasted suit that was buttoned.

Indirect assessments of criminal subjects were increasingly used for law enforcement purposes following the arrest of the Mad Bomber (e.g., Woodworth & Porter, 1999). A widely recognized example in the national security arena is that of Theresa Squillacote. A senior staff attorney in the Department of Defense, Squillacote was suspected of having spied for the Soviet bloc for decades (Mickolus, 2015). A psychologist helped law enforcement identify ways in which Squillacote’s “emotional vulnerabilities” for fantasy and intrigue might improve the chances of a successful sting operation (Ewing, 2002). The sting was effective, leading to Squillacote’s arrest for, and ultimate conviction of, espionage (United States v. Squillacote, 2001).

As illustrated by these case examples, indirect assessments have a long and storied history in national security, national defense, and law enforcement—the very areas to which the status and perceived utility of psychology are inextricably linked (e.g., Staal, Neller, & Krauss, 2018). From the generation of psychological profiles of foreign leaders to unidentified criminal subjects to suspected spies, indirect assessments have proven to be a highly successful tool within individual cases. But their use is by no means limited to these circumscribed areas. The next section draws on research from other areas of psychology, showing that indirect assessments are likely as reliable and valid as direct assessments across diverse areas of practice.

Research Foundations

Psychological assessment is “the systematic measurement of a person’s behavior and variables associated with variance in behaviors as well as the inferences and judgments based on those measurements” (Haynes, Smith, & Hunsley, 2011, p. 2). It involves the use of multiple sources of information gathered from methods that vary in their degree of objectivity (Matarazzo, 1990; McIntyre & Miller, 2007). To be conducted soundly, it requires (1) an understanding of cognition, emotion, and behavior; (2) a grasp of measurement, statistics, and research methods; (3) an appreciation for the distinct type and quality of data generated by different sources and methods of information; and (4) the ability to think clearly about data in context (Meyer et al., 2001).

At first blush, unobtrusively assessing subjects in a reliable way—and making accurate inferences about them without ever having met them in person—may seem to be a highly challenging endeavor. More than a half-century ago, however, thoughtful researchers and practitioners recognized that the task is less complex and potentially less complicated than that which occurs when face-to-face interactions are introduced (e.g., Webb, Campbell, Schwartz, & Sechrest, 1966). The early insights of these researchers and practitioners are corroborated by more recent scholarship (e.g., Hill, White, & Wallace, 2014).

To be sure, the sources of error introduced by an interview are innumerable. The interview subject might intentionally distort the information she provides (e.g., Rogers, 2018). Even if her self-report is credible, her personal biases and limited introspection ability may nevertheless preclude her from providing accurate information about her history, present mental state, or future intentions (Nisbett & Wilson, 1977; Tversky & Kahneman, 1974; Wilson, 2009; Wilson & Dunn, 2004). The dynamic nature of the subject–interviewer interaction itself presents additional and perhaps unpredictable challenges to effective data collection and analysis (cf. Campbell, 1958).

Risk for error is also introduced by the behaviors and cognitive processes of the interviewer himself. Potential for error increases with each question the interviewer chooses; each cue to which he attends, records, and later recalls; and each inference he later makes (e.g., Arkes, 1981; Borum, Otto, & Golding, 1993; Garb, 2005; but see Blumenthal-Barby & Krieger, 2015). Because many interview questions are spontaneously conceived during an unscripted interaction with another person, they can range from shrewdly diagnostic to unavailingly uninformative. Without question, this increased risk for error is associated with increased odds that misinformation will be collected and erroneous inferences will be made.

It is noted that many practitioners continue to view interviews as an essential—even the foundational—component of sound psychological assessment (e.g., Jones, 2010; Sommers-Flanagan & Sommers-Flanagan, 1999). Such confidence in their value is unquestionably misplaced (e.g., Dana, Dawes, & Peterson, 2013). Inferences based on interviews are not only often unreliable, but classification decisions based on them are also often less accurate than those based on other methods.

In the following section, I discuss studies that address the reliability of decisions often made on the basis of interviews alone: clinical diagnoses, case formulations, and personality appraisals. Next, I turn to validity of inferences based on interviews. I then review the reliability and validity of procedures used in two areas in which some psychologists commonly render opinions without interviewing subjects: deception detection and violence risk assessment.

Reliability. As part of routine duties, many psychologists render clinical diagnoses of mental disorders as set forth in the Diagnostic and Statistical Manual of Mental Disorders (Evans et al., 2013; also see Wright et al., 2017), currently in its fifth edition (DSM-5; American Psychiatric Association, 2013a). Research indicates the reliability coefficients of some of the most common DSM-5 clinical diagnoses are quite low. As examples, pooled kappa coefficients (K) for interview-based diagnoses of major depressive disorder, generalized anxiety disorder, and alcohol use disorder do not exceed 0.40 for adults (Clarke et al., 2013; Regier et al., 2013). Medical diagnoses with Ks in this range are typically described as having “questionable” reliability (Kraemer, 2014).

The questionable reliability of mental disorder diagnoses is confined neither to the current edition of the DSM (Rettew, Lynch, Achenbach, Dumenci, & Ivanova, 2009) nor to clinical conditions (e.g., Faust & Ziskin, 1988). For instance, in a recent systematic review Samuel (2015) showed that treating clinicians’ diagnoses of specific personality disorders (PDs) are just as unreliable as those of clinical disorders; treating clinicians’ diagnostic agreement with other sources is even lower (K= 0.26).

Some practitioners dismiss the questionable reliability of clinical diagnoses by advocating for the superiority and importance of case formulations. Beyond mere diagnosis, case formulations integrate psychological theory, research, and idiographic data to provide an enriched conceptualization of an individual; development of an individual’s specific characteristics or problems; contexts or conditions under which those issues are maintained; and predictions about changes that might occur in the future. As with diagnoses of clinical conditions and PDs, high confidence in the ability to generate reliable case formulations is unsupported.

Flinn, Braham, and das Nair (2015) systematically reviewed studies that had addressed the reliability of practitioners’ case formulations. They found few studies to be methodologically rigorous: small samples of practitioners and students had comprised most of them, and only a minority of studies had used blinding. And reliability of practitioners’ case formulations varied considerably across studies. Although the specific impact of interview data on reliability was not reported, reliability of case formulations did not improve with more data (e.g., test results and audio-visual recording) or when formulations were restricted to discrete areas (e.g., overt problems).

One discrete area in which some practitioners contend they achieve highly reliable judgments from interviews is personality appraisal. But a body of research indicates the level of consensus achieved when raters assess most personality traits is neither impressive nor meaningfully improved by modest increases in acquaintance with the rated subject (Kenny, 1994). Even when rated from the best information sources, the simplest personality traits require at least five independent raters (Connelly & Ones, 2010) or substantial contact (i.e., over the course of years; Kenny, Albright, Malloy, & Kashy, 1994) to achieve minimally acceptable levels of reliability. Indeed, in routine practice, interviews of subjects are unlikely to increase reliability of personality judgments beyond that which can be obtained from other informants (Achenbach, Krukowski, Dumenci, & Ivanova, 2005).

Across numerous domains, practitioners can reasonably expect to rate subjects at least as reliably without an interview as with an interview, so long as they use sound procedures and rely on appropriate sources of information. For instance, practitioners can expect to rate neuroticism, extraversion, and agreeableness as reliably as close acquaintances can, provided the practitioners rely exclusively on audio cues (Connelly & Ones, 2010; see Table 10.1). Ratings of another personality construct, psychopathy, are at least as reliable without an interview as with an interview, provided sufficient file information and a structured scheme are utilized (e.g., Wong, 1988). When structured appropriately, personality traits and PDs thought to be relevant to espionage cases are also rated reliably without interviews (i.e., Pearson r and intraclass correlation coefficients ≥ 0.80; Lenzenweger, Knowlton, & Shaw, 2014). Coupled with the limits of interview- and self-report-based methods of data collection, findings such as these strongly support the use of less traditional assessment of personality traits and disorders (e.g., Marcus & Zeigler-Hill, 2016).

Table 10.1 Reliability Estimates: Sources and Observation Methods for Indirect Assessment of the Big Five Personality Traits

Dimension

r (SD)

K

N

Neuroticism (All)

0.33 (0.14)

72

13,458

  Friends

0.38 (0.11)

16

3,102

  Family

0.37 (0.16)

5

774

  Strangers

0.23 (0.15)

41

3,723

    Audio cues only

0.32 (0.14)

9

315

    Natural behavior

0.32 (0.16)

15

2,136

Extraversion (All)

0.43 (0.13)

82

12,438

  Friends

0.46 (0.08)

16

3,111

  Family

0.45 (0.08)

5

774

  Strangers

0.40 (0.17)

49

4,238

    Natural behavior

0.50 (0.10)

16

2,124

    Activity (audio + visual)

0.48 (0.11)

19

2,388

    Audio clues only

0.45 (0.25)

10

393

    Prescribed behavior

0.45 (0.06)

3

267

Openness (All)

0.32 (0.13)

53

7,990

  Friends

0.43 (0.05)

9

2,077

  Family

0.38 (0.07)

2

185

  Strangers

0.30 (0.17)

31

3,601

    Personal object

0.42 (0.12)

5

412

Agreeableness (All)

0.32 (0.14)

83

10,689

  Friends

0.34 (0.11)

20

3,263

  Cohabitators

0.33 (0.06)

8

1,172

  Strangers

0.27 (0.16)

48

4,094

    Audio cues only

0.35 (0.28)

10

393

    Activity (audio + visual)

0.31 (0.12)

19

2,424

Conscientiousness (All)

0.36 (0.13)

64

11,523

  Friends

0.37 (0.08)

20

3,394

  Strangers

0.28 (0.15)

35

3,466

    Activity (audio + visual)

0.35 (0.13)

15

2,260

    Personal object

0.33 (0.11)

5

412

Note: Connelly and Ones (2010) reported observed and corrected mean interrater reliability coefficients by observer source. r = mean observed interrater reliability coefficient; SD = observed standard deviation of interrater reliability coefficient; k = number of independent samples contributing data; N = sample size.

Validity. As discussed, practitioners’ inferences based on interviews share a limited amount of variance with inferences based on other sources. This finding begs at least a couple of questions. First, are practitioners’ inferences based on interviews (often drawn from relatively brief, unstructured contacts with subjects in a single context) more valid than inferences based on other sources (e.g., standardized tests, interviews of third parties whose contact with sources is comparatively greater and spans multiple contexts, comprehensive reviews of files covering several years and areas of life)? Second, even if practitioners’ interview-based inferences are less valid than those based on other sources, might they still add accurate and unique information beyond that which is gleaned from other sources?

Connelly and Ones (2010) provide a partial answer to these questions in the context of personnel selection. In their systematic review and quantitative synthesis of more than 200 independent samples and 40,000 targets (i.e., “subjects,” as used in this chapter), they found that others’ ratings of several personality characteristics predicted job performance more strongly than did self-ratings of those same characteristics. Especially strong correlations were found between other-rated conscientiousness and job performance, as well as other-rated openness and job performance. The addition of self-ratings to other-ratings did not add incrementally to the prediction of job performance.

These findings underscore the relatively limited value of interview data for the prediction of job-related outcomes (Morris, Daisley, Wheeler, & Boyer, 2015; cf. McDaniel, Whetzel, Schmidt, & Maurer, 1994). This is especially true for unstructured interviews (Schmidt & Hunter, 1998). To be sure, a practitioner can expect to predict job performance as well based on a combination of general mental ability (GMA) test scores and unstructured interview data as from a combination of GMA scores and any one of a number of alternative variables, including scores on measures of Openness and Conscientiousness (Schmidt, Oh, & Shaffer, 2016).2

Notably, practitioners can expect to make valid inferences about a subject’s openness and conscientiousness on the basis of extraordinarily brief encounters (Ambady & Rosenthal, 1992; Slepian, Bogart, & Ambady, 2014). If the subject is unavailable, practitioners might instead make inferences on the basis of his personal documents, such as autobiographies, diaries, or letters (Allport, 1942; Borkenau, Mosch, Tandler, & Wolf, 2016). Alternatively, they might examine his social media postings (Stoughton, Thompson, & Meade, 2013). Or they might consider the variety of books on his office shelves or degree of organization and clutter in his workspace (for a review, see Gosling, 2008). Even his garbage might reveal accurate information about him and his pattern of life; as explained by Rathje and Murphy (2001, p. 54), “What people have owned—and thrown away—can speak more eloquently, informatively, and truthfully about the lives they lead than they themselves ever may.”

Deception detection. Laypersons from many and diverse cultures agree that certain behaviors signal deception. As examples, most people believe that when others lie, they often make poor eye contact or avert their gaze; shift body posture or touch their face; or exhibit such speech disturbances as pauses, “ah” utterances, or rate changes (Global Deception Research Team [GDRT], 2006). These beliefs are shared by presumed experts in deception detection, such as law enforcement officers, intelligence officers, polygraphers, and psychologists (Bogaard, Meijer, Vrij, & Merckelbach, 2016; Ekman & O’Sullivan, 1991; Stromwall, Granhag, & Hartwig, 2004). They are reinforced by pop-culture guides that purportedly teach the public at large to detect deception (e.g., Craig, 2012). Yet a sizable body of research clearly indicates many commonly held “signs” of deception do not meaningfully discriminate truths from lies (DePaulo et al., 2003).

Accompanying the widespread misunderstanding of behavioral cues are many misconceptions about the conditions thought to impact a person’s ability to detect deception. For instance, law enforcement officers, prosecutors, and judges believe deception is more easily and accurately detected by conducting face-to-face interviews than by merely observing videotapes (Stromwall & Granhag, 2003). But a sizable body of research refutes this belief (e.g., Hartwig & Granhag, 2015). Not only can reliable truth–lie discriminations be made on the basis of limited to no contact with a subject (DePaulo et al., 2003), but accuracy does not increase with added exposure time (Bond & DePaulo, 2008). Indeed, deception judgments are at least as accurate when based on transcripts of interactions as when based on the interactions themselves (Bond & DePaulo, 2006; Hartwig & Bond, 2014).

Table 10.2 displays the magnitude of effect sizes of many empirically based cues to deception. It is readily apparent that none of them requires face-to-face contact with subjects; rather, all of them can be based exclusively on observation (also see Aamondt & Custer, 2006). Although not large in an absolute sense, the effect sizes are comparable to those found in other areas of applied psychology (e.g., Richard, Bond, & Stokes-Zoota, 2003). Collectively, this body of research suggests that practitioners can expect to make reasonably reliable and valid judgments about deception, whether or not they have face-to-face contact with a subject.

Table 10.2 Validity Estimates: Behaviors More Suggestive of Deception Than Truthfulness

Cue

d (95% CI)

Q

k

N

Verbal

Less verbal and vocal immediacy

0.55 (0.41–0.70)

26.3*

7

373

Less likely to admit lack of recall

0.42 (0.15–0.70)

18.7*

5

183

Less time talking

0.35 (0.16–0.54)

8.1

4

207

More external associations

0.35 (0.02–0.67)

2.1

3

112

More discrepancies, ambivalence

0.34 (0.20–0.48)

14.3*

7

243

Fewer details

0.31 (0.21–0.38)

76.2*

24

883

More verbal and vocal uncertainty

0.30 (0.17–0.43)

11.0

10

329

Fewer spontaneous corrections

0.29 (0.02–0.56)

3.8

5

183

More vocal tension

0.26 (0.13–0.39)

25.4*

10

328

Less logical

0.25 (0.04–0.46)

21.5*

6

223

Less plausible

0.23 (0.11–0.36)

13.1

9

395

Less verbal and vocal involvement

0.21 (0.08–0.34)

5.8

7

384

More word and phrase repetitions

0.21 (0.02–0.41)

0.5

4

100

More negative statements, complaints

0.21 (0.09–0.32)

21.5*

9

397

Higher voice pitch, frequency

0.21 (0.08–0.34)

31.2*

12

294

Nonverbal

Less cooperative in general

0.66 (0.38–0.93)

11.2*

3

222

Greater pupil dilation

0.39 (0.21–0.56)

1.1

4

328

More signs of nervousness, tension

0.27 (0.16–0.38)

37.3*

16

571

Raised chin

0.25 (0.12–0.37)

31.9*

4

286

Note: DePaulo et al. (2003) systematically reviewed and quantitatively analyzed 116 studies that had compared the behaviors of adults who were lying with the behaviors of adults who were telling the truth. North American students, most of whom had no motivation to tell successful lies, comprised the substantial majority of the 120 independent samples. Two cues were more strongly related to deception when message senders were motivated to succeed with their lies than when they had no motivation to succeed: higher vocal frequency or pitch (d = 0.59, CI = 0.31–0.88, Q = 9.7, k = 6) and increased nervousness or tension (d = 0.35, CI = 0.11–0.58, Q = 23.4*, k = 8). d = weighted standardized mean difference; CI = confidence interval; Q = homogeneity statistic, where an asterisk indicates considerable differences across samples; k = number of independent effect sizes; N = total number of participants in the studies.

Violence risk assessment. Psychologists assess violence risk across a variety of contexts (e.g., Heilbrun, 2009; Mills, Kroner, & Morgan, 2011). In some contexts, risk assessments are highly formal, deliberate, and comprehensive (e.g., civil commitment, bond, criminal sentencing, and parole). In other contexts, they are often informal and intuitive, or the practitioners’ ultimate judgments are inferred on the basis of disposition (e.g., emergency room discharge and end of therapy session). Violence risk assessment methods vary in accordance with these degrees of formality (Mrad & Neller, 2015).

In formal, high-stakes settings, experts commonly use actuarial models to assess risk for violence (e.g., Jackson & Hess, 2007). Actuarial risk assessment instruments (ARAIs) combine statistically derived variables to produce numerical probability statements. By contrast, unstructured clinical judgments (UCJs) involve nonstandardized collection and combination of data, and result in squishy impressions (Dawes, Faust, & Meehl, 1989). Whereas ARAIs ordinarily can be scored without interview data, UCJs rest heavily upon impressions formed from interviews.

More than a half-century of research indicates actuarial models are at least as accurate as—and in many cases more accurate than—UCJs for drawing a wide range of inferences (Grove & Meehl, 1996; Grove, Zald, Lebow, Snitz, & Nelson, 2000). ARAI scores are more reliable than UCJ-based risk inferences (American Psychological Association, 2011; also see Singh, Serper, Reinharth, & Fazel, 2011). They also yield higher validity coefficients (Ægisdottir et al., 2006). In the assessment of risk for sexually violent recidivism, for instance, the mean effect size of ARAI scores is roughly 50 percent larger than the mean effect size of UCJ-based inferences (Hanson & Morton-Bourgon, 2009). The validity of ARAI scores for the prediction of general violence is about as high as the validity of mammograms for the detection of breast cancer (Fazel, Singh, Doll, & Grann, 2012; Mushlin, Kouides, & Shapiro, 1998).

Perhaps the most persuasive evidence of the accuracy of indirect assessments and limited value of interview data with regard to violence risk is gleaned from a meta-analysis conducted by Campbell, French, and Gendreau (2009). In 88 truly prospective studies, mean effect sizes of violence predictions made exclusively on the basis of file reviews were more than twice as large in magnitude as those made exclusively on the basis of interviews. When added to file reviews, interview data did not meaningfully increase the accuracy of predictions of community recidivism, and they significantly reduced the accuracy of predictions of institutional violence. These findings support the view that ARAIs are good enough—and interviews poor enough—that practitioners can justify relying exclusively on the former and eliminating the latter when assessing violence risk (cf. Quinsey, Harris, Rice, & Cormier, 2006).

Ethical Considerations

In 1964, roughly a decade after the APA published its first Ethics Code, Fact Magazine surveyed psychiatrists regarding the fitness of Senator Barry Goldwater to serve as U.S. president. Nearly 2,500 psychiatrists responded to the survey. Roughly half opined Goldwater was unfit. The remainder were split between opinions that he was fit or that they lacked sufficient information to make a judgment about his fitness.

The event embarrassed a number of physicians, and it outraged Goldwater and members of the public (Kroll & Pouncey, 2016). Nearly 10 years later, in 1973, the American Psychiatric Association (ApA) published the so-called Goldwater Rule, currently worded as follows:

On occasion psychiatrists are asked for an opinion about an individual who is in the light of public attention or who has disclosed information about himself/herself through public media. In such circumstances, a psychiatrist may share with the public his or her expertise about psychiatric issues in general. However, it is unethical for a psychiatrist to offer a professional opinion unless he or she has conducted an examination and has been granted proper authorization for such a statement. (Section 7.3; ApA, 2013b; see Stone [2018] for a thoughtful review)

A half-century later, psychiatrists and other mental health professionals conducted indirect assessments of President Donald Trump, some of which were collected and published as a single volume (Lee, 2017). That same year, ApA (2017) reaffirmed psychiatrists’ obligation to continue to follow the Goldwater Rule, offering the following rationale: professional opinions offered without direct interviews (1) compromise the integrity of the physician and profession, (2) have the potential to stigmatize people with mental illness, and (3) violate the principle of informed consent.3

In the context of indirect assessment, psychologists’ obligation to maintain integrity rests chiefly with their duties to strive for accuracy and to honestly acknowledge their limits (Mossman, 1994; see Meloy, 2004, for applied examples). Research findings already discussed in this chapter clearly refute any blanket argument that interviews are necessary for accurate assessment (for an excellent review, see Lilienfeld, Miller, & Lynam, 2018). Therefore, ApA’s first concern about indirect assessments can be largely dismissed as a misunderstanding regarding the state of the science.

The second concern expressed by the ApA, avoiding stigma, has no logical connection to the issue at hand. It seems based more on the public image of psychiatry and psychiatric patients than on any serious consideration of ethical principles or standards. Furthermore, because diagnoses are not rendered in many contexts in which indirect assessments are conducted, stigma-related issues can be minimized if not completely avoided. This means ApA’s second concern can also be readily dismissed from the present discussion. Accordingly, this section addresses mainly the third reason expressed by the ApA, informed consent, then segues into discussion of harms that might occur in connection with indirect assessments (Acklin, 2018).

In the first edition of their seminal work, published shortly after the Fact Magazine survey was conducted but well before the Goldwater Rule was implemented, Webb et al. (1966) deliberately avoided grappling with ethical issues that might arise from the use of unobtrusive measurement in the social sciences. The second edition of their work was published less than a decade after the Goldwater Rule was formulated. In it, they devoted an entire chapter to the issue (Webb, Campbell, Schwartz, Sechrest, & Grove, 1981).

Webb et al. (1981) identified two primary ethical issues to consider—the subject’s right to privacy and, like the ApA (2013b), the investigator’s need to obtain informed consent. They explicitly rejected the notion of any right to privacy in contexts involving analysis of the behavior of public figures or “spying … in some parts of the criminal justice system” (p. 148). They also identified problems in attempting to apply the doctrine of informed consent to all situations, acknowledging that informed consent may sometimes be impossible to obtain. Even if informed consent is feasible, the person from whom consent should be obtained is not always clear (Staal 2018; Staal & Greene, 2015).

In a number of circumstances, the subject of the assessment is not the same person from whom consent should be sought (Koocher, 2009). This arises, for example, in contexts where the client is a third party rather than the subject of the assessment (see, e.g., Greenberg & Shuman, 1997; Monahan, 1980; Strasburger, Gutheil, & Brodsky, 1997). Such are the contexts in which indirect assessments are ordinarily performed (Morgan et al., 2006).

The discomfort that some psychologists may experience while conducting assessments without a subject’s informed consent is not based on any prohibition from the APA’s Ethics Code. Indeed, the APA’s Ethics Code explicitly states consent is not required in a variety of circumstances, such as when assessments are (1) mandated by law or governmental regulation, (2) implied because they are performed as a routine institutional or organizational activity, or (3) rendered moot because they are focused on the subject’s decisional capacity (2002, 9.03(a); as amended, 2016).

Furthermore, the APA’s Ethics Code does not require psychologists to interview individuals before offering opinions about them (Canter, Bennett, Jones, & Nagy, 1996; cf. Miller & Evans, 2004). Indeed, no previous version of the APA’s Ethics Code has ever included such a mandate (Myers, Neller, de Leeuw, & McDonald, 2017). This is true even for situations in which the subject of the assessment could be harmed (DeMatteo, Neller, Supnick, McGarrah, & Keane, 2017; also see Koocher, 2009).

Psychologists who conduct indirect assessments despite potential harms to the subject evidently place more weight on the concerns of their clients and society than on these non-client individuals (e.g., Behnke, 2006; Ewing, 2002; Gravitz, 2009). They are not alone. Psychologists practice ethically in multiple areas in which their actions might harm others (e.g., Neller, 2016). The placement of greater weight on the interests of their client and society over any potential or actual harms to an individual subject does not, in and of itself, violate the APA’s Ethics Code (Grisso, 2001; Staal, 2018; Staal & Greene, 2015).

In his APA presidential address, Gerald Koocher (2007) thoughtfully addressed the issue confronted by psychologists who practice in circumstances that might result in harm to others. He explained, “At times, avoiding all harm becomes impossible, and we must attempt instead to minimize harm resulting from our work. At the same time that we strive to establish relationships of trust with our clients, we must remain mindful of our professional and scientific responsibilities to society and our communities” (p. 379).

Irrespective of any potential harms, if a psychologist determines the collection of interview data is unreasonable (Schlesinger, 2017), inadvisable, precluded by the nature of the services (Canter et al., 1996), impractical, or otherwise unwarranted, she simply explains this to her client and collects information from other sources; determines if the information is sufficient for offering opinions; and, if the information is sufficient, offers data-driven opinions with appropriate disclaimers (Foote, 2017; see also 9.01(b) and 9.01(c) of the APA’s Ethics Code, as well as Guideline 9.03 of the Specialty Guidelines of Forensic Psychology [APA, 2013]). Depending on the quality of information, those opinions may be based on a record review, the results of a structured tool that can be completed without an interview, interviews of collateral sources, or any reasonable combination thereof (see Bush, Connell, & Denney, 2006; Lilienfeld et al., 2018; Neller, 2017). In the next section, practitioners will find additional guidance intended to help them think of ways they might improve their indirect assessments.

Principles for Practice

So far, this chapter has addressed history and context relevant to indirect assessment, with a focus on national defense, national security, and law enforcement. It has presented findings that show indirect assessments can be sufficiently reliable and valid for practice across multiple specialty areas, including but not limited to clinical, personnel, and forensic settings. It has also discussed ethical issues that might have particular relevance to indirect assessments.

The current section presents 10 foundational principles that have the potential to enhance indirect assessments, irrespective of the specific setting in which they are conducted. The set of principles is not exhaustive. But the principles are common to many diverse areas, including but not limited to psychobiography (Ponterotto, 2014), clinical psychology (Haynes et al., 2011), and forensic psychology (Heilbrun, 2001; Heilbrun, Grisso, & Goldstein, 2009; Heilbrun, Marczyk, & DeMatteo, 2002). Their commonality across diverse settings suggests they are both generally accepted and potentially useful across multiple specialty areas.

  1. 1) Clearly identify the primary client and reason(s) for the assessment. Prior to commencing any assessment, a practitioner clearly establishes his or her primary client and the objective(s) of the assessment (Monahan, 1980). Doing so helps the practitioner discern if he or she is competent to conduct the assessment. It also helps the practitioner anticipate complications that may arise, as well as manage and meet expectations the client may have. The relational and task-oriented expectations are usually straightforward in a traditional treatment-related context: Ordinarily, the primary client is a patient of the practitioner or the patient of one of the practitioner’s colleagues; the practitioner conducts the assessment to improve his understanding of the patient so treatment can be effectively planned (Groth-Marnat, 1999).
    Making such a determination can be slightly more complicated in less traditional situations, especially when third-parties request services. In a forensic setting, for instance, the subject of the assessment is not the primary client; instead, a judge or legal representative usually is (Greenberg & Shuman, 1997; Strausburger et al., 1997). In these less traditional contexts, particularly when the client is a third party, the specific issue(s) that led to the assessment must sometimes be carefully drawn out, clarified, and refined (Melton et al., 2007). Failure to do so can lead to products that lack utility or, worse, are detrimental to the client.
  2. 2) Structure the assessment in accordance with its purpose. The practitioner’s approach to the assessment—the specific information she seeks, the methods she uses to seek it, the framework she adopts to understand and analyze the data—is guided by the client’s question(s). Incorporation of established models is one strategy a practitioner might use to guide her approach. Doing so simplifies and clarifies the task at hand. Failing to use an established model (or utilizing a poor model) can lead to misunderstanding and, as a result, poor decision making (Heilbrun, 2001).
    For instance, if a client seeks a broad understanding of a foreign leader to enhance impending negotiations or other interactions, the practitioner might follow a model proposed by Winter (2013): Gather all available information on the leader; rate the leader’s personality traits in line with an empirically supported model of personality; study the social context that shapes the leader’s behavior; make judgments regarding the content and complexity of the leader’s cognition (e.g., locus of control, self-esteem, self-confidence, and values); make inferences about the leader’s motives related to achievement, affiliation, and power; and offer recommendations for interacting with and influencing the leader. The practitioner will likely choose a simpler, more focused model if the client is seeking a more targeted assessment (e.g., Grisso, 1986).
  3. 3) Use relevance and accuracy as guideposts for collecting and reporting information. The practitioner strives to collect accurate information that directly or indirectly addresses the client’s specific question(s). She generally refrains from seeking information that comes from unreliable sources or that will not help answer the client’s specific question(s). The importance of following this principle increases with the sensitivity of the information. For instance, if information on family functioning, unusual sexual practices, or clinical diagnoses will not directly or indirectly answer the referral question, the practitioner does not seek it. And if this information is inadvertently uncovered, she does not report it.
  4. 4) When deciding what to report, weigh the probative value of the information against its potentially prejudicial impact. The practitioner refrains from reporting data that will likely interfere with the chief goal of most indirect assessments, that is, providing information and analysis that will improve the client’s decision making. For example, in some circumstances, a subject’s racially or ethnically charged remark can strongly influence a client. If a subject were to make a charged remark that has no bearing on the assessment objectives, the practitioner omits it from any report. To do otherwise would run the risk of drawing the client’s attention to irrelevant data and unduly influencing the client’s decision. By following this principle, the practitioner focuses her assessment on essential issues and improves the likelihood that her assessment will ultimately be of value to the client.
  5. 5) Separate facts from inferences. The practitioner identifies facts, that is, a collection of information that is indisputably true. Based on those facts, he makes inferences, or reasoned conclusions, such as speculations about the origin, development or meaning of the facts; generation of ideas about what else might be true; or predictions about future behavior. The separation of facts from inferences helps the practitioner think clearly about available data—what he knows and does not know, what he thinks the data means, and what else he needs to know to answer the client’s question(s). It might also generate new questions from the client. By clearly distinguishing facts from inferences, the practitioner increases transparency and clearly communicates limitations.
  6. 6) Use both group-based and individual-based data. In general, predictions and other types of classification decisions rest on three types of information: (1) prior probability estimates, that is, prevalence or incidence rates; (2) evidence relevant to the individual cases; and (3) accuracy of classification methods (Meehl & Rosen, 1955). For example, as part of a suicide autopsy, a practitioner might be asked if a treating clinician should have foreseen the suicide event (e.g., White, 1999). In answering this question, the practitioner culls research findings that will help him derive informed estimates of the incidence rates of suicide in the general population and among smaller groups of people who share a number of characteristics of the decedent (e.g., the suicide rate among elderly white men). He searches for suicide risk and protective factors, then applies these group data to the individual case. Focusing further on individual-based data, he examines test scores and behavior patterns that preceded and followed any suicide-related acts previously committed by the decedent. When feasible, he seeks and applies estimates of true- and false-positive rates associated with each risk factor, protective factor, and relevant behavior sample. When all of this data is combined, he is in a position to offer estimates of the decedent’s probability of suicide before the act was committed. Even when precise statistics are unavailable, the process of thinking about prior probabilities, nomothetic and idiographic data, and classification accuracy can improve decision making across multiple areas (Gigerenzer, 2002; Silver, 2015; Tetlock & Gardner, 2015).
  7. 7) Use deductive, inductive, and abductive reasoning. A practitioner reaches conclusions in many ways when conducting an indirect assessment. In deductive reasoning, she applies generally accepted facts to reach a conclusion. In inductive reasoning, she uses specific observations to make broad generalizations about what is likely to be true. In abductive reasoning, she uses incomplete observations to discern the likeliest possible explanations or conclusions. Each form of reasoning plays an integral role.
    For example, a practitioner might be asked to indirectly and retrospectively assess whether a man understood he was committing a wrongful act when he allegedly murdered someone. The practitioner knows that most defendants who are referred for these types of assessments understand the wrongfulness of their acts (Cochrane, Grisso, & Frederick, 2001; Warren, Murrie, Chauhan, Dietz, & Morris, 2003); the practitioner knows of no meaningful differences between the defendant and other defendants referred for these assessments, so she defaults to the position that he likely understood the wrongfulness of the act (deduction). She subsequently analyzes crime scene evidence, where she observes video footage of the alleged offender putting on a mask and gloves before entering the victim’s home at night, then fleeing from the scene with a weapon in his hand; she infers that he wore a mask and gloves to avoid identification, entered the home at night to avoid detection, and fled the scene to avoid apprehension—all actions an offender might take if he knows the wrongfulness of the act (induction). When the practitioner subsequently learns that the murder weapon was found in a nearby dumpster, she infers that the defendant likely discarded the weapon upon fleeing the scene, another indication that he understood the wrongfulness of the act (abduction). The use of these three forms of reasoning—deductive, inductive, abductive—enables the practitioner to compensate for the limitations of any single approach to data analysis (see, e.g., Heuer, 1999).
  8. 8) Consider convergent and divergent validity while testing multiple hypotheses. The use of two or more independent sources and methods to assess each variable of interest is a well-established tenet of assessment. A variety of sources and methods, such as hidden or contrived observation, trace analysis, or record review, might be used to assess traits indirectly (Webb et al., 1966). Consider, for example, a practitioner who is retained to help a company decide which job applicants will be offered positions. Based on research findings, the practitioner expects cognitive ability and motivation to explain almost all of the variance in training success and job performance (Van Iddekinge, Aguinis, Mackey, & DeOrtentiis, 2018). To indirectly assess ability, he might rely on two historical variables found in records: school grades (Roth et al., 2015) and level of educational attainment (Ritchie & Tucker-Drob, 2018). He might likewise make reasonable inferences about motivation level by examining involvement in extracurricular activities, prior evaluations of job performance, and information contained in letters of recommendation. By assessing each construct with distinct measures, the practitioner can increase the likelihood of accurate prediction.
  9. 9) Seek peer consultation when possible and appropriate. The potential benefits of peer consultation are innumerable (e.g., Bennett et al., 2006). Among them, the process itself often involves “thinking out loud,” which can help a practitioner clarify her own thoughts about a case. It also offers a practitioner the opportunity to glean insights from outside observers, such as untested alternative hypotheses, places to look for additional sources of information, and enhanced interpretation of extant data. Furthermore, it may decrease the likelihood that a malpractice claim will be successful if later filed against the practitioner. For all of these reasons, when available, peer consultation is a staple of indirect assessment.
  10. 10) Honestly and openly acknowledge limits. Psychological science is rife with limitations (e.g., Ferguson, 2015). When acknowledging limits of the data, sources, and methods on which he relies, a practitioner helps his client establish an appropriate level of confidence. Moreover, honest and open acknowledgment of actual limitations keeps him from overselling his value; this can have the effect of ensuring his findings are not accorded undue weight while simultaneously enhancing his credibility.

Summary

Indirect assessments have a long and storied history. Widely performed by practitioners operating across multiple specialty areas, they are supported by a large body of research. And despite the mistaken beliefs of some, they are not prohibited by any version of the APA’s Ethics Code. The material discussed in this chapter is intended to stimulate thinking about indirect assessments. The principles presented here have the potential to enhance quality of indirect assessments, irrespective of the settings in which they are conducted. As such, by generally following them, practitioners might improve the decisions made by their clients.

Notes

1. Russ Palarea is acknowledged for his contributions. Additional colleagues offered helpful comments on earlier drafts. Opinions are not necessarily shared by them or any organization with which I am affiliated. Mistakes are mine.
2. When combined with GMA, other individual variables that predict performance at least as well as the combination of GMA and interviews include the following: scores on tests of integrity, interests, or emotional intelligence; reference checks or biographical data; and grade point average.
3. Psychologists have no obligation to follow ethical principles or standards promulgated by the ApA, and the subjects of most indirect assessments are not public figures. The similarities of professions and circumstances suggest it is nevertheless worthwhile to at least consider ApA’s rationale for the Goldwater Rule. This point is supported by a 2016 press release from APA, in which then-president Susan McDaniel mischaracterized psychologists’ ethical obligations related to the topic.

References

Aamondt, M., & Custer, H. (2006). Who can best catch a liar? A meta-analysis of individual differences in detecting deception. The Forensic Examiner, 15(1), 6–11.

Abbott, P. S. (1980). Social and behavioral sciences contributions to the realities of warfare. In J. K. Arima (Ed.), What is military psychology? Symposium proceedings (pp. 27–32). Monterey, CA: Naval Postgraduate School.

Achenbach, T. M., Krukowski, R. A., Dumenci, L., & Ivanova, M. Y. (2005). Assessment of adult psychopathology: Meta-analyses and implications of cross-informant correlations. Psychological Bulletin, 131(3), 361–382.

Acklin, M. W. (2018). Beyond the boundaries: Ethical issues in the practice of indirect personality assessment in non-health-service psychology. Journal of Personality Assessment. Advance online publication. Retrieved from http://dx.doi.org/10.1080/00223891.2018.1522639

Ægisdottir, S., White, M. J., Spengler, P., Maugherman, A., Anderson, L., Cook, R., Nichols, C. R., … Rush, J. D. (2006). The meta-analysis of clinical judgment project: Fifty-six years of accumulated research on clinical versus statistical prediction. Counseling Psychologist, 34, 341–382.

Allport, G. W. (1940). The psychologist’s frame of reference. Psychological Bulletin, 37(1), 1–28.

Allport, G. W. (1942). The use of personal documents in psychological science. New York: Social Science Research Council.

Ambady, N., & Rosenthal, R. (1992). Thin slices of expressive behavior as predictors of interpersonal consequences: A meta-analysis. Psychological Bulletin, 111(2), 256–274.

American Psychiatric Association. (2013a). Diagnostic and statistical manual of mental disorders, fifth edition. Washington, DC: Author.

American Psychiatric Association. (2013b). Principles of medical ethics with annotations especially applicable to psychiatry. Arlington, VA: American Psychiatric Association.

American Psychiatric Association. (2017). ApA reaffirms support for Goldwater Rule. Retrieved from https://www.psychiatry.org/newsroom/news-releases/apa-reaffirms-support-for-goldwater-rule

American Psychological Association. (2002). Ethical principles of psychologists and code of conduct. American Psychologist, 57, 1060–1073.

American Psychological Association. (2013). Specialty guidelines for forensic psychology. American Psychologist, 68(1), 7–19.

Arkes, H. R. (1981). Impediments to accurate clinical judgment and possible ways to minimize their impact. Journal of Consulting and Clinical Psychology, 49(3), 323–330.

Behnke, S. (2006). Psychological ethics and national security: The position of the American Psychological Association. European Psychologist, 11(2), 153–156.

Benjamin, L. T. (1986). Why don’t they understand us? A history of psychology’s public image. American Psychologist, 41(9), 941–946.

Bennett, B. E., Bricklin, P. M., Harris, E., Knapp, S., VandeCreek, L., & Younggren, J. N. (2006). Assessing and managing risk in psychological practice: An individualized approach. Rockville, MD: The Trust.

Blumenthal-Barby, J. S. & Krieger, H. (2015). Cognitive biases and heuristics in medical decision making: A critical review using a systematic search strategy. Medical Decision Making, 35(4), 539–557.

Bogaard, G., Meijer, E. H., Vrij, A., & Merckelbach, H. (2016). Strong, but wrong: Lay people’s and police officers’ beliefs about verbal and nonverbal cues to deception. PLoS ONE, 11(6), 1–19.

Bond, C. F., & DePaulo, B. M. (2006). Accuracy of deception judgments. Personality and Social Psychology Review, 10(3), 214–234.

Bond, C. F., & DePaulo, B. M. (2008). Individual differences in judging deception: Accuracy and bias. Psychological Bulletin, 134(4), 477–492.

Borkenau, P., Mosch, A., Tandler, N., & Wolf, A. (2016). Accuracy of judgments of personality based on textual information on major life domains. Journal of Personality, 84(2), 214–224.

Borum, R., Otto, R., & Golding, S. (1993). Improving clinical judgment and decision making in forensic evaluation. The Journal of Psychiatry & Law, 21(1), 35–76.

Brussel, J. A. (1968). Casebook of a crime psychiatrist. New York: Bernard Geis Associates.

Bush, S. S., Connell, M. A., & Denney, R. L. (2006). Ethical practice in forensic psychology: A systematic model for decision making. Washington, DC: American Psychological Association.

Campbell, D. T. (1958). Systematic error on the part of human links in communication systems. Information and Control, 1, 334–369.

Campbell, M. A., French, S., & Gendreau, P. (2009). The prediction of violence in adult offenders: A meta-analytic comparison of instruments and methods of assessment. Criminal Justice and Behavior, 36(6), 567–590.

Canter, M. B., Bennett, B. E., Jones, S. E., & Nagy, T. F. (1996). Ethics for psychologists: A commentary on the APA Ethics Code. Washington, DC: American Psychological Association.

Capshew, J. H., & Hilgard, E. R. (1992). The power of service: World War II and professional reform in the American Psychological Association. In R. Evans, V. Sexton, & T. Cadwallader (Eds.), The American Psychological Association: A historical perspective (pp. 149–175). Washington, DC: American Psychological Association.

Clarke, D. E., Narrow, W. E., Regier, D. A., Kuramoto, S. J., Kupfer, D. J., Kuhl, E. A., … Kraemer, H. C. (2013). DSM-5 field trials in the United States and Canada, part I: Study design, sampling strategy, implementation, and analytic approaches. American Journal of Psychiatry, 170, 43–58.

Cochrane, R. E., Grisso, T., & Frederick, R. I. (2001). The relationship between criminal charges, diagnoses, and psycholegal opinions among federal pretrial defendants. Behavioral Sciences and the Law, 19(4), 565–582.

Connelly, B. S., & Ones, D. S. (2010). Another perspective on personality: Meta-analytic integration of observers’ accuracy and predictive validity. Psychological Bulletin, 136(6), 1092–1122.

Craig, D. (2012). Detect deceit: How to become a human lie detector in under 60 minutes. New York: Skyhorse Publishing.

Dana, J., Dawes, R., & Peterson, N. (2013). Belief in the unstructured interview: The persistence of an illusion. Judgment and Decision Making, 8, 512–520.

Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668–1674.

DeMatteo, D., Neller, D. J., Supnick, J., McGarrah, N., & Keane, T., moderated by Harvey, S. (2017, August). Consultation and ethical practice: Dilemmas in forensic, operational and police psychology. Presented at annual convention of the American Psychological Association, Washington, DC.

DePaulo, B. M., Malone, B. E., Lindsay, J. J., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psychological Bulletin, 129(1), 74–118.

Ekman, P., & O’Sullivan, M. (1991). Who can catch a liar? American Psychologist, 46(9), 913–920.

Evans, S. C., Reed, G. M., Roberts, M. C., Esparza, P., Watts, A. D., Correia, J. M., … Saxena, S. (2013). Psychologists’ perspectives on the diagnostic classification of mental disorders: Results from the WHO-IUPsyS Global Survey. International Journal of Psychology, 48(3), 177–193.

Ewing, C. P. (2002). Findings in spy case limit confidentiality of psychotherapy. Monitor on Psychology, 33(7), 26.

Faust, D., & Ziskin, J. (1988). The expert witness in psychology and psychiatry. Science, 241(4861), 31–35.

Fazel, S., Singh, J. P., Doll, H., & Grann, M. (2012). Use of risk assessment instruments to predict violence and antisocial behaviour in 73 samples involving 24827 people: Systematic review and meta-analysis. British Medical Journal, 345(e4692), 1–12.

Ferguson, C. J. (2015). “Everybody knows psychology is not a real science:” Public perceptions of psychology and how we can improve our relationship with policymakers, the scientific community, and the general public. American Psychologist, 70(6), 527–542.

Flinn, L., Braham, L., & das Nair, R. (2015). How reliable are case formulations? A systematic literature review. British Journal of Clinical Psychology, 54, 266–290.

Foote, B. (2017). Expert commentary: Providing opinions of persons not examined. In G. Pirelli, R. Beattey, & P. Zapf (Eds.), The ethical practice of forensic psychology: A casebook (pp. 295–296). New York: Oxford University Press.

Garb, H. N. (2005). Clinical judgment and decision making. Annual Review of Clinical Psychology, 1, 67–89.

Gigerenzer, G. (2002). Calculated risks: How to know when numbers deceive you. New York: Simon & Schuster.

Global Deception Research Team. (2006). A world of lies. Journal of Cross-Cultural Psychology, 37(1), 60–74.

Gosling, S. (2008). Snoop: What your stuff says about you. New York: Basic Books.

Gravitz, M. A. (2009). Professional ethics and national security: Some current issues. Consulting Psychology Journal: Practice and Research, 61(1), 33–42.

Greenberg, S. A., & Shuman, D. W. (1997). Irreconcilable conflict between therapeutic and forensic roles. Professional Psychology: Research and Practice, 28(1), 50–57.

Grisso, T. (1986). Evaluating competencies: Forensic assessments and instruments. New York: Plenum Press.

Grisso, T. (2001). Reply to Schafer: Doing harm ethically. The Journal of the American Academy of Psychiatry and the Law, 29, 457–460.

Groth-Marnat, G. (1999). Handbook of psychological assessment (3rd ed.). New York: John Wiley & Sons, Inc.

Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293–323.

Grove, W. M., Zald, D. H., Lebow, B. S., Snitz, B. E., & Nelson, C. (2000). Clinical versus mechanical prediction: A meta-analysis. Psychological Assessment, 12(1), 19–30.

Hanson, R. K., & Morton-Bourgon, K. E. (2009). The accuracy of recidivism risk assessments for sexual offenders: A meta-analysis of 118 prediction studies. Psychological Assessment, 12(1), 1–21.

Hartwig, M., & Bond, C. F. (2014). Lie detection from multiple cues: A meta-analysis. Applied Cognitive Psychology, 28(5), 661–676.

Hartwig, M., & Granhag, P. A. (2015). Exploring the nature and origin of beliefs about deception: Implicit and explicit knowledge among lay people and presumed experts. In P. A. Granhag, A. Vrij, & B. Verschuere (Eds.), Detecting deception: Current challenges and cognitive approaches (pp. 125–154). West Sussex, UK: John Wiley & Sons, Ltd.

Haynes, S. N., Smith, G. T., & Hunsley, J. D. (2011). Scientific foundations of clinical assessment. New York: Routledge.

Heilbrun, K. (2001). Principles of forensic mental health assessment. New York: Kluwer Academic/Plenum Publishers.

Heilbrun, K. (2009). Evaluation of risk for violence in adults. New York: Oxford University Press.

Heilbrun, K., Grisso, T., & Goldstein, A. M. (2009). Foundations of forensic mental health assessment. New York: Oxford University Press.

Heilbrun, K., Marczyk, G. R., & DeMatteo, D. (2002). Forensic mental health assessment: A casebook. New York: Oxford University Press.

Heuer, R. J. (1999). Psychology of intelligence analysis. Washington, DC: Central Intelligence Agency Center for the Study of Intelligence.

Hill, A., White, M., & Wallace, J. C. (2014). Unobtrusive measurement of psychological constructs in organizational research. Organizational Psychology Review, 4(2), 148–174.

Hoffman, L. E. (1992). American psychologists and wartime research on Germany, 1941–1945. American Psychologist, 47(2), 264–273.

Jackson, R. L., & Hess, D. T. (2007). Evaluation for civil commitment of sex offenders: A survey of experts. Sexual Abuse: A Journal of Research and Treatment, 19, 425–448.

Jones, K. D. (2010). The unstructured clinical interview. Journal of Counseling & Development, 88, 220–226.

Kapardis, A. (2017). Offender-profiling today: An overview. In C. Spinellis, N. Theodorakis, E. Billis, & G. Papadimitrakopoulos (Eds.), Europe in crisis: Crime, criminal justice, and the way forward—Essays in honour of Nestor Courakis, Vol. II (pp. 739–754). Athens, Greece: Ant. N. Sakkoulas Publishers, L. P.

Kenny, D. A. (1994). Interpersonal perception: A social relations analysis. New York: The Guilford Press.

Kenny, D. A., Albright, L., Malloy, T. E., & Kashy, D. A. (1994). Consensus in interpersonal perception: Acquaintance and the Big Five. Psychological Bulletin, 116(2), 245–258.

Kohler, W. (1943). A perspective on American psychology. Psychological Review, 50(1), 77–79.

Koocher, G. P. (2007). Twenty-first century ethical challenges for psychology. American Psychologist, 62(5), 375–384.

Koocher, G. P. (2009). Ethics and the invisible psychologist. Psychological Services, 6(2), 97–107.

Kraemer, H. C. (2014). The reliability of clinical diagnoses: State of the art. Annual Review of Clinical Psychology, 10, 111–130.

Kroll, J., & Pouncey, C. (2016). The ethics of APA’s Goldwater Rule. The Journal of the American Academy of Psychology and the Law, 44, 226–235.

Langer, W. C., Murray, H. A., Kris, E., & Lewin, B. D. (1943). A psychological analysis of Adolph Hitler: His life and legend. Washington, DC: Office of Strategic Services.

Lee, B. (2017). The dangerous case of Donald Trump: 27 psychiatrists and mental health experts assess a president. New York: St. Martin’s Press.

Lenzenweger, M. F., Knowlton, P. D., & Shaw, E. D. (2014, June). Toward an empirically- based taxonomy for espionage: A new rating system and multivariate statistical results. Paper presented at the 2nd Annual National Security Psychology Symposium, Chantilly, VA.

Lilienfeld, S. O., Miller, J. D., & Lynam, D. R. (2018). The Goldwater Rule: Perspectives from, and implications for, psychological science. Perspectives on Psychological Science, 13(1), 3–27.

Marcus, D. K., & Zeigler-Hill, V. (2016). Understanding the dark side of personality: Reflections and future directions. In V. Zeigler-Hill & D. Marcus (Eds.), The dark side of personality: Science and practice in social, personality, and clinical psychology (pp. 363–374). Washington, DC: American Psychological Association.

Matarazzo, J. (1990). Psychological assessment versus psychological testing: Validation from Binet to the school, clinic, and courtroom. American Psychologist, 45(9), 999–1017.

McDaniel, M. A., Whetzel, D. L., Schmidt, F. L., & Maurer, S. D. (1994). The validity of employment interviews: A comprehensive review and meta-analysis. Journal of Applied Psychology, 79(4), 599–616.

McIntyre, S. A., & Miller, L. A. (2007). Foundations of psychological testing: A practical approach, second edition. Thousand Oaks, CA: Sage Publications.

Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52(3), 194–216.

Meloy, J. R. (2004). Indirect personality assessment of the violent true believer. Journal of Personality Assessment, 82(2), 138–146.

Melton, G. B., Petrila, J., Poythress, N. G., Slobogin, C., Lyons, P. M., & Otto, R. K. (2007). Psychological evaluations for the courts: A handbook for mental health professionals and lawyers, third edition. New York: The Guilford Press.

Meyer, G. J., Finn, S. E., Eyde, L. D., Kay, G. G., Moreland, K. L., Dies, R. R. Eisman, E. J., Kubiszyn, T. W., & Reed, G. M. (2001). Psychological testing and psychological assessment: A review of evidence and issues. American Psychologist, 56(2), 128–165.

Mickolus, E. (2015). The counter-intelligence chronology: Spying by and against the United States from the 1700s through 2014. Jefferson, NC: McFarland & Company, Inc.

Miller, C., & Evans, B. B. (2004). Ethical issues in assessment. In M. Hersen (Ed.), Psychological assessment in clinical practice: A pragmatic guide (pp. 21–31). New York: Taylor & Francis Books, Inc.

Mills, J. F., Kroner, D. G., & Morgan, R. D. (2011). Clinician’s guide to violence risk assessment. New York: Guilford Press.

Monahan, J. (1980). Report of the Task Force on the role of psychology in the criminal justice system. In J. Monahan (Ed.), Who is the client? The ethics of psychological intervention in the criminal justice system (pp. 1–17). Washington, DC: American Psychological Association.

Morgan, C. A., Gelles, M. G., Steffian, G., Temporini, H., Fortunai, F., Southwick, S., Feuerstein, S., & Carie, V. (2006). Consulting to government agencies—indirect assessments. Psychiatry, 3(2), 24–28.

Morris, S. B., Daisley, R. L., Wheeler, M., & Boyer, P. (2015). A meta-analysis of the relationship between individual assessments and job performance. Journal of Applied Psychology, 100(1), 5–20.

Mossman, D. (1994). Is expert psychiatric testimony fundamentally immoral? International Journal of Law and Psychiatry, 17(4), 347–368.

Mrad, D. F., & Neller, D. J. (2015). Legal, clinical, and scientific foundations of violence risk assessment. In C. Pietz & C. Mattson (Eds.), Violent offenders: Understanding and assessment (pp. 329–341). New York: Oxford University Press.

Munsterberg, H. (1908/2018). On the witness stand: Essays on psychology and crime. New York: The McClure Company (reprinted in London, England by Forgotten Books).

Mushlin, A. I., Kouides, R. W., & Shapiro, D. E. (1998). Estimating the accuracy of screening mammography: A meta-analysis. American Journal of Preventive Medicine, 14, 143–153.

Myers, C. A., Neller, D. J., de Leeuw, J., & McDonald, S. (2017, June). Indirect assessment: An ethics discussion. Presented at 5th annual National Security Psychology Symposium, Chantilly, VA.

Neller, D. J. (2016). Developments that threaten forensic psychology. The Specialist, 36(1), 30–34.

Neller, D. J. (2017). Expert commentary: Maintaining the scope of the evaluation and testing rival hypotheses. In G. Pirelli, R. Beattey, & P. Zapf (Eds.), The ethical practice of forensic psychology: A casebook (pp. 289–290). New York: Oxford University Press.

Nisbett, R. E., & Wilson, T. D. (1977). Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84(3), 231–259.

Ponterotto, J. G. (2014). Best practices in psychobiographical research. Qualitative Psychology, 1(1), 77–90.

Post, J. M. (1979). Personality profiles in support of the Camp David Summit. Studies in Intelligence, 23, 1–5.

Post, J. M. (2003a). Assessing leaders at a distance: The political personality profile. In J. Post (Ed.), The psychological assessment of political leaders (pp. 69–104). Ann Arbor, MI: University of Michigan Press.

Post, J. M. (2003b). Leader personality assessments in support of government policy. In J. Post (Ed.), The psychological assessment of political leaders (pp. 39–61). Ann Arbor, MI: University of Michigan Press.

Quinsey, V. L., Harris, G. T., Rice, M. E., & Cormier, C. A. (2006). Violent offenders: Appraising and managing risk, second edition. Washington, DC: American Psychological Association.

Rathje, W., & Murphy, C. (2001). Rubbish! The archaeology of garbage. Tucson, AZ: The University of Arizona Press.

Regier, D. A., Narrow, W. E., Clarke, D. E., Kraemer, H. C., Kuramoto, S. J., Kuhl, E. A., & Kupfer, D. J. (2013). DSM-5 field trials in the United States and Canada, part II: Test-retest reliability of selected categorical diagnoses. American Journal of Psychiatry, 170, 59–70.

Rettew, D. C., Lynch, A. D., Achenbach, T. M., Dumenci, L., & Ivanova, M. Y. (2009). Meta-analyses of agreement between diagnoses made from clinical evaluations and standardized diagnostic interviews. International Journals of Methods in Psychiatric Research, 18(3), 169–184.

Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One hundred years of social psychology quantitatively described. Review of General Psychology, 7(4), 331–363.

Ritchie, S. J., & Tucker-Drob, E. M. (2018). How much does education improve intelligence? A meta-analysis. Psychological Science, 12, 987–998.

Rogers, R. (2018). An introduction to response styles. In R. Rogers & S. Bender (Eds.), Clinical assessment of malingering and deception, fourth edition (pp. 3–17). New York: The Guilford Press.

Roth, B., Becker, N., Romeyke, S., Schafer, S., Domnick, F., & Spinath, F. M. (2015). Intelligence and school grades: A meta-analysis. Intelligence, 53, 118–137.

Samuel, D. B. (2015). A review of the agreement between clinicians’ personality disorder diagnoses and those from mother methods and sources. Clinical Psychology: Science and Practice, 22, 1–19.

Schlesinger, L. B. (2017). Expert commentary: Providing opinions of persons not examined. In G. Pirelli, R. Beattey, & P. Zapf (Eds.), The ethical practice of forensic psychology: A casebook (pp. 293–295). New York: Oxford University Press.

Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124(2), 262–274.

Schmidt, F. L., Oh, I.-S., & Shaffer, J. A. (2016). Working paper: The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 100 years of research findings. Unpublished manuscript.

Silver, N. (2015). The signal and the noise: Why so many predictions fail—but some don’t. New York: Penguin Books.

Singh, J. P., Serper, M., Reinharth, J., & Fazel, S. (2011). Structured assessment of violence risk in schizophrenia and other psychiatric disorders: A systematic review of the validity, reliability, and item content of 10 available instruments. Schizophrenia Bulletin, 37(5), 899–912.

Slepian, M., Bogart, K., & Ambady, N. (2014). Thin-slice judgments in the clinical context. Annual Review of Clinical Psychology, 10, 131–153.

Sommers-Flanagan, R., & Sommers-Flanagan, J. (1999). Clinical interviewing (2nd ed.) New York: John Wiley & Sons.

Staal, M. A. (2018). Applied psychology under attack: A response to the Brookline principles. Peace and Conflict: Journal of Peace Psychology, 24(4), 439–447.

Staal, M. A., & Greene, C. (2015). An examination of “adversarial” operational psychology. Peace and Conflict: Journal of Peace Psychology, 21(2), 264–268.

Staal, M. A., Neller, D., & Krauss, D., moderated by Harvey, S. (2018, August). Developing specialty practice guidelines—The case for operational psychology. Panel discussion at annual convention of the American Psychological Association, San Francisco, CA.

Stone, A. (2018). The psychiatrist’s Goldwater Rule in the Trump era. Retrieved from https://www.lawfareblog.com/psychiatrists-goldwater-rule-trump-era

Stoughton, J. W., Thompson, L. F., & Meade, A. W. (2013). Big Five personality traits reflected in job applicants’ social media postings. Cyberpsychology, Behavior, and Social Networking, 16(11), 800–805.

Strasburger, L. H., Gutheil, T. G., & Brodsky, A. (1997). On wearing two hats: Role conflict in serving as both psychotherapist and expert witness. American Journal of Psychiatry, 154, 448–456.

Stromwall, L. A. & Granhag, P. A. (2003). How to detect deception? Arresting the beliefs of police officers, prosecutors and judges. Psychology, Crime & Law, 9, 19–36.

Stromwall, L. A., Granhag, P. A., & Hartwig, M. (2004). Practitioners’ beliefs about deception. In P. Granhag & L. Stromwall (Eds.), The detection of deception in forensic contexts (pp. 229–250). Cambridge, UK: Cambridge University Press.

Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The art and science of prediction. New York: Penguin Random House, LLC.

Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 184, 1124–1131.

United States v. Squillacote. (2001). 4th Cir., 221 F.3d 542.

Van Iddekinge, C. H., Aguinis, H., Mackey, J. D., & DeOrtentiis, P. S. (2018). A meta-analysis of the interactive, additive, and relative effects of cognitive ability and motivation on performance. Journal of Management, 44(1), 249–279.

Warren, J. I., Murrie, D. C., Chauhan, P., Dietz, P. E., & Morris, J. (2003). Opinion formation in evaluating sanity at the time of the offense: An examination of 5175 pre-trial evaluations. Behavioral Sciences and the Law, 22(2), 171–186.

Webb, E. J., Campbell, D. T., Schwartz, R. D., & Sechrest, L. (1966). Unobtrusive measures: Nonreactive research in the social sciences. Chicago, IL: Rand McNally College Publishing.

Webb, E. J., Campbell, D. T., Schwartz, R. D., Sechrest, L., & Grove, J. B. (1981). Nonreactive measures in the social sciences, second edition. Boston, MA: Houghton Mifflin Company.

White, T. W. (1999). How to identify suicidal people: A systematic approach to risk assessment. Philadelphia, PA: The Charles Press, Publishers.

Williams, T. J., Picano, J. J., Roland, R. R., & Bartone, P. (2012). Operational psychology: Foundation, applications, and issues. In J. H. Laurence & M. D. Matthews (Eds.), Oxford library of psychology. The Oxford handbook of military psychology (pp. 37–49). New York: Oxford University Press.

Wilson, T. D. (2009). Know thyself. Perspectives on Psychological Science, 4(4), 384–389.

Wilson, T. D., & Dunn, E. W. (2004). Self-knowledge: Its limits, value, and potential for improvement. Annual Review of Psychology, 55, 493–518.

Winter, D. G. (2013). Personality profiles of political elites. In L. Huddy, D. Sears, & J. Levy (Eds.), The Oxford handbook of political psychology, second edition (pp. 423–458). New York: Oxford University Press.

Wong, S. (1988). Is Hare’s Psychopathy Checklist reliable without the interview? Psychological Reports, 62(3), 931–934.

Woodworth, M., & Porter, S. (1999). Historical foundations and current applications of criminal profiling in violent crime investigations. Expert Evidence, 7, 241–264.

Wright, C. V., Beattie, S. G., Galper, D. I., Church, A. S., Bufka, L. F., Brabender, V. M., & Smith, B. L. (2017). Assessment practices of professional psychologists: Results of a national survey. Professional Psychology: Research and Practice, 48(2), 73–78.

Yerkes, R. M. (1917). Psychology and national service. Psychological Bulletin, 14, 259–262.

Yerkes, R. M. (1918). Psychology in relation to the War. Psychological Review, 25, 85–115.