Chapter 6

        The background assumptions of measurement practices in psychological assessment and psychiatric diagnosis

        Jared W. Keeley

6.1 Introduction

The validity of mental health diagnoses has been questioned long before the systems were formalized into official documents like the Diagnostic and Statistical Manual of Mental Disorders (DSM) of the American Psychiatric Association or the International Classification of Diseases (ICD) of the World Health Organization. For example, Hippocrates’ humoral account of mental illness was contested by the contemporaneous Cnidian school, which argued mental illness came from external disease agents rather than a constitutional imbalance (Weckowicz and Liebel-Weckowicz 1990). The debate arose from a difference in the two schools’ fundamental ontological assumptions, which translated into disagreements about appropriate medical diagnosis and intervention. Even in its relatively short 200-year history (Lewis 1941; Menninger et al. 1963), psychiatry has gone through multiple periods of severe criticism of the validity of its diagnostic concepts, each roughly coinciding with a major shift in the process and procedure of diagnosis (Hempel 1964; Houts 2000; Menninger et al. 1963; Szasz 1961). The ontological assumptions inherent in the procedures of diagnosis can have a profound impact upon the validation of those diagnoses.

    With the advent of new editions of both DSM and ICD, the time is ripe to consider the implications of the validity of the concepts included in these manuals. While other chapters in this volume address the finer points of changing models of validation in the context of psychiatric diagnosis, this chapter will focus on some of the background epistemic values and ontological assumptions present in psychological assessment and the process of diagnosing. Assessment is critical to understanding diagnostic validity, because it is through the measurement process, both in research and in clinical work, that information comes to bear for providing evidence for the validation of a diagnostic concept or the basis of a diagnostic decision. However, the assumptions of the measurement process can—in no small part—reciprocally shape the structure of the diagnostic entity. Certain assumptions restrict (or at least make more likely) certain diagnostic structures. Further, these measurement assumptions do not exist in a vacuum. Rather, they are partially determined by the decisions made by human beings, either acting as clinicians or researchers. Thus, some of the factors of human cognition undergird the measurement assumptions and exert an additional indirect influence upon the validity of diagnostic constructs. Diagnosis is a practical endeavor. The clinicians that use these diagnostic constructs are subject to quirks and biases of human cognition that limit the validity of different kinds of diagnostic constructs.

    This chapter will begin by outlining some of the influences human cognition might exert upon measurement concepts in the process of diagnosis. These influences form an ontological loop that reifies some of the assumptions made by the developers of psychological assessment methods and those clinicians that use them. The chapter will then progress to discuss some of the key assumptions made in diagnostic assessment practices and how those impact diagnostic validation.

6.2 The Interaction of Diagnostic Process and Measurement

Diagnostic validity involves the process of diagnosis as much as it does the ontological assumptions surrounding the underlying concepts. Both the process and ontology include a variety of epistemological assumptions about how we know anything and the optimal way to come to know anything. Ideally, a thorough deconstruction and examination of those processes could help elucidate problems or missteps in diagnostic validation as those constructs and measurements are initially developed. If simply understanding the assumptions were enough to develop and maintain diagnostic validity, then the problems of the field would have been solved long ago. There are additional processes that influence clinicians and researchers regarding diagnostic validation. Specifically, the cognitive processes involved in making a diagnosis (in a practical, day-to-day manner) also impact assumptions made about the underlying measurement properties.

    Even for clinicians who eschew formal diagnosis on theoretical grounds, some level of diagnostic process is fundamental to all clinical acts. Without understanding the nature of the person’s presenting complaint (i.e., an assessment), it is impossible to make determinations about a course of clinical action (unless one literally assumes all people and all problems are the same and provides a “one size fits all” approach to treatment). Thus, the way in which a clinician conceptualizes the information provided by an assessment has implications for later clinical decisions. Just as there are a number of ontological and epistemological assumptions underlying the measurement process, similar assumptions affect the clinician’s decision-making process.

    For example, a clinician’s assumptions about the ontology of mental disorders will shape what information he or she gathers in an assessment. If a clinician believes that psychiatric symptoms are fundamentally different than normal processes, he or she will look for discrete or categorical demarcations between them. In contrast, a clinician who believes psychiatric symptoms vary continuously in the population will ascribe to a dimensional model of psychopathology. These two clinicians are more likely to adopt assessment strategies that are compatible with their underlying assumptions and conceptualizations. Thus, the first clinician may be more comfortable with a checklist of the presence or absence of symptoms. The second clinician might be more likely to include self-report inventories that provide dimensional scores referenced against a normative population. The irony here is that the choice of assessment measure provides information that reinforces the clinician’s initial assumptions. For example, by constantly seeing the dimensionality of a client’s information, the second clinician is further convinced that his or her assumptions have been validated. However, both categorical and dimensional assumptions can be applied to every client. Of course, the process being described here is reification, which has long been recognized in the mental health literature as a problem for clinical science and practice (Hyman 2010).

    Mental health diagnosis is not unique; reification purveys all of human experience. The process of a clinician (or scientist) reifying a diagnostic concept has its roots in cognitive-perceptual processes. Experience shapes perception in a top-down fashion, whereby cognitive knowledge structures and expectations influence the way in which information is received and processed (Kveraga et al. 2007).

    What aspects of clinicians’ experience are most relevant to shaping how they perceive disorders? I will divide these influences into two domains: cognitive processes common to all human beings, and processes specific to the education of mental health professionals. I hold no illusion that these are the only two influences, or that they are mutually exclusive in any meaningful way. I use the distinction merely as a contrivance for elucidating their effects.

6.2.1 Common Features of Human Cognition

There are some inherent communalities to the way human beings operate generally that are relevant to the process of diagnosis. First, human beings tend to organize their world in categorical, hierarchical ways. A variety of work from anthropology (e.g., Atran 1990; Berlin 1992) and cognitive psychology (Coley et al. 2004; Deneault et al. 2005; Johnson and Mervis 1997; Lopez et al. 1997; Medin et al. 1997; Shafto and Coley 2003) has found that human beings organize the living world in very particular ways. Specifically, they organize living things into groups (categories) that reflect meaningful perceptual or pragmatic properties of the group. These groups are organized in a hierarchical fashion, such that smaller groups (e.g., dogs) are placed within larger groups (e.g., mammals).

    Interestingly, categorizing things into groups seems to be the way children learn these concepts, and their capacity for understanding the world in categorical terms mimics the inherent way in which human beings assimilate language in those early years. Thus, it is not surprising that children grow up to organize other aspects of their experience in similar ways (e.g., where things are in the grocery store or mental disorders). These sorts of categorical hierarchies seem to serve the function of making access to information easier and faster by virtue of simplifying the wide array of information present in the world (Biederman et al. 1999; Rosch et al. 1976). Flanagan and Blashfield (2007) have applied this notion of “folk taxonomies” to how mental health professionals organize mental disorders, and indeed, clinicians follow many of the same “natural” patterns that arise from common human mechanisms for organizing objects and constructs in the world.

    If there is something “natural” or inherent to utilizing categorical organizations (regardless of whether it is genetically or culturally driven), then it stands to reason that clinicians—being human beings—will default to organizing mental disorders in categories. Thus, an intuitive understanding of non-categorical models of mental disorder (regardless of their scientific validity) may be more difficult to acquire. Indeed, a preponderance of evidence suggests that many (if not most) mental disorder concepts should reasonably be considered dimensional at a latent level (Eaton et al. 2011; Haslam et al. 2012; Wright et al. 2013). The point, rather, is that using dimensional organizational structures for mental disorders, even if they better reflect the scientific validity of the concept, may be an uphill battle for human cognition. Preserving all the additional information of a dimension may overload clinicians’ cognitive capacities, such that they will implicitly (or even explicitly) try to simplify the amount of information for easier storage and use.

    A second factor affecting the diagnostic process for clinicians is overconfidence bias. Again, the overconfidence bias is not something unique to mental health; rather, it is common to human cognition in nearly all settings (West and Stanovich 1997). In essence, after people make a judgment, they tend to be more confident in that judgment than their accuracy would justify. In other words, people are often wrong, but they nearly always believe they are right once they have committed to a decision. Clinicians are the same after making diagnoses. Reliability statistics show that clinicians, in real world settings, can have relatively low agreement on their diagnoses (especially for common conditions like Major Depressive Disorder; Regier et al. 2013). However, individual clinicians continue to be relatively confident in their diagnoses (Smith and Dumont 2002). Thus, their overconfidence in the correctness of their diagnosis may lead clinicians to ignore disconfirming evidence, further reifying their understanding of the diagnostic construct and how they assess it.

6.2.2 Education and Reification

It is reasonable to assume that clinicians’ extensive training has an impact on the way they think. Indeed, a common definition of learning is an enduring change in cognition or behavior (Barker 2001). The educational process for mental health professionals has many goals, including gaining an understanding of the variety and forms of psychopathology, developing skills in case conceptualization, and learning how to formulate a treatment plan (e.g., see the “cube model” for a representation of educational goals for psychologists; Fouad et al. 2009; Rodolfa et al. 2005). Presumably, the educational process results in greater knowledge about these topics, although the various mental health disciplines are relatively unsystematic in demonstrating that they have met these goals. Nonetheless, if a clinician has not been exposed to a way of thinking about diagnosis (e.g., never been exposed to dimensional models of psychopathology), it stands to reason that it will be less likely the clinician would implement such a model in his or her practice. Thus, the educational system might place limits or boundaries upon the range of responses a clinician can utilize. If they are to expand upon those boundaries, it will have to occur after completing a degree through continuing education or self-study.

    Most educational systems provide some level of exposure to formal medical nosologies such as the DSM or ICD. One purpose of formalizing a classification system is to standardize the language used by professionals (Keeley et al. in press). Thus, indoctrination into the field requires some exposure to the common language of that field, including diagnostic terms like “Major Depressive Disorder” and “Schizophrenia.” The structure of diagnoses in these manuals tends to follow a set of underlying ontological assumptions (Sadler 2005). First, the DSM and ICD operate from the medical model, where the disorder is an imperfect pattern of common signs and symptoms that represents a (sometimes presumed) pathological process. The idea of a syndrome becomes inherent to some degree for individuals’ representations of disorders when trained with these manuals. They then organize their understanding of psychopathology around these patterns of behavior in the overall landscape of all mental health symptoms.

    Exposure to this model begins the process of reification. If nothing else, the DSM and ICD serve as a baseline against which alternative models of psychopathology are compared. Clinicians likely use DSM and ICD to structure their understanding of psychopathology when they start to see actual patients. Thus, rather than forming a bottom-up understanding of symptoms and their arrangement, students start with top-down concepts that they “discover” in their patients. The educational process, by virtue of coming first, inherently places some arrangement on clinicians’ perceptions. To be clear, I am not stating that the alternative of having each clinician construct a bottom-up classificatory system for psychopathology based upon individual experience is desirable. Such a state of affairs would create mass confusion in the field, as evidenced by the state of mental health services prior to DSM-I (see Grob 1991 and Houts 2000 for commentary). Rather, I am claiming that there will be some a priori biases to clinicians’ views by virtue of having been trained to be a clinician.

    Nevertheless, clinicians’ experience beyond their education seems to have an impact on their understanding of diagnoses. For example, a series of studies examining how mental health professionals would organize a classification of mental disorders have shown that clinicians do reproduce some familiar aspects of the DSM or ICD. However, they also show unique (and consistent) features that are not present in the DSM or ICD (Flanagan et al. 2008; Reed et al. 2013; Roberts et al. 2012). For example, clinicians do not preserve personality disorders as a coherent group. Rather, they tend to spread those disorders throughout the classification based upon the disorder’s phenomenology (e.g., Avoidant Personality Disorder is placed with anxiety disorders, Schizoid Personality Disorder is placed with psychotic disorders, Borderline Personality Disorder is placed with mood disorders; Flanagan and Blashfield 2006).

6.3 Education in Measurement and Assessment Basics

Clinicians who undergo psychological or academic psychiatric training are exposed to a variety of information about measurement. These experiences also shape their decisions about diagnosis and assessment. This information will form a necessary backbone for investigating the underlying ontological assumptions and epistemic values that are part of diagnostic validity. From a practical standpoint, the process of diagnosis requires an initial period of information-gathering (a.k.a., an assessment). That assessment, whether explicitly called such or not, is inherently a measurement process. Information is gathered, and that information has particular properties.

    Common practices for diagnostic assessment vary widely across mental health disciplines and professionals. Some ways of gathering information include diagnostic interviews, clinician-administered instruments, self-report questionnaires, neuropsychological tests (often performance based), projective tests, and many others. Other tests, although not commonly used, might have potential as diagnostic information, including neuroimaging scans, genetic testing, observation of the person’s social interactions or occupational performance, etc. Depending upon the presenting complaint of the person, a mental health professional might use any or all of these sources of information. Regardless of the source, the information could be classified in two kinds: qualitative versus quantitative information.

    Qualitative information retains its initial value only (usually verbal language, although it could be in another form, like a drawn picture or a facial expression) and—unlike quantitative information—is not assigned a numeric value. There is nothing inherent about the response that requires it to be qualitative or quantitative. Assessment responses traditionally treated as qualitative could be assigned a value—usually that process is just considered too difficult or burdensome to be useful.

    Quantitative information, on the other hand, is codified and assigned a numerical value. Quantitative responses may take on any of a set of values, and thus are appropriately termed “variables.” Variables may take one of four measurement scales: nominal, ordinal, interval, or ratio. Each measurement scale represents a different level of assumption about the kind of information portrayed by the numerical value. Nominal values provide no other information than group membership, such as 1 = Canadian, 2 = American, and 3 = Mexican. In a true nominal scale, there are no justifiable values in between the discrete groups. An ordinal scale adds a comparative value to the groups and ranks them. However, with an ordinal scale, the relative distance between the groups is unknown. The distance between 1 and 2 might or might not be the same as the distance between 2 and 3. Interval and ratio scales preserve the spacing between units, such that the movement from 1 to 2 to 3 on the scale represents equal amounts of difference. The difference between the two is that ratio scales have a true zero point, meaning that meaningful ratios can be computed from the numbers.

    Another important distinction that will be necessary for some of the discussion to follow is the difference between an indicator of a diagnostic concept and the concept itself. Under the DSM and ICD systems, individual symptoms and signs are taken as imperfect indicators of a disorder. The “reality” of that disorder may take on more or less meaning depending upon one’s ontological stance about measurement. The range of stances goes all the way from a belief that diagnoses correspond to real entities that are discoverable and even locatable, to a belief that they are instruments that serve a variety of practical purposes (Zachar and Kendler 2007). Regardless of the level of meaning one places on the disorder, the syndrome represents a conglomeration of symptoms and signs that seem to go together. It is important not to conflate any individual indicator (e.g., depressed mood) with the diagnostic construct (e.g., Major Depressive Disorder).

    Further, the structure of the disorder is—in part—assumed, but it can also be subjected to empirical investigation, assuming at least two competing models can be identified. The most common competing models (although not the only two) are categorical and dimensional approaches (Widiger and Samuel 2005). Categorical models assume that the diagnostic construct refers to a distinct group in a population, whereas dimensional models assume the disorder maintains continuous covariation among all individuals in the population. There are a number of models that exist between these two extremes (see Haslam 2002). A variety of data analytic strategies including taxometrics (Meehl 1995; Ruscio et al. 2006) and facture mixture analysis (Muthen 2006; Muthen and Muthen 2010) have been developed to test these alternative models.

6.4 Scientists are People, Too

All the factors discussed thus far regarding how clinicians’ perceptual and cognitive processes shape their diagnostic decisions also apply to the researchers who undergird the mental health field. The scientists who develop assessment measures or who investigate psychopathological constructs are subject to the same biases. As such, these individuals’ assumptions about the structure, nature, and cause of mental disorders may affect the measures they create and the constructs they investigate. If a scientist follows a belief in the inherent categorical nature of mental disorders, then his or her measurement choices in constructing an assessment tool could reflect that ontological assumption. The scientific investigations that studied the ontological structure of the construct would be all but forced to find support for its categorical nature if they used that measure.

    Similarly, researchers may adopt a measurement technology that presumes a specific ontology (like a yes or no question, discussed further in section 6.5). In that case, they may unintentionally adopt an ontological stance that shapes the diagnostic construct. Whether the researcher intentionally adopts a certain kind of measurement due to his ontological beliefs or simply selects a technology that does so for him, the measurement influences upon diagnostic validity form an ontological loop with those that create them, perpetuating ontological and epistemological assumptions about mental disorder. The process of diagnosis is reciprocally influenced by (and influences) the validation of the concept it is designed to assess. To borrow a quote from Wittgenstein, “Show me how you are searching and I will tell you what you are looking for” (1975, p. 67).

6.5 Assessment Options and Their Measurement Characteristics

The following sections will address two of the more common assessment styles and the ontological assumptions of each. The measurement choices employed in an assessment strategy simultaneously structure what sort of information is available and define what kind of diagnostic constructs they can detect.

6.5.1 Diagnostic Interviews

A common method of gathering diagnostic information is a clinical interview. The nature of that interview can vary from an unstructured format where the clinician chooses the topics and questions based upon the quality of the interaction, to structured interviews where the wording, order, and scope of the questions are set. Arguments for the use of unstructured interviews tend to highlight the importance of qualitative information not otherwise gathered in structured interviews (Segal et al. 2012), particularly for the purpose of treatment planning or rapport building (Segal et al. 2008). These arguments state that the process of quantifying the interaction in an interview inherently loses information. Thus, they ontologically privilege the quality of information that is not typically preserved in quantitative codes (such as emotional tone, or the interviewers’ emotional reaction; Churchill 2006). However, an unstructured interview does not presume any particular ontological structures for the diagnostic constructs it investigates.

    One criticism of unstructured interviews is that individual clinicians might inadvertently miss a topic area due to theoretical dispositions, overshadowing by more florid symptoms, or other factors (Segal et al. 2012). Similarly, the way in which a question is phrased might pull for particular responses from the interviewee (Rogers 2001). For example, asking, “How bad is your depression?” might pull for a more negative and thereby pathological response from an interviewee than, “How is your depression?” Therefore, by setting the wording, ordering, and number of questions, structured interviews purportedly create a more reliable diagnostic outcome.

    Structured interviews also contain a number of ontological assumptions. The first regards its domain of content. In being designed to ensure complete coverage of a topic area, the authors of the interview assume that their definition of the boundary (and thereby also the components) of the domain is complete. For example, the Anxiety Disorders Interview Schedule for DSM-IV (ADIS-IV; Brown et al. 1994) is designed to be a comprehensive examination of DSM-IV anxiety disorders. By measuring DSM-IV disorders, the validity of their interview is inherently limited by the validity of the diagnostic choices made by those individuals who developed the DSM-IV. Whether an interview is based upon the DSM or some other scheme, the validity of the measurement is inherently based upon the choices the researcher made when defining those boundaries. The biases and assumptions guiding the individuals who define the diagnostic construct are perpetuated by a researcher adopting that construct for the domain of an assessment instrument.

    A second ontologically loaded assumption of structured interviews is contained in the nominal or ordinal coding of the questions. Often, the response to an inquiry about a particular symptom is “yes, it is present” or “no, it is not.” This sort of information is consistent with a nominal, categorical definition of the symptom (although not necessarily with a categorical definition of the diagnostic construct, although the equivocation is often made). The consequence of using the yes/no response alternatives is that there are no (or negligible) options in between the two poles. The symptom is present or it is not, much in the same way a medical patient either has an abscess or does not. A partial abscess is an abscess.

    Some psychiatric symptoms are readily amenable to that sort of measurement scheme. For example, a person either does or does not have hallucinations. “Partial hallucinations” are not given that name; rather they are labeled “unusual perceptual experiences” or “illusions,” because the experience is considered qualitatively different. There might be additional, dimensional information about the severity or frequency of the hallucinations, but their presence (which is the information relevant for current diagnostic definitions) is categorical.

    Some diagnostic interviews utilize in-between response options to address symptoms that are not as clear-cut. For example, the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID-I; First et al. 2002) includes coding options for absent, below threshold, or present. Including a “below threshold” option allows for the measurement to capture in-between states where some of the symptom is present, but it has not yet reached diagnostic levels. For example, many individuals might experience depressed mood, but that mood might not be present “most of the day, more days than not,” which is required for a Major Depressive Episode (APA 2013). Thus, the interview could capture the information that some level of the symptom was present, but differentiate it from a higher level required for diagnosis.

    When an in-between coding option is available to interviewers, there is an implicit ontological implication that the symptom exists at varying levels. The three-point coding scale preserves the ordinal nature of distinctions (absent < sub-threshold < present); however, it loses information about the distance between the three points. Is a sub-threshold symptom closer to absent or to present? Just how frequent is the individual’s depressed mood? In the case of these sorts of symptoms, a categorical/discrete measurement scale is less justifiable than cases where there is clear separability to the state.

    In instances where there is not a clear demarcation in the nature of the symptom, a measurement scheme that divides the symptom must be based upon some convention. In other words, where on the continuous distribution of depressed mood does the interviewer decide that the level of mood is indicative of disorder? Often, these operational definitions are made explicit as part of developing the interview schedule. Other times, the threshold is more implicit, or developed during standardization training. These sorts of decisions may be entirely justifiable on the basis of some value-statement, such as increased risk of harm (to self or others), impairment in important daily tasks or functions, or distress to the person. However, the category created by the cut-point is not natural, in that the measurement itself did not suggest where the line should be drawn. Rather, the cut-point is practical. Other authors (Haslam 2002; Zachar 2000) have termed this sort of category a “practical kind.”

    Structured interviews are not required to use nominal or ordinal measurement schemes; their inclusion is a reflection of the ontological assumptions of the authors creating them or the technology available to them. For example, an interviewer could rate the client’s depression on a scale from 1 to 10—a dimensional measurement—instead of an ordinal set of thresholds. However, the symptoms, in-and-of themselves, rarely demonstrate a clear structure that obviously lends itself to one sort of measurement versus another. The choice of measurement probably reflects as much about the interview developer’s ontological stance and purpose in creating the assessment as it does about the nature of the symptom. As discussed previously, some reciprocal process of reification probably plays a role in the author’s assumptions about the nature of the symptoms and choice of how to measure them. The zeitgeist of the field also plays a role, in that cutting-edge assessment technology (like Item Response Theory) probably influences individuals’ selection of measurement options. New technologies are developed to replace the problems of the old, and thus are preferred. Nevertheless, each measurement technology has its own implied ontology. Often the developer of an assessment method may not be aware of the implications of that method, and thus adopts unintended ontological boundaries.

    Further, the best structure of the symptom (again, be it in dimensional or categorical terms) need not correspond to the structure of the diagnostic concept. Categorical measurement of symptoms could support a dimensional diagnostic concept, and dimensional assessment of symptoms could correspond to a diagnostic category. For example, if ten symptoms of a disorder were determined categorically (present/absent), the description of the disorder overall could still be dimensional (a score ranging from 0 to 10), with more symptoms indicating more severe cases. Vice versa, dimensionally assessed symptoms could be used to produce a categorical diagnosis, such as a taxon defined by individuals scoring at the high end of two or more dimensions. The best measurement scheme for the indicators of a disorder may be the same as or different than the assumed structure of the diagnostic construct. However, those choices are often conflated, leading researchers and clinicians to assume the structure of a diagnostic construct given the measurement of its components.

6.5.2 Self-Report Questionnaires

Perhaps the second most common assessment method is a self-report questionnaire. There are two major types of self-report measures: (a) large inventories (e.g., Minnesota Multiphasic Personality Inventory 2 [MMPI-2]; Butcher et al. 1989) that assess a variety of constructs simultaneously, and (b) small scales designed to assess a single construct (e.g., Beck Depression Inventory-II [BDI-II]; Beck et al. 1996) or perhaps a few aspects of similarly related constructs. Both rest on some similar measurement assumptions, but they also incorporate some subtle differences.

    In self-report questionnaires, each item is considered an imperfect indicator of the target construct (Embretson and Reise 2000). The combination of the items (because each represents an aspect of the construct) then creates a representation of the construct. Because each item is quantitatively measured, the sum or average of the numerical values of the items creates a continuous measurement scale (a.k.a., a dimension). The appropriate scaling (e.g., ordinal or interval) of that variable rests upon what ontological assumptions are made about the items. For example, if each item is considered to be a roughly equal indicator of the construct, then summing the items creates a scale with equal units. Classical test theory tends to assume items are equal indicators (Nunnally and Bernstein 1994), but they need not be. Item response theory is more flexible by allowing items to be weighted. However, allowing the indicators of the construct to be unequal requires additional assumptions before the scale can be considered interval (Embretson and Reise 2000). Averaging items for a total scale score necessitates at least interval data. Thus, any scale that uses an averaged score assumes interval properties for its items—that assumption may be justified or unjustified. If the items are treated as unequal, and their aggregate as an ordinal scale, then the only mathematically justifiable combination of those items is a sum. However, the danger of a summed score is to assume it is interval, because there is a wide range of possible values.

    Patterning the previous discussion of interviews, the individual items of the scale may or may not have the same measurement scale as the overall diagnostic concept. Some inventories use dichotomous response options, like true/false or present/absent. Perhaps the best-known user of this response style is the Minnesota Multiphasic Personality Inventory (MMPI). Each item is presented in a true/false format, where the individual must make a judgment about whether that characteristic is true of him- or herself. Thus, each indicator is assumed to follow a categorical present/absent structure. This ontological assumption may not be representative of the nature of the symptom (i.e., there is true variance in the severity between presence and absence) and the nature of the respondent’s decision-making process (Morey 1996). For this reason, many self-report questionnaires, like the Personality Assessment Inventory (PAI; Morey 1996), include a Likert-style range of responses in which indicators are scored as 0, 1, 2, or 3.

    For the MMPI, on some questions people earn one point by agreeing with the item and on others by disagreeing with the item. All the item scores are summed to create a dimensional scale of the variable in question. For example, Scale 2 (Depression) ranges from 0 to 57 in its raw responses, which are then converted into standardized, norm-referenced scores. The fact that these scores are converted to a standardized distribution with a known mean and standard deviation implies interval assumptions about the scale. Scores form a continuous (if not entirely normal) distribution across the population. This conceptualization of depression then assumes a dimensional structure to the construct, and further assumes that each item is an equal indicator of the construct (which is likely not true; see Aggen et al. 2005).

    Even though the scales of the MMPI assume dimensional structures, they also employ cut-points, making them practical kinds as described earlier. Based upon the normative likelihood of individuals endorsing a similar number of items, the authors of the test assign a cut-point. Individuals above the cut-point are interpreted to have problems significant enough to warrant intervention, where those below do not. In other words, the dimensional scheme is converted into a category for practical purposes like the decision to initiate treatment. That said, it is a practical decision, and the line could easily be drawn in another place with similar results. Interestingly, the scales are not always used in a purely categorical way by clinicians. If an individual’s score falls close to the cut-off, the clinician reserves the right of “clinical judgment” to interpret the meaning of that score in its context, and may consider it strong enough to indicate treatment. Similarly, the dimensional nature of the scale continues to be interpreted as an indicator of severity. In other words, individuals with scores over the cut-point are sorted based upon the relative degree of how high their scores are, and that relative information may be used to inform the clinical picture and intensity of intervention.

    Finally, there is an important ontological distinction to be made between multidimensional scales and unidimensional scales. Multidimensional scales assume that constructs have multiple components (e.g., like verbal and non-verbal intelligence for IQ). Unidimensional scales assume constructs are homogeneous. Multidimensional scales have an extra hurdle in establishing their validity for a purpose (like diagnosis) because measurement error can be attributed to multiple sources. If the author of the scale assumes the ontological structure of the construct is multifaceted, and thereby includes multiple components in the scale, the total scale which combines those elements may be flawed if any one of those components is not functioning adequately. Take, for example, a measure of depression which includes cognitive, affective, and physiological components in its definition. If the assessment of the physiological components of the construct is unreliable or poorly measured, then the total scale is similarly compromised, even though it might be masked by the adequate functioning of the other two areas.

    However, a unidimensional scale, which assumes a unitary structure to its construct, can be judged by its merits as if all measurement error pertains to the same construct. This sort of reasoning has led some measurement theorists to insist that it is only justifiable to construct measurements of unitary constructs (Smith and Zapolski 2009). In other words, we can only validate the meaning of one construct at a time. To validate a multidimensional construct, one must break it down into each of its components, to ensure that each component is functioning appropriately—hence, creating a series of unidimensional scales, in essence. This viewpoint reflects an eliminativist scientific realism whereby the constructs do not correspond to reality, but their components do.

    However, this reductionistic argument could be taken a step further. Unidimensional scales are determined to be unidimensional based upon factor analysis, i.e., all of the individual items on the scale are correlated to each other. However, it is never the case that each item is equally correlated to the assumed latent construct. Some items are better indicators than others. If the reductionist argument is taken to its extreme, we must validate individual items. Indeed, that is the approach taken within Item Response Theory (Embretson and Reise 2000), where each item is evaluated based upon its measurement characteristics. However, an instrument could go through a process of validation and yet not be valid because of its ontological assumptions. This reductionistic sort of argument contains an ontological assumption that the components of psychopathology can be broken into meaningful, separable elements, and the combination of those elements in an additive fashion creates the total meaning of the diagnostic concept. That sort of measurement strategy removes any interactive or causal effects among the elements of a diagnostic construct, such that its whole might be more than the sum of its parts (Keeley et al. 2013; Kim and Ahn 2002). By way of analogy, a reductionist argument would say the concept of clinical depression is no more than separable components of cognition, affect, and physiology. However, most theorists would easily agree that cognition effects affect, affect effects physiology, and so on, such that the interaction of the components is an important part of understanding the concept. Unidimensional measurement strategies fail to capture those sorts of ontological assumptions. Thus, multidimensional measurement approaches have been developed and are beginning to be applied to this problem (Mok and Xu 2013; Wang et al. 2004).

6.6 Conclusion

The validity of diagnostic concepts cannot be disentangled from the way in which they are conceptualized and measured, i.e., the process of validation. There will never be a single or all-purpose “objective” truth to mental disorder concepts, because there are legitimate reasons to use them for different purposes. Sometimes more information is necessary, like using a continuous variable in research to predict a low base rate outcome like suicide, justifying a dimensional measurement of a construct. Other times, all that information is unnecessary, because a clinical decision is dichotomous (Do I hospitalize this patient or not?). In those situations, gathering extra-dimensional information would waste time and resources; hence a categorical measurement process is preferred. The same clinical phenomenon might be legitimately conceptualized differently in different contexts for different purposes. Thus, the validity of a diagnostic construct is pluralistic, and impossible to disentangle from the measurement and pragmatic contexts in which it is used.

    The validity of diagnostic constructs is a reciprocal process involving many ontological, epistemological, and measurement-based assumptions. This chapter argues that the assumptions made in the measurement process, both in terms of constructing measures and using them practically in diagnosis, influence the assumptions made about the ontology of the diagnostic construct, and vice versa. The result is a situation where there will never be a single estimation of the validity of a diagnostic concept; rather, there will be many based upon the intended use and desired properties of the concept.

References

Aggen, S. H., Neale, M. C., and Kendler, K. S. (2005). DSM criteria for major depression: Evaluating symptom patterns using latent-trait item response models. Psychological Medicine, 35, 475–87.

American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders, Fifth edition. Washington, DC: American Psychiatric Association.

Atran, S. (1990). Cognitive foundations of natural history: Towards an anthropology of science. Cambridge, England: Cambridge University Press.

Barker, L. (2001). Learning and behavior: Biological, psychological, and sociocultural perspectives (3rd edn). Upper Saddle River, NJ: Prentice-Hall.

Beck, A. T., Steer, R. A., and Brown, G. K. (1996). Manual for the Beck Depression Inventory II. San Antonio, TX: Psychological Corporation.

Berlin, B. (1992). Ethnobiological classification. Princeton, New Jersey: Princeton University Press.

Biederman, I., Subramaniam, S., Bar, M., et al. (1999). Subordinate-level object classification reexamined. Psychological Research, 62, 131–53.

Brown, T. A., DiNardo, P. A., and Barlow, D. H. (1994). Anxiety Disorders Interview Schedule for DSM-IV (ADIS-IV). Albany, NY: Graywind Publications.

Butcher, J. N., Dahlstrom, W. G., Graham, J. R., et al. (1989). The Minnesota Multiphasic Personality Inventory-2 (MMPI-2): Manual for administration and scoring. Minneapolis, MN: University of Minnesota Press.

Churchill, S. (2006). Phenomenological analysis: Impression formation during a clinical assessment interview. In C. Fischer (ed.), Qualitative research methods for psychologists: Introduction through empirical studies (pp. 79–110). San Diego, CA: Elsevier Academic Press.

Coley, J., Hayes, B., Lawson, C., et al. (2004). Knowledge, expectations, and inductive reasoning within conceptual hierarchies. Cognition, 90, 217–53.

Deneault, J., Ricard, M., and Rimouski, P. (2005). The effect of hierarchical levels of categories on children’s deductive inferences about inclusion. International Journal of Psychology, 40, 65–79.

Eaton, N. R., Krueger, R. F., South, S. C., et al. (2011). Contrasting prototypes and dimensions in the classification of personality pathology: Evidence that dimensions, but not prototypes, are robust. Psychological Medicine, 41, 1151–63.

Embretson, S. E., and Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum Associates.

First, M. B., Spitzer, R. L., Gibbon, M., et al. (2002). Structured clinical interview for DSM-IV-TR Axis I Disorders, research version, patient edition. New York: Biometrics Research, New York State Psychiatric Institute.

Flanagan, E. H., and Blashfield, R. K. (2006). Do clinicians see Axis I and Axis II as different kinds of disorders? Comprehensive Psychiatry, 47, 496–502.

Flanagan, E. H., and Blashfield, R. K. (2007). Clinicians’ folk taxonomies of mental disorders. Philosophy, Psychiatry, and Psychology, 14, 249–69.

Flanagan, E. H., Keeley, J. W., and Blashfield, R. K. (2008). An alternative hierarchical organization of the mental disorders of DSM-IV. Journal of Abnormal Psychology, 117, 693–8.

Fouad, N. A., Grus, C. L., Hatcher, R. L., et al. (2009). Competency benchmarks: A model for understanding and measuring competence in professional psychology across training levels. Training and Education in Professional Psychology, 3, S5–S26.

Grob, G. (1991). Origins of DSM-I: A study in appearance and reality. American Journal of Psychiatry, 148, 421–31.

Haslam, N. (2002). Kinds of kinds: A conceptual taxonomy of psychiatric categories. Philosophy, Psychiatry, and Psychology, 9, 203–17.

Haslam, N., Holland, E., and Kuppens, P. (2012). Categories versus dimensions in personality and psychopathology: A quantitative review of taxometric research. Psychological Medicine, 42, 903–20.

Hempel, C. (1964). Fundamentals of taxonomy. In Aspects of scientific explanation and other essays in the philosophy of science. New York: The Free Press.

Houts, A. (2000). Fifty years of psychiatric nomenclature: Reflections on the 1943 War Department Technical Bulletin, Medical 203. Journal of Clinical Psychology, 56, 935–67.

Hyman, S. E. (2010). The diagnosis of mental disorders: The problem of reification. Annual Review of Clinical Psychology, 6, 155–79.

Johnson, K., and Mervis, C. (1997). Effects of varying levels of expertise on the basic level of categorization. Journal of Experimental Psychology: General, 126, 248–77.

Keeley, J., DeLao, C. S., and Kirk, C. L. (2013). The commutative property in comorbid diagnosis: Does A+B = B+A? Clinical Psychological Science, 1, 16–29.

Keeley, J., Morton, H., and Blashfield, R. K. (in press). Classification. In P. Blaney, R. Krueger, and T. Millon (eds), Oxford textbook of psychopathology (3rd edn). New York: Oxford University Press.

Kim, N. S., and Ahn, W. (2002). Clinical psychologists’ theory-based representations of mental disorders predict their diagnostic reasoning and memory. Journal of Experimental Psychology: General, 131, 451–76.

Kveraga, K., Ghuman, A. S., and Bar, M. (2007). Top-down predictions in the cognitive brain. Brain and Cognition, 65, 145–68.

Lewis, N. (1941). A short history of psychiatric achievement with a forecast for the future. New York: W. W. Norton & Company, Inc.

Lopez, A., Atran, S., Coley, J., et al. (1997). The tree of life: Universal and cultural features of folk-biological taxonomies and inductions. Cognitive Psychology, 32, 251–95.

Medin, D., Lynch, E., Coley, J., et al. (1997). Categorization and reasoning among tree experts: Do all roads lead to Rome? Cognitive Psychology, 32, 49–96.

Meehl, P. E. (1995). Bootstraps taxometrics: Solving the classification problem in psychopathology. American Psychologist, 50, 266–75.

Menninger, K., Mayman, M., and Pruyser, P. (1963). The vital balance: The life process in mental health and illness. New York: The Viking Press.

Mok, M., and Xu, K. (2013). Using multidimensional Rasch to enhance measurement precision: Initial results from simulation and empirical studies. Journal of Applied Measurement, 14, 27–43.

Morey, L. (1996). An interpretive guide to the Personality Assessment Inventory (PAI). Lutz, Florida: Psychological Assessment Resources, Inc.

Muthen, B. (2006). Should substance use disorders be considered as categorical or dimensional? Addiction, 101, 6–16.

Muthen, L. K., and Muthen, B. O. (2010). Mplus User’s Guide, Sixth edition. Los Angeles, CA: Muthen & Muthen.

Nunnally, J. C., and Bernstein, I. H. (1994). Psychometric theory, Third edition. New York: McGraw-Hill.

Reed, G., Roberts, M., Keeley, J., et al. (2013). Mental health professionals’ natural taxonomies of mental disorders: Implications for clinical utility of ICD-11 and DSM-5. Journal of Clinical Psychology, 69, 1191–212.

Regier, D. A., Narrow, W. E., Clarke, D. E., et al. (2013). DSM-5 field trials in the United States and Canada, part II: Test–retest reliability of selected categorical diagnoses. American Journal of Psychiatry, 170, 59–70.

Roberts, M., Reed, G., Medina-Mora, M., et al. (2012). A global clinicians’ map of mental disorders to improve ICD-11. International Review of Psychiatry, 24, 578–90.

Rodolfa, E. R., Bent, R. J., Eisman, E., et al. (2005). A cube model for competency development: Implications for psychology educators and regulators. Professional Psychology: Research and Practice, 36, 347–54.

Rogers, R. (2001). Handbook of diagnostic and structured interviewing. New York: Guilford.

Rosch, E., Mervis, C., Gray, W., et al. (1976). Basic objects in natural categories. Cognitive Psychology, 8, 382–439.

Ruscio, J., Haslam, N., and Ruscio, A. M. (2006). Introduction to the taxometric method. Mahwah, NJ: Erlbaum.

Sadler, J. Z. (2005). Values and psychiatric diagnosis. Oxford: Oxford University Press.

Segal, D. L., Maxfield, M., and Coolidge, F. L. (2008). Diagnostic interviewing. In M. Hersen and A. M. Gross (eds), Handbook of clinical psychology, Volume 1: Adults (pp. 371–94). Hoboken, NJ: John Wiley & Sons.

Segal, D. L., Mueller, A. E., and Coolidge, F. L. (2012). Structured and semistructured interviews for differential diagnosis: Fundamentals, applications, and essential features. In M. Hersen and D. C. Beidel (eds), Adult Psychopathology and Diagnosis, Sixth edition. Hoboken, New Jersey: John Wiley & Sons.

Shafto, P., and Coley, J. (2003). Development of categorization and reasoning in the natural world: Novices to experts, naïve similarity to ecological knowledge. Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 641–9.

Smith, G. T., and Zapolski, T. C. (2009). Construct validation of personality measures. In J. Butcher (ed.), Oxford handbook of personality assessment (pp. 81–98). New York: Oxford University Press.

Smith, J. D., and Dumont, F. (2002). Confidence in psychodiagnosis: What makes us so sure? Clinical Psychology and Psychotherapy, 9, 292–8.

Szasz, T. (1961). The myth of mental illness: Foundations of a theory of personal conduct. New York: Hoeber-Harper.

Wang, W., Chen, P., and Cheng, Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116–36.

Weckowicz, T., and Liebel-Weckowicz, H. (1990). A history of great ideas in abnormal psychology. Amsterdam, Holland: North-Holland.

West, R. F., and Stanovich, K. E. (1997). The domain specificity and generality of overconfidence: Individual differences in performance estimation bias. Psychonomic Bulletin & Review, 4, 387–92.

Widiger, T. A., and Samuel, D. B. (2005). Diagnostic categories or dimensions: A question for DSM-V. Journal of Abnormal Psychology, 114, 494–504.

Wittgenstein L. (1975). Philosophical Investigations. Oxford: Blackwell Publishing Ltd.

Wright, A. G., Krueger, R. F., Hobbs, M. J., et al. (2013). The structure of psychopathology: Toward an expanded quantitative empirical model. Journal of Abnormal Psychology, 122, 281–94.

Zachar, P. (2000). Psychiatric disorders are not natural kinds. Philosophy, Psychiatry, and Psychology, 7, 167–82.

Zachar, P., and Kendler, K. (2007). Psychiatric disorders: A conceptual taxonomy. American Journal of Psychiatry, 164, 557–65.