The importance of structural validity
Kathryn L. Jacobs and Robert F. Krueger
In this chapter, we will be discussing the role of structural validity in current psychiatric nosology. Structural validity, as we define it, is how closely the organizational structure of a set of definitions of psychiatric disorders matches how the disorders present in patient samples. We will discuss how structural validity has been neglected in the current nosology in favor of what we label as external validity, and what the possible consequences are in regards to the diagnosis and treatment of psychiatric disorders. We will then suggest possible changes that could be made to the current nosology to improve the structural validity of psychiatric diagnoses.
The first step in discussing current psychiatric nosology is to examine how it was initially developed. We therefore choose to focus first on reliability, and the role it has historically had in shaping psychiatric nosology. We believe that a discussion of the significant influence that reliability had in the formation of current diagnostic categories will give our readers a base of knowledge necessary to move forward into the discussion about structural validity.
Reliability is often a chief concern in the development or refinement of a diagnostic system. In the case of the development of the different iterations of the DSM, this is apparent when looking back at previous editions. In the first and second DSM publications, disorders were described in a very literary style in paragraph form. Diagnoses involved matching patients with the best fitting description. This made diagnoses highly subjective and hard to replicate from one clinician to another (Spitzer et al. 1978). An unreliable diagnostic system means that a patient cannot be sure of the diagnosis they are given, as it could change from clinician to clinician. This instability in diagnoses can complicate treatment, as different diagnoses presumably have different optimal treatments.
The shift toward greater reliability can be seen in the DSM-III, DSM-IV, and DSM-IV-TR. In these later editions, descriptions of mental disorders shifted away from the paragraph form seen in earlier publications. The new format presented mental disorders as polythetic-dichotomies. Mental disorders in the early versions of the DSM had previously been dichotomous, meaning that a patient was diagnosed with either having a given disorder or with not having it; there was no possible middle ground. The polythetic approach to mental disorders, however, was introduced in the DSM-III, and changed the way that mental disorders were described. This approach was applied to many of the new diagnostic categories described in the DSM-III, and carried over into the DSM-IV and DSM-IV-TR (text revision). In this new description, in order to have a given disorder, a patient had to present with a certain number of symptoms out of a larger group. For example, in order to be diagnosed with a major depressive episode, a patient would have to present with at least five of nine listed symptoms, including “depressed mood most of the day” and “markedly diminished interest or pleasure” (fourth ed.; text rev.; DSM-IV-TR; American Psychiatric Association 2000). The severity of the symptoms was not taken into account, nor was any specific combination of symptoms beyond the two required symptoms mentioned above. There is a specific boundary (from four to five symptoms) that a patient must cross before they can receive the diagnosis. The rigid use of these boundaries in clinical practice cut diagnoses into distinct, non-overlapping categories.
This new format was designed to improve the reliability of the DSM diagnoses, as it reduced the number of subjective decisions that a clinician had to make in regards to a patient (Grove et al. 1981). These reductions in subjectivity led to greater agreement amongst clinicians when given a particular patient to diagnose (Lobbestael et al. 2011). In specifying the definitions for mental disorders to a distinct set of required symptoms, the reliability of the diagnoses was improved.
This new system, created in order to enhance reliability in diagnoses of mental disorders, focused on external validity to support its new definitions. In our definition of external validity, we are referring to judging diagnoses by how they predict external criteria (e.g., functioning). This is of course an important consideration in diagnosing a mental disorder, but is not entirely sufficient. This is because diagnostic concepts can predict external variables, yet still not be organized in a way that reflects the empirical organization of disorders. The shift in format from paragraph descriptions to polythetic-dichotomies in the DSM made clear to clinicians what to focus on when considering a diagnosis. This was a major step forward in the mental health field, and helped clear up many ambiguities in definitions of mental disorders. This approach used external validity as the benchmark for judging the quality of these new definitions.
For a more concrete example, consider again the diagnosis of a major depressive episode. As noted before, in order to receive a diagnosis of a major depressive episode, a patient must present with five of the nine given symptoms listed in the DSM-IV-TR. In order to judge this diagnosis in terms of external validity, one would have to examine the patient’s life outside of the clinic. Do they have impaired functioning? Are multiple aspects of their life affected, such as work, home, and school? If they answer yes to these questions, one might judge the diagnosis of a major depressive episode as valid. This kind of impairment is what a clinician would expect given a diagnosis of a major depressive episode; therefore, the diagnosis seems to fit.
There is one central problem with using external validity as the primary support for a current diagnostic system. This problem is that external validity uses circular reasoning to back up its claims. A diagnostic construct is considered valid when it can be shown that the patients present with the expected clinical impairments, but those impairments are only expected because that is how the diagnosis was conceptualized in the first place. Many different diagnoses—mental disorders as seemingly different as major depressive episode and cocaine intoxication—can present with very similar impairment symptoms (work functioning, sleep disturbances). Yet despite being different diagnoses, these same external correlates can be used as support for a diagnosis of either (fourth ed.; text rev.; DSM-IV-TR; American Psychiatric Association 2000). This example illustrates the problems inherent in using external validity as a sole source of support for a diagnostic system. External validity is too broad to be used as a primary means of validating a classification system. While it is useful for judging which cases should be included in the broader category of “mental disorders,” alone it is insufficient to validate which specific diagnosis a given patient should receive.
If external validity alone is not enough to validate a set of psychiatric diagnoses, then what can be done to improve these diagnoses? We believe a plausible solution would be to look at structural validity. The definition for structural validity originated in a 1957 monograph written by Jane Loevinger. In it, Loevinger describes a type of validity focused on evaluating and improving psychological test structure. Examining the structural validity of a test would involve looking at the test questions and how they correlate with one another, and comparing those correlations to real-world behaviors. If the behaviors which correlated in real life corresponded to correlating test questions, then the test is structurally valid.
For example, consider a psychological test designed to measure aggression. For the purpose of this example, we will define aggression as a single, discrete concept, rather than a diffuse family of behaviors. In real life, one might expect a number of physically aggressive acts committed against a romantic partner in a given period of time to correlate positively with the number of verbal threats made to co-workers. In this example, in order to have a structurally valid test, the test questions aimed at measuring romantic aggression should correlate positively with the test questions aimed at measuring co-worker aggression.
In our case, we will be using a slightly different definition of structural validity. The idea is analogous to that of Loevinger, but has been modified somewhat to be relevant to the nosology of mental disorders rather than tests. Structural validity, as presented in the remainder of this chapter, is how closely the organizational structure of a set of definitions of psychiatric disorders matches how the disorders present themselves in clinical samples.
For an example, consider the diagnoses of Major Depressive Disorder (MDD) and Generalized Anxiety Disorder (GAD). In current organizational structure, these are defined as separate, unrelated illnesses. The recent publication of the DSM-5 has arranged the disorders to be located next to each other in the text, but they are still considered separate diagnoses (American Psychiatric Association 2013). To examine the structural validity of these diagnoses, one must look at how they manifest in patient populations. A study conducted via survey to examine comorbidity found the prevalence of MDD in a general population to be 8.5 percent (Kessler et al. 1999). Generalized Anxiety Disorder occurred at a rate of 1.3 percent (Kessler et al. 1999). According to probability theory, the joint probability of any two independent events is the probability of the first multiplied by the probability of the second. Given the individual probabilities listed above, if MDD and GAD were indeed independent, one would expect the prevalence of patients with co-occurring MDD and GAD to be, at most, 0.085 * 0.013, which is equal to 0.0011 or 0.11 percent. The actual comorbidity, according to the survey, was 2.3 percent, approximately 20 times the expected rate (Kessler et al. 1999).
This seems to indicate that the definitions of MDD and GAD need to change if they are to be considered structurally valid. We will explore a possible alternative for the diagnoses of MDD and GAD later on in this chapter. By separating disorders and making them as specific as possible, the current nosological system for psychiatric disorders fails to consider structural validity, instead opting for external validity to support its definitions.
We mentioned in the previous section that Loevinger’s definition of structural validity was originally written to examine the validity of psychological tests. Her definition of structural validity fits with the manner in which psychological tests are developed. In Loevinger’s definition, structural validity is determined by looking at how the test structure correlates with how behaviors are structured in the real world. When psychological tests are first developed, researchers come up with a broad array of questions that could possibly relate to the psychological construct being tested. They use these questions with an initial group of subjects, and then narrow down the question pool by looking at the correlations between the answers and the real-life behavior they are trying to study.
In test development, structural validity is defined as “the degree to which scores of a questionnaire are an adequate reflection of the dimensionality of the construct to be measured” (Elbers et al. 2012). This type of validity is conceptually different from external validity. In external validity, a diagnosis is validated simply based upon the fact that it correlates with expected external criteria (e.g., functioning), without consideration as to the structure of these external correlates. In considering structural validity, researchers and clinicians must also focus on the underlying structure of the diagnoses, and whether or not the patterns in the diagnostic space match the patterns seen in patient populations.
Structural validity is considered very important in test development. This is seen in a review of a self-report questionnaire examining fatigue in patients suffering from Parkinson’s disease. Elbers et al. (2012) examined 31 different questionnaires focusing on fatigue in patient populations. The researchers evaluated the different questionnaires based on a number of criteria with the goal of more clearly defining fatigue in a clinical setting. In this evaluation, structural validity, or how well the scores of the questionnaire are an adequate reflection of the dimensionality of fatigue as a construct, was considered crucial in order to consider the questionnaire a good indicator of fatigue symptoms.
This method of test development is a data-driven method, meaning that the test structure changes accordingly when unexpected correlations are found in the initial test subjects. Naming of new psychiatric disorders, on the other hand, is an expert-driven approach. This means that instead of drawing from data to develop criteria, experts in the field create delineation between symptoms as they perceive them to be arranged in real life.
For example, consider a disorder that has been conceptualized and studied by a small group of researchers. This disorder is characterized by the intense desire to have one’s limb(s) amputated (First 2004). The limbs are not diseased in any way, and the researchers claim that the desire is not systematically related to any documented sexual fetish. In the consideration of adding this desire for amputation as a separate disorder to future publications of the DSM, we can see the expert-driven approach taken to naming new psychiatric disorders. In this case, a small group of researchers noticed a trend among a group of patients. This trend was the desire to have a healthy limb removed. The researchers then sought out other people who shared this desire. They posted on forums, searched chat rooms, and asked current patients if they knew of anyone else who shared their desire for amputation. In doing so, they amassed a small group of people who seemed to share similar symptoms. The researchers categorized this disorder, considering it separate from any disorder included in the current nosological system, and proposed that it be considered for inclusion in future nosology.
There are several differences between this expert-driven approach seen in the proposal of the addition of an amputation-affinity disorder and the data-driven approach used to create new psychological tests. In the development of tests, a battery of questions is created with the goal of capturing all aspects of a construct. The construct is then refined based upon the patterns of correlations (between items, and between items and behaviors) that occur in the population.
In contrast, in the expert-driven development of the definition of the amputation disorder, researchers actively sought out patients who showcased the set of symptoms that the researchers were looking for. The definition or categorization of the amputation disorder was not changed based on varying symptoms in a large group of people, because anyone who displayed symptoms that were not consistent with what the researchers were looking for was excluded from the study.
The problem with this expert-driven method is that it can miss many of the intricacies and variations in human behavior, normal and abnormal. By artificially delineating between people who have an affinity for amputation and people who do not (according to the definitions they themselves created), researchers miss out on a possible spectrum of thoughts present in subjects not considered for the study. If there was a person who had once considered amputation, but had decided against it, they would not be included anywhere in the scientific documentation, even though this could be an important middle ground in studying the phenomenon of feeling detached from one’s own limbs. Instead, researchers looking to name a new psychiatric disorder pick only those subjects who fit the description that the researchers themselves created.
As a contrast, consider an example of how to approach this diagnosis from a data-driven, structurally valid perspective. In this approach, researchers or clinicians looking to create a new diagnosis would need to look at a much broader range of the population. Instead of focusing only on those who are currently considering voluntary amputation, they would broaden their search to, for example, people who had once considered amputation, people who felt separated from their limbs, and any number of other related criteria that the researchers proposed. They would then collect data on all of the subjects exhibiting these symptoms, including data on all other psychiatric symptoms not included in the original data set. This data could include possible depressive symptoms, anxious symptoms, and any other psychiatric symptoms of note. The researchers would then look at the structure of the data in the sample collected, examining correlative trends between symptoms. This would allow the researchers to discover if this diagnosis was simply a part of a larger group of symptoms (like anxiety), or if it did indeed represent a separate and distinct set of symptoms warranting a new diagnosis.
To further explore the importance of a structurally valid system of nosology, we will focus on the definitions of personality disorders in the DSM-IV-TR and DSM-5. The personality disorders, like many of the other diagnoses in the DSM-IV-TR, were treated as polythetic-dichotomies. As described earlier, a polythetic-dichotomy is an organizational system in which a diagnosis is either present or not (there is no middle ground), and a certain number of criteria have to be met in order to qualify for the diagnosis. Even if, as mentioned in the DSM-IV-TR, the diagnoses were not meant as hard, non-negotiable categories (fourth ed.; text rev.; DSM-IV-TR; American Psychiatric Association 2000), clinicians and research often reified these categories in practice, despite there being evidence that such reifications were premature, given the state of our understanding of mental disorders (Hyman 2010).
In the case of the personality disorders, this organizational system proved to be particularly troubling to many in the field. The personality disorders in the DSM-IV-TR were split into ten categories a priori, a pattern which does not seem to represent what is found in nature (Krueger et al. 2011a; Widiger et al. 2009). These boundaries, inconsistent with what seems to occur naturally, lead to large comorbidity between different personality disorders (Skodol et al. 2011), as well as frequent use of the Not Otherwise Specified (NOS) category (Widiger et al. 2005). This lack of specificity leads to poor coverage in the field of personality disorders, leaving 40 percent of patients with a PD not covered by current criteria boundaries (Livesley 2012). This inconsideration of structural validity is beneficial to neither patients nor researchers, as a substantial portion of the clinical population is being either missed or misrepresented. Through the pursuit of data-derived diagnoses, the DSM could have its structure reflect the natural organization of mental disorders.
One possible consequence of a structurally invalid nosology is that actual correlations between diagnoses are hidden because clinicians believe that these diagnoses are separate entities, leading them to ignore areas of overlap. As mentioned in the previous section, current nosology and improper use of the DSM in regards to personality disorders has led to not only a very high comorbidity rate, but also a high percentage of patients categorized as having a Personality Disorder Not Otherwise Specified (Widiger et al. 2005). This suggests that these categories are not properly representing how personality disorders are structured in the population. If these disorders were indeed unrelated, such a high comorbidity rate would not occur.
Use of the DSM verbatim, without consideration as to the underlying intentions of the authors, has the potential to be extremely costly to the scientific community. If a system of classification is not examined for structural validity, many years and millions of dollars in grant money could be wasted examining differences between diagnoses that in reality show no distinct boundary. This is especially true if research groups exclude subjects who exhibit comorbid diagnoses. If two diagnoses are related, showing high comorbidity rates, excluding subjects who are comorbid for both diagnoses artificially separates subjects into two groups that should be more closely linked. If new diagnostic criteria are created by expert opinion rather than data driven, possible links are more likely to be missed if the said experts are used to seeing only one type of patient.
All of these negative consequences of a structurally invalid nosology hurt not only researchers, but patients themselves. Consider from our examples, a patient who displays both Avoidant Personality Disorder and Dependent Personality Disorder. Depending on the clinician that the patient visits, they may receive a diagnosis of a disorder of Avoidant PD, a diagnosis of Dependent PD, both, or neither. Their diagnoses may change depending on which clinician they see, when they are seen, or what symptoms they choose to highlight when being interviewed. This burden of uncertainty and multiple disparate diagnoses does not help the patient to understand their mental illness, and is not helpful in leading towards a treatment. Given this example, one can see how having a structurally valid and concise diagnosis containing the symptoms shared by both personality disorders could help greatly in reducing the stigma associated with multiple diagnoses, as well as helping in developing more specified treatment options.
From the previous examples given, one can see that having a structurally valid nosology would be greatly beneficial to the field of mental health. Having the terminology of diagnoses follow the patterns that are displayed by the data would indicate a more structurally valid nosology. However, given how data is collected, one can see that this goal is not easy to accomplish. In order to begin data collection, researchers must have a hypothesis in place to build upon. In the past, this has led to data collection aimed at confirming mental health diagnoses that were initially developed by expert opinion. This limits the type of data that can be collected, as the data will follow the patterns that are laid out by the diagnoses delineated in the hypothesis. In order to create a more structurally valid system of diagnoses, a more fluid system of diagnostic development is needed. Taking from the methods of test development described earlier in this chapter, researchers can use available data to modify the current system of nosology into a system more focused on structural validity.
With the recent development of the DSM-5, many researchers have been addressing the question of how to develop a more structurally valid system of nosology. In regards to personality disorders, one of the large changes was made based on the fact that patterns of comorbidity in PDs mimics patterns seen in personality traits (Krueger et al. 2011b). Using this information, work group members developed a new “trait based” system to categorize personality disorders (American Psychiatric Association 2013). Rather than representing the symptom space with ten categorically distinct syndromes (borderline, histrionic, etc.), it is modeled using a profile of twenty-seven pathological traits (emotional liability, anxiousness, etc.) that can be grouped into six broader domains (e.g. negative affectivity). This new system both explained a good portion of the variance in the DSM-IV PDs, as well as providing incremental information about an individual’s disorder using a measure of severity of impairment (Hopwood et al. 2012).
This new trait-based system for personality disorders presented in the DSM-5 is a large step forward for those interested in creating a structurally valid nosological system. However, there are still some problems with the way the DSM-5 presents personality disorders. The personality disorders are split into two sections, with the new trait-based system printed in a later section of the book, labeled as the “alternative model” (American Psychiatric Association 2013). Written along with all of the other diagnoses are the original ten personality disorders, reprinted from the DSM-IV-TR. This poses a problem, as the empirically flawed DSM-IV diagnostic criteria are presented as a valid diagnostic system. In order to get closer to a structurally valid system of nosology, current terminology needs to be allowed to change in the direction that is informed by the data available.
As a possible example for how structural validity could inform the organization of mental disorders in future DSM publications, consider the example from earlier in the chapter involving Major Depressive Disorder and Generalized Anxiety Disorder. As stated earlier, these two disorders are considered separate and distinct by current DSM nosology. Despite being considered independent, MDD and GAD co-occur at a rate much higher than would be expected by chance. In a structurally valid nosology, this overlap would have to be taken into account. For example, a new organizational structure could focus on a general pool of both depressive and anxious symptoms. These symptoms would be weighted by severity, with more severe symptoms considered to be of more importance. A patient would be rated on the number and severity of their symptoms, and given two scores.
The first score would represent the type of symptoms present (on a scale of depressive–anxious, with a midpoint representing an even ratio of depressive to anxious symptoms). The idea of combining depressive and anxious symptoms was supported by Goldberg (2010), who suggested that considering the two categories simultaneously could increase the effectiveness of treatment in a clinical setting. Goldberg proposes that a diagnosis such as “anxious depression” could be given in an applicable situation, saying that failing to take into account the anxiety of a clinically depressed patient may lead to improper treatment.
The second score would represent severity (from mild to severe). The idea of dimensional ratings based on severity was supported in Brown and Barlow (2009). The authors stated that by keeping current categories of mental disorders intact, meanwhile adding a dimensional assessment of severity, would allow diagnoses to be more accurately based on data while keeping the clinical utility of hard categories (Brown and Barlow 2009). This proposed system would better take into account the large overlap between depressive and anxious symptoms, while still preserving the fact that there are important differences between the two categories.
This example could also be considered on a broader scale. In their review, Barlow et al. (2013) highlight the underlying role of neuroticism in a large cluster of diagnoses represented in the current DSM, including depressive and anxiety disorders. They suggest that treatment of neuroticism directly via much broader treatment methods may benefit a large patient population. This method of treatment, like the hypothetical example presented earlier, incorporates the information about the high comorbidity rates seen in depressive and anxious disorders by targeting the personality style linking these diagnoses together.
We believe structural validity to be extremely important to consider when developing and evaluating psychiatric nosology. Current approaches to the development of mental disorder definitions focus primarily on reliability and external validity, which has led to diagnoses with considerable clinical overlap. This overlap, with many patients receiving two or more diagnoses, could conceivably lead to stigmatization, increased health care costs, and confusion regarding the mental health care field in general. It is therefore imperative that the organizational structure of mental disorder definitions matches as closely as possible to how the symptoms present in a research setting. The expert-driven approach to diagnoses, with experts providing definitions by consensus, was abandoned years ago by the medical community (Sackett et al. 1996). It is time for the psychiatric community to follow suit, and utilize a data-driven approach to the classification of mental disorders.
Aina, Y., and Susman, J. L. (2006). Understanding comorbidity with depression and anxiety disorders. The Journal of the American Osteopathic Association, 106(5),
American Psychiatric Association (2000). Diagnostic and statistical manual of mental disorders, Fourth edition, text rev.). Washington, DC: American Psychiatric Association.
American Psychiatric Association (2013). Diagnostic and statistical manual of mental disorders Fifth edition. Arlington, VA: American Psychiatric Publishing.
Barlow, D. H., Sauer-Zavala, S., Carl, J. R., Bullis, J. R., and Ellard, K. K. (2014). The nature, diagnosis, and treatment of neuroticism: Back to the future. Clinical Psychological Science, 2, 344–65.
Brown, T. A., and Barlow, D. H. (2009). A proposal for a dimensional classification system based on the shared features of the DSM-IV anxiety and mood disorders: Implications for assessment and treatment. Psychological Assessment, 21(3), 256–71.
Elbers, R. G., Van Wegen, E. E. H., Verhoef, J., et al. (2012). Reliability and structural validity of the Multidimensional Fatigue Inventory (MFI) in patients with idiopathic Parkinson’s disease. Parkinsonism & Related Disorders, 18(5), 532–6.
First, M. (2004). Desire for amputation of a limb: paraphilia, psychosis, or a new type of identity disorder. Psychological Medicine, (6), 919–28.
Goldberg, D. (2010). Should our major classifications of mental disorders be revised? The British Journal of Psychiatry, 196, 255–6.
Grove, W. M., Andreasen, N. C., McDonald-Scott, P., et al. (1981). Reliability studies of psychiatric diagnosis theory and practice. Archives of General Psychiatry, 38(4), 408–13.
Hopwood, C. J., Thomas, K. M., Markon, K. E., et al. (2012). DSM-5 personality traits and DSM-IV personality disorders. Journal of Abnormal Psychology, 121(2), 424–32.
Hyman, S. (2010). The diagnosis of mental disorders: The problem of reification. Annual Review of Clinical Psychology, 6, 155–79.
Kessler, R. C., DuPont, R. L., Berglund, P., et al. (1999). Impairment in pure and comorbid generalized anxiety disorder and major depression at 12 months in two national surveys. The American Journal of Psychiatry, 156(12), 1915–23
Krueger, R. F., Eaton, N. R., and Clark, L. A., et al. (2011). Deriving an empirical structure of personality pathology for DSM-5. Journal of Personality Disorders, 25(2), 170–91.
Krueger, R. F., Eaton, N. R., Derringer, J., et al. (2011). Personality in DSM-5: Helping delineate personality disorder content and framing the metastructure. Journal of Personality Assessment, 93(4), 325–31.
Livesley, J. (2012). Tradition versus empiricism in the current DSM-5 proposal for revising the classification of personality disorders. Criminal Behaviour and Mental Health, 22(2), 81–90.
Lobbestael, J., Leurgans, M., and Arntz, A. (2011). Inter-rater reliability of the Structured Clinical Interview for DSM-IV Axis I Disorders (SCID I) and Axis II Disorders (SCID II). Clinical Psychology & Psychotherapy, 18, 75–9.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3, 635–94.
Sackett, D. L., Rosenberg, W. M., Gray, J. A., et al. (1996). Evidence-based medicine: what it is and what it isn’t. British Medical Journal, 312, 71–2.
Skodol, A. E., Clark, A., Bender, D. S., et al. (2011). Proposed changes in personality and personality disorder assessment and diagnosis for DSM-5 Part I: Description and rationale. Personality Disorders: Theory, Research, and Treatment, 2(1), 4–22.
Spitzer, R. L., Endicott, J., and Robins, E. (1978). Research Diagnostic Criteria Rationale and Reliability. Archives of General Psychiatry, 35(6), 773–82.
Widiger, T. A., Livesley, J., and Clark, L. (2009). An integrative dimensional classification of personality disorder. Psychological Assessment, 21(3), 243–55.
Widiger, T. A., and Samuel, D. B. (2005). Diagnostic categories or dimensions? A question for the Diagnostic and Statistical Manual of Mental Disorders—fifth edition. Journal of Abnormal Psychology, 114(4), 494–504.