Alternative Perspectives on Psychiatric Validation: DSM, ICD, RDoC, and Beyond

Chapter 1

Introduction: The concept of validation in psychiatry and psychology

Peter Zachar and Assen Jablensky

1.1 Introduction

The roots of validity lie in logic, referring to whether an instance of reasoning conforms to correct rules (formal validity) and to whether the conclusion is true (material validity). How we progressed from logical validity to a problem about the validity of diagnostic constructs is not a simple tale. Although the path from logic to the current notions of validation in psychiatry travels through the science of psychological measurement, one has to be careful about construing parallel developments in psychiatry and clinical psychology as causally related and thereby inferring connections that never existed.

Psychologists began using reliability and validity to think about the technology of measuring inferred psychological attributes as two interchangeable terms for “adequacy” (Leuba 1899; Starch 1915). Subsequently, they employed them to distinguish measuring a psychological attribute consistently—reliability—from measuring it accurately—validity (Thurstone 1931; Adams 1936). As we shall see, in psychology the problem of accurately measuring psychological attributes came to be seen as the problem of measuring theoretical constructs, whereas in psychiatry the primary concern was one of confirming disease status. Over the years, however, the validity problem in psychiatry has also evolved into a problem about theoretical constructs.

1.2 Validity in Mid-Twentieth-Century Science and Philosophy

In the middle years of the twentieth century, the school of logical positivism was the dominant approach in the philosophy of science. One of the goals of this school was to elucidate the logical structure of scientific reasoning. It therefore made sense for the logical positivists to refer to the validity of scientific theories. According to them, validity was largely formal. For example, on the logical positivist’s account, confirmation and explanation depended on conforming to proper logical syntax.

The positivist (or empiricist) aspect of this school held that science seeks to discover and systematize regularities in the network of observations that are part of experience. Logical positivism also updated empiricism to better conform to twentieth-century science (especially relativity theory and quantum physics). Networks of scientific concepts, the logical positivists agreed, also contain theoretical constructs such as force and electron (in physics) or general intelligence (in psychology).

What does psychiatric diagnostic classification look like from the perspective of such an empiricism? According to this particular empiricist view, in psychiatry a regular pattern of characteristic self-disturbances, hallucinations, delusions, and a decline in functioning is given a name such as “schizophrenia.” In the most minimalist form of empiricism, schizophrenia is a descriptive term (or inductive summary) that refers only to the pattern of observed signs and reported symptoms.

Less minimally, schizophrenia is a theoretical construct that enables clinicians to organize signs and symptoms into a coherent framework. The construct of schizophrenia also has surplus meaning by virtue of its association with other theoretical constructs such “psychosis,” and “disease.” In general, empiricists are instrumentalist and anti-realist about theoretical constructs, viewing them like they do socioeconomic status. A person’s socioeconomic status is not a cause of income level and educational attainment; rather, it is a handy abbreviation for income and educational attainment patterns in a population.

According to Markus and Borsboom (2013), the psychologists who introduced the notion of construct validity increasingly went beyond the empiricism of the logical positivists and adopted scientific realism about psychological attributes such as intelligence, extroversion, and schizophrenia. According to realism about constructs, differences in test scores are caused by people’s position on the psychological attribute being measured. These attributes are considered to exist independently of being measured.

1.3 Science and Validity in Psychiatry

For nearly the entire twentieth century psychologists debated whether the latent variable of general intelligence is a real attribute/natural kind or a mathematical construct whose meaning changes depending upon how it is measured. Proposed mid-century largely to address the clinical constructs measured by instruments such as the Rorschach Inkblot Test and the Minnesota Multiphasic Personality Inventory (MMPI), the notion of construct validity redrew the lines of the ongoing debate. After the lines were redrawn, schizophrenia and hysteria were declared to be unproblematical constructs—but constructs that cannot be reduced to how they are measured and that can refer to something real.¹

The term construct validity was introduced in an American Psychological Association Technical Report in 1954. The committee that prepared this report was chaired by Lee Cronbach. According to Cronbach (1989), the idea of construct validation was proposed by committee member Paul Meehl. It had been worked out in cooperation with Meehl’s colleagues at the Minnesota Center for the Philosophy of Science. Meehl expanded on these ideas with Cronbach in a 1955 article titled Construct validity in psychological tests. One of the main ideas of this article was that the validation of a test is analogous to the validation of a theory in science (according to the strictures of logical positivism/empiricism with scientific realism tacked on).

If Cronbach and Meehl’s article was a watershed event for construct validity in psychology, Robins and Guze’s (1970) article The establishment of diagnostic validity in psychiatric illness: Its application to schizophrenia played a similar role in psychiatry. In their article Robins and Guze said that diagnosis must be a scientific classification, and valid classification is essential to science. Rather than worry about the validity of the diagnosis of a single patient as would be typical in medicine, they were concerned about the validity of schizophrenia—and later about classification in general (Woodruff et al. 1974).

Most commentators consider this article to be an attempt to resurrect a psychiatry of disease entities similar to that advocated by Emil Kraepelin. Kraepelin proposed that dementia praecox (renamed schizophrenia by Bleuler in 1908) and manic depressive illness were two different entities, with the first having a deteriorating course and the second involving recovery and re-occurrence over time. In this tradition, Robins and Guze’s over-arching construct was “psychiatric illness.” They proposed five groups of validators—clinical description, laboratory studies, differentiation from other disorders, studies of outcome, and family studies—each of which were predictions about what would be observed if a diagnostic construct such as schizophrenia conformed to their illness construct.

In the 1950s there was little interest in diagnosis among American psychiatrists, with one important exception being a group of scientifically oriented psychiatrists at Washington University in St. Louis. Subsequently named the neo-Kraepelinians, this group included Robins and Guze. They introduced the concept of diagnostic validity to describe the research programs that nosologically oriented psychiatrists were already conducting (Goodwin et al. 1969; Purtell et al. 1951; Robins and Mensh 1954). Validity was also a helpful term for encouraging psychiatrists to conduct research that could disprove Szasz’s (1961) claims about mental illness being a myth (or a theoretical fiction).

To what extent did the articulation of construct validity in clinical psychology influence the conceptualization of diagnostic validity in psychiatry? It is worth noting that Samuel Guze’s early research included a study of the validity of The Taylor Anxiety Scale (Matarazzo et al. 1955). In that article Guze and his colleagues claimed to be evaluating construct validity (as described in the 1954 technical report), defining validity as confirming theory-based predictions. This mingles Robins’ natural history of disease approach, the predictive validity notion that preceded the work of Cronbach and Meehl, and construct validation.

Another factor influencing the establishment of a psychiatric research program on diagnostic validity was the emphasis in the 1970s placed on the evaluation of reliability as it was assessed statistically by psychologists (Ash 1949; Kendler et al. 2010). The Washington University group’s operationalization of diagnostic constructs (called the Feighner criteria) and Columbia University psychiatrist Robert Spitzer’s advocacy of measuring reliability using Cohen’s kappa culminated in the publication of the third edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-III) in 1980 (Feighner et al. 1972; Spitzer et al. 1967).²

After the DSM-III was published its proponents claimed that reliability had been achieved. Once the psychologists’ more scientific approach to reliability was implemented in psychiatry, the reliability–validity distinction came for free, and with it came the notion that securing validity is the next task.

In principle, the problem of reliability has been solved. . . . However, reliability does not guarantee validity. While reliability is a necessary precursor to establishing the validity of psychopathologic classes, special efforts are required for validity research (Klerman 1986).

In making this claim, Klerman was relying upon the psychometric principle that reliability sets a ceiling on validity since the latter cannot be meaningfully explored unless the variable or entity under consideration can be defined in robust and reproducible terms. Despite this bridge back to psychometrics, the meaning of construct validity for psychologists and diagnostic validity for psychiatrists continued to differ. The nuts and bolts language of measuring abstract constructs (or latent variables) is not one that psychiatrists tended to use (Blashfield and Livesley 1991). Meehl’s metaphysical elaborations such as “surplus meaning” and “nomological networks” were not carried over to psychiatry. Inspired by biomedicine and not physics, for psychiatrists diagnosis was about identifying disease entities (nosological realism in Rodrigues and Banzato’s terms in Chapter 3). Yet as we will see, the notion that psychiatric classifications are constructs that may or may not represent real clinical entities has gradually been working its way into psychiatric thinking.

1.4 The Failure to Validate as Expected

Each of Robins and Guze’s validators can be considered a standard of adequacy that a diagnostic construct must meet. Documenting that the construct does meet a standard is called validation. In recent years a variety of standards have been articulated.

1.A diagnosis is valid if it can be confirmed by a test that is independent of the diagnostic criteria (e.g., a biopsy validates a diagnosis of cancer).

2.A diagnostic criterion is valid if it is a sensitive indicator of a disorder.

3.A diagnostic criterion is valid if it is a specific indicator, distinguishing true cases of the disorder from other disorders.

4.A diagnostic construct is valid if it representatively samples the psychological and behavioral features of the disorder.

5.A diagnostic construct is valid if it refers to an integrated syndrome (a pattern of intercorrelated symptoms and a predictable time course) that supports a distinction between cases and non-cases (i.e., a natural clinical grouping).

6.A diagnostic construct is valid if it allows professionals to make non-trivial inferences about patients that contribute to the description, management, or treatment of the problem. Non-trivial means the inferences are not deducible from the definition of the construct.

7.A diagnostic construct is valid if it is psychometrically unidimensional.

8.A diagnostic construct is valid if it corresponds to a unique (identity-determining) etiology (preferably biological).

9.A diagnostic construct is valid if it refers to an objective dysfunction that is harmful to its bearer.

10.A diagnostic construct is valid if its internal structure corresponds to how symptoms are structured in the population.

Arguably, one of most significant developments of the past 30 years is how difficult it has been to decisively validate the constructs of the DSM using the Robins and Guze standards. As regards the mental disorders section of the International Classification of Diseases (ICD-10), such validation has not been attempted.

For instance, doubt has been raised about the distinction between schizophrenia and manic depression (bipolar disorder) as assessed by every Robins and Guze standard (Kendell 1991; Bentall et al. 1988). Even when these constructs are carefully defined to emphasize severe pathology, symptom overlap is extensive (description, differentiation), clinical course is highly variable (outcome), they share genetic vulnerabilities not only with each other but with other disorders as well (family studies, differentiation), and no identity-determining biological pathology has been identified (lab studies) (Jablensky 2010; Greene 2007; Craddock and Owen 2005).

What complicates this issue is that rather than being conclusively falsified, Kraepelin’s original dichotomy (dementia praecox versus manic depressive insanity) has garnered mixed support. Kraepelin’s data set consisted of semi-structured case summaries written on Zählkarten (counting cards). Jablensky et al. (1993) obtained access to all the Zählkarten for the year 1908. They recoded them using the symptoms and syndromes assessed in the Present State Examination (PSE) with the goal of studying what groups would emerge from objective statistical analyses of the original clinical data. Using both discriminant function analysis and cluster analysis, Jablensky et al. were able to corroborate Kraepelin’s dichotomy using PSE symptoms. They found that the dementia praecox category was more restrictive and homogeneous than ICD-9 schizophrenia, placing greater emphasis on alteration of personality and disorders of affect, speech, movement, and volition. On the other hand, the manic depressive category was very broad, consisting of typical cases as well as a residue of mixed and borderline cases. Using grade of membership analysis, Jablensky and Woodbury (1995) found that pure types of dementia praecox, bipolar affective disorder, and unipolar affective disorder could be detected, but 30 percent of these cases had significant symptom overlap with one of the other types. For all types, the presence of catatonia was associated with overlap.

Kendell and Jablensky (2003) claimed that when the Robins and Guze validators were first proposed, it was assumed that the delineation of clinical entities would readily follow. However, they noted, this assumption proved to be unfounded. Kendell and Jablensky worried that this troubling outcome was being minimized in part because of ambiguity in the use of the term “valid.” In practice, to say that the concept of schizophrenia is valid had come to mean that its use in psychiatry is clinically justified. In their view, such a broad use of the term validity encouraged the reification of constructs in which anything that was clinically useful would be considered to be a distinct entity.

According to Kendell and Jablensky, whether something such as schizophrenia is what psychometricians call a taxon is an empirical question. It depends on whether a syndrome has natural boundaries with other disorders and with normality, and if no boundary exists whether it possesses defining characteristics (pathogonomic signs) that are qualitatively distinct. Although the “taxonicity” of disorders had not yet been confirmed, Kendell and Jablensky believed that the methodology to do so was available. They proposed that the concept of validity be limited to the assessment of taxonicity, whereas the degree to which a diagnostic construct was clinically informative be termed utility.³

Kendell and Jablensky’s article was published 33 years after Robins and Guze. Only seven years later Steven Hyman (2010) claimed that it was time to acknowledge that the DSM and ICD classifications are interfering with making progress on a more scientifically valid classification system. Illustrating construct-oriented thinking, Hyman noted that classifications are cognitive structures imposed on data to achieve important goals. According to Hyman, the DSM classifications, which are the products of the attempt to increase inter-rater reliability with reference to surface characteristics (observable signs and reportable symptoms), have not resulted in the discovery of etiology and pathogenesis, leave many cases unclassified despite a proliferation of specific diagnostic categories, and have produced confusing comorbidities.

Although Hyman has not proposed immediately replacing the current system with an alternative, he does not believe that it can be made valid by incremental changes. In agreement with his colleagues at the U.S. National Institute of Mental Health (NIMH), Hyman believes that a classification derived from the Research Domain Criteria Project (RDoC) could eventually replace or complement the DSM and ICD classifications. A collaborative effort between psychiatrists and psychologists, the ambitious RDoC program aims to bridge the knowledge gap between psychiatry and the recent groundbreaking advances in neuroscience and genomics (such as the ENCODE project, the Brain Activity Map project, and the next-generation sequencing of whole genomes of psychiatric patients), and to translate them into “personalized” diagnostic formulations and targeted prevention or treatment. Ideally, this will result in a “functional psychopathology” that may lead to recasting the taxonomy of mental disorders (Jablensky and Waters 2014). If successful, psychiatric constructs will be more amenable to being explained with reference to genetic and physiological mechanisms (Cuthbert and Insel 2010; Insel and Cuthbert 2010; Sanislow et al. 2010).

1.5 Extraordinary Science

Massimiliano Aragona (2009) has written about how the problem of extensive comorbidity in the DSM can be conceptualized in Kuhnian terms. More generally, the neo-Kraepelinian paradigm established by Robins and Guze and institutionalized in the DSM has resulted in so many problems and inconsistencies that a crisis of confidence has become widespread, as indicated by the many criticisms that DSM has attracted from inside the field. The problems with the current diagnostic paradigm, some psychiatrists believe, has become so great that a significant paradigm shift may be required (Kupfer et al. 2002; Hyman 2010; Frances 2009).

According to Kuhn (1962), although a crisis of confidence does not invariably result in paradigm shifts, it is reliably associated with a transition from a period of normal science (where the paradigm serves as an integrating framework in which questions are asked and answered) to a period of extraordinary science. The defining features of the fragmented periods called extraordinary science include a) a lack of agreement on what are the most appropriate methodologies, b) magnification of the problems that define the crisis into the most important problems of the discipline, c) the generation of speculative new theories, and d) a dramatic increase of interest in exploring the philosophical assumptions of the discipline.

In the course of his own career, Kraepelin came to doubt the usefulness of the disease entity model for psychiatry—a model which he himself had borrowed from Karl Kahlbaum (Kendler and Jablensky 2011). For instance, the failure to discover etiology and pathogenesis, and even to establish a firm boundary between the normal and the abnormal, was of great concern. In 1920 Kraepelin acknowledged that schizophrenia and manic depression were not separate entities as originally proposed. To some extent, Eugen Bleuler’s early work on the group of schizophrenias in 1908, the establishment of Adolf Meyer’s psychobiology in 1910–30, and Karl Jaspers’ pluralistic model of general psychopathology in 1913 can also be seen as responses to this earlier crisis.

In fact, the neo-Kraepelinian paradigm itself was promulgated as a Kuhnian revolution that sought to replace the DSM systems that were developed using the nosological theories of Meyer and Freud (Klerman 1986). Aragona in Chapter 2, however, argues that the DSM-I and the DSM-II were both more Kraepelinian than not in many respects. The same might be said for many currently proposed alternatives. For example, a renewed commitment to Robins and Guze’s emphasis on laboratory studies (for etiology and pathogenesis) is a primary justification for claiming that a paradigm shift is needed. Attempts to take psychiatry beyond the traditional medical model are probably more thoroughly revolutionary, but this too is not a new development—nor even anti-Emil Kraepelin. Trained in experimental psychology under the scientific pluralist Wilhelm Wundt, Kraepelin rejected the reductive analysis of psychiatry associated with Greisinger and Wernicke, asserting that psychiatry was inherently psychological and best kept distinct from neurology (Kendler and Jablensky 2011; Engstrom forthcoming).

The important question is whether basic science has become (or soon will be) advanced enough to make this current crisis more than an expectable oscillation in a long historical cycle. We should all hope so—especially if it leads to progress in the treatment and management of psychiatric distress.

Before bringing this introduction to a close and proceeding to our chapter summaries (the final chapter by Stoyanov and Aragona offers an integrative overview of the book), we would like to say more about extraordinary science and the exploration of philosophical assumptions. It is our view that extraordinary science should explicitly include a scholarly attempt to examine the history and the philosophy of the discipline. This requires not only the increased interest in historical and philosophical questions on the part of those inside the discipline as Kuhn described, but the inclusion of philosophical thinking drawn from sources outside the discipline.

In addition, we suggest that the philosophical problems of psychiatric classification and nosology are important not only for psychiatry and clinical psychology, but for philosophy itself. According to Popper (1963), from Plato onward the genuine problems of philosophy have originated in the philosophical problems of other disciplines, particularly the sciences.

The degeneration of philosophical schools [into pseudo-problems and meaningless babble about words] in its turn is the consequence of the mistaken belief that one can philosophize without having been compelled to philosophize by problems which arise outside philosophy. . . . Genuine philosophical problems are always rooted in urgent problems outside philosophy, and they die if these roots decay (p. 95).

Although helpful to psychiatry, the resulting philosophical work should be more than a handmaiden to psychiatry. Ideally it would be formulated to also make a contribution to philosophy.

In conclusion, this volume in the International Perspectives in Philosophy & Psychiatry book series is offered neither as a comprehensive handbook on the problems of psychiatric validation, nor as a partisan view on how this most recent crisis of confidence should be resolved. It is more along the lines of an invitation for those with different scholarly skills to participate in a debate on a significant scientific, philosophical, and sociocultural problem—how to classify those signs and symptoms that constitute what we call psychiatric distress and impairment.

1.6 Chapter Summaries

In Chapter 2, Massimiliano Aragona critically analyzes several received views about important historical changes in psychiatric classification, especially the DSM-III revolution and the hoped for DSM-5 and RDoC revolutions. With respect to the DSM-III’s so-called neo-Kraepelinian revolution, he shows that, despite their usual association with the theories of Meyer and Freud, the DSM-I and the DSM-II were developed in conformity with Kraepelinian (or conventional European) assumptions. In fact, an examination of the disorders listed in the first two editions of the DSM reveals that they were more similar to Emil Kraepelin’s system than were DSM-III and its successors. Even the DSM-III’s emphasis on diagnostic reliability was not new; rather, what was new was the neopositivist attempt to establish reliability by using operational criteria that can be algorithmically applied.

Aragona argues that because each new manual is contrasted with its predecessor in order to establish its superiority, the fundamental continuity between all the manuals tends to be ignored. What has remained the same throughout the various revisions is a Kraepelinian model of validity par excellence, which holds that similar cases should ideally be grouped together with respect to shared underlying biopathological processes. The difference between older and the newer approaches is whether the organization of the phenomenal/descriptive level of analysis is expected to lead to an underlying etiology or to follow from it. He closes by suggesting an alternative to the Kraepelinian model of validity that is more constructivist and less realist in its aspirations.

Adriano Rodrigues and Claudio Banzato, in Chapter 3, demarcate two domains of validation—the diagnostic and the nosological. In the diagnostic domain they place the various types of validity assessed in psychometrics such as content, concurrent, predictive, and construct validity. The goals of diagnostic validation are to assess how well the diagnostic criteria represent the construct of interest and the extent to which they are able to correctly sort people into cases and non-cases. In their view the methodology for assessing diagnostic validity is well understood and should be applied to every disorder construct.

Nosological validation, however, is a different matter. By nosological validation they mean the extent to which it is reasonable to include a diagnostic category in the classification system. The best methodology for assessing nosological validity, they note, is more subject to debate. In this domain they distinguish two strategies which they name the pragmatic conception and the realistic conception. According to the pragmatic conception, a disorder category is valid if it is clinically useful. According to the realistic conception, a disorder category is valid if it is a real kind in J. S. Mill’s sense of the term, i.e., instances of the kind systematically share features that are not included in the definition of the kind. The defining features of real kinds are also expected to cluster together more often than they would be expected to by chance.

They argue that both the pragmatic and realistic approaches should be applied to the validation of every disorder because they are non-redundant—one cannot be reduced to the other. Categories can have clinical utility without being real kinds and a category may be a real psychiatric kind, but its clinical usefulness does not depend upon its reality (i.e., what makes it real, such as having a specific genetic etiology, provides no guidance on how to manage or treat cases). In their view, clinical utility is an important consideration in both the pragmatic conception and the realist conception of validation and is therefore the most important validator. With some caveats as to whether the abstract psychological constructs of psychiatry can be squeezed into the notion of a real kind without loss of information, they also acknowledge that a system containing only real kinds may prove to be the most useful in the long run.

For those of us who have been working in the philosophy of psychiatry area over the past decade, the National Mental Institute of Health’s Research Domain Criteria (RDoC) could be considered to be an implementation of Dominic Murphy’s (2006) vision for a scientific psychiatry. Therefore, Murphy’s evaluation of the RDoC approach to validation in Chapter 4 is a significant statement. Murphy argues that validation in psychiatry should involve discovering how disorders are produced by the causal structure of the world, and that these causal structures (preferably mechanistic in nature) can be interpreted as being real. This, he says, is a kind of Big-V validity, or the view that to validate a disorder is to show that it is real.

He contrasts his views with those of pragmatists such as Kenneth Schaffner. Pragmatists often say that valid classifications are derived from utility considerations, the articulation of goals and purposes, and background assumptions rather than the discovery of what is objectively there independent of human interests. Murphy argues that utility, purposes, and conceptual interpretations are all important in the development of psychiatric constructs as the pragmatists claim, but the resulting constructs are not necessarily incompatible with a scientific realism that seeks to discover what is really there.

However, in contrast to another important feature of the RDoC vision, Murphy articulates why the legitimacy of scientific realism about causal structures does not confer a similar legitimacy upon scientific realism about psychopathology. The concept of dysfunction/pathology, he argues, depends on normative assumptions. Dysfunctions are unwanted conditions similar to how weeds are unwanted plants. Why they are unwanted depends on contingent human interests. In fact, it is impossible to empirically (Big-V) validate a claim that a particular condition is really a dysfunction. This problem, he argues, is not specific to psychiatry, but also holds for other types of pathology, including cancer.

In Chapter 5 Nigel Sabbarton-Leary, Lisa Bortolotti, and Matthew Broome explore validity with respect to the concept of a natural kind. According to them, a natural kind is a part of the furniture of reality that reflects divisions in the world that can be considered to exist independently of human classification practices. Examples of natural kinds include electrons, carbon, and physical diseases such as syphilis. Because many of the features we want from a valid disorder construct are possessed by natural kinds, including uniformity among cases with respect to causal origin (etiology), development (time course), and response to intervention (treatment strategies), one way to validate a mental disorder construct is to show that it refers to a natural kind. It is thought by some (especially Szaszians) that only natural kinds are appropriate objects for medical attention.

Some arguments against the existence of natural kinds in psychiatry refer to lack of known etiology for most disorders. Others arguments point to different possibilities for lumping and splitting disorders based on extra-empirical considerations. Sabbarton-Leary, Bortolotti, and Broome, however, argue that clear examples of natural kinds in the rest of medicine do not require that there be a confirmed (and specific) etiology or even a firmly decided lumping and splitting of cases, nor should there be for natural kinds in psychiatry. Without taking a stance on best ontological theory of natural functions, they argue that valid mental disorders represent objective biological dysfunctions (or failures of natural functions). On any construal, objective biological dysfunctions are natural kinds. They are proper objects of scientific investigation and good targets for pharmacological treatment.

The problem is what to do about cases that refer not to objective biological dysfunctions, but to harmful (yet natural) alterations in functioning such as persistent bereavement or maladaptive and sub-optimal functioning that represents a violation of norms (such as conduct disorder). These too, seem like allowable targets for psychiatric attention, but often of a non-pharmacological nature. Given that these alterations are parasitic on natural functions, Sabbarton-Leary, Bortolotti, and Broome claim that they can be classified as para-natural kinds. In the philosophy of science, a para-natural kind refers to something like an electron hole—the lack of an electron in a shell where the laws of physics would allow one to be. The electron hole is not an entity in the world, but it is a mind-independent regularity about which reliable inferences can be made (parasitic upon the nature of electrons). Another example of a para-natural kind is cold, which, according to the kinetic theory of thermodynamics, is the absence of heat. Likewise, the absence of normal function represented by bereavement can also be considered a para-natural kind.

In summary, they claim that two categories of mental kind should be recognized in psychiatry. Mental disorders refer to sub-optimal functioning that represents objective biological dysfunctions. Mental harms refer to cognitive-emotional states that are harmful to their bearers, but that lack a causally potent biological etiology. These harms too can be subject to a validation process, but for para-natural kinds the boundaries between suboptimal and optimal may be fuzzy. When this is the case, additional conceptual analysis is needed to decide if and when a particular alteration should be subject to clinical attention.

Jared Keeley, in Chapter 6, explores what he calls the ontological loop that exists between the practice of diagnosis/assessment and our ideas about the nature of psychopathology. This loop, argues Keeley, is one of the sources of reification in psychiatry and psychology. Keeley illustrates how our assumptions about the nature of psychopathology influence the diagnostic and assessment strategies we use, and how the results obtained from using these strategies are taken to represent a validation of those ontological assumptions.

Furthermore, some of the ontological assumptions we bring to the study of psychopathology are imparted to us during our education, rooted in the measurement techniques available to us. Although measurement can occur in many ways, Keeley focuses on the ontological assumptions encoded in qualitative interviews, structured diagnostic interviews, and self-report psychological tests. To take one example, those who study depression using psychological tests weight every item equally and treat them as additive, making it likely that in using the test they will validate the theory that depression in the population varies continuously from lower to higher scores. A different measurement ontology in which not all symptoms are treated as equal and are assigned different weights when they interact with other symptoms under specific conditions would prescribe and subsequently validate a different ontology for depression. If all we need is severity information, then the additive ontology is perfectly adequate—but it may not be adequate for all purposes we may adopt.

In Chapter 7 Ivana Marková and German Berrios critically examine the assumption that progress in the validation of psychiatric symptoms can invariably be made by using neuroimaging data to localize symptoms onto underlying neural structures and pathways. According to this assumption, those symptoms that cannot be captured using the gold standard of imaging data are likely invalid.

Focusing on subjective complaints reported by the patients, Marková and Berrios refer to these kinds of symptoms as hybrid objects. By hybrid object they mean that the conversion of a raw experience into a report of a subjective mental symptom gains structure from multiple sources.

These sources can roughly be grouped into the categories of the biological and semantic. The biological refers to neural signals and the semantic refers to an envelope of meaning that symptoms gain from concepts, communities, comrades, cultures, and so on. The upshot or sense of the symptom as a whole, they say, is sometimes located in the neural signaling, but at other times is located in the semantic element. When the sense of the symptom is located in the semantic element, imaging will be an inadequate validator. As a consequence, psychiatric symptoms are not all equally appropriate for imaging research, and it would behoove researchers to gain a better understanding of symptom structure in order to avoid continually entering blind alleys.

Drozdstoj Stoyanov, Stefan Borgwardt, and Somogy Varga begin Chapter 8 by exploring the explanatory gap that exists between the underlying metaphysics of clinical psychology–psychopathology and the neurosciences, which together comprise the amalgam discipline of psychiatry. In their telling the tests of the clinical psychologists and structured interviews of the psychiatrists represent decontextualized excerpts drawn from patient narratives (i.e., “I feel tired most mornings”). On the level of meaning these reports are much like the hybrid objects described by Marková and Berrios in their chapter and therefore are closely allied with the hermeneutical approaches of the humanities. As measurements they are typically conceptualized within the instrumentalist framework of scientific empiricism.

In contrast, the data of the neurosciences are not embedded in a hermeneutical space of reasons and narrative meanings. Additionally, the neurosciences are typically thought of using the assumptions of scientific realism rather than instrumentalism. The problem emphasized in this chapter is that the validation of psychological tests in particular occurs internal to the testing domain. Test scales are validated with reference to clinical interviews or other tests. It would be better to validate these reports by reference to something external to the framework such as the data of neuroscience, but it has not been possible to cross the explanatory gap using the kind of data that is conventionally collected by researchers. What is needed, they claim, is a program of translational validation.

One way to do this, they argue, is to align the information in the two domains by focusing on the simultaneous measurement of occurrent states by both self-report and imaging modalities. If some key methodological guidelines are followed, they claim that the imaging data would represent instantaneous states. The analog in the testing framework would not be an aggregate multi-item measure like those preferred by psychologists nor a general claim like “I feel tired most mornings,” but a report of an occurrent state such as “I feel restless.” Their hypothesis is that this kind of report might be better validated by imaging data, and if the translation is successful, the imaging data would itself gain clinical meaning.

The second section of the book is brought to a close in Chapter 9, where Michael Loughlin and Andrew Miles explore our inevitable reliance on norms in making distinctions between health and illness. They argue that we cannot validate a particular condition as truly disordered if our normative judgments about “disorder” cannot also be true or false. They refer to this position as realism about value. One of the reasons that progress on the validity problem has stalled, they claim, is that psychiatrists have accepted a modern philosophical framework in which “truth aptness” is classified with “observable and measurable” under the auspices of the objective. This leaves the contrasting concept of the subjective with “values,” “opinion,” and “preference.” Identifying psychiatric disorders, they say, is irreducibly moral (value-laden), but moral reasoning is not just a matter of subjective preference. It too, can be correct or incorrect.

One influential manifestation of this modern framework, they suggest, is scientism—which they define as the philosophical doctrine that those approaches to knowledge generation that want to be considered as discovering truth must establish their scientific credentials, and that subjective values must be minimized if we want to know what is true. In their view, this framework is a philosophical mistake that needs to be rectified. In fact, psychiatrists may learn to accept this value-free view of truth and the objective intellectually, but cannot actually do so when practicing psychiatry.

They close by addressing the criticism that their view may encourage authoritarian practice, justifying the use of psychiatry to impose values on others. While they accept that such “totalitarian practice” has occurred in the history of psychiatry, their view, they maintain, does not justify such practice. They argue that tolerance is also an objective value—and a good. They agree that the practices of authoritarian or totalitarian psychiatry are wrong, but not just because we do not like them. They are objectively and truly wrong.

Bridging the more philosophical chapters in the second section with those in the third section that are progressively more concerned with the everyday work of clinicians, James Phillips begins Chapter 10 by observing that with the recent publication of the DSM-5, diagnostic validity as conceptualized in the Robins and Guze paradigm still has not been secured. He also notes, with some puzzlement, that the DSM-5 process began with the hope that validity might be secured with a paradigm shift, but ended with the marketing of an evaluation of reliability. Even more puzzling, the DSM-5 architects claimed that they could study reliability because adequate validity had been established. Of this, Phillips is not convinced.

He proceeds to describe three developments in the Robins and Guze paradigm, which he calls strong syndromal validity, weak syndromal validity, and the RDoC project. Strong syndromal validity refers to Kendell and Jablensky’s emphasis on syndromes that possess natural boundaries with other syndromes with normality—a feature which is either present or not. Weak syndromal validity refers to Kenneth Kendler’s emphasis on examining a plurality of possible validators, which can result in there being degrees of validity. One of the issues with pluralism is that the validators may not agree on the placement of category boundaries, and selecting which validator to weight higher is often a non-empirical decision. In RDoC the validator that is given the most weight is the discovery of underlying neural mechanisms. RDoC jettisons the description of syndromes in favor of studying symptomatic expressions of general psychological processes.

Next Phillips turns to a more philosophical evaluation of these diagnostic validity paradigms as expressions of the medical model. First, in all three paradigms etiology is a potentially important validator. They all assume that etiology is to be understood in a bottom-up fashion—from cause to phenomenon—but this may not be justified when one considers systems theory. In systems theory etiological factors interact in complicated ways, and their causal roles can sometimes only be understood in terms of the larger system in which they are embedded.

This complexity is most problematic for the RDoC matrix which attempts to regiment the domain into distinct symptom boxes that are then explained at multiple levels of analysis. From an integrated systems approach, however, these boxes are defined by somewhat artificial boundaries. Second, despite the medical model’s use of psychological symptoms, the medical model typically does not consider the implications of the psychological being emergent. One example is identity disturbance. As an emergent process, identity disturbance is both a symptom and a pathological process. The best treatment strategies may not involve direct interventions on underlying biological realizers. Even when medication is used, its role in the complex system cannot be understood in a bottom-up fashion.

In Chapter 11, Kathryn Jacobs and Robert Krueger argue that diagnostic constructs (and diagnostic systems) should be assessed by examining their structural validity. Structural validity refers to how well the internal structure of a construct (or system) corresponds to how symptoms are actually structured in the population. They contrast this data-driven approach with the expert-driven approach that is more commonly used in psychiatry. In the expert-driven approach, one or more clinicians group a number of cases together based on noticed similarities. They give that symptom pattern a name and then study it by looking for other cases that match that pattern. The problem is that validation research becomes an exercise in confirmatory hypothesis testing and is oriented to what clinicians expect to find. For instance, often a disorder such as depression is considered validated when it is shown to be associated with some external criterion such as work impairment. Two problems are that this so-called “external criterion” was likely one of the implicit considerations used in identifying cases (so validation is circular) and it tends to be associated with many conditions, not only depression (so validation is unspecific). This makes validation somewhat illusory.

Furthermore, if depression and generalized anxiety disorder are defined as separate and distinct entities and their defining symptoms are those which support diagnostic specificity, then actual and important overlap between diagnosed cases will not be noticed, or if noticed considered to be perplexing. When a diagnostic system is structurally invalid clinicians will be constantly confronted with cases that do not fit any known disorder, or that reflect a confusing mix of disorders. The problem is that once people learn common diagnostic distinctions, such as major depression versus generalized anxiety and schizophrenia versus bipolar disorder, they will see them out there in the world, even though the world is not structured in that way. With increased structural validity, what mental health professionals expect and what they actually find would be more closely aligned.

How can such a lofty goal be accomplished? Jacobs and Krueger suggest that the methods psychologists use to develop psychological tests offer a readily available technology for enhancing the structural validity of psychiatric diagnoses. In the concluding sections of the chapter they describe the attempt to develop a more structurally valid model for diagnosing personality disorders in DSM-5 and suggest how something similar could be implemented for depression and generalized anxiety disorder.

Robert Cloninger, in Chapter 12, argues that psychiatrists’ assumptions about how to validate a classification system have been at one time or another mistaken, partial, and lacking in vision. According to him, simplified assumptions about causality, arm-chair debates between advocates of categorical approaches and dimensional model monists, and losing sight of the persons who are subject to classification, characterize much of the recent validation literature.

As an alternative he offers a comprehensive theoretical and scientifically informed approach called the psychobiological model of personality. One of the advantages of this model, he claims, is that it contains the resources to systematically validate the distinction between psychiatric health and illness, and to ground that distinction in a model of personality functioning in general. To do so, Cloninger distinguished three domains: temperament, character, and the narrative self. Personality is what emerges from the interaction of these three domains in dynamic social contexts.

The four temperaments are basic to all vertebrate life forms and evolved early on. They are harm avoidance, novelty seeking, reward dependence, and persistence. Character evolved later in mammals and includes cooperativeness, self-directedness, and self-transcendence (in modern humans). Both temperament and character are moderately heritable and both interact in development, although character is more plastic and culturally embedded. The healthy personality is the result of being at a “golden mean” on the four temperaments combined with high values on the three character traits.

In his telling, the key feature of psychopathology is a deficiency in one or more character traits. Together, temperament and character influence, but do not fully determine, the narrative self. When healthy, the narrative self utilizes the creative capacity of self-transcendence to evolve in unique ways and is more closely associated with the concept of flourishing. He concludes the chapter by describing the ways in which the psychobiological model is superior to the factor analytic models that dominate contemporary psychopathology.

One of the new features of the DSM-III in 1980 was the introduction of multiaxial diagnosis. In addition to identifying a patient’s psychiatric disorder, mental health professionals were asked to identify personality traits and types, other medical conditions, and psychosocial stressors and then to rate the patient’s general functioning. The goal was to convert the more narrow activity of diagnosis into conceptual case formulation. One of the major changes in the DSM-5 was that the multiaxial system has been eliminated. In opposition to this change, in Chapter 13 Juan Mezzich and Ihsan Salloum argue that the multiaxial model was itself too narrow. Valid case conceptualization can only be achieved in a widening of diagnostic scope. Their proposed widening is termed Person-centered Integrative Diagnosis. The conceptual foundation of this more comprehensive approach is that of person-centered medicine, a holistic model that rejects the assumption that one can understand the nature of a disorder independently of the person who has the disorder. The fragmentation of mental health care, they argue, is one consequence of the DSM goal of diagnosing disorders, not persons. What would be better is to diagnose a person’s whole health.

Their development of an explicit and useable new diagnostic paradigm that is person-centered and integrative was informed by surveys of what clinicians, patients, and third-party stakeholders want from a diagnostic model. Their proposed model is divided up into three pillars. The first pillar is named Broad Informational Domains, of which there are three: health status, the experience of health, and contributors to health. Each domain is divided into “ill health” and “positive health.” Included in this first pillar are psychiatric diagnostic categories and states of health, but also contributors to illness and health, and an attempt to understand what it is like for the person to be sick or well, and how they understand the condition themselves. This aspect of the model was the basis of the Latin American Guide of Psychiatric Diagnosis, Revised Version (GLADP-VR), which has been developed by the Latin American Psychiatric Association. The second pillar is named Pluralistic Descriptive Procedures. In addition to categories and dimensions, it uses narratives to depict an individual patient. The third pillar is named Partnership for Evaluation. In this pillar the person who is being evaluated is seen as a participant in the diagnosis process, and whose values and preferences help determine the clinical recommendations.

In Chapter 14, René J. Muller contends that the DSM-5 is an invitation to get a psychiatric diagnosis wrong. The problem, he says, is that this approach conceptualizes all mental illnesses as natural disease entities and it assumes that people with the same clinical presentation share the same underlying pathological process. He argues that a better alternative would be the diagnostic model promulgated by Adolf Meyer at the Johns Hopkins University School of Medicine during the first four decades of the twentieth century, and more recently by Paul R. McHugh, chair of the Hopkins psychiatry department from 1975 to 2001.

According to Meyer, a diagnostic assessment should focus on the person, and not attempt to fit symptoms into categories of psychopathology, though these categories can be useful when not taken too literally. Influenced by William James and John Dewey (Dewey was a personal friend), Meyer believed that a person’s psychology and biology are inseparable, and that persons are inseparable from their environments. He held that most psychiatric illnesses can be understood as psychological reactions to negative life-events that a person refuses to—or is unable to—work through. These reactions are defensive alterations in thought, emotion, and behavior, and the diagnostician’s task is to understand their psychobiological origin and meaning.

McHugh and Slavney (1998) have partially systematized Meyer’s psychobiology by identifying four “perspectives of psychiatry” from which a patient’s pathology may be viewed. Each perspective offers a different conceptual model for evaluating a mental illness. Starting with the McHugh–Slavney perspectives and Meyer’s psychobiology, Muller has developed a guide to classification and diagnosis that redubs the four perspectives domains to emphasize the existential niches inhabited by those who are mentally ill.

The first domain includes maladaptive reactions to challenge and stress, where a return to normalcy is delayed by a defensive stance. The varied manifestations of depression and anxiety fall here, along with dissociative disorders and some types of psychosis, as well as severe obsession–compulsion. These problems may be biologically abetted, but are ultimately the result of a person succumbing to what Meyer called the “bad habits” of mental life.

Illnesses of the second domain derive from psychobiological deficits of intellect and personality. The personality disorders and autism reside here. So do impediments to impulse control, and to reading and learning. It seems plausible, claims Muller, that deficits in the second domain could make a person more likely to succumb to maladaptive reactions of the first domain.

Third domain reactions are actively acquired and self-destructive habits. Addictions, psychosomatic conversion, sexual paraphilia, anorexia, bulimia, and self-injury are the major exemplars. Those with deficits in the second domain are also more susceptible to acquiring bad habits of the third domain.

Only in the fourth domain do we find the type of psychopathology that, to use Meyer’s term, “impinges” on a person from the outside, as happens with diseases such as epilepsy. Some psychotic deviations from normality are undoubtedly brain diseases, though many are due to first domain reactions—a distinction to which the DSM-5 is blind.

Muller argues that in the DSM-5, the emphasis on reliability (agreement among clinicians) leaves out so much of what is contributing to a person’s illness that validity (the truth about the illness) is compromised, and often sacrificed altogether. The psychobiological tack taken in the FDMI implicitly follows the conviction that the etiology and meaning of symptoms can be understood—and that this understanding can be validated—using existential and pragmatic criteria.

The book concludes in Chapter 15 with an overview by Drozdstoj St. Stoyanov and Massimiliano Aragona. Referring to the ideas of the Bulgarian philosopher Azarya Polikarov, they seek a middle ground between a radical pluralist approach to validation and a radical unificationist approach. Their middle ground is one in which diverse notions of validation can work together and, in some cases, be combined into a more parsimonious menu of approaches.

Notes

1.Borsboom et al. (2003) and Murphy (2006) critique Cronbach and Meehl for being too wed to logical positivism and not adequately realist. This is a fair reading, but if one considers concurrent work by Meehl, such a reading is harder to sustain (Maccorquodale and Meehl (1948), Meehl (1962)). The first part of Cronbach and Meehl’s article was largely realist in tone, referring to the psychological attributes that account for test performance.

2.The DSM-III of 1980 was preceded in 1978 by the U.S. National Institute of Mental Health’s Research Diagnostic Criteria (RDC), which was the result of a collaborative effort by Spitzer, Robins, and colleagues.

3.Similar to Kendell and Jablensky (2003), Borsboom et al. (2004) are critical of how the term validity is used. They claim that in psychological testing the concept of validity has become so broad that every important test-related issue is relevant to validity. Instead, they propose making the term more precise. According to them, validity should refer to whether the variations in the psychological attribute in question causally produce the variations in the measured outcomes.

References

Adams, H. F. (1936). Validity, reliability, and objectivity. Psychological Monographs, 47, 329–50.

American Psychological Association. (1954). Technical recommendations for psychological tests and diagnostic techniques. Psychological Bulletin, 51, 1–38.

Aragona, M. (2009). The role of comorbidity in the crisis of the current psychiatric classification system. Philosophy, Psychology, Psychiatry, 16, 1–11.

Ash, P. (1949). The reliability of psychiatric diagnoses. The Journal of Abnormal and Social Psychology, 44, 272–6.

Bentall, R. P., Jackson, H. F., and Pilgrim, D. (1988). Abandoning the concept of “schizophrenia”: some implications of validity arguments for psychological research into psychotic phenomena. The British Journal of Clinical Psychology/The British Psychological Society, 27 (Pt 4), 303–24.

Blashfield, R. K., and Livesley, W. J. (1991). Metaphorical analysis of psychiatric classification as a psychological test. Journal of Abnormal Psychology, 100, 262–70.

Borsboom, D. G., Mellenbergh, G. J. and Van Heerden, J. (2003). The theoretical status of latent variables. Psychological Review, 110, 203–19.

Borsboom, D., Mellenbergh, G. J., and van Heerden, J. (2004). The Concept of Validity. Psychological Review, 111(4), 1061–71.

Craddock, N., and Owen, M. J. (2005). The beginning of the end for the Kraepelinian dichotomy. The British Journal of Psychiatry, 186, 364–6.

Cronbach, L. J. (1989). Lee J. Cronbach. In G. Lindzey (ed.), A History of Psychology in Autobiography (pp. 62–93). Stanford, CA: Stanford University Press.

Cronbach, L. J., and Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.

Cuthbert, B., and Insel, T. (2010). The data of diagnosis: New approaches to psychiatric classification. Psychiatry, 73, 311–14.

Engstrom, E. J. (forthcoming). The history of psychiatry as interdisciplinary history: on attitudes toward philosophy and psychology in German psychiatry 1867–1917. In K. S. Kendler and J. Parnas (eds), Philosophical Issues in Psychiatry III: Nature and sources of historical change. Oxford, UK: Oxford University Press.

Feighner, J. P., Robins, E., Guze, S. B., et al. (1972). Diagnostic criteria for use in psychiatric research. Archives of General Psychiatry, 26, 57–63.

Frances, A. (2009). Whither DSM-V? The British Journal of Psychiatry, 195, 391–2.

Goodwin, D. W., Guze, S. B., and Robins, E. (1969). Follow-up studies in obsessional neurosis. Archives of General Psychiatry, 20, 182–7.

Greene, T. (2007). The Kraepelinian dichotomy: The twin pillars crumbling? History of Psychiatry, 18, 361–79.

Hyman, S. E. (2010). The diagnosis of mental disorders: The problem of reification. Annual Review of Clinical Psychology, 6, 155–79.

Insel, T., and Cuthbert, B. (2010). Research Domain Criteria (RDoC): Toward a new classification framework for research on mental disorders. American Journal of Psychiatry, 167, 748–50.

Jablensky, A. (2010). The diagnostic concept of schizophrenia: its history, evolution, and future prospects. Dialogues in Clinical Neuroscience, 12, 271–87.

Jablensky, A., Hugler, H., Von Cranach, M., et al. (1993). Kraepelin revisited: A reassessment and statistical analysis of dementia praecox and manic-depressive insanity in 1908. Psychological Medicine, 23, 843–58.

Jablensky, A., and Waters, F. (2014). RDoC: a roadmap to pathogenesis? World Psychiatry, 13, 43–4.

Jablensky, A., and Woodbury, M. A. (1995). Dementia praecox and manic-depressive insanity in 1908: A Grade of Membership analysis of the Kraepelinian dichotomy. European Archives of Psychiatry and Clinical Neuroscience, 245, 202–9.

Kendell, R. E. (1991). The major functional psychoses: Are they independent entities or part of a continuum? Philosophical and conceptual issues underlying the debate. In A. Kerr and H. McClelland (eds), Concepts of Mental Disorder: A Continuing Debate (pp. 1–16). London England: Gaskell/Royal College of Psychiatrists.

Kendell, R. E., and Jablensky, A. (2003). Distinguishing between the validity and utility of psychiatric diagnoses. American Journal of Psychiatry, 160, 4–12.

Kendler, K. S., and Jablensky, A. (2011). Kraepelin’s concept of psychiatric illness. Psychological Medicine, 41, 1119–26.

Kendler, K. S., Muñoz, R. A., and Murphy, G. (2010). The development of the Feighner criteria: a historical perspective. The American Journal of Psychiatry, 167, 134–42.

Klerman, G. L. (1986). Historical perspectives on psychopathology. In T. Millon and G. L. Klerman (eds), Contemporary Directions in Psychopathology: Toward the DSM-IV (pp. 3–28). New York: Guilford Press.

Kuhn, T. S. (1962). The Structure of Scientific Revolutions. Chicago: University of Chicago Press.

Kupfer, D. J., First, M. B., and Regier, D. A. (2002). A Research Agenda for DSM-V. Washington, DC: American Psychiatric Association.

Leuba, J. H. (1899). On the validity of the Griesbach method of determining fatigue. Psychological Review, 6, 573–98.

Maccorquodale, K., and Meehl, P. E. (1948). On a distinction between hypothetical constructs and intervening variables. Psychological Review, 55, 95–107.

Markus, K. A., and Borsboom, D. (2013). Frontiers of Test Validity Theory. New York: Routledge.

Matarazzo, J. D., Guze, S. B., and Matarazzo, R. G. (1955). An approach to the validity of the Taylor Anxiety Scale: scores of medical and psychiatric patients. The Journal of Abnormal and Social Psychology, 51, 276–80.

McHugh, P. R. and Slavney, P. R. (1998). The Perspectives of Psychiatry. Baltimore: Johns Hopkins University Press.

Murphy, D. (2006). Psychiatry in the Scientific Image. Cambridge, MA: The MIT Press.

Popper, K. (1963). Conjectures and Refutations: The Growth of Scientific Knowledge. London: Routledge.

Purtell, J. J., Robins, E., and Cohen, M. E. (1951). Observations on clinical aspects of hysteria. Journal of the American Medical Association, 146, 902–9.

Robins, E., and Guze, S. B. (1970). Establishment of diagnostic validity in psychiatric illness: Its application to schizophrenia. American Journal of Psychiatry, 126, 983–6.

Robins, E., and Mensh, I. N. (1954). Prediction in the clinical method and interrelations of biochemistry, psychiatry, and psychology. The Journal of Abnormal and Social Psychology, 49, 435–42.

Sanislow, C. A., Pine, D. S., Quinn, K. J., et al. (2010). Developing constructs for psychopathology research: Research domain criteria. Journal of Abnormal Psychology, 119, 631–9.

Spitzer, R. L., Cohen, J., and Fleiss, J. L. (1967). Quantification of agreement in psychiatric diagnosis: a new approach. Archives of General Psychiatry, 17, 83–7.

Starch, D. (1915). The measurement of efficiency in reading. Journal of Educational Psychology, 6, 1–24.

Szasz, T. S. (1961). The Myth of Mental Illness. New York: Harper & Row.

Thurstone, L. L. (1931). Relation between reliability and validity of a test. The reliability and validity of tests: Derivation and interpretation of fundamental formulae concerned with reliability and validity of tests and illustrative problems. Ann Arbor, MI US: Edwards Brothers.

Woodruff, R., Goodwin, D. W., and Guze, S. (1974). Psychiatric Diagnosis. New York: Oxford University Press.