Mad Science

4 And DSM Said: Let There Be Disorder

DSM-III exemplified the antiscientific and irrational approach to perfection; it does not discuss or argue, but lays down the law as if it were dealing with a paradigm which no rational person could quarrel with. The fact that such an approach and such an empty, atheoretical, and antiexperimental system can find acceptance in psychiatry says more about the nature of modern psychiatry than any critic, however hostile, might be able to say.

Eysenck, 1986, p. 95

Introduction

The Diagnostic and Statistical Manual of Mental Disorders, also known as the DSM, is owned and published by the American Psychiatric Association, the APA. For sixty years, the manual has been the classification system of mental illness in the United States. It is revised every fifteen years or so by a committee of psychiatrists appointed by the APA.

From 1952 until 1980, the DSM never merited serious attention. The manual was a slim, eighty-page, spiral-bound, administrative codebook. Its primary use was to provide code numbers for the diagnoses that patients received in treatment institutions. It was based on the common-sense judgments of a small group of appointed psychiatrists. There was no pretense that DSM’s usefulness extended beyond administrative convenience. It was not a textbook used to train mental health professionals, not a guide for research, not a tool to decide who would receive services for personal problems, not an instrument of corporate or public bureaucracies, not the subject of controversy, and certainly not a document whose authors or overseers claimed as a scientific achievement.

That all changed dramatically in 1980 with the publication of the third edition, DSM-III. In this chapter and the next one, we will examine that transformation, its scientific claims, its social consequences, and how the manual’s weaknesses are being exploited as the controversial fifth edition is being readied for publication in 2013.

The current controversies encompass a concern about the meaning, nature, and boundaries of madness. As described in chapters 1 and 2, even the terms most people use—which do not have satisfying definitions—are controversial: insanity, mental disease, brain disease, brain-based disorder, psychopathology, mental illness, and mental disorder. These terms refer to some presumed, ill-defined pathological internal condition in individuals that causes personal or social concern. We will continue to use these terms interchangeably, as all of them remain in play in discussions of madness.

The story in this and the next chapter will center on the evolution of the DSM, which represents psychiatry’s answer to questions about the meaning of madness. Rather than recounting scientific progress in recognizing and fighting disease, as the NIMH and the leaders of the psychiatric establishment might suggest, the story is a multilayered tale of competing interests, professional politics, commercial interests, and pseudoscientific claims. It is also an account of cultural transformation: how a widening array of human anguish, misbehavior, and travail has come to be viewed as the result of brain disease.

Psychiatrists, although they constitute a minority of mental health workers, literally “own” the contemporary meaning of madness.¹ That ownership is the cornerstone of their profession: its area of responsibility and control. As such, psychiatry must maintain firm control over the meaning of madness and ensure that it is shared among experts who recognize it as legitimate and that it remains accepted by the public. In the last forty years, psychiatry periodically has had serious trouble maintaining agreements about the meaning of madness and the methods for recognizing it. The common site for those troubles has been the DSM. Some historical context will help.

In politics and our personal lives, we tolerate vague language and concepts held together by no more than tacit agreements in order to facilitate social cohesion and avoid potentially interminable disagreements. Scientific inquiry, however, requires that concepts meet higher standards than those expected of political slogans or marketing pitches. Scientific ideas must be subject to systematic, rigorous, and independent scrutiny, verifying that they have a good correspondence with the real world. Psychiatry is the leading mental health profession, and, as in medicine in general, the meaning of the core concept of disease and illness must be broadly shared, specific in content, and have a consistent and verifiable basis in reality. Psychiatrists must establish that they know what mental illness is and are able to recognize people who are mentally ill and those who are not.

In the language of science, having a firm grip on the meaning and recognition of madness requires that any classification of mental illnesses, such as the DSM, must possess both reliability and validity. We examine these different but related requirements in this and the following chapter. For now, suffice it to say that the validity of psychiatric diagnoses refers to whether the definition and meaning of mental illness can be objectively shown to refer to something that is factually true. The reliability of diagnoses involves establishing whether mental health professionals can independently agree on which diagnosis applies to people evaluated. Agreement among observers, however, does not guarantee truth, because people could agree on something that is factually incorrect. But if people couldn’t agree, for example, on who should be diagnosed as bipolar, it makes it very difficult to establish the meaning and nature of that disorder (i.e., the validity of that diagnosis). Reliability should be much easier to establish than validity.

Who Is Mad?

Let us begin with a news story that is typical of how madness is commonly represented in the media. In June 2005, the New York Times reported, based on a recently published study (Kessler et al., 1994), that psychiatric researchers had estimated half the American population has had or will have a mental disorder at some time in their life and that more than one quarter of the population had at least one mental disorder in any given year (Carey, 2005). The thrust of the story, and that of many other news reports, is that madness is even more widespread than previously thought, that many people do not recognize their illnesses, and that many of those who are ill are not getting the proper treatment. The striking claim in this instance is the remarkably high proportion of mentally ill that the researchers claimed to have discovered. As described in chapter 2, several generations ago, less than one fifth of one percent of the American population was described as mentally ill—primarily those in state asylums.

The numbers pointing to this epidemic of madness came from a telephone survey of the general population, one of the most influential surveys ever undertaken by psychiatry. Certainly, we are familiar with telephone polls of our political opinions and expected votes in elections, during which our answers are recorded. These polls are usually surprisingly accurate, because people understand the questions and candidly reveal how they intend to vote. The New York Times article, however, is addressing a much more complex foundational claim by psychiatry and the NIMH (which sponsored the research); namely, that researchers can ask people a few questions, actually identify if they are mad, and distinguish one form of madness from another.

Psychiatric telephone interviews, such as the one reported, do not ask respondents if they think that they are mentally ill or about which mental illnesses they have had in the past. Instead, telephone interviewers asked randomly selected adults a series of structured questions about rather common behaviors, derived from long lists of behaviors and emotions that are listed as symptoms of mental disorder in the DSM. Computers then counted the responses of those interviewed to determine whether the answers matched the DSM criteria to qualify for a mental disorder.

Whether this sort of counting is science or propaganda depends entirely on whether the concept of mental illness represented by DSM can withstand scrutiny and whether there is persuasive empirical evidence that the diagnoses of mental illnesses in DSM and the criteria that define them are credible and can be used consistently. Thus, the DSM with its list of disorders and symptoms is the foundational scientific cornerstone on which psychiatry stands.

When public health officials want to check the prevalence of a disease, they typically have some reliable method of determining who has and who does not have the disease. For example, populations are screened, not by asking people questions about their feelings, but by using blood or other biological tests to determine the prevalence of HIV/AIDS, H1N1 influenza, pneumonia, cancer, and many other diseases. That is because the presence of diseases is identified by examining living cells. Not so with the “diseases” of madness. There are no biological tests, markers, or pathophysiology for any of the diagnoses listed in DSM.

Even in the twenty-first century, little is known about the presumptive causes of mental disorders in DSM. Although the psychiatric establishment increasingly claims that mental disorders have some biological causes and has had considerable success in making the public think that this assertion has been scientifically proven, there is little evidence for this. This particular sleight of hand can be seen in the evolution of DSM. In the 1980 and prior editions of DSM, the manual set aside a section for “organic mental disorders” (with known etiology, e.g., delirium caused by alcohol withdrawal or dementia caused by a cerebrovascular accident). DSM claimed that these disorders were associated with “transient or permanent dysfunction of the brain (APA, 1980, p. 101).

In the most recent edition, the approach is different. The term organic mental disorder has been expunged: “The term organic mental disorder is no longer used in DSM-IV because it incorrectly implies that ‘nonorganic’ mental disorders do not have a biological basis” (APA, 2000, p. 135). Little more is said, as if somehow science had established in the intervening twenty years the biological basis for all mental disorders. In one sentence, DSM appears to disguise ignorance of the etiology for what it calls mental disorders as firm knowledge of their biological basis. The false implication by the APA is that knowledge of some pathophysiology of neurological diseases such Alzheimer’s or observations that some people become physiologically dependent on various drugs are analogous to knowledge of the “biological basis” of all mental disorders listed in DSM. We shall have more to say on this issue in this and the chapters that follow.

Who is counted as mentally ill depends entirely on the frequently changing list of disorders and an incredibly long checklist of behaviors, mood states or feelings, and verbalizations that DSM considers as symptoms of mental disorder. We and many other scholars (Carlat, 2010; Frances, 2010b; Greenberg, 2010; Horwitz, 2002; 2010, Horwitz & Wakefield, 2012) argue that there are now higher estimates of mental disorders in the United States because the APA keeps adding new disorders and more behaviors to the diagnostic manual. These are some of the new disorders that have been added to DSM since 1979: panic disorder, generalized anxiety disorder, posttraumatic stress disorder, social phobia, borderline personality disorder, substance use disorders, gender identity disorder, eating disorders, conduct disorder, oppositional defiant disorder, identity disorder, acute stress disorder, sleep disorders, nightmare disorder, rumination disorder, and sexual disorders, including inhibited sexual desire disorder, premature ejaculation disorder, male erectile disorder, and female sexual arousal disorder.

Each new and old disorder comes with a list of “diagnostic criteria,” the behavioral checklists that constitute the heart of DSM. As any reader would quickly recognize, these are not lists of strange, bizarre, or inexplicable behaviors. Rather, they appear to describe behavior and traits frequently self-reported or observed in everyday life. In the DSM, however, just about any behavior can qualify as a “symptom.” Here is a selection from hundreds of behaviors listed in the DSM as criteria of mental illnesses: restlessness, irritability, sleeping too much or too little, eating too much or too little, difficulty concentrating, increasing goal-directed activity, fear of social situations, feeling morose, indecisiveness, impulsivity, self-dramatization, using physical appearance to draw attention to self, being inappropriately sexually seductive or provocative, requiring excessive admiration, having a sense of entitlement, lacking empathy, being envious of others, arrogance, being afraid of being criticized in public, feeling personally inept, being afraid of rejection or disapproval, finding it hard to express disagreement, being excessively devoted to work and productivity, and being preoccupied with details, rules, and lists.

For children, signs of disorder occur when kids are deceitful, break rules, can’t sit still or wait in lines, have trouble with math, don’t pay attention to details, don’t listen, don’t do homework, lose their school assignments or pencils, or speak out of turn. Granted, one momentary feeling or behavior is not expected to qualify you officially for a DSM mental disorder; it requires small clusters of them, usually for several weeks, accompanied by some claimed serious discomfort to you or those around you. Nevertheless, these additional criteria are regularly ignored when there are perceived short-term benefits to be gained from diagnosing someone with a mental disorder.

These new illnesses and extensive lists of behaviors are part of the reason why almost everyone has by now noted that there appears to be an epidemic of mental disorders in the United States (Whitaker, 2010). But the manufactured epidemic is about to spread even further. In the early draft of DSM-5 (February 2010) are proposals to broaden the definition of mental disorder, lower the thresholds for many disorder categories, and create even more illnesses—though a few existing disorders are also slated for termination. We will have more to say about these matters later in this and the next chapter. Such expansion has its skeptics, who are suspicious of the motivations of the APA and the drug companies, who view the expanding sweep of mental disorders like a lumber company lusts for redwood forests. But unlike typical challenges to the physical or political environment, in the mental health arena there are no legions of watchdogs challenging this medicalization of human foibles. The public’s acquiescence in accepting brain disease as an explanation for personal troubles is a puzzle that we will address later. For now, we should review how a small, largely ignored administrative codebook became a massive compendium of human troubles that reshaped psychiatry, mental health services, and the public culture. Although the psychiatric enterprise claimed that this transformation was based on science, we will see that, again, the scientific evidence is elusive.

A Short History of Psychiatric Diagnosis

During the middle of the twentieth century, American psychiatry was heavily influenced by psychoanalysis and paid little attention to official classifications of mental disorders. Diagnostic categories were of minor importance, useful more for administrative counting purposes, first in state asylums and later in clinics (e.g., see DSM-I and DSM-II). Diagnostic labels (such as Hysterical Personality or Depressive Neurosis, from DSM-II) were not emphasized in treatment, research, or training in any of the mental health professions. Diagnoses were uninteresting and banal, except as administrative shorthand or as a way to discuss the more interesting psychological dynamics in a patient’s biography and, most importantly, were never claimed to be scientifically valid.

In an ethnographic study of the training of psychiatrists, the anthropologist T. M. Luhrmann (2000) captures the mind-set of those learning to be psychodynamic psychotherapists. Such a psychiatrist

. . . learns to construct complex accounts of his patients’ loves. He thinks in terms of the way his patients are with other people and in terms of the emotions and unconscious motivations that lead his patients to hurt themselves. Here there is no clear-cut line between health and illness. What is wrong with a patient is that his interactions with other people go or have gone awry, and being a good psychiatrist involves understanding how and why. (p. 83)

By contrast, Luhrmann observed that the biomedical, post-DSM-III psychiatrist in training

. . . learns to memorize patterns and starts to use them in a rough-and-ready way. He learns to think in terms of disease and to see those diseases as quickly and as convincingly as a bird-watcher identifies different birds. For him, what is wrong with a patient is that the patient has a disease, and being a good psychiatrist involves seeing the patient in terms of the disease. For him there is a clear-cut difference between illness and health. (p. 83)

Until 1980 mental disorders were defined as whatever a committee of psychiatrists listed in the DSM. No definition of mental illness or mental disorder was offered in the manual, and few people were troubled by the absence. The public seemed content to leave the matter to the discretion of psychiatrists. Even without a definition and only vague descriptions in the manual, troubled people were referred to mental health providers, patients received diagnoses and were treated, students were recruited and trained to offer therapies, and research about madness continued to receive funding. In fact, as we describe in chapter 2, the size and scope of the psychiatric enterprise grew briskly following World War II without much clarity about what constituted mental illness or how to distinguish it from a vast array of other problems in living—or even whether it was necessary to make such distinctions.

Developing a definition of madness as mental illness/disorder or establishing its validity, when so little was known scientifically about it, was a conceptual and practical morass. As long as no one demanded clarity, ambiguity had advantages that precision of thought might lose. For example, without any scientific or conceptual guidance, the early editions of the DSM published in 1952 (DSM-I) and 1968 (DSM-II) were at liberty to list all manner of mental disorder categories, such as Inadequate Personality, Gross Stress Reaction, Group Delinquent Reaction of Childhood, and Homosexuality. The manuals were silent on what principles were followed in deciding to include these or any categories. Madness was whatever psychiatrists listed in DSM, whether diagnosis with some type of neurosis, psychosis, affective or personality disorder, or one of the “transient situational disturbances,” most of which came with only several sentences of description. Notably, there was little public or professional concern regarding psychiatry’s singular authority over these matters.

Psychiatric diagnosis is a form of classification, an attempt to group human behaviors that may be disturbing or unwanted into categories of disease that appear to or actually share essential features (a conjunctive category, as discussed in chapter 2). Classification is essential for all science. There could be many bases of classification of diseases in medicine and psychiatry, but the most profound categorizations are grounded in an understanding of the causes of disease (their etiology), because such categories may suggest effective approaches to prevent or treat what is obviously undesirable. Developing a classification of mental disorders based on causes, however, has been impossible for several reasons. First, we don’t know what madness is, whether it exists as an entity, whether what are labeled as mental illnesses are “illnesses” in any medically valid sense, or whether the concept itself is a semantic trash bin for all that is behaviorally inexplicable (a disjunctive category, as also discussed in chapter 2). Second, one can’t discover causes for something one can’t define or identify. Third, one might classify according to the essential features or characteristics of the disorders, but there has been virtually no agreement throughout history and even in the modern age, about what these essential features might be—as the changing diagnostic criteria themselves of the DSMs over only the past thirty years testify. Nonetheless, as we have seen in chapter 2, putative classifications of madness have existed since ancient times (Zilboorg, 1941). During the twentieth century, two dominant approaches to diagnosis coexisted, each spawned by men born in 1856: Emil Kraepelin and Sigmund Freud.

Creating a Disease

Emil Kraepelin (1856–1926) was a German physician who developed the hypothetical construct of dementia praecox, from which the construct of schizophrenia is derived. Kraepelin postulated the existence of dementia praecox as some kind of metabolic disorder as he went about studying patients in his asylum whom he thought suffered from the disorder (Boyle, 2002, pp. 3ff). The asylum inmates were a heterogeneous population of imprisoned people with a variety of problems who exhibited a diversity of behaviors. Kraepelin’s general approach was to observe the inmates’ behaviors to find similarities in the way the behaviors changed over time. He wanted to study the whole course of the presumed disease in order to discover distinctive patterns, natural groupings, or entities. He believed that the behavioral patterns he hoped to find would later be discovered to have distinctive biological antecedents and cerebral pathology, as well as a similar onset, course, and outcome (Boyle, 2002, p. 11).² In this particular ambition, he failed. But what he succeeded in accomplishing was the creation of an illness label—dementia praecox—which he and other interested physicians presumed to represent a discrete disorder with some underlying biological cause modeled in part on another presumed mental illness, general paresis, which later was found to be a late stage of the brain disease neurosyphilis. His observations and descriptions of those he assumed had dementia praecox did not constitute scientific evidence that dementia praecox existed as a discrete entity. Its existence was presumed a priori (Boyle, 2002, p. 49). This tactic has been used with all subsequent psychiatric diagnoses, none of which were grounded on any convincing evidence of ontological reality.

Kraepelin’s approach allowed a fundamental confusion to occur, blurring the difference between, on the one hand, establishing scientifically the existence of a disease by noting within the general population a previously unobserved and non-randomly occurring discrete grouping of physical signs (a syndrome), and on the other hand, assuming the existence of a disease and then describing various odd and disturbing social behaviors as symptomatic of this construction. This confusion provided a platform in the 1970s for the emergence of the expansion of psychiatric diagnosis. As we will see, Kraepelin (and those later referred to as neo-Kraepelinians) labeled selected troublesome or unusual behaviors the symptoms of hypothetical diseases and then took these symptoms as evidence of the existence of the disease, leaving for later the discovery of the hypothesized causative connections between these “symptoms” and the disease. Making these huge leaps possible is the expectation that at some future point the physical evidence of these diseases will be known, as occurred in the past when new technologies or methods, such as autopsies, provided ways of peering into the body itself and uncovered evidence for physical pathology or brain lesions. Indeed, throughout the twentieth century, new technologies were being developed. The discovery of microorganisms by Pasteur, Koch, and others led to the delineation of many clinical diseases and their prevention. Later with X-rays, biochemical tests, and diagnostic radiology, the biological confirmation of the correlations between pathological processes and clinical symptoms could be established. Specific biological tests, such as the Wassermann Test for syphilis, were then developed as the definitive methods of documenting internal pathology (Klerman, 1986, pp. 11–12). Kraepelin’s approach to classification was through detailed description of the behaviors constituting presumed diseases, the existence of which was left for future medical researchers to document scientifically. His process was to assume that some particular disease existed, identify people who presumably had the disease, then describe their unusual or problematic behaviors and their evolution. We will discuss the contemporary counterpart of this approach shortly.

Making Madness Normal

A contemporary of Kraepelin, Sigmund Freud (1856–1939) was a neurologist also interested in bizarre behaviors. Unlike Kraepelin, however, Freud’s work would make him a household word and cultural icon by the middle of the twentieth century. His work not only influenced psychiatric theory and the mental health professions, but culture, art, and literature around the world, whereas Kraepelin remained known only to a few psychiatric scholars.

Unlike Kraepelin, however, Freud’s clinical work was not among patients in asylums, but among individual outpatients with less severe problems, in the sense that these problems rarely brought them into open conflicts with authorities. More important, Freud’s adult patients could hire and pay him directly and be free to reject his diagnoses or advice if they so wished—something none of Kraepelin’s patients/inmates could possibly do. Freud had little interest in classification or the detailed description of symptom patterns. The studies were intensive, in-depth, conversational explorations of what Freud speculated were the intrapsychic worlds of patients. By means of his case studies and conversations, which he later renamed psychoanalysis, Freud explored the presumed psychological and cultural dynamics of human behavior, in particular how hypothesized instinctive, biological urges of humans interacted with the demands of civilization to shape human character. Whereas Kraepelin was a describer and classifier of bizarre, stigmatized, and unwanted behaviors, Freud was a theoretician of the human condition itself. Whereas Kraepelin was interested in the madness and degenerative brain diseases affecting people sent to insane asylums, Freud attempted to explain the psychodynamics of everyday life and civilization.

Freud asserted that curious, strange, or seemingly harmless behavior was potentially an expression of psychopathology. Even though his theories of mind and psychopathology have fallen out of fashion in most countries, a devoted core of his followers, with modified psychodynamic approaches, remains and in certain places thrives, and his work continues to have a lasting impact on psychiatric diagnosis. There are three main elements to his contemporary influence.

First was his ability to make us suspicious about the meaning of behavior. Nothing was what it seemed. Ostensibly loving behaviors could mask aggression; fastidiousness could cover impending disorderliness; a slip of the tongue could harbor clues to unconscious, perverted desires. Using the invented notions of unconscious desires and psychic conflicts; the tug-of-war of id, ego, and superego; and the control functions of defense mechanisms, he argued that what we consciously espouse and how we behave is unconsciously connected to the same psychological dynamics that cause more obvious and severe problems. Psychopathology, he suggested, could be lying just beneath the surface of our most mundane behaviors and feelings. Every behavior might be a possible symptom of latent mental illness.

The second element to his contemporary influence is related to the first. If any behavior might be a symptom, then potentially everyone might be a little mad. In suggesting that psychopathology was part of the bargain of being human, Freud, in Civilization and Its Discontents, explained madness as an expected and inevitable part of life. The psychic energy and unconscious conflicts that were required to repress our most primitive urges so that human society was possible, he conjectured, affected our cognitive and emotional lives in ways that could produce anxieties, emotional turmoil, and troublesome or inexplicable behaviors. To resolve these dysfunctional conflicts and subdue the troubling symptoms, Freud and his followers proposed, required in-depth psychoanalysis. The popularity of psychoanalysis grew into a burgeoning potpourri of therapies by the 1970s, not only for the “worried well” and middle class, but for those exhibiting bizarre behaviors. Although the evidence for the “effectiveness” of these therapies or their comparative advantages remained scientifically thin, seeing common behaviors as psychopathology grew in popularity. (In chapter 6 we will see that this is similar to the later popularity of drugs: abundant but thin scientific evidence and growing popularity.)

The third way in which Freud transformed how we viewed disturbing behavior was to further medicalize it. As a neurologist, Freud relied on the language and methods of medicine to understand and describe human behaviors. The medical language of “psychopathology” largely supplanted other forms of discourse about human troubles (e.g., the moral, the spiritual, the legal). Undesirable behaviors, such as excessive elation, sexual interest in children, aggressiveness, or heavy drinking, instead of being viewed as unwise, immoral, criminal, or sinful—the product of poor upbringing, weak character, or willful deviance—were transformed into expressions of illnesses of the psyche (or the more familiar reification, mind). Using the idiom of medicine, Freud may have surmised, allowed for seemingly simple explanations that would deeply resonate with people. Conveniently, Freud offered a remedy—his own style of treatment. Patients had illnesses which were not entirely controllable on their own. Those afflicted were not fully responsible for their actions, being based on activity in the “unconscious.” Their behaviors were not willed, any more than getting pneumonia or polio makes a person culpable. People with troubles should seek treatment. Freud’s later insight (Freud, 1969, p. 92), that the work of the psychoanalyst was like that of “secular pastoral work,” has often been repeated but was drowned by the more powerful mythic image of the physician manipulating unconscious material and restoring the helpless patient to healthy functioning.

In different ways Kraepelin and Freud prepared American culture for the great transformation in psychiatric diagnoses that began in the 1970s and fully bloomed after the publication of DSM-III in 1980. What they both contributed was a way to classify hypothesized mental disorders by minute descriptions of the behaviors of those thought to be afflicted, a large list of labels for presumed illnesses, the use of medical and medical-sounding language to describe hypothesized groupings of common behaviors as psychopathology, and the notion that countless forms of psychopathology—mental illnesses—were ubiquitous among the populace, lying beneath the surface and requiring medical detection and treatment.

The Vulnerabilities of Mid-Century Psychiatry

By the mid-1950s, with the help of Kraepelin and Freud, American psychiatry had acquired a vast territory of behaviors as its appropriate domain. By 1960 mental disorders had entered the mainstream of American life, under the jurisdiction of the newly established National Institute of Mental Health and the subject of a major report (Joint Commission, 1961) of a presidential commission appointed by Dwight Eisenhower and President Kennedy’s mental health initiatives on establishing community mental health centers across the nation. There were troubling currents, however, within this recognition and expansion of psychiatry.

American psychiatry was always a peculiar profession, whose members initially were the managers of state insane asylums. Within medicine, psychiatry was situated at the bottom of the totem pole of medical specialties. For the general US public, which typically venerate physicians and accord them power and status much higher than physicians have enjoyed in other industrial countries, psychiatrists—shrinks—were frequently the butt of jokes and were depicted in movies and television as impish inquisitors quick to make wild and seemingly nutty inferences about the underlying motivations of everyday troubles but also willing to use coercion and brutality (see, for example, the 1948 movie, The Snake Pit). This occasional disparagement reflected an ambivalent acceptance of the expanded domain of psychiatry and psychoanalysis and recognition of the overt coercive functions of psychiatrists.

Even Freud’s success in influencing American psychiatry was a two-edged sword. On the one hand, problems labeled as mental illness gained more national attention. On the other hand, the methods of treatment proposed by psychoanalysts looked like one-sided, unstructured, nonmedical conversations. For example, psychoanalytic treatment involved no physical examination, no stethoscope or other medical instruments, no probes of the body or laboratory tests to determine the nature of the ailment or its disappearance. Instead, the therapy required that the patient merely lie on a couch and talk to the analyst, who would occasionally make a comment or ask a question. The invention of this sort of “talking therapy” was a creative attempt to reframe interpersonal help giving. Talking to others about personal problems, however, was hardly novel. It was a universal behavior used by troubled people in soliciting help from family, friends, clergy, and counselors of all kinds. As we described in chapter 2, psychoanalysis and its psychotherapy offshoots had the challenge of reinventing talking and discussion as an instrument of medical treatment.

Although Freud was a physician, offering talking therapy did not require the therapist to have the skills of medical doctors, and even Freud, as we mentioned, thought that it was akin to pastoral counseling. The expansion of outpatient, office-based psychotherapy made psychiatric treatment more accessible for many people, but it also attracted the involvement of other non-physicians as purveyors of mental health treatment. Starting with funding to World War II veterans from the GI bill, clinically oriented psychologists, social workers, and various counselors swarmed to the growing enterprise, far outnumbering psychiatrists (Goleman, 1990). The threat was not so much that other professions competed for scarce patients, but that these competitors brought with them various nonmedical interventions that they systematized as therapies, such as family therapy, gestalt therapy, milieu therapy, reality therapy, cognitive-behavioral therapy, and many, many others. These new forms of interpersonal helping, often used by psychiatrists as well, further de-medicalized therapy, showing that mental disorders and their treatment need not be the unique province of medicine or psychiatry. Could not a social worker as well as a psychiatrist help a family with an unruly child or a substance-abusing spouse? What, if any, unique role did psychiatrists play with regard to troubled clients who were not seen to need modification by drugs or electroshock? How were medical credentials and training relevant to the new, popular interventions?

By the 1960s psychiatry had succeeded in expanding its domain and acquiring new federal recognition. With the increased political salience, however, came increased scrutiny of its scientific grounding, effectiveness, and social purposes. At the historic moment of psychiatry’s broader influence and institutional expansion, the essential core of the psychiatric enterprise came under attack, producing a broad crisis for the profession. The center of this crisis revolved around the seemingly simple, internal matter of the definition and diagnosis of mental disorder.

The Perfect Psychiatric Storm

Despite its greater prominence by 1960, psychiatry was still less prestigious than other fields of medicine, in which there were scientific discoveries of the causes of common diseases and breakthroughs in the prevention and effective treatment of many of them, such as the development of antibiotics and the near eradication of polio with a vaccine. In psychiatry, there were few solid examples of scientific “progress” or breakthrough discoveries, and, despite its political success, psychiatry remained a low-status branch of medicine. Furthermore, with the movement away from institutional care in state hospitals and toward community treatment (see chapter 3), psychiatry’s identity was blending into the amorphous, somewhat undifferentiated mental health professions.

In part, these shortcomings of psychiatry stemmed from the fact that American psychiatry never needed to have a strong scientific infrastructure. Most psychiatric residents were trained to be therapists rather than scholars and researchers. Because Freud and his many adherents relied on case studies of individual patients, there was no well-established public health tradition of either epidemiological studies of the incidence and prevalence of mental disorders in the broad populations or of comparative clinical trials—carefully controlled experiments of the outcomes of different treatments.

At the same time, the massive discharge of residents of state asylums and the broad promises and claims about the wonders of community treatment (see chapter 3 for details) produced visibility that wasn’t sought. For example, by the early 1970s there were constant news stories about the emergence of the visible homeless, many of whom might have resided in asylums in prior decades. In addition, doubts were raised about the clinical effectiveness or practicality of psychoanalytic therapy (Eysenck, 1952). But profound new challenges were brewing. The challenges came from many quarters and shook the foundations of psychiatry, particularly its scientific credibility, and created the perfect psychiatric storm. The questions had been around the halls of academia for decades but had generally been ignored. What exactly was mental illness? Why was it considered to lie within the jurisdiction of medicine? What evidence existed that psychiatrists could distinguish the mentally ill from the sane in ways that ordinary people could not? Paradoxically, Freud’s legacy had so broadened the meaning of madness that psychiatry’s primary construct had become even more ambiguous. (Although in America psychoanalysis had initially coupled with psychiatry, requiring psychoanalysts to possess a medical degree and usually a psychiatric specialty, in most of the rest of the world, professionals of all stripes and even laypersons could call themselves psychoanalysts after a course of study and personal analysis.) Within psychiatry itself, some mainstream figures recognized that the profession, on the verge of becoming more nationally prominent, was vulnerable in terms of its ability to define, identify, and diagnose mental illness (Kendell, 1975; Zigler & Phillips, 1961). Psychiatry’s vulnerability was revealed sharply by a number of frontal assaults within a decade.

Assaults from the Academy

The physician and psychoanalyst Thomas Szasz, who became the most persistent and prolific critic of institutional psychiatry, argued vigorously that what psychiatrists were calling mental illnesses had no underlying physiological dysfunctions but were personally or socially unwanted, unpleasant, devalued behaviors (Szasz, 1961). Mental illnesses, Szasz argued, were just the latest nineteenth- and twentieth-century bogus claims by psychiatrists for defining as “medical” problems what were in reality, social, economic, ethical, and moral problems that all individuals face in the course of fashioning their existence and attempting to assert their autonomy. He asserted that mental illness is a “myth,” in the sense that it is only a metaphor, an inappropriate analogy for describing human problems in living as if they constituted medical diseases, but he was not denying that these problems existed. This was a frontal attack on the Kraepelinian ambition. Szasz’s publications quickly became some of the most widely read and cited critiques of psychiatry. His attack hit a nerve among psychiatrists, because no consensus existed on a definition of mental illness, and no physical evidence existed for classifying problems in living as bodily diseases requiring the attention of physicians. Although Szasz was quickly marginalized by the psychiatric establishment, his arguments were not easily rebutted and, partly because of his prodigious output and his longevity (1920–2012), still persist in the vigorous contemporary debates about psychiatric diagnosis, as they do in this book.

Social and behavioral scientists of many stripes joined the assault. Sociologists who had an interest in social responses to deviant behavior pushed the criticism of psychiatry further. Erving Goffman, with his collection of illuminating essays, Asylums: Essays on the Social Situation of Mental Patients and Other Inmates, was one of the first to cast a skeptical eye on psychiatric practices (Goffman, 1961). State hospitals were categorized, not with other medical institutions, but with “total institutions,” such as prisons, monasteries, and battleships, in which near total control over every aspect of the lives of inmates transforms their identities. Within these institutions, patients had “moral careers” and had to devise ways of coping as they contended with the “daily round of petty contingencies to which they were subject” (p. x). Psychiatrists were part of the “tinkering trades,” tinkering with hospitalized patients and their problems. As his language makes clear, Goffman debunked psychiatric practice, placing it not in the realm of science and medicine but squarely among agencies of coercive social control in charge of managing unsightly, unpleasant, and unwanted behavior.

This was a consistent theme among many other sociologists at the time, who discussed psychiatric treatment squarely as another form of social control, albeit under the guise of treatment, and theorized about its possible harms to people who might get labeled as mentally ill, including stigma and social rejection. One particularly provocative and widely read sociologist, Thomas Scheff (1966), proposed that mental illness was a form of “residual deviance,” a mere collection of unusual behaviors that fit into no other social categories. Further, he suggested that being labeled as mentally ill could set in place a chain of social dynamics that could produce the very problems that psychiatrists were trying to treat. Scheff took the heart of psychiatrists’ domain—the construct of mental illness—and insisted that it really served as a mask for society’s ignorance about the nature of these unusual behaviors. Psychiatric diagnoses were only fancy labels for describing behaviors that baffled the public and mental health professionals.

Such disconcerting critiques of psychiatry came from other academic quarters as well. Foucault’s Madness and Civilization (1965), Ivan Illich’s Medical Nemesis (1975), and other popular works offered powerful critiques of psychiatry and medicine applied to the social world. R. D. Laing in The Politics of Experience (1967) turned madness on its head by suggesting that schizophrenia was an adaptive response to a disordered society. One young psychiatrist argued that psychiatry and the medical model it had adopted were dying and deserved a good Irish wake (Torrey, 1974).³ These cultural critiques of psychiatry coincided with criticism that came from behavioral psychologists (Eysenck, 1952; Ullmann & Krasner, 1969), who argued that some of the “truths” of psychoanalytic theory (still at that time the main adopted paradigm of academic psychiatry) were bogus and that psychiatric practice appeared to be therapeutically ineffective.

Overall, these critiques targeted the heart of psychiatry. They suggested that psychiatry’s core concepts were myths, that psychiatry’s relationship to medical science had only historical connections, that psychiatry was more aptly characterized as a vast system of coercive social management, and that its paradigmatic practice methods (the talking cure and psychiatric confinement) were ineffective or worse.

For a decade or so, these criticisms stayed confined to academic debates among scholars critically sifting through evidence and arguments in the struggle to assert their views or find the truth. As serious as the conflicts were and as important to the psychiatric enterprise, they were generally ignored or treated as nihilistic. Unfortunately for the American Psychiatric Association, these debates broke into public view in ways that were entirely unexpected and anxiety-provoking to the psychiatric establishment. Two seemingly unrelated challenges to psychiatry erupted within a few years of each other, which opened a Pandora’s box of vulnerability.

Public Embarrassment

The first challenge began in June 1969 with a police raid on the Stonewall Bar in New York City’s Greenwich Village, a gathering spot for homosexuals. The raid incited a riot in which the gay community fought back, which symbolically marked the beginning of a new, more assertive phase in the struggle for gay rights. Psychiatry was not the target of gay community’s wrath, although for some years gay activists had been raising questions with the medical and psychiatry associations about the psychoanalytic interpretations of homosexuality as a form of psychopathology (Bayer, 1981; Bayer & Spitzer, 1982). But within a year of the Stonewall Riot, the gay activists began confronting the American Psychiatric Association openly at its annual meetings, demanding that the APA remove homosexuality as a mental disorder in DSM. Most embarrassingly, their challenges were staged to attract media attention, which they did with smashing success. The novelty of the challenge was not lost on the media or the public. At a time when most gays were still in the closet, here was an activist gay group not simply revealing their homosexual identities publicly but challenging the psychiatric profession on scientific grounds. An entire group of people labeled as mentally ill by the American Psychiatric Association was disputing its psychiatric diagnosis. At the core of their challenge was a simple, easy-to-understand question: why was homosexuality a mental illness?

The question revived a serious and complex problem for the APA. What is a mental disorder? What are the criteria used to diagnose it? Are these criteria social and moral or medical in nature? What is the rationale for including some deviant behaviors and excluding others as illnesses in the official manual, the DSM? The gay community asked publicly and forcefully whether American psychiatry knew the difference between a mental illness and normality or social deviance. It was a question directed at the scientific integrity of psychiatry.

The unwanted media coverage demonstrated that the APA wasn’t able to manage public relations nearly as well as the gay activists (Kutchins & Kirk, 1997). The APA and the DSM provided no immediate or persuasive answers. The second edition of the DSM (DSM-II), which was the ostensible target of the dispute, provided no guidance about why homosexuality was listed as a disorder. What was presumably a question that psychiatry had answered on the basis of psychiatric science was unmasked as a question of social values and political negotiations. After several years of embarrassing turmoil and fumbling by the APA, the resolution of the issue came not from science or new research, but from a vote of the members of the APA to drop homosexuality as a mental disorder in DSM. While this succeeded in ending the immediate controversy with the gay activists, dropping the diagnosis appeared to confirm that the constructions and the definitions of mental disorders were arbitrary matters—of personal opinions, group politics, and special interests in the APA—rather than matters of medicine, science, and evidence.

As with all political disputes among contending groups, there was behind-the-scenes scheming, negotiation, and compromise. In this case, a little-known psychiatrist from New York, Robert Spitzer, who initially believed that homosexuality was a mental illness, stepped forward as a mediator to manage the conflict about whether the diagnosis should be retained in DSM. Over a period of several years beginning in 1972, Spitzer inserted himself into the swirling controversy, meeting with gay activists, APA leaders, and psychiatrists on both sides of the battle on removing homosexuality from the DSM. He organized discussion sessions at psychiatric meetings and drafted compromise statements in an attempt to find a presentable rationale for removing the diagnosis from the DSM. In the end, he and others politically engineered the settlement to exclude homosexuality as a mental illness (Bayer, 1981; Kutchins & Kirk, 1997). Although the conflict did not disappear altogether for many years, it did bring an end to embarrassing publicity over this dispute. More importantly for Spitzer, he debuted in a role as master of diagnostic disputes, a role that he would reenact many times in the future.

The second eruption came on the heels of the dispute about homosexuality. It was initiated by a cleverly titled article, “Being Sane in Insane Places,” that appeared in the world’s most prestigious scientific journal, Science (Rosenhan, 1973). The article reported a study by Stanford University psychologist David Rosenhan in which he conspired with colleagues and graduate students to get them admitted to several psychiatric hospitals, without the hospitals knowing about the study. The pseudo-patients posed as having a minor symptom (hearing a “thud” sound or the word empty) at the emergency room. All were diagnosed as schizophrenic (one as manic-depressive) and hospitalized. All were prescribed medications. During their hospitalizations, the staff did not recognize them as sane or as inappropriately hospitalized, and pathologized many of their ordinary behaviors, such as asking staff when they would be released or writing their field notes. When discharged, the pseudo-patients were diagnosed with “schizophrenia in remission.” Because it was an intriguing, easy-to-understand study and the findings were so striking, it was widely cited. The study reinforced the view that psychiatric judgments were not merely inadequate but almost laughable. Once again, the target of the joke was the scientific pretence of psychiatric diagnosis: psychiatrists could not distinguish the sane from the insane (or the study demonstrated that it was easy to fake mental illness, at least). Published in Science, the study’s challenge could not be ignored. And it wasn’t.

Two years after the Rosenhan study, a forceful rebuttal and defense of diagnosis appeared in a prominent psychology journal, the Journal of Abnormal Psychology, written by Robert Spitzer (1975), who himself had become concerned about the scientific basis of psychiatric diagnosis. By 1975, however, Spitzer was assuming a leadership role to revise the DSM for the American Psychiatric Association. He used the controversy created by Rosenhan’s article to not only defend psychiatry, but to advance his own, as yet still private, agenda in reforming the diagnostic manual.

Rise of an Entrepreneur of Controversy

Robert Spitzer was a most unlikely rescuer of American psychiatry. His contribution to psychiatric history in the late twentieth century was not for developing some innovative theory of mental disorder, a new therapeutic approach, influential teaching, or significant scientific research. Nor was it established by his personal pedigree. His place in psychiatric history came about partially by chance and very much because of his bureaucratic persistence, single-mindedness, intelligence, and robust skills as a political actor. His major achievement, as first glimpsed in the homosexuality and Rosenhan disputes, was as an entrepreneur of diagnostic controversies. The achievement reaches its height as he created and managed a daunting series of political compromises between 1975 and 1980 that resulted in the landmark third edition of the diagnostic manual, the modern DSM (APA, 1980), which established the character of DSM for over thirty years into our day. He achieved this feat by working tirelessly on the technical classification of mental disorders, a task peripheral to psychiatric practitioners and largely ignored in academic psychiatry.

In 1966, Spitzer, who had trained as a psychoanalyst, became associated with this obscure codebook almost by accident in the Columbia University cafeteria while chatting with a colleague, Ernest Gruenberg, who was chairing the small group appointed to update the first 1952 edition of DSM (APA, 1952). Gruenberg invited Spitzer, who had no particular expertise regarding diagnosis, to be notetaker on the DSM-II committee. He did well in that role, and, when DSM-II was published in 1968, he was listed as a consultant. He also was called on to author an article describing the new edition to colleagues (Spitzer & Wilson, 1968, 1969). The second edition was infused with psychoanalytic terms and assumptions, reflecting the theoretical orientation of the leaders of the APA. This updating for the second edition of the diagnostic manual received very little notice in psychiatry and almost no publicity. As we have mentioned, classification of mental illness was not viewed as a vital or interesting topic.

But within a few years of the publication of DSM-II (1968), with the growing criticism of psychiatry in academic circles, the successful challenge to drop the homosexuality diagnosis, and the Rosenhan article in Science, psychiatric diagnosis would unexpectedly become ground zero for a major struggle within psychiatry. Spitzer, who had defended the APA and the DSM against the challenges by the gay community and Rosenhan, had made himself the APA’s house expert on diagnosis. When in the 1970s, the APA decided it was time to revise the diagnostic manual, it asked Spitzer to serve as chair of the task force to produce the third edition.

At the time, psychoanalysis remained the reigning paradigm, although a re-emerging academic wing within psychiatry was emphasizing biological aspects of behavior and the growing use of psychotropic drugs. Spitzer, however, had become keenly interested in diagnosis and had been prepping for this assignment to revise DSM. He took on the task with gusto, taking strategic advantage of the fact that no one expected much from the work of this internal APA committee. As we will explain, the result of his work during the next five years made diagnosis singularly significant in psychiatry; reshaped training, research, and practice; and elevated the status of American psychiatry in ways that were completely unexpected. It also made the APA an unprecedented bundle of money. To many of his research colleagues, Spitzer became a psychiatric hero, “the most important psychiatrist of our time,” according to one of his colleagues (Frances, 2010a).

The Rhetoric of Reliability and the Modern DSM

Because past revisions of the diagnostic manual were considered of little significance, Spitzer was given a free hand in appointing members to the DSM-III task force and to the many committees that were used in revising the manual. Selecting members of the task force was critical to his efforts. He drew heavily on a relatively small group of like-minded research psychiatrists (and a few psychologists) who also had concerns about the weaknesses of the manual (Kirk & Kutchins, 1992; Millon, 1983, 1986; Spiegel, 2005). Under Spitzer’s firm leadership, within a year the task force had outlined a radically different architecture for the third edition of the DSM. In the following four years, they furnished this new edifice with thousands of new diagnostic details and procedures that would in time require political support for the transformation and the approval by various APA committees. Having recently emerged victorious from controversies surrounding homosexuality, Spitzer was undoubtedly aware of the struggles ahead as opponents of the revisions got wind of the radical changes that were being undertaken.

The Reliability Rationale

Since he had absolutely no mandate from the APA to engage in a major transformation of the DSM, Spitzer had to develop a compelling rationale. What was so wrong with DSM-II that required a total makeover? His answer, carefully developed over several years, was that the current classification system lacked “reliability” (Spitzer & Fleiss, 1974). This reliability problem had been a concern to only a few research-oriented psychiatrists and psychologists, such as the ones Spitzer had appointed to the task force. As mentioned previously, reliability refers to diagnostic consistency among clinicians: can psychiatrists using the DSM independently reach the same diagnoses for the same patients? If they can agree on the diagnoses of patients, there is diagnostic reliability. If there is substantial disagreement about the diagnoses for the same patients, the diagnostic system is said to be unreliable. Reliability is a minimal expectation for a science-based profession and is often easily achieved. Unreliability in diagnoses would impede research on the causes of disorders, on effective treatment for particular disorders, and on studies of the prevalence of disorders. Further, communication among clinicians about patients and their problems would be stymied, and the veracity of medical records and reimbursements for services for mental disorders would all be called into question. Psychiatry, which was expected to have responsibility for diagnosing and categorizing mental disorders, had the most to lose if its diagnostic system was considered unreliable.

It is important to realize, however, that reliability of diagnoses (consistency) was a simple problem compared to the specter of demonstrating the empirical existence of a putative category labeled mental illness, particularly since psychiatry’s weak and inadequate definitions of mental illness were the target of critics. In the 1970s, as currently, the meaning or validity of mental illness was in dispute.

All members of Spitzer’s task force understood the problem of having an unreliable DSM. Spitzer seized on this vulnerability and used it to gain crucial leverage in transforming the DSM. Nancy Andreasen, a prominent schizophrenia researcher and one of the initial members of Spitzer’s task force, reports that at the very first meeting of the task force, as participants were introducing themselves and indicating what changes they thought needed to be made, the problem of reliability was raised, and, by the end of the very first meeting, there was unanimous agreement to create a different kind of manual (Andreasen, 1984, p.155).

Kappa: Measuring Unreliability

Spitzer had been issuing warnings about unreliability even before the task force convened and continued to do so until DSM-III was approved and published. One of his earliest warnings came in 1974 in the influential British Journal of Psychiatry (Spitzer & Fleiss, 1974). Along with his coauthor, the respected statistician Joseph Fleiss, Spitzer cautioned that any classification system, including DSM, had to be reliable. They emphasized that any unreliability in DSM would undermine its validity. Members of the task force, whom Spitzer had selected, needed little instruction about this problem; they were researchers and understood that to the extent that a classification system is unreliable, its validity is weakened. If people with the same troubles cannot be sorted into the same diagnostic categories, the validity of those categories is completely undermined. And validity is precisely the ingredient required for the scientific integrity of the concept of “mental illness.” It was the validity of mental illness that had been questioned by Szasz, by critics of psychiatry, by the gay activists, and by Rosenhan, among many others. Improving reliability, the task force knew, would prop up the pivotal professional construct of madness until its validity could be more securely established.

Surely, the task force knew that not all problems confronting psychiatry could be addressed even by a major revision of DSM. But any major renovation, like the one contemplated by Spitzer and his inner circle, had to have a persuasive scientific rationale that would counter the substantial psychoanalytic faction of the profession who would oppose the changes they would make in DSM. Improving diagnostic reliability became that scientific rationale and defense of their plans.

Selecting diagnostic reliability for leverage had several advantages. First, no one could argue with the basic scientific principle of improving reliability. Second, an emerging series of studies were suggesting that diagnostic reliability was problematic, which provided some independent support for defining unreliability as a crucial problem. Third, focusing on the consistent use of diagnoses appeared to be a far easier problem to resolve than the conceptual challenges from Szasz, the cultural critics, or the various social scientists questioning the validity of the construct of mental illness. Fourth, unreliability appeared to many researchers to be largely a technical problem—getting agreement on diagnoses among clinicians—that seemed to be within the expertise of those who composed the DSM task force.

Spitzer began a campaign to elevate awareness of the importance of reliability so that he could use it effectively as the primary justification for the renovation of DSM. Having gained respect within the APA by being both an insider and a defender of psychiatric diagnosis, his opinion about the perils of unreliability had credibility. In his earlier article in the British Journal of Psychiatry (Spitzer & Fleiss, 1974), he had reviewed seven available studies on diagnostic reliability and concluded that diagnostic reliability was troublingly low. The authors stressed that, without solving the unreliability problem, psychiatry was playing into the hands of the profession’s severest critics. Using the data in the reviewed studies, they claimed that actual psychiatric diagnosis was in terrible shape and threatened the validity of diagnosis. If this conclusion was not alarming enough, they suggested that in routine clinical settings, unlike the research settings of the reviewed studies, reliability was probably worse.

Spitzer and Fleiss used a statistic, called kappa, as a tool to summarize the seven reliability studies, which had been published from the 1950 to the early 1970s. Researchers determine a kappa score by having pairs of clinicians independently assess a series of people coming into a mental health facility. They then tally up to what extent the two clinicians make the same or different diagnosis for each client. Of course, as in many games, there will be some level of agreement just by chance. Kappa is a statistical computation that discounts mere agreement by chance (it is called a chance-corrected measure) and provides a resulting measure of how close these pairs of clinicians achieve perfect (i.e., 100 percent) diagnostic agreement. Perfect agreement—perfect reliability—would result in a kappa score near or at 1.0. Levels of agreement that are no better than chance would have kappas of 0.0. Kappa then provides a method of comparing how much agreement is reached by clinicians and comparing the results of different diagnostic studies with each other.

Spitzer and Fleiss used this new tool to examine the level of reliability for different broad diagnostic (not the more specific) categories. This is similar to asking people to identify makes of automobiles using broad categories of automobile makers such as Ford, General Motors, or Toyota, rather than specific categories of cars, such as Mustang, Buick, or Camry. Using general categories will always produce higher agreement/reliability scores. For example, if two people looked at the same car and one said it was a Corvette and the other said it was an Impala, their responses would nonetheless be considered to indicate perfect agreement, since both cars are made by General Motors.

In the Spitzer and Fleiss review and in most subsequent reliability studies, reliability scores are usually reported for general diagnostic categories. In this case, they reported that no general categories had kappa levels that were “good,” which appeared to mean scores somewhat above 0.75. Six kappa scores, averaging 0.74, were considered “only satisfactory,” 12 with an average of 0.56 were “no better than fair,” and the remaining 31, averaging 0.37, were described as “poor.” The overall average mean for the nine general diagnostic categories was 0.53. Figure 4.1 presents a summary of the Spitzer and Fleiss data.

Figure 4.1
Interpretation of Kappa Reliability Levels: Early Studies (1950–1974)

*Data reported in Spitzer & Fleiss (1974), British Journal of Psychiatry.

These damning conclusions were published just as Spitzer was being asked to chair the DSM-III task force. His attack on the reliability of DSM-II gave him the leverage he needed in the anticipated struggle to develop a new approach to diagnosis. It was a risky tactic, unless Spitzer and others had reasons to think that they could remedy the reliability problem. In fact, several small groups of research psychiatrists at Washington University and Columbia University had been working for years on just that problem and were about ready to announce in a series of important journal articles that their new approach to diagnosis remedied the problem of unreliability (Kirk & Kutchins, 1992). This group of like-minded experts formed an “invisible college” that eventually became referred to as neo-Kraepelinians (Andreasen, 1984; Blashfield, 1982; Klerman, 1986; Millon, 1986). They worked on codifying psychiatric diagnosis by creating a system of describing symptomatic behaviors, as a means of grouping patients into homogeneous categories that would facilitate the conduct of research to identify how patients differed biologically from normal people. This was the fulfillment of Kraepelin’s agenda. The emphasis on improving the reliability of diagnosis, rather than on directly attempting to understand the causes and dynamics of disturbed behaviors (which Kraepelin assumed were at root distinct biological diseases), radically distinguished the neo-Kraepelinians from the professionally dominant psychoanalysts.

The neo-Kraepelinians had an explanation for why psychiatrists couldn’t often agree on diagnoses: the diagnostic process was unregulated, unsystematic, and messy. For example, there was no standard method of conducting a psychiatric interview to reach a diagnosis. Clinicians gathered information about a patient’s condition in disparate ways: they asked different questions, in different sequences, and with varying levels of probing. This resulted in different amounts and types of information from a patient. Clinicians also varied in how carefully they may have reviewed the patient’s existing psychiatric records, whether they gathered information from other sources, like family members, teachers, employers, or other physicians. Consequently, independent clinicians were likely to each have information about a patient that differed in quality, extensiveness, and amount of detail. To Spitzer and these research psychiatrists, such different interviewing practices explained why diagnoses might be so inconsistent, so unreliable.

But there was a more profound reason for unreliability. Even if clinicians obtained exactly the same information, there were no established criteria for determining whether the person’s behaviors constituted a mental illness, and if so, which type of illness. In short, there were no guiding criteria for making a diagnosis.

While these ambiguities were of great concern to Spitzer and the neo-Kraepelinians, they were not particularly troubling to psychotherapists, who viewed differences in clinical style as part of each therapist’s methods of establishing rapport with different clients in order to provide effective help. Therapists had their own individual ways of interviewing, of understanding the meaning of what they learned from clients, and their own sense of what the diagnosis might be. The challenge for the DSM-III task force was to figure out how they could change the diagnostic thinking and behavior of all mental health practitioners, a debate that continues still (Jaffe, 2010).

Creating Descriptive Diagnosis

The solution that Spitzer and the task force created would become known as descriptive diagnosis. For the first time, they developed specific behavioral criteria to use for each diagnosis. These diagnostic criteria would represent specific rules for determining if a mental disorder was present and, if so, which one. They expected, with justification, that this standardization would improve reliability. By establishing new criteria for each mental disorder, they expected that these criteria would provide a framework to structure the assessment and diagnostic interviewing process. Structured interview guides were nothing new to social scientists, who for decades had used structured questionnaires in surveys to obtain more complete information in a standard format from each respondent. With these two diagnostic innovations—diagnostic criteria and structured interviews—Spitzer fully expected to greatly improve the reliability of DSM, and with it, the scientific reputation of psychiatry, and to pave the way for the rise of biopsychiatry.⁴

Task Force Struggles

The process of constructing the radically new manual and building sufficient support for it was a five-year, full-time struggle for Spitzer. Few research psychiatrists would have had the creativity, energy, fortitude, and political skill to accomplish such a revision, and fewer still would have been willing to devote their entire professional life to such a complicated challenge.

Although prior revisions of the manual had been simple administrative updates of a minor pamphlet, this third revision by comparison was a monumental struggle. As Spitzer and the task force created additional working groups and began circulating drafts of the work-in-progress, controversies erupted in print and behind the scenes, within psychiatry and elsewhere, demanding Spitzer’s attention. For example, Spitzer offered a new definition of mental disorder, defined as a “subset of medical disorders,” which caused an eruption by clinical psychologists who perceived this as an unfounded jurisdictional expansion, sweeping all manner of behavioral problems into medicine. Psychoanalysts were frequently critics of Spitzer’s efforts, and they vigorously challenged the removal of the cherished term neurosis from the manual. Compromises had to be fashioned. Similarly, many groups with an interest in particular “disorders” lobbied the task force: gay activists monitored efforts to reinsert homosexuality as a disorder; black psychiatrists proposed that racism be included as a mental disorder; veterans pushed successfully for adding posttraumatic stress disorder; and so on. None of this whirl of interest groups and politicking had characterized prior revisions of DSM. All the while, Spitzer steadfastly held to his position that revising psychiatric diagnosis and DSM was a scientific enterprise and no longer an arbitrary exercise in stigma-producing labels.

The Unveiling of DSM-III

After five years and epic struggles, the third edition of the manual, DSM-III, was finally published. DSM-III (APA, 1980) contained features never seen in any previous edition. First among these was its sheer size, a whopping five hundred pages—over three times longer than the 1968 second edition (DSM-II) and sold for more than ten times the prior edition’s price. The second notable feature was the listing of hundreds of names of individual psychiatrists, not only of the task force members, but of the APA officers, and every member of seventeen advisory committees, many formed after most of the major structural decisions about the manual had been made. This filled the first five pages of the new manual, deliberately conveying the impression that the manual was the product of a broad array of psychiatrists, not the handiwork of a few. The listings were also a bid for legitimacy, as the renovation that was being published had been very controversial. The third novel feature was a notable twelve-page introduction to DSM-III, written by Spitzer, which covered the history of the manual, the background for the third edition, the process of its development, and its new features, which were many, including a new multi-axial system, expanded text for each disorder, and scores of new mental disorders.

For the first time in any edition of DSM, in his introduction Spitzer emphasized that this edition was based on data gathered in field trials:

“ . . . interest in the development of this manual is due to awareness that DSM-III reflects an increased commitment in our field to reliance on data as the basis for understanding mental disorders” (APA, 1980, p. 1).

This important claim is echoed several times in his introduction. In justifying the character of the new manual, he notes that other contemporary classifications (like the ICD-9) were not “sufficiently detailed for clinical and research use” and did “ . . . not make use of such recent major methodological developments as specified diagnostic criteria” (p. 2). He describes ten goals that the task force sought to achieve. The first was clinical usefulness, and the second was the reliability of diagnostic categories (p. 2). He acknowledges that even though the task force tried to rely on research evidence, task force members often differed in their interpretations of the findings (p. 3). No earlier edition of the manual had even invoked research or data. The great bulk of the new manual described the nearly three hundred specific mental disorders. Some were making their debut:

•Post-traumatic Stress Disorder, which was described as “symptoms following a psychologically traumatic event that is generally outside the range of usual human experience. The characteristic symptoms involve reexperiencing the traumatic events; numbing of responsiveness to, or reduced involvement with, the external world; and a variety of autonomic, dysphoric, or cognitive symptoms” (p. 236).

•Borderline Personality Disorder, which was described as a disorder “in which there is instability in a variety of areas, including interpersonal behavior, mood, and self-image. No single feature is invariably present” (p. 321).

And some old favorites (e.g., Inadequate Personality Disorder) were retired. Its full description in the previous edition of DSM-II stated that is was

•a behavior pattern “characterized by ineffectual responses to emotional, social, intellectual and physical demands. While the patient seems neither physically nor mentally deficient, he does manifest inadaptability, ineptness, poor judgment, social instability, and lack of physical and emotional stamina” (p. 44).

All of these above examples are similarly vague and ambiguous. But the most significant differences between the old and new edition was that for each disorder in the new DSM-III the manual provided detailed lists of diagnostic criteria, borrowing heavily from the work of the St. Louis and New York groups. These criteria constituted the necessary and sufficient behaviors/signs/symptoms that must be observed or reported in order to use the diagnoses appropriately. These criteria constituted the technical strategy designed to bolster the impaired status of psychiatric diagnosis. Figure 4.2 uses a type of depression to illustrate some of the dramatic changes between DSM-II to DSM-III.

Figure 4.2
Contrast of DSM-II and DSM-III

Figure 4.2 Contrast of DSM-II and DSM-III

There are several noteworthy features in this comparison. The description in DSM-II was only one sentence, contained no specific criteria, included concepts based on psychoanalytic notions (e.g., neurosis internal conflict, love object), and revealed that depression was a reaction to something. None of this appears in DSM-III, which provides instead time frames and a list of supposedly more precise behaviors or feelings, with a minimum number required to make the diagnosis (at least three). This specificity was designed to aid clinicians in making diagnoses and to greatly improve consistency (i.e., reliability).

A Stunning Success?

The publication of the new manual was accompanied by the rhetoric of scientific victory. In the introduction to DSM-III and in other articles, Spitzer is explicit that the diagnostic criteria were to serve “ . . . as guides for making each diagnosis since such criteria enhance interjudge diagnostic reliability” (APA, 1980, p. 8). He announces that for the first time, drafts of DSM had been tested in reliability field trials, involving over 12,000 patients who were evaluated by approximately 550 clinicians in 212 different facilities (p. 5). He claims that the data from these field trials indicate “far greater reliability” than had previously been obtained with DSM-II (p. 5). One publication about the field trials stressed that reliability was encouraging, was much better than had been expected, and much higher than before (Spitzer, Forman, & Nee, 1979). In other articles, interviews, and presentations around the time of DSM-III’s publication, Spitzer and others repeated the claims of improved reliability (Spitzer, Williams, & Skodal, 1980; Talbot, 1980). By 1982 the reliability of DSM-III was said to be “extremely good” (Hyler, Williams, & Spitzer, 1982, p. 1276). At every turn, Spitzer and others used the prior reliability problems with diagnosis to promote the new manual as a great scientific advance. A few years later Spitzer boasted that the adoption of DSM-III had marked a signal achievement for psychiatry and represented an advance toward the fulfillment of the scientific aspirations of the profession and would eliminate the disarray surrounding psychiatric diagnosis (Bayer & Spitzer, 1985, p. 187).

Spitzer’s rise to a position of control over psychiatric diagnosis had been solidified by his use of the problem of unreliability. Improving reliability had been his quest; it served as the scientific linchpin for the thorough renovation of the manual. The unique inclusion in the manual of the reliability appendix was of singular symbolic value. The data tables in the appendix were used as evidence that the task force and Spitzer had delivered on their promise. It was this claim of greater reliability that appeared to justify the diagnostic revolution.

This was a momentous scientific and political claim. The euphoria was not Spitzer’s alone. It was infectious among those celebrating the DSM-III. Gerald Klerman, the highest-ranking psychiatrist in the federal government when DSM-III was published, gushed:

In my opinion, the development of DSM-III represents a fateful point in the history of the American psychiatric profession . . . [the adoption of the new manual] represents a significant reaffirmation on the part of American psychiatry to its medical identity and its commitment to scientific medicine. (Klerman, 1984, p. 539)

Similarly, psychiatrist Gerald Maxmen claimed in The New Psychiatrists that DSM-III marked “the ascendance of scientific psychiatry” and that more than any other single event it demonstrated that psychiatry “had indeed undergone a revolution” (Maxmen, 1985, p. 35). With regard to the core problem of unreliability, Klerman made it clear: “In principle, the problem of reliability has been solved” (Klerman, 1984, 1986, pp. 25, 541).

Even critics of DSM-III readily accepted the claims that reliability was no longer problematic (Andreasen, 1984; Carson, 1991; Michels, 1984; Vaillant, 1984). For thirty years, there has been no diminishing of the belief that reliability had been greatly improved if not solved (Buckley, Michels, & MacKinnon, 2006). In 2002 those who would become the architects of DSM-5 continued to praise DSM-III, saying that “the major advantage of adopting [DSM-III] was its improved reliability over prior classification systems” (Kupfer, First, & Regier, 2002, p. xviii). In the same book, others reaffirm that “when DSM-III was published in 1980, one of its most important advantages was a radical improvement in the reliability of psychiatric diagnosis” (Rounsaville et al., 2002, p.13). And in March 2011, Carol Bernstein, president of the American Psychiatric Association, praised DSM-III for addressing “the pressing problem of interrater reliability in psychiatric diagnosis” and claimed that DSM-III “contributed significantly to improved diagnostic agreement” (Bernstein, 2011).

The Mad Science of Diagnostic Reliability

All of these recent claims about DSM-III’s radical improvement in reliability were made without citing a single study or source of evidence. There was another problem with all these claims: reliability had not improved with DSM-III.

How could such misinformation be so widely believed and disseminated? Apparently, no one actually examined critically the field trial data, nor tried to compare them with the earlier reliability levels of DSM-II. If they had, the justification for the diagnostic revolution would have collapsed. What the data actually show undermines what has been claimed about reliability for over thirty years. We will briefly present these data in summary fashion (for an earlier detailed analysis, see Kirk and Kutchins, 1992). Four original published sources report the results of different aspects of the DSM-III field trials (APA, 1980; Hyler et al., 1982; Spitzer & Forman, 1979; Spitzer et al., 1979). In addition, there is one unpublished summary (Williams, 1982) regarding personality disorders, which was found in a letter in the APA’s DSM archives in Washington, DC. From the data reported in these sources, all of them developed by the DSM task force leaders, we compare the earlier, pre-DSM-III studies of reliability with the reliability of DSM-III.

The field trial data were collected in 1978–1979 and originally reported in two phases, first in an earlier report and then in an appendix to DSM-III. Within each phase, the results were organized separately for adults and children and for the major diagnostic categories of mental disorder (axis I) and the personality disorders (axis II). In summarizing these data, we retain the distinctions between adults and children, and between the major disorder categories and the personality disorders, but for simplicity we will combine the data from the two phases. Figure 4.3 consists of a bar graph of the average (mean) kappas (recall the earlier description of this statistic in which higher kappa scores represent better agreement, i.e., higher reliability) from the field trials for the diagnoses for adults. On the far left are the interpretative standards developed by Spitzer and Fleiss for the earlier reliability studies that we described previously (from figure 4.1). On the right are the DSM-III field trial data and a subsequent reliability study of DSM conducted several years later.

Several cautions should be raised about these data. First, these reliability studies were undertaken while the manual was being developed. The reports of the design, implementation, and findings were at times inconsistent and ambiguous. Second, aggregating data from many different clinical sites around the country, with little oversight or monitoring, creates various methodological problems (e.g., differences in base rates of disorders at different sites) that complicate the interpretation of the reliability statistics (kappa). Third, as mentioned before, the kappa measures of reliability in the field trials are based on the more forgiving general diagnostic categories, not on the specific diagnoses made by clinicians. For example, there are twelve types of personality disorder (e.g., antisocial, narcissistic, compulsive, and so forth). Even if two clinicians disagreed on which type of personality disorder a patient had, they still would have been scored as having perfect (kappa 1.0) agreement if they agreed that the patient had some type of personality disorder. Kappas based on the general class of disorder are liberal interpretations of reliability and are always much higher than the level of agreement with specific diagnoses. For all these reasons, it is prudent to view all the reliability measures as crude overestimates using crude data. Accordingly, we will use some ranges of average kappas calculated in different reasonable ways to describe reliability levels.

With that in mind, we now turn to figure 4.3, which contains the summaries of the reliability of adult diagnoses. For the major mental disorder categories, in both phases of the field trials, depending on how you calculate, the overall average kappa ranged from 0.55 to 0.72. This range of average scores comes from three different methods of calculating overall means. First, there are the field trial data (0.68 and 0.72 in each phase) in which actual calculation of the overall kappas is not provided, although it appears to be a kappa weighted by the number of patients in each category.⁵

Figure 4.3
Comparison of Reliability Levels (Kappa) for Early Studies and DSM-III

Figure 4.3Comparison of Reliability Levels (Kappa) for Early Studies and DSM-III

*DSM-III field trial data for the major diagnostic categories and personality disorders are taken from appendix F in DSM-III (APA, 1980) and from a letter in the APA archives (Williams, 1982). Data for the Written Vignette Study comes from the Archives of General Psychiatry (Hyler, Williams & Spitzer, 1982). Data from the DSM-III-R Multi-Site Study is from the Archives of General Psychiatry (Williams et al., 1992).

A second method is to calculate kappa for all the major categories without weighting by simply adding their kappas and dividing by the number of total categories (0.63 and 0.55 for each phase). In addition, since many of the major categories had very few patients, the kappas are quite unstable estimates. So a third method was to recalculate the average kappas for categories that had at least six patients in each phase (0.59 and 0.66). These various kappas are in the range of kappas presented in figure 4.3 for major diagnostic categories for the field trials (from a low of 0.55 to a high of 0.72). The dark vertical blocks in figure 4.3 provide the general range of kappas from all the available sources, calculated in these different ways.

For the adult personality disorders, the kappa range represents the overall kappas from the appendix of DSM-III (0.56 and 0.65 for both phases) and from a mean calculated (0.50) from the list of individual kappas (n = 12) provided in the archival material, which covered both phases of the field trials.

The case summary study, which used written vignettes instead of live interviews and was conducted as part of the field trials, provided an additional source of information about diagnostic agreement among clinicians. It found lower levels of agreement (weighted kappa 0.47; unweighted mean kappa 0.30).

In other words, the DSM-III field trial data for adults do not support the claims that were made about far greater reliability. If they had, the three dark vertical bars in the center of figure 4.3 would have all been above the dotted horizontal line which Spitzer and Fleiss (1974) had earlier claimed represented “good reliability.” In fact, by visual inspection one can clearly see that reliability remained at the same low level that existed before DSM-III. Let us recall that all these data were collected by the originators of DSM-III, who had a professional and personal stake in demonstrating superior reliability levels.

Perhaps because of these equivocal and disappointing results, another major reliability study was conduct several years after the publication of DSM-III by Spitzer and his associates. The later study, which its authors describe as the most rigorous and comprehensive reliability assessment ever conducted of DSM, gathered data in 1985–1986, using multiple clinical sites in Germany and the United States. It was published in 1992 (Williams et al., 1992), twelve years after the release of DSM-III. The findings were presented for two samples, one of 390 “patients” already in treatment (average weighted kappa of 0.61 and a range of overall kappas from the five sites of 0.49 to 0.67) and one of 202 people from a “community sample” (weighted kappa 0.37 and a range of overall kappas from the two sites of 0.32 to 0.38). These results are shown in the far right section of figure 4.3, under the heading Multi-Site Study. As the two dark vertical bars show, this later study again documents that no general improvement in reliability occurred in comparison to DSM-II.

The results do not get any better when we examine the reliability reports for children; in fact, they are worse. Unfortunately, the field trials included few children: only seventy-one in phase I and fifty-five in phase II. Most of the diagnostic categories had fewer than five children, and many had only one or two, too few to allow for any meaningful recalculations. Here again, we combine both phases and provide the summary kappas that were offered in the DSM appendix on the field trials. For the major categories of disorder (0.72 and 0.43) and personality disorders (0.66 and 0.55), lower kappas were obtained in the second phase. Figure 4.4 provides these estimates. In addition, using only the categories with a larger number of children, we present the range of phase I and phase II kappas for five of the most common children’s disorders. For example, Attention Deficit Disorder in the two phases had kappas of 0.58 and 0.50. The five vertical dark bars clearly show that none of the results reach kappa levels in the “good” range, and they are no better and perhaps worse than the disappointing adult results.

Figure 4.4
Reliability Levels (Kappa) for Children’s Diagnoses with DSM-III

*Data for children’s diagnoses come from appendix F in DSM-III (APA, 1980, p. 471). For the specific diagnoses the range of kappas comes from the means of the first and second phases of the field trial.

Achieving Success with Failure

As we have seen, the actual reliability data for the new DSM looks remarkably similar to the old DSM, clustering in the range of “no better than fair” and “poor.” Almost none of the kappas fall above 0.70, a range earlier described by Spitzer as “good reliability.” In sum, the often praised “far higher” or “extremely good” reliability of the modern DSM has become an institutionalized myth—unsupported by any convincing data—although those exaggerations conveniently served the ambitious purposes of the architects of DSM. The reliability myth, in fact, serves as a prototype of how science is used and misused in revising the manual: the scientific data really don’t matter that much. The APA and others announced to the world that DSM-III was a scientific slam dunk, when, in fact, the ball didn’t even come close to the basket.

After claiming success in solving the reliability problem, the topic was largely abandoned, perhaps conveniently because no one knew how to fix it. There was a plan funded by the MacArthur Foundation to conduct new reliability studies for DSM-IV, but the leaders quietly abandoned the effort without an explanation; no published results ever appeared (Kirk & Kutchins, 1992; Spiegel, 2005).

In subsequent decades, little evidence has appeared that mental health professionals in typical clinical settings can achieve high reliability with psychiatric diagnoses using any of the newer editions of the DSM. Nevertheless, a few subsequent studies that bear some semblance to earlier studies of the whole classification system continued to report diagnostic unreliability at levels similar to the pre-DSM-III studies (Regier, Kaelber, Roper, Rae, & Sartorius, 1994; Sartorius et al., 1993, p.116; Shear et al., 2000). The reliability of children’s diagnoses are equally dismal (Spitzer, Davies, & Barkley, 1990; Kirk, 2004; Lahey, Applegate, Barkley, et al., 1994; Boyle et al., 1993; Ezpeleta, de la Osa, Domenech, Navarro, & Losilla, 1997; Kirk & Hsieh, 2004).

Other studies of diagnostic reliability appear occasionally in the journal literature, but these generally have limited relevance to the reliability of the DSM classification system as a whole as used by general mental health practitioners. Typically, these studies test the reliability of special measuring scales developed to identify a particular disorder or measure the consistency of diagnoses made under highly specialized circumstances or at special clinics focused only on a particular narrow group of disorders. For example, Mary Zanarini and her colleagues (Zanarini & Frankenburg, 2001; Zanarini et al., 2000) in several articles report generally high reliability scores (kappa) for personality disorders and other nonpsychotic disorders with DSM-IV. The underlying motive for the studies appeared to be to demonstrate that personality disorders could be diagnosed as reliably as other DSM disorders.

A careful reading of the studies reveals how high reliabilities were accomplished and why they cannot be generalized to general diagnostic practice. The results were obtained by maintaining rigorous control over the making of diagnoses. The study took place in university-related clinics and hospitals at Brown, Columbia, Harvard, and Yale, under the close supervision of the project coordinators. The study was, by design, a study of the reliability of two highly structured and well-developed interview protocols, Structured Clinical Interview for Axis I Disorders (SCID-I) and the Diagnostic Interview for Personality Disorders (DIPD-R), one of them—for personality disorders—developed by the lead author. Interviewers were graduate students (presumably in relevant disciplines) who were selected by the coordinators and underwent a week of intensive training in the use of the instruments, which contained over three hundred questions. This training was followed by additional training, supervised practice interviews, and close supervision throughout the study. Each interviewer had to be “certified” before they participated in the study. The instruments not only structure the questioning of clients but provide a specific scoring system that produces the “diagnosis.” To test for inter-rater reliability, those trained to use the instrument participated on conjoint interviews (in which one person conducted the interview and the other observed), and each completed the instruments. At other times, those trained watched videotapes of prior interviews and their scores were compared. The patients for these interviews were selected by the project coordinator at each site.

Many structural and procedural aspects of these studies may account for the generally high reliability scores, among them the use of structured protocols that deliberately focus each interview, the careful selection of patients and interviewers, the use of conjoint interviews (which, technically, are not independent observations), the highly specialized nature of the research settings with highly motivated participants, and the leadership of the person who developed one of the instruments and had an obvious stake in its success. Under these special conditions and full technical control over the interviews and the scoring of answers, it is expected that interviewers would often reach agreement. Indeed, it would be surprising if they did not.

Does this mean that DSM itself is used reliably? None of these special research conditions apply to the real world of psychiatric diagnosis, in which individual clinicians with varying levels of training and perspectives are mandated to use DSM to screen a diverse array of clients walking into medical clinics, mental health centers, family service agencies, and so forth. Clinicians generally have different styles of interviewing, ask different questions, pursue different avenues for obtaining client information, and are supervised loosely, if at all. We will have more to say about this in the next chapter when we discuss DSM-5.

The Zanarini et al. (2000) diagnostic studies used the most highly trained and supervised interviewers working under the most controlled and protected circumstances. The relationship of the Zanarini et al. studies to actual clinical practice is about the same as comparing professional racecar drivers circling a race track at 150 miles per hour to granny moving her Volvo through city traffic. All are driving automobiles, just as all clinicians are making diagnoses, but no one ought to claim that there is much in common in their skills or in the setting.

Such studies of diagnostic instruments, whatever their usefulness for research purposes, have little bearing on the reliability of DSM as it is used by practitioners who are not mandated to use structured interview protocols and who see diverse patients walking into a clinic. As far as we know, there have been no new DSM reliability studies of the magnitude that Spitzer and his colleagues conducted over thirty years ago.⁶

The summary point is that there is little evidence that the signature objective of DSM-III to improve the reliability of psychiatric diagnosis was achieved. What was stated over a decade ago (Kutchins & Kirk, 1997) remains true:

Mental health clinicians independently interviewing the same person in the community are as likely to agree as disagree that the person has a mental disorder and are as likely to agree as disagree on which of the over 300 disorders is present. . . . If the unreliability of diagnosis were widely recognized and if there were no scientific patina to DSM, the use of everyday behaviors as indicators of mental disorders would be more rigorously questioned by the public. The illusion that psychiatrists are in agreement when making diagnoses creates the appearance of a united professional consensus. In fact, there is considerable professional confusion. (p. 53)

The claims of achieving diagnostic reliability with the new editions of DSM are a prime example of mad science—a triumph of hope, marketing, and political rhetoric over the scientific evidence. After five years of bitter struggle to build support for the new DSM-III, its developers were certainly not about to reveal that the old bogeyman of unreliability was still around, big as life. Too much investment had already been made in the new manual, too many opponents defeated, too many reputations in jeopardy—and the business of diagnosis had to move forward. In addition, so few in the mental health professions understood the then-arcane language of reliability statistics or were in any position to assess the veracity of the developers’ claims. In the process of promoting DSM-III, the claims were more the advertisements of entrepreneurs preparing to launch a new product, on which all their capital had been invested. Exaggeration, distortion, and bias should have been expected. In fact, it is a pattern that is replicated in the exaggerated claims of the effectiveness of community treatment (chapter 3) and drugs (chapters 6 and 7).

Twenty-five years after the launch of DSM-III, Spitzer admitted that he failed to solve the problem. “To say that we’ve solved the reliability problem is just not true,” he said plainly (Spiegel, 2005, p. 63). “It’s been improved . . . it’s certainly not very good. There’s still a real problem, and it’s not clear how to solve the problem.” Allen Frances, who worked on DSM-III and later chaired the task force that developed DSM-IV, pointed out in a 2005 interview that “without reliability the system is completely random, and the diagnoses mean almost nothing, because they’re falsely labeling. You’re better off not having a diagnostic system” (Spiegel, 2005). Nevertheless, he still claimed inaccurately that reliability had been improved, but acknowledged the ploy:

To my way of thinking, the reliability of the DSM—although improved—has been oversold by some people. . . . From a cultural standpoint, reliability was a way of authenticating the DSM as a radical innovation. In a vacuum, to create criteria that were based on accepted wisdom as a first stab was fine, as long as you didn’t take it too seriously. (p. 63)

Unfortunately, the vast majority took it very seriously. In the early twenty-first century, with the modern DSM completely institutionalized, its architects and entrepreneurs Spitzer and Frances could admit that diagnostic unreliability remains just as it was in the 1950s, a perplexing and unsolved problem that continues to undermine the validity of DSM.

The flawed new manual, however, became an undisputed marketing success, a runaway psychiatric bestseller. The APA sold 830,000 copies of DSM-III (Carey, 2008) and, given the price, grossed tens of millions of dollars. The APA commissioned a sequel from Spitzer, published in 1987 as DSM-III-R and in 1994 released DSM-IV and then a text revision in 2000 (DSM-IV-TR). DSM can now be found on the bookshelf of every mental health professional in America. It is reported that each year, sales of DSMs exceed $6.5 million (Greenberg, 2011).

The once obscure codebook is now an important instrument in shaping public mental health policy, guiding psychiatric—and much of psychology and social work—research and training, and determining the prevalence of mental disorders in the population. Moreover, because of insurance reimbursement requirements, DSM is used as the criteria for the financing of psychiatric treatment, hospitalization, and the use of medications. Furthermore, its categories have completely seeped into public discourse and shaped how behavioral problems of many kinds are described, understood, and anticipated to manifest themselves. Absolutely none of these indicators of acceptance and popularity pertained to its predecessor, DSM-II.

Troubling Consequences of Success

The success of DSM-III has spawned a cluster of unanticipated and very troubling consequences for people in need in help, for clinicians, and for our society. These will be explored in the next chapter. First, we will examine whether the descriptive approach to diagnosis introduced by DSM-III improved the validity of diagnosis as anticipated. Second, we will describe the alliance of American institutions that exploited DSM for their own purposes, while ensuring its success in the marketplace. For example, the APA embraced the new manual for restoring scientific respectability to psychiatry. Mental health clinicians and universities found that DSM provided a scientific-looking book for labeling human troubles and rapidly adopted it to authenticate their training and practices. The National Institute of Mental Health made the use of DSM’s criteria a requirement for funding research. The politically powerful drug industry rapidly exploited DSM’s expansive list of mental disorders and behaviors that could be labeled as symptoms and treated with expensive psychotropic drugs. The health insurance industry relied on the DSM to guide decisions about which disorders they would cover. The media, in news articles, magazines, and on television, provided a steady diet of stories about the “discovery” of new disorders and purported scientific advances in psychiatry and the wonders of new medications targeting the newly discovered disorders. We will then address the latest edition of the manual, DSM-5, and its controversies as it makes its way to publication in 2013. Is the new edition likely to be a diagnostic breakthrough or the inevitable collapse of descriptive psychiatry after a thirty-year reign? Finally, the social harms of DSM will be addressed.

Notes

1.The primary professionals working in mental health include psychiatrists, psychologists, social workers, and nurses. The number of professionals in each profession is elusive, depending on what levels of education are considered and which mental health service settings are included. One generally accepted estimate reports that in 2000 there were approximately 40,000 psychiatrists, 70,000 nurses, 70,000 social workers, and 88,000 licensed psychologists working in mental health services (Mechanic, 2008, pp. 12–17). Regardless of how these estimates are made, psychiatrists are always a small minority of mental health service providers in American.

2.In this regard, his model for mental illness was based on his work with Alois Alzheimer, who in his work on cadavers, identified the neurological disorder later known as Alzheimer’s, as well as general paresis.

3.E. Fuller Torrey later became one of the most outspoken promoters of coercive and involuntary treatment in psychiatry, and an adherent of the view that schizophrenia had a viral origin. The private nonprofit Stanley Medical Research Institute, which he directs, has amassed one of the largest banks of brains in the world, and has spent over $300 million since 1989 on research on the “brain diseases” of schizophrenia, bipolar disorders, and depression, according to the Institute’s website (http://www.stanleyresearch.org), but with no diagnostic brain lesions discovered so far through this effort.

4.Indeed, within a decade biopsychiatry had taken hold, as President George H.W. Bush and the NIMH proclaimed that the 1990s would be the “Decade of the Brain,” and by 2005, the director of NIMH argued that psychiatrists needed to be trained as neuroscientists, not psychotherapists (Insel & Quirion, 2005).

5.Weighted kappa has technical advantages and disadvantages. The advantage is giving greater weight to those diagnostic categories in which more patients were placed. The disadvantage is that if many patients received their diagnoses in clinical sites with higher base rates for particular disorders (e.g., patients who were referred to an anxiety disorder clinic and were therefore more likely to be diagnosed with anxiety disorders), the overall kappas from many different sites would be artificially inflated. Unweighted kappa gives equal status to each general diagnostic category.

6.At least we were unable to locate any, and our queries to key DSM-IV participants revealed none.

References

American Psychiatric Association. (1952). Diagnostic and statistical manual: Mental disorders. Washington, DC: APA.

American Psychiatric Association. (1980). Diagnostic and statistical manual of mental disorders, Third edition (DSM-III). Washington, DC: American Psychiatric Association.

American Psychiatric Association. (2000). Diagnostic and statistical manual of mental disorders, Fourth Edition (DSM-IV-TR). Washington, DC: American Psychiatric Association.

Andreasen, N. C. (1984). The broken brain: The biological revolution in psychiatry. New York: Harper & Row.

Bayer, R. (1981). Homosexuality and American psychiatry: The politics of diagnosis. Princeton, NJ: Princeton University Press.

Bayer, R., & Spitzer, R. (1982). Edited correspondence on the status of homosexuality in DSM-III. Journal of the History of the Behavioral Sciences, 18, 32–52.

Bayer, R., & Spitzer, R. (1985). Neurosis, psychodynamics, and DSM-III: A history of the controversy. Archives of General Psychiatry, 42, 187–195.

Bernstein, C. A. (2011, March 4). Meta-structure in DSM-5 process. Psychiatric Times, 7.

Blashfield, R. K. (1982). Feighner et al., invisible colleges, and the Matthew effect. Schizophrenia Bulletin, 81(1), 1–6.

Boyle, M. (2002). Schizophrenia: A scientific delusion? (2nd ed.). East Sussex, UK: Routledge.

Boyle, M., Offord, D., Racine, Y., Sanford, M., Szatmari, P., Fleming, J., et al. (1993). Evaluation of the diagnostic interview for children and adolescents for use in general population samples. Journal of Abnormal Child Psychiatry, 21(6), 663–681.

Buckley, P. J., Michels, R., & MacKinnon, R. A. (2006). Changes in the psychiatric landscape. American Journal of Psychiatry, 163, 757–760.

Carey, B. (2005, June 7). Most will be mentally ill at some point, study says. New York Times, 1.

Carey, B. (2008, December 18). Psychiatrists revise the book of human troubles. New York Times. Retrieved from http://www.nytimes.com/2008/12/18/health/18psych.html?pagewanted=all

Carlat, D. (2010). Unhinged: The trouble with psychiatry—A doctor’s revelations about a profession in crisis. New York: Free Press.

Carson, R. (1991). Dilemmas in the pathway of the DSM-IV. Journal of Abnormal Psychology, 100, 302–307.

Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting Psychology, 16, 319–323.

Eysenck, H. J. (1986). A critique of contemporary classification and diagnosis. In T. Millon & G. Klerman (Eds.), Contemporary directions in psychopathology (pp. 73–98). New York: Guilford Press.

Ezpeleta, N., de la Osa, N., Domenech, J., Navarro, J., & Losilla, J. (1997). Diagnostic agreement between clinicians and the Diagnostic Interview for Children and Adolescents—DICA-R—in an outpatient sample. Journal of Child Psychology and Psychiatry, 38(4), 431–440.

Foucault, M. (1965). Madness and civilization: A history of insanity in the age of reason. New York: Random House.

Frances, A. (2010a, December 22). The most important psychiatrist of our time. Psychiatric Times. Retrieved from http://www.psychiatrictimes.com/blog/couchincrisis/content/article/10168/1766926?_EXT_4_comsort=of

Frances, A. (2010b, July 6). Normality is an endangered species: Psychiatric fads and overdiagnosis. Psychiatric Times. Retrieved from http://www.psychiatrictimes.com/display/article/10168/1598676

Freud, S. (1969). The question of lay analysis. New York: W. W. Norton.

Goffman, E. (1961). Asylums: Essays on the social situation of mental patients and other inmates. Garden City, NY: Anchor Books.

Goleman, D. (1990, May 17). New paths to mental health put strains on some healers. New York Times, A1, B12.

Greenberg, G. (2010). Manufacturing depression: The secret history of a modern disease. New York: Simon & Schuster.

Greenberg, G. (2011, January). Inside the battle to define mental illness. Wired. Retrieved from http://www.wired.com/magazine/2010/12/ff_dsmv/

Horwitz, A. V. (2002). Creating mental illness. Chicago: University of Chicago.

Horwitz, A. V. (2010). How an age of anxiety became an age of depression. Milbank Quarterly, 88(1), 112–138.

Horwitz, A. V., & Wakefield, J. C. (2012). All we have to fear: Psychiatry’s transformation of natural anxieties into mental disorders. New York: Oxford University Press.

Hyler, S., Williams, J., & Spitzer, R. (1982). Reliability in the DSM-III field trials. Archives of General Psychiatry, 39, 1275–1278.

Illich, I. (1975). Medical nemesis: The expropriation of health. New York: Pantheon.

Insel, T. R., & Quirion, R. (2005). Psychiatry as a clinical neuroscience discipline. JAMA, 294(17), 2221–2224.

Jaffe, E. (2010, January 11). Of two minds. Los Angeles Times, E1, E6–E7.

Joint Commission on Mental Illness and Health. (1961). Action for mental health. New York: Basic Books.

Kendell, R. E. (1975). The role of diagnosis in psychiatry. Oxford: Blackwell Scientific Publications.

Kessler, R., McGonagle, K., Zhao, S., Nelson, C., Hughes, M., Eshleman, S., et al. (1994). Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States: Results from the National Comorbidity Survey. Archives of General Psychiatry, 51, 153–164.

Kirk, S. A. (2004). Are children’s DSM diagnoses accurate? Brief Treatment and Crisis Intervention, 4, 255–270.

Kirk, S. A., & Hsieh, D. K. (2004). Diagnostic consistency in assessing conduct disorder: An experiment on the effect of social context. American Journal of Orthopsychiatry, 74(1), 43–55.

Kirk, S. A., & Kutchins, H. (1992). The selling of DSM: The rhetoric of science in psychiatry. Hawthorne, NY: Aldine de Gruyter.

Klerman, G. (1984). The advantages of DSM-III. American Journal of Psychiatry, 141, 539–542.

Klerman, G. (1986). Historical perspectives on contemporary schools of psychopathology. In T. Millon & G. Klerman (Eds.), Contemporary directions in psychopathology. New York: Guiford Press.

Kupfer, D. J., First, M. B., & Regier, D. A. (2002). Introduction. In D. J. Kupfer, M. B. First, & D. A. Regier (Eds.), A research agenda for DSM-V (pp. xv–xxiii). Washington, DC: American Psychiatric Association.

Kutchins, H., & Kirk, S. A. (1997). Making us crazy: DSM: The psychiatric bible and the creation of mental disorders. New York: Free Press.

Lahey, B., Applegate, B., Barkley, R., Garfinkel, B., McBurnett, K., Kerdyk, L., et al. (1994). DSM-IV field trials for oppositional defiant disorder and conduct disorder in children and adolescents. American Journal of Psychiatry, 151, 1163–1171.

Lahey, B., Applegate, B., McBurnett, K., Biederman, J., Greenhill, L., Hynd, G., et al. (1994). DSM-IV field trials for attention-deficit/hyperactivity disorder in children and adolescents. American Journal of Psychiatry, 151, 1673–1685.

Laing, R. D. (1967). The politics of experience. Baltimore: Penguin Books.

Luhrmann, T. M. (2000). Of two minds: The growing disorder in American psychiatry. New York: Knopf.

Maxmen, G. (1985). The new psychiatrists. New York: New American Library.

Mechanic, D. (2008). Mental health and social policy: Beyond managed care (5th ed.). Boston: Pearson.

Michels, R. (1984). First rebuttal. American Journal of Psychiatry, 141, 548–551.

Millon, T. (1983). DSM-III: An insider’s account. American Psychologist, 38, 804–815.

Millon, T. (1986). On the past and future of the DSM-III: Personal recollections and projections. In T. Millon & G. Klerman (Eds.), Contemporary directions in psychopathology: Toward the DSM-IV (pp. 29–70). New York: Guilford Press.

Regier, D., Kaelber, C., Roper, M. T., Rae, D., & Sartorius, N. (1994). The ICD-10 clinical field trial for mental and behavioral disorders: Results in Canada and the United States. American Journal of Psychiatry, 151, 1340–1350.

Rosenhan, D. L. (1973). On being sane in insane places. Science, 179, 250–258.

Rounsaville, B., Alcarcon, R., Andrews, G., Jackson, J. S., Kendell, R. E., & Kendler, K. (2002). Basic nomenclature issues for DSM-V. In D. J. Kupfer, M. B. First, & D. A. Regier (Eds.), A research agenda for DSM-V (pp. 1–29). Washington, DC: American Psychiatric Association.

Sartorius, N., Kaelber, C. T., Cooper, J. E., Roper, M. T., Rae, D. S., Gulbinat, W., et al. (1993). Progress toward achieving a common language in psychiatry: Results from the field trial of the clinical guidelines accompanying the WHO classification of mental and behavioral disorders in ICD-10. Archives of General Psychiatry, 50, 115–125.

Scheff, T. J. (1966). Being mentally ill: A sociological theory. Chicago: Aldine.

Shear, M., Greeno, C., Kang, J., Ludewig, D., Frank, E., Swartz, H., et al. (2000). Diagnosis of nonpsychotic patients in community clinics. American Journal of Psychiatry, 157(4), 581–587.

Spiegel, A. (2005, January 3). The dictionary of disorder. New Yorker, 56–63.

Spitzer, R. (1975). On pseudoscience in science, logic in remission, and psychiatric diagnosis: A critique of Rosenhan’s “On being sane in insane places.” Journal of Abnormal Psychology, 84, 442–452.

Spitzer, R., Davies, M., & Barkley, R. (1990). The DSM-III-R field trial of disruptive behavior disorders. Journal of the American Academy of Child and Adolescent Psychiatry, 29, 690–697.

Spitzer, R., & Forman, J. (1979). DSM-III field trials: II. Initial experiences with the multiaxial system. American Journal of Psychiatry, 136, 818–820.

Spitzer, R., Forman, J., & Nee, J. (1979). DSM-III field trials: I. Initial interrater diagnostic reliablity. American Journal of Psychiatry, 136, 815–817.

Spitzer, R., & Wilson, P. T. (1968). A guide to the American Psychiatric Association’s new diagnostic nomenclature. American Journal of Psychiatry, 124, 1616–1629.

Spitzer, R., & Wilson, P. T. (1969). DSM-II revisited: A reply. International Journal of Psychiatry, 7, 421–426.

Spitzer, R. L., & Fleiss, J. L. (1974). A re-analysis of the reliability of psychiatry diagnosis. British Journal of Psychiatry, 125, 341–347.

Spitzer, R. L., Williams, J., & Skodal, A. (1980). DSM-III: The major achievements and an overview. American Journal of Psychiatry, 137, 151–164.

Szasz, T. (1961). The myth of mental illness: Foundations of a theory of personal conduct. New York: Hoeber-Harper.

Talbot, J. (1980). An in-depth look at DSM-III: An interview with Robert Spitzer. Hospital and Community Psychiatry, 31, 25–32.

Torrey, E. F. (1974). The death of psychiatry. New York: Penguin.

Ullmann, L. P., & Krasner, L. (1969). A psychological approach to abnormal behavior. Englewood Cliffs, NJ: Prentice-Hall.

Vaillant, G. (1984). The disadvantages of DSM-III outweigh its advantages. American Journal of Psychiatry, 141, 542–545.

Whitaker, R. (2010). Anatomy of an epidemic: Magic bullets, psychiatric drugs, and the astonishing rise of mental illness in America. New York: Crown.

Williams, J., Gibbon, M., First, M., Spitzer, R. L., Davies, L., Borus, J., et al. (1992). The structured clinical interview for DSM-III-R (SCID): II. Multi-site test-retest reliability. Archives of General Psychiatry, 49, 630–636.

Zanarini, M. C., & Frankenburg, F. R. (2001). Attainment and maintenance of reliability of Axis I and II disorders over the course of a longitudinal study. Comprehensive Psychiatry, 42(5), 369–374.

Zanarini, M. C., Skodal, A., Bender, D., Dolan, R., Sanislow, C., Schaefer, E., et al. (2000). The collaborative longitudinal personality disorders study: Reliability of Axis I and II diagnoses. Journal of Personality Disorders, 14(4), 291–299.

Zigler, E., & Phillips, L. (1961). Psychiatric diagnosis: A critique. Journal of Abnormal and Social Psychology, 63, 607–618.

Zilboorg, G. (1941). A history of medical psychology. New York: Norton.