CHAPTER 7

Measure

AN ILLNESS becomes an object of knowledge when it is identified, as its causes are discovered, and as methods of prevention, treatment, or cure are developed. Measurement is a second route to knowledge, and the two routes cross. For example, the causal story about multiple personality is bolstered by measurements used to establish that dissociation comes in degrees, so that children with a strong innate predisposition to dissociate may use that as a device to cope with trauma. Thus Putnam writes that “central to the concept of the adaptive function(s) of dissociation is the idea that dissociative phenomena exist on a continuum.”1

Why does he think that there is a continuum? He cites two sources of evidence. First, hypnotizability in the general population forms a continuum from those who are highly resistant to those who are hypnotized at the wave of a hand. It is postulated that there is an analogy between susceptibility to hypnotism and tendency to dissociate. “The second line of evidence supporting the concept of a continuum of dissociative experiences … comes from surveys using the Dissociative Experiences Scale.”2 That scale was the first objective measure of dissociative experiences.

The continuum of dissociative experiences has become something of an accepted fact within the multiple personality movement. It has been criticized from outside. Fred Frankel, a psychiatrist who is an expert on clinical and experimental uses of hypnotism, cautions against equating hypnotizability scores with dissociative capacity, and warns against the ready assumption that hypnotizability itself is a single phenomenon, so that everyone is simply more or less hypnotizable. Thus he doubts Putnam’s first line of evidence.3 He also queries the second line for reasons that I will soon mention. Unlike Frankel I am less concerned to question Putnam’s continuum hypothesis than to show how creating systems of measurement, such as the Dissociative Experiences Scale, can bring a fact—the fact of a dissociative continuum—into being.

The past ten years have seen rapid development of quantitative measures of dissociation and multiplicity. That makes the study of the disorder more and more like other branches of empirical psychology. To avoid getting lost in statistical details, I shall focus on two related items: Putnam’s continuum hypothesis and the very first method of measuring dissociation, the Dissociative Experiences Scale (“DES”) published by Putnam and Eve Bernstein Carlson in 1986. These two authors used their scale to test their hypothesis that “the number and frequency of experiences and symptoms attributable to dissociation lie along a continuum.”4

I approach matters this way for several reasons. First, it allows us to focus on the logical nature of the concept “dissociation.” Is it well represented by a linear continuum? Second, the continuum hypothesis, established by the use of objective questionnaires, leads on to the objective theory of causation. A third reason for my approach is that testing hypotheses has cachet in itself: thanks to Karl Popper’s influential philosophy of science, testing hypotheses is widely regarded as the sine qua non of objective science. Bernstein and Putnam stated two hypotheses they “sought to test,” one of which was the continuum hypothesis. Their work thereby acquired the tone of Popper’s hard-nosed science, yet these authors did not, in fact, test their hypotheses at all. Finally, Colin Ross asserted in 1994 that “over the past ten years, the MPD literature has evolved from prescientific to scientific status.”5 By studying the DES and related statistical tests we shall be able to form a just appreciation of this scientific status.

Empirical psychology has created its own genre of objectivity, the questionnaire subjected to standardized scoring and statistical comparisons.6 Best known to most of us are IQ tests. Nothing could seem much further from multiple personality than the intelligence quotient. Yet by what ought to be the sheerest coincidence, the early histories of the two are intertwined. Alfred Binet is usually regarded as the founder of intelligence testing; descendants of the Stanford-Binet tests are still in use. Early in his career, before he turned to intelligence, Binet was writing about multiple personality.7 He studied hypnotism intensively and discussed its ability to produce alter states. He was up to his neck in one of the zanier types of research, metallotherapy, in which hysterical symptoms were relieved by the application of different metals to different parts of the body. The very first truly multiple personality, the subject of chapter 12 below, was made multiple by metallotherapy.

Morton Prince, the great American pioneer of multiplicity, during a visit to France where his mother was to be treated for neurasthenia, took the opportunity to study under Binet. H. H. Goddard, whose 1921 patient Bernice was among the last of the first American wave of multiples (and whom I use as an example in my final chapter) also began his career under Binet. He returned to America to develop the low end of intelligence testing and invented the word “moron.” Goddard’s measures of feeblemindedness showed that nearly all immigrants from central and southern Europe were unintelligent. It was surely Binet2, measurer of intelligence, and not Binet1, student of multiple personality, who left his mark on the larger history of psychology, yet Binet1 would surely be delighted at the way in which testing, which Binet2 fostered, has now found a niche in what Binet1 called “alterations of the personality.”

Psychologists often refer to tests and questionnaires as instruments. That makes us think of the material apparatus of chemistry or physics. The analogy usefully points to one of the central methodological practices of the natural sciences, what the philosopher of science Nicholas Jardine has called calibration.8 When a new kind of instrument is introduced for purposes of measurement, it has to be calibrated against old measurements or judgments. The atomic clock may supersede astronomical clocks, but it must also give very much the same readings of time as previous instruments. And we should be able to explain how it differs and show why its revised measurements of time are to be preferred.

The expression used in psychology is not calibration but validation. A key phrase is “construct validity”; I shall avoid that language, for although it is standard in experimental psychology, its usage is largely restricted to that field. Psychologists talk of instruments and call the Dissociative Experiences Scale an instrument. What we do as we begin to use ordinary instruments—such as physical science apparatus—is to calibrate, not to validate. No one ever talked about validating the atomic clock. Of course “validity” is a value word: a validated instrument or construct is all right. But when we look at how the Dissociative Experience Scale is said to be validated, we see something very ordinary, unproblematic, and untechnical.9 We see that it is checked and calibrated against prior expert judgments and diagnoses, just as the atomic clock was calibrated to prior astronomical judgments of time. We check, for example, to see that people diagnosed as multiples score highly on the DES, and that scores do not correlate with traits thought to be irrelevant to dissociation.

The history of intelligence testing has been a history of calibration of instruments. Binet was immured in a world dense with scholastic examinations. None was more uniform and interpersonal than those administered by the French educational bureaucracy. Binet had qualms about the system, especially in regard to less gifted children, but he did not flaunt his doubts. His measures of “intelligence” had to agree, generally, with preexisting judgments and then be adapted at the margins. Had he declared that many children who could not cope with French elementary education were intelligent, he would have been mocked. Had he said that the better students at the lycées were stupid, he would have been reviled. He had perhaps measured something, generous people might have said, but not intelligence. (Compare: if the atomic clock did not calibrate with sun time, it might measure something, but not what we call time.) Binet’s great innovation, the testing of intelligence, made sense only against a background of shared judgments about intelligence, and it had to agree with them by and large, and also to explain when it disagreed. Who shared the judgments? Those who matter, namely, the educators, other civil servants, and Binet’s peers in the middle classes of society.

Despite the sometimes unattractive features of the history of intelligence testing, there was seldom a deep problem about calibration. This was because, at any time, there was a body of agreed judgments and discriminations of intelligence to which the IQ tests were calibrated. Sometimes prior judgments were modified in the light of test results, and sometimes tests were revised as a result of calibration failure.10 Most of the sciences work that way, although each has its own traditions and terminology. One result of calibration is that prior judgments became both sharpened and objectified. What were once discriminations made by suitably educated or trained individuals were turned into impartial, distant, nonsubjective measures of intelligence. Intelligence became an object, independent of any human opinions. Empirical psychology has regularly achieved objectivity by following this route. The pattern for the objectification of multiple personality by measurement had been established for decades when Putnam and Bernstein introduced the DES.

Two types of questionnaires are used for multiple personality. One type is self-administered. A subject answers some printed questions and is scored accordingly. The DES was the first example of this type of test; two additional ones are now being studied.11 These tests are said to be intended for screening only, and not for diagnosis. A more searching type of probe is based on a set of questions printed in a manual; an interviewer puts the questions to the subject and records responses, which are then scored. It is proposed that such questionnaires can be used for tentative diagnosis.

These questionnaires are research tools for studying dissociation. They may also be used to select and screen subjects who will be examined further. They can be used for surveys of chosen populations—psychiatric inpatients, college students, or randomly selected city-dwellers—in order to discover the incidence or distribution of dissociative experiences. The questionnaires are sometimes presented as instruments for routine screening or tentative diagnosing in a hospital or outpatient clinic. There is no information on the extent to which they are so used, outside of research settings. Their day-to-day (nonresearch) use is encouraged less in clinics than at some of the small workshops for therapists that take place all over the continent. As Putnam himself has regretfully noted, such workshops often do not involve actual clinical work or follow-up training.12 That is one way in which the questionnaires objectify and legitimate multiple personality—the therapist is made to feel that she is using a scientific tool. An anthropologist observing the practices of designing and testing questionnaires might suggest that the primary function served is not to provide a working tool for the hospital admissions department or for the clinic. It is rather to establish the objectivity of knowledge about dissociative disorders.

Dissociation questionnaires are checked and calibrated through a comparison between scores and diagnoses made by qualified personnel. There are incidental but necessary checks. Do subjects held to be normal respond roughly in the same way when asked to fill in the questionnaire a second time, a few months later? As successive questionnaires in the field are developed, each is calibrated with previous questionnaires and further clinical judgments. Hence a network of mutually consistent and self-confirming testing devices is set in place. For example, the results of an interview questionnaire are compared to those of a self-administered questionnaire, and both are compared to expert clinical judgment.

There is a superficial but grave problem about the calibration of dissociation questionnaires. To what agreed judgments should they be calibrated? In the field of dissociative disorders there is no body of agreed judgment. Many leading psychiatrists say there is no such field. What we are observing is not the calibration of dissociative scales to judgments shared by students of the human mind and its pathologies. Instead, the scales are calibrated to the judgments of a movement within psychiatry. They are presented as objective, scientific results like any other. Formally, the procedures of calibration are no different from those used in other branches of psychology and clinical medicine. The problem is that they are not calibrated to an independent standard.

The issue of independence is seldom addressed squarely. Responses to the DES made by psychiatric patients in seven different establishments in North America were compared, in part to check on independence. At each of these seven centers, patients were selected, tested on the scale, and independently diagnosed. According to the authors of the study, “We can safely say that the DES data collected in this study were unrelated to the diagnostic process.” The paper was written by Eve Bernstein Carlson, statistician, six psychiatrists from six of the seven centers, and an expert on testing from the seventh center. I have more to say about the seventh center later in this chapter. The six psychiatrists are six leading researchers on multiple personality, mostly past or future presidents of the ISSMP&D, each running a clinic or research center studying or treating multiple personality.13 “The Dissociative Experiences Scale items do not measure the diagnostic criteria for multiple personality, and Dissociative Experiences Scale data collected in this study were unrelated to the diagnostic process.” But the conclusion cannot be drawn that the diagnoses and the scale were in any ordinary sense independent. This was a calibration of an in-house scale against in-house diagnoses—in places where multiple personality behavior was acknowledged, elicited, encouraged, and even fostered. At many other centers one would have had zero diagnoses of multiple personality.

Calibrating the atomic clock involves going to the experts, the astronomers, so why not have experts on multiple personality calibrate the dissociation scale? The comparison fails. There is no viable body of astronomers—let alone a majority—who disagree with standard solar and astronomical time measurements. An unkind skeptic might compare calibration based on the judgments of multiple personality experts to calibrating a clock on the basis of the judgments of sophisticated flat-earthers who hold that the regularity of solar time is an illusion. Their time has no regular connection with solar time, or even with lunar time. An internal consistency might be established between their new clock and their “time,” but so what?

Internal consistency does have a power of its own. Once we have enough internally consistent tests, once we apply a routine battery of statistical comparisons, once we produce a sufficient number of charts and graphs, then, so long as we use the mantra of statistical degrees of significance, the entire structure does seem to become objective. Let’s see how this happens in practice.

Bernstein Carlson and Putnam published their initial results in 1986. Their slightly revised questionnaire of 1993 begins with the following instructions.14

This questionnaire consists of 28 questions about experiences that you may have in your daily life. We are interested in how often you have these experiences. It is important, however, that your answers show how often these experiences happen to you when you are not under the influence of alcohol or drugs. To answer the questions, please determine to what degree the experience described in the question applies to you and circle the number to show what percentage of the time you have the experience.

Then the subject is given a choice of percentages, 0 percent, 10 percent, 20 percent, etc. Some of the questions involve what we often call daydreaming, absentmindedness, or being caught up in a story. How often do you find you can’t recall whether or not you mailed the letter you intended to post? How often do you have the experience, when taking a trip in a car, bus, subway, or whatever, of suddenly realizing that you don’t remember part or all of the trip? How often, when watching TV or a movie, do you lose track of what is going on around you?

Some questions involve classic aspects of prototypical multiple personality: Being accused of lying, when you don’t think you lied. Finding unfamiliar things among your belongings. Discovering evidence that you’ve done something you can’t recall doing. Having no memory of an important event in your life, such as a wedding or graduation. Being approached by people you don’t know who call you by name. Failing to recognize friends or family members.

Other questions involve what is called depersonalization or derealization. Depersonalization is listed in both DSM-III and DSM-IV as a dissociative disorder, but this diagnosis has a complex history. It appears with other types of disorder, and is held by some theorists of dissociation not to be a dissociative disorder at all. The issues, both historical and diagnostic, lead in so many directions that I decided not to discuss them in this book. In the dissociation questionnaire, depersonalization or derealization is broached by questions about whether one has the feeling that other people or objects are not real—or that one is not real oneself. Do you feel your body is not your own? Do you look in the mirror and not recognize yourself? Do you sometimes have the feeling that you are standing next to yourself, or watching yourself, as if you were another person?

One of the odd things about such questionnaires is that they cannot be taken literally. Even the directions that I just quoted are puzzling. The investigators want to know “how often” you have certain experiences, but two sentences later they ask “to what degree” you have these experiences. These are two materially different questions, yet you have only one “percentage” to answer with.15 The ambiguity poses no practical problem, though: no one has any trouble completing the questionnaire. The test determines the responses to 28 printed sentences. And it is very clear, in a nonliteral way, what the questions are getting at.

The very directness of the questions unfortunately means that anyone who catches on and who wants to reply as if ill, pretend to be well, or otherwise play the fool can easily do so. This was confirmed in an experiment in which one group of student nurses was asked to answer straightforwardly, a second group to answer as if they had problems (“to fake bad”), and a third group to answer as if they were supernormal (“to fake good”). Nurses in a final group were asked “to fake MPD.” Without further instruction, the nurses produced the profiles requested.16 It is not only experimental subjects who behave like this. There is a feedback effect from the questionnaire to potential multiple personality patients. Richard Kluft has remarked whimsically that “many ‘well-travelled’ dissociative disorder patients have become overly familiar with the DES, and may enter the clinician’s office with a copy of their last DES as one of the many exhibits in their bulging files.”17

It is hardly the fault of Bernstein and Putnam that their questionnaire has had an effect on patient symptomatology. Their initial research was purely scientific in intention. Their first experiment used 34 normal adults, 31 college undergraduates aged 18–22, 14 alcoholics, 24 phobic-anxious patients, 29 agoraphobics, 10 post-traumatic stress disorder patients, 20 schizophrenics, and 20 patients with multiple personality disorder. The patients had been diagnosed by authorized clinics, hospitals, or research groups.

Scores on the 28 questions are averaged for a final score out of 100. Normal adults and alcoholics scored about 4, phobics about 6, college students about 14, and schizophrenics about 20. Post-traumatic stress disorder patients scored 31.35 and multiples scored 57.06. Thus the test seems to sort diagnosed multiples from diagnosed schizophrenics, although, as we shall see in chapter 9, the borders between schizophrenia and multiplicity are contested.

It was no miracle that diagnosed multiples scored so highly. Numerous questions on the test correspond to the 1980s prototype for multiplicity. Moreover, these questions specifically draw attention to aspects of multiplicity that are emphasized in clinical treatment for the illness, so that the diagnosed patients know when to score themselves highly. The authors themselves note such learning effects.18 But there is nothing illicit about choosing such questions. The point of the test design is to include questions on which multiples score highly.

Some of the results may nevertheless have nothing to do with dissociation. Thus the college students score far more highly than normal adults, and are not so far short of schizophrenics. A number of other studies find a high degree of dissociation among college students. Does this show that students are abnormally dissociative? Or does it show that young people, especially those pursuing university education, daydream, are imaginative, can become absorbed in what they are doing? I dread the thought of teaching a class with an average score on the DES of less than 15.19

Bernstein and Putnam obtained fascinating data. Karl Popper taught that there is a difference between mere data collection and the testing of hypotheses. He counted only hypothesis testing as scientific. Bernstein and Putnam would seem to have honored his precept, for they “sought to test two general hypotheses.” The first is the hypothesis “that the number and frequency of experiences attributable to dissociation lie along a continuum.” The idea is easy to understand: almost everyone dissociates from time to time, some people dissociate quite a lot, and multiples dissociate more than anyone else. It is not so easy to turn that into a testable hypothesis.

What would be a precise version of the continuum hypothesis? One is that dissociative tendencies are what logicians call well-ordered. That is, we can say of any two people that they are equally dissociative, or that one is more dissociative than the other. Anyone who completes all 28 questions gets a score between 0 and 100. The scores of different people automatically “lie along a continuum.” That is a result of the test design. Hence the well-ordering version of the continuum hypothesis was not tested.

Under the assumption—by no means a negligible one—that dissociation is well-ordered, we can frame a second continuum hypothesis. There are no holes in the test results. That is, for any degree of dissociation, some people are dissociated to that degree. This no-gap hypothesis can be stated precisely.20 It is part of what Bernstein and Putnam had in mind. It is a very weak hypothesis, tested by noting whether, for each segment between the lowest and highest score observed in a given population, at least one person has a score in that segment. Bernstein and Putnam did not bother to test the no-gap hypothesis, probably because it is so uninteresting.

They were preoccupied by other questions. They noted that many authorities on dissociation assume that virtually everyone is a little bit dissociative. Under the assumption that dissociation is well-ordered, we can frame a no-threshold hypothesis. Groups of what psychiatrists classify as normal people have on average a nonzero dissociative score.21 This was not, however, a test of the no-threshhold hypothesis because the result depended so heavily on the choice of questions. If they had used a suitable subset of the 28 questions, virtually all people called normal would score zero. How often do you look in the mirror and not recognize yourself? How often do you fail to recognize close family and friends, whom you’ve seen recently, and whom you meet again in ordinary circumstances? If the test had used only questions like that, there would have been a sharp threshold, with the normals on one side and some very disturbed people on the other. Instead the authors included questions bearing on absentmindedness, daydreaming, self-absorption, and fantasy. As Frankel noted, almost two-thirds of the items on the questionnaire “can be readily explained by the manner in which subjects recall memories, apply or redistribute attention, use their imagination, or direct or monitor control.”22 The no-threshold hypothesis was not tested because questions were included that would preclude a break between those who score zero and those who score positively.23

A fourth interpretation of the continuum hypothesis is that not only are there no gaps in degrees of dissociative experience, but there is also a smooth flow of dissociative experiences from those of normal people to those of multiples. Call this the smooth hypothesis. There are many ways to be smooth. Suppose we drew a bar graph of discriminable scores or groups of scores. Then the most natural way to understand the vague word “smooth” would be that the graph looks like a slope, up or down, or like a hill, or like a valley.24 That gives four possible hypotheses; many people would expect a hill. The hill hypothesis for a chosen population is that a bar graph of dissociative scores forms a hill. Such hypotheses are tested on a random sample from the population. Bernstein and Putnam did not randomly sample any whole population but instead took volunteers from a number of specific populations, such as college students or phobics. Hence they did not test the hill hypothesis.

I have now distinguished four versions of the continuum hypothesis. Bernstein and Putnam did not test the well-ordering hypothesis because they designed a test that gave well-ordered results. They did not test the intrinsically uninteresting no-gap hypothesis—they could have done so, but they did not mention it. They did not test the no-threshold hypothesis because they included questions that prejudge the issue. They did not test the smooth hypothesis because they did not randomly sample any whole population. They said that they “sought to test” the continuum hypothesis, but they did not do so.

“Testing hypotheses” is one of the activities commonly supposed to make work count as scientific. Bernstein and Putnam head one section of their paper with the title “Hypotheses to be Tested,” yet in this paper the authors did not report any tests of their hypotheses. An anthropologist observing psychological testing practices might go so far as to suggest that it is part of the way in which such papers are assessed and used that no one raises questions such as, did you test the hypothesis you said you tested? Once you have said you are testing a hypothesis, it is as if you have done it. The peer referees and the journal editor do not look to see if you have tested the hypothesis. They look to see if you have used various prescribed statistical procedures. No one asks about the meaning of those procedures.

This is even more apparent when we turn to the second of the two hypotheses that the authors “sought to test.” “The second hypothesis is that the distribution of dissociative experiences in the population would not follow a Normal probability (Gaussian) curve but would exhibit a skewed distribution similar to that observed for the ‘trait’ of hypnotic susceptibility.”25 Normal distributions are the most commonly used probability distributions; they are often described as “bell shaped,” but they are literally bell shaped—symmetrical—only when the mean is 0.5. Evidently Bernstein and Putnam expected the distribution of experiences to look like a hill, but not to be Gaussian. Their hypothesis is about a population. They do not say which one, but it might be the population of the United States, or the population of patients admitted to psychiatric care in Washington, D.C. Such hypotheses can only be tested on a random sample of the population. Bernstein and Putnam, who did not randomly sample any population, did not test this hypothesis.

Yet they say something very curious in this connection. They present a graph of scores for all subjects. It peaks at about 10 percent. The authors write, “Clearly this distribution is not normal.”26 By “this distribution” they mean the distribution of scores from their population of 34 normal nonstudent adults, 31 college students, 20 schizophrenics, 20 severe multiples, 14 alcoholics, 53 phobics, 20 multiples, and 10 people diagnosed with post-traumatic stress disorder. It makes no sense to talk about the probability distribution or sampling distribution of a population constituted in these proportions.27

Bernstein and Putnam’s second hypothesis is testable, and so is the “hill” version of the continuum hypothesis. The first random sample of a population tested by the DES consisted of 1,055 citizens of Winnipeg, Manitoba, and did apparently result in a smooth hill-shaped curve.28 It has not been determined whether the hill is Gaussian, although the authors do say that the curve qualitatively resembles curves for susceptibility to hypnotism, which are said not to be Gaussian. No one bothered to look into these matters, because the hypothesis of a continuum of dissociative experiences had already become a fact.

The DES inspired a welter of new instruments. There are several new self-report scales, and there are interview-type questionnaires. Thus Ross and his coworkers developed a Dissociative Disorders Interview Schedule tied to DSM-III diagnostic criteria.29 They have asserted that this interview schedule is more reliable at detecting multiple personality disorder than are other questionnaire tests for other disorders.30 Marlene Steinberg designed a schedule keyed to DSM-III-R, and then one for DSM-IV.31 The most extended set of mutual calibrations has been conducted in the Netherlands.32

One standard statistical procedure is factor analysis. It is a technique to assess the extent to which the variability of a trait in a population can be attributed to a number of distinct causes. The factors are ranked according to their impact in producing variability. Not only has the DES been made the object of factor analysis, but different self-report scales have been studied to see how they elicit different factors. Carlson et al. identified three factors in a population of clinical and nonclinical subjects. “The first factor was thought to reflect amnesic dissociation,” the second, “absorption and imaginative involvement,” the third, “depersonalization and derealization.”33 With nonclinical subjects the chief factor identified is called “an absorption and changeability factor.”

Ross’s group found that dissociation scores in Winnipeg were produced by three factors that they called “absorptive-imaginative involvement,” “activities of dissociated states,” and “depersonalization-derealization.”34 Ray and colleagues found that scores on the DES could be attributed to seven factors ordered as follows: “(1) Fantasy/Absorption; (2) Segment Amnesia, (3) Depersonalization, (4) In situ Amnesia, (5) Different Selves, (6) Denial and (7) Critical Events.” But scores on another self-report scale for dissociation could be attributed to six factors that they called “(1) Depersonalization, (2) Process Amnesia, (3) Fantasy/Daydream, (4) Dissociated Body Behaviors, (5) Trance and (6) Imaginary Companions.”35

It is familiar to statisticians that factor analysis is a remarkably useful tool when in safe hands, but that its use also demands a considerable amount of good sense.36 This is a miscellaneous stew of “factors”—after duplication is eliminated, there appear to be at least eleven of them. If they mean anything at all, they seem to suggest that the original continuum hypothesis is false. This is because low scores on the DES may be attributable to factors quite distinct from the factors that account for high scores. Before these studies were published, Frankel wrote that a “distinct qualitative difference between subjects with high and low scores has not been ruled out.”37 Has that qualitative difference now been confirmed by factor analysis? No, because one doubts that these analyses, taken together, confirm anything.

Questionnaires about dissociation should help to answer a different kind of question. How common is pathological dissociation? A number of authors have suggested that a score above 30 is a sign of pathology or, more specifically, of multiple personality. Ross conjectured that the incidence of multiple personality in North America may be as high as 2 percent. He proposed that the incidence among college students may be as high as 5 percent; subsequently he and his colleagues have suggested that the rate may be even higher.38 In a letter published in a British journal, Ross, writing from Canada, implied that 5 percent of “all individuals admitted to an acute care adult psychiatric in-patient unit in Britain or South Africa … [would] meet DSM-III-R criteria for MPD.” A second Canadian doctor, Lal Fernando, replied in exasperation, “Considering the fact that the majority of psychiatrists on both sides of the Atlantic have never seen or diagnosed a case of MPD, I find these figures and predictions incredible.”39 This is a stark statement of the problem of calibration that I mentioned earlier. Fernando need not disagree with Ross’s statistical analysis. He is questioning the calibration itself.

We can well imagine that if multiple practitioners trained by Ross were to take over a South African hospital, they would find that 5 percent of patients admitted were multiples. The problem for Fernando and many other doctors is that the DES is not calibrated against judgments made by a consensus in the psychiatric community, but against the judgments of psychiatrists who are advocates of multiple personality. The nearest we get to an outside opinion is the seven-center study mentioned above. The authors, as I noted, include six leading multiple personality researchers. What about the seventh center, McLean Hospital in Belmont, Massachusetts? It has a dissociative disorders unit directed by James Chu. Chu has published favorably on the diagnosis of multiple personality and has written about how difficult it is for some patients to face up to their multiplicity.40 Thus he is not a skeptic, but he has warned against overdiagnosis. In the clinical approach to dissociation he recommends treating other disorders first and minimizing the expression of dissociative symptoms.41 He insists strongly on patient responsibility. The coauthor for the seven-center study from McLean Hospital was a colleague of Chu’s who had supervised the testing.42

The six centers other than McLean provided the study with 953 patients, 227 of whom were diagnosed with multiple personality disorder. McLean provided 98 patients, only one of whom was diagnosed with multiple personality, and that patient was excluded from the results. Patients with diagnoses of illnesses that are not widely regarded as “dissociative” consistently had higher DES scores at McLean than similarly diagnosed patients at the six other centers. On the other hand, patients at McLean with what are often urged to be dissociation-prone disorders—post-traumatic stress disorder, eating disorders—had lower DES scores than their counterparts at the other centers. Qualitatively speaking, these results from McLean are the opposite of those from the other six centers. But that hospital is not an environment hostile to multiple personality or dissociation. As soon as we edge even a very short distance away from absolute commitment to multiple personality, the scores and their relations to diagnoses begin to change radically.

Thus the very study intended to clinch the “validity of the dissociative experiences scale in screening for multiple personality disorder” reveals that there is a serious problem about calibration. A logic textbook has described one type of fallacy as the fallacy of the self-sealing argument. This is an argument whose only confirmation is provided by itself.43 The “construct validity” of multiple personality is daringly close to being self-sealing. When the seal is torn only a little, to admit the patients from McLean Hospital, the problem is plain for all to see.

I shall conclude with one other aspect of measurement. The DES is proposed as a screening instrument, comparable to routine screening for an infectious disease using a blood sample. Suppose we are told that an instrument is right 99 percent of the time—in the following sense. Ninety-nine percent of diseased people who are tested show up as diseased; 99 percent of well people who are tested show up as well. On hearing that the screen picks me as diseased, I am mortally afraid. But if the disease is very rare in the whole population of which I am a member, and I am not a member of a more vulnerable subpopulation, my fear may be unjustified. For suppose only one person in 100,000 has the disease. Then after the instrument has surveyed a million people, it will have picked 99 percent of the sick people as sick (that is, 10 people) and 1 percent of the remaining 999,990 will also be picked as diseased. This means that in total about 9,999 healthy people are found to be ill. Hence in this extreme case, the screen picks about 10,009 as diseased, but only 10 of them actually are. Nearly all the picks are false positives. Exactly this argument was used against universal indiscriminate AIDS screening.44

When we want to understand a test result, the bottom line is not “the probability that the test picks a person as ill, when they are ill.” Instead we want to know the probability that a person is ill, given that the test says she is ill. In symbols the bottom line is not

(1) Probability (test says person is ill / person is really ill)

but

(2) Probability (person is really ill / test says person is ill).

To calculate (2) we need to know the “base rate” of the disease in a chosen population, that is, the rate with which the illness does occur in the population. In a famous series of papers Amos Tversky and Daniel Kahneman showed that one of the most common fallacies, in thinking about probabilities, was failing to take the base rate into account.45

When the DES is used as a screening instrument, a high enough score is taken to indicate multiple personality. Carlson et al. urge a cutoff point of 30: score over 30, and the DES says you are a multiple. How good a screen is this? We can work out (2) using an elementary rule of probability. It requires three items: (a) the population being screened, (b) the base rate of multiple personality in that population, and (c) the ability of the screen to pick a multiple as a multiple, and the ability to pick a nonmultiple as a nonmultiple—in effect (1) above.

Carlson and her colleagues present such a calculation. They do not actually state (a), the population for which they are doing the calculation, but since their study is about psychiatric patients, it must be the population at present in psychiatric treatment (say, in the United States). Their data tell them (c), because they applied the DES to patients who were independently diagnosed. They found that 80 percent of diagnosed multiples score over 30, and 80 percent of nonmultiples score below 30. So now we have (a) and (c), and lack only (b), the base rate in the population of psychiatric patients.

Carlson et al. use a base rate of 5 percent, which means that one in twenty psychiatric patients is a multiple. They do not state where this figure comes from. This is the figure Ross expected and Fernando found preposterous. On the basis of this figure the probability of a psychiatric patient’s being a multiple, given a score above 30 on the DES, is 17 percent. The remaining 83 percent of patients picked as multiples are not multiples. This may not be troubling, since many of these false positives may have other dissociative problems such as post-traumatic stress disorder.

But where did this 5 percent figure come from?46 The majority of psychiatrists would very much doubt that 5 percent of psychiatric patients have multiple personalities. At McLean one patient of the selected 98 had multiple personality, but many psychiatrists would doubt that even a rate of 1 in 98, or 1 percent, is typical. With a base rate of 1 percent, it would follow that 94 percent of psychiatric patients screened as multiples are “false positives.” If we thought that the base rate for multiple personality were a good deal less than what is found at a hospital near Boston with a dissociative disorders unit, then we would expect almost all the people picked by the DES as multiples to be false positives.

My purpose has been only to show how the measurement of multiple personality legitimates multiple personality and turns it into an object of knowledge. It happens to have been easier than might be expected because of the way that statistics are so often used in psychology. We have long had a multitude of highly sophisticated statistical procedures. We now have many statistical software packages. Their power is incredible, but the pioneers of statistical inference would have mixed feelings, for they always insisted that people think before using a routine. In the old days routines took endless hours to apply, so one had to spend a lot of time thinking in order to justify using a routine. Now one enters data and presses a button. One result is that people seem to be cowed into not asking silly questions, such as: What hypothesis are you testing? What distribution is it that you say is not normal? What population are you talking about? Where did this base rate come from? Most important of all: Whose judgments do you use to calibrate scores on your questionnaires? Are those judgments generally agreed to by the qualified experts in the entire community?

The increasingly massive array of “instruments” for assessing multiple personality has a primary function that is seldom acknowledged. They make the field of multiple personality look like the rest of empirical psychology, and thereby turn the study of the disorder into an objective science. Many sociologists of science, and a few philosophers, have recently welcomed the idea that scientific knowledge is a social construction. They contend that science does not discover facts but constructs them. I am not arguing such a case in the present chapter. More traditional students of scientific method, variously called logical empiricists or scientific realists, hold that scientists aim at discovering facts, at finding out the truth. It is precisely these traditional thinkers who would be thunderstruck at the practices I have just described.

I have focused on the continuum hypothesis about dissociation because, as Putnam saw from the start, it is absolutely fundamental. Multiple personality may be an important object for psychiatric study almost no matter how rare it is. Even if the incidence rate among psychiatric inpatients were not 5 percent but .05 percent, it is still a striking phenomenon. The present theory invokes a cause, child abuse, and invokes a continuum of dissociative experiences. “Dissociation” is a technical word, put to use in psychology by Pierre Janet, and almost immediately dropped by him. It caught on. But there is not one definite thing that the word “dissociation” was invented to name. It is not as if Janet designated something, leaving us with the task of finding out what it is. On the contrary, we can use the word “dissociation” in any way that is useful. But a problem arises when it seems to many observers that “dissociative experiences” is used to refer to a great many experiences that have singularly little in common with each other. The whole machinery of the DES has been constructed—quite literally constructed—in order to make it appear to be an objective fact that there is a continuum of one and the same kind of experience, the dissociative experience. Once one dismantles that construction, it is not so clear that there is one kind of experience there to study. Until 1994 there was an International Society for the Study of Multiple Personality and Dissociation. It had something to study, namely, multiple personality. But now we have the International Society for the Study of Dissociation. It is less than clear that there is a distinct object, named “dissociation,” there to be studied.