Research on the Effectiveness of Psychotherapy

Healing practices have been endemic to every human civilization since earliest times (A. K. Shapiro & Shapiro, 1997; Wampold & Imel, 2015). Whether these practices have truly healed or not is a matter of some interest. Some have claimed that many healing practices actually have been harmful. Acupuncture in ancient China may have killed many due to homologous serum jaundice likely introduced through unsterilized needles, and George Washington may have died as a result of dehydrating treatments (e.g., bloodletting) administered for a respiratory condition (A. K. Shapiro & Shapiro, 1997; Wampold, 2001a). The development of the randomized control group in the early 20th century allowed for the rigorous test of the efficacy of various practices in agriculture, education, medicine, and psychology (Danziger, 1990; Gehan & Lemak, 1994; A. K. Shapiro & Shapiro, 1997). With regard to healing practices, only modern medicine and psychotherapy have been subjected to systematic controlled research to investigate the effectiveness of interventions (Wampold, 2007); both practices have been shown to be effective, distinguishing them as the only practices scientifically shown to be effective. This chapter reviews the literature showing that psychotherapy is effective indeed. The chapter then addressed the more difficult question of whether some approaches to psychotherapy are more effective than others. It ends with a discussion about research on psychotherapy in the real world.

DOES PSYCHOTHERAPY WORK?

In the first decades of psychotherapy, its benefits were established primarily through the presentation of successful cases by means of a certain treatment or technique. Psychoanalysis was supported by several perspicuous cases treated by Sigmund Freud, including “Anna O.,” “Dora,” “Frau N.,” “Little Hans,” “Ratman,” and “Wolfman.” During the first half of the 20th century, research designs were developed to test the efficacy of various interventions and, more importantly, were applied in the area of medicine. The randomized double-blind, placebo-controlled group design became the gold standard for testing the efficacy of medicines (Gehan & Lemak, 1994; A. K. Shapiro & Shapiro, 1997); indeed, the design has been required by the Food and Drug Administration to approved drugs for 3 decades. Not long after the development of this design, it was recommended that it be used to investigate the effects of psychotherapy (Rosenthal & Frank, 1956). Accordingly, the use of randomized designs in psychotherapy research has increased since.

At the origins of the randomized design, our understanding of the effects of the psychotherapy was, to say the least, ambiguous. On the one hand, Hans Eysenck, in a series of books and articles, made the claim that psychotherapy was not beneficial and likely was harmful (Eysenck, 1952, 1961, 1966; Wampold, 2001b). His claim was based on an examination of the rate of spontaneous remission of untreated patients that was derived from two samples: (a) “severe neurotics” receiving institutional custodial care and (b) disability claims of “psychoneurotics.” On comparing rates of recovery in psychotherapy studies to rates of spontaneous remission, he found that recovery rates in psychotherapy groups were smaller than the rates of spontaneous remission. Indeed, he claimed, “There . . . appears to be an inverse correlation between recovery and psychotherapy; the more psychotherapy, the smaller the recovery rate” (Eysenck, 1952, p. 322). On the other hand, psychotherapy researchers who have reviewed the literature on psychotherapy reached exactly the opposite conclusion: Psychotherapy was indeed effective (Bergin, 1971; Luborsky, Singer, & Luborsky, 1975; Meltzoff & Kornreich, 1970). Needless to say, the controversy has not been particularly beneficial to a field that is trying to establish its legitimacy as a healing practice.

Every controversy has its subtext. At the time of this controversy, the predominant approach to psychotherapy was either psychoanalytic or eclectic. Behavioral approaches to psychotherapy were emerging and struggling to be accepted as legitimate. Eysenck and others (e.g., Rachman, 1971) were making the claim that behavior therapy (as opposed to psychotherapy) was scientific and, consequently, was superior to other approaches. In his later analyses, Eysenck (1961, 1966) made the claim that whereas psychotherapy (i.e., psychoanalysis and eclectic therapy) was ineffective or harmful, behavior therapy was remarkably effective. The historical context of Eysenck’s claims has been discussed in some detail (M. L. Smith, Glass, & Miller, 1980; Wampold & Imel, 2015).

A major event in the debate about the effects of psychotherapy was the development of meta-analysis as a means to objectively synthesize the results of many studies. In 1977, M. L. Smith and Glass (Smith & Glass, 1977; see also M. L. Smith, Glass, & Miller, 1980) collected all controlled psychotherapy outcome research, calculated an effect size for each treatment, and averaged the effect sizes to estimate the degree to which the outcomes of clients receiving psychotherapy exceeded the outcomes of clients who did not receive psychotherapy. The intricacies of meta-analysis need not concern us here, but it is important to note that one of lasting contributions of these earliest years of meta-analysis was that Glass (1976) developed an index of the effect size that was standardized; that is, it could be calculated, regardless of the outcome measures used in the study. The measure of effect size indexed the degree to which the treatment group exceeded the control group in standard deviation units. In their comprehensive meta-analysis, M. L. Smith and Glass determined that the outcomes of those clients receiving psychotherapy were superior to the outcomes of those not receiving any treatment by .8 standard deviation units. As the continuing discussion makes clear, an effect size of .8 is remarkably large!

To the behaviorists, the M. L. Smith and Glass (1977) finding was disturbing because they found not only that psychotherapy was remarkably effective but also that behavior therapy was not substantially superior to other psychotherapies (a question considered in the next section). Consequently, the debate about the validity of the M. L. Smith and Glass meta-analysis became heated with criticisms of meta-analysis as a method and with as M. L. Smith and Glass’s application of the method. One cause of the differences among the pre–meta-analytic reviews and the M. L. Smith and Glass meta-analysis concerned the decisions about which studies to include or exclude. All of the earlier studies made an attempt to exclude poorly designed studies, but unsurprisingly, there were distinct differences in opinion about the matter of quality. It is interesting to observe that the studies that supported the opposition in these debates often were excluded because of their (purportedly) poor quality (M. L. Smith et al., 1980). M. L. Smith and Glass attempted to address that issue by including all studies, regardless of quality, and then determining whether the quality of the study moderated the effect size. In that way, the question of whether better designed studies favored a particular treatment (e.g., behavioral) could be answered (e.g., better designed studies did not produce greater effects for behavioral treatments). Nevertheless, M. L. Smith and Glass were criticized for omitting several important behavioral studies and for including studies that were poorly designed (Andrews & Harvey, 1981; Eysenck, 1978, 1984; Landman & Dawes, 1982). Moreover, critics claimed that many of the participants in the studies meta-analyzed were mildly distressed and were not seeking treatment (e.g., psychology undergraduates). However, when those issues were addressed subsequently by critics of M. L. Smith and Glass in meta-analyses of good quality studies that treated clients with significant psychological problems who were seeking treatment, the results were extraordinarily consistent with the M. L. Smith and Glass result of an effect size in the neighborhood of .8 (e.g., Andrews & Harvey, 1981; Dawes, 1994). Additional meta-analyses of psychotherapy outcomes have produced additional evidence that the effects of psychotherapy vis-à-vis no treatment yield effects in the neighborhood of .80 (Wampold, 2001b; Wampold & Imel, 2015).

We now turn to the question of whether an effect for psychotherapy of .80 is compelling. Is psychotherapy marginally, moderately, or extraordinarily effective? Of course, such evaluations are subjective, but recasting the effect size in various ways and comparing it with effects produced in other contexts provide a good sense of the size of this effect. An effect of .80 is equivalent to saying that 13% of the variability in outcomes is determined by whether one receives psychotherapy versus one does not (Wampold & Imel, 2015); this is not particularly comforting on the face of it because it means that 87% of the variability in mental health outcomes is not associated with whether a treatment is received or not! But conclusions should not be reached too quickly. An effect size of .80 also means that the average client receiving psychotherapy will be better off than 79% of those who do not receive treatment (M. L. Smith et al., 1980; Wampold & Imel, 2015); that is a commonsense interpretation that is particularly impressive. Most people in distress would be willing to receive a treatment with such odds.

Not only is psychotherapy effective, but it also appears that for most disorders, psychotherapy is as effective as pharmacological treatments (Barlow, Gorman, Shear, & Woods, 2000; Cuijpers et al., 2013; de Maat et al., 2008; Hollon, Stewart, & Strunk, 2006; Huhn et al., 2014; Imel, Malterer, McKay, & Wampold, 2008; Mitte, 2005; Mitte, Noack, Steil, & Hautzinger, 2005; Robinson, Berman, & Neimeyer, 1990; Spielmans, Berman, & Usitalo, 2011). Moreover, it appears that when psychotherapy and medications are withdrawn (i.e., the psychotherapy is terminated or the course of medication is finished), the effects of psychotherapy are longer lasting (Hollon et al., 2006) in that, at various times following the end of treatment, a greater number of clients who have been on medication relapse. It appears that psychotherapy provides clients with skills with coping with the world and with their disorder. Moreover, clients who have received previous courses of medication become resistant to additional course of medication, whereas they do not become resistant to additional course of cognitive therapy (Leykin et al., 2007).

It is a safe conclusion that as a general class of healing practices, psychotherapy is remarkable effective. In clinical trials, psychotherapy results in benefits that far exceed the benefits of those who do not get psychotherapy. Indeed, psychotherapy is more effective than many commonly used evidence-based medical practices, some of which have onerous side effects and are quite expensive (Wampold, 2007; Wampold & Imel, 2015). In addition, psychotherapy is as effective as medications for prevalent mental disorders, is longer lasting, and is less resistant to additional courses. This leads us to the question of whether some types of psychotherapy are more effective than others.

ARE SOME PSYCHOTHERAPIES MORE EFFECTIVE THAN OTHERS?

The brief history of psychotherapy presented earlier showed that (a) many psychotherapies have been developed over the years, (b) advocates of these psychotherapies make often make claims of superiority, (c) various schemes have been used to differentiate treatments based on the extent of the benefits experienced by clients, and (d) the debates about these issues have been contentious. Indeed, as discussed previously, claims of superiority of a particular psychotherapy has characterized the field from its origins, with Freudians arguing with other Freudians, with behaviorists criticizing psychoanalysis, and so on. Indeed, Eysenck’s (1952, 1961, 1966) claims of the ineffectiveness of psychotherapy were an attempt to show the superiority of behavioral therapy, based on the scientific principles of learning theory, to psychotherapy, based on mentalistic and unscientific principles. It is unsurprising that advocates of a treatment are convinced that their preferred treatment is as effective or more effective than other treatments; people advocating any claim generally are convinced of its worth. And, indeed, it is a good thing that those who develop a psychological treatment and those who practice it are enthusiastic supporters of the treatment, as discussed in the previous chapter.

Despite the flaws in many aspects of Eysenck’s (1952, 1961, 1966) claims, his reviews were the first to use evidence from measures of outcomes of psychotherapy to address the question of the relative efficacy of various forms of psychotherapy (Wampold, 2013). This section reviews the evidence, beginning with the meta-analyses of M. L. Smith and Glass (1977; M. L. Smith et al., 1980) to the present, to show a result that may be surprising to some: When treatments that are intended to be therapeutic are compared, few differences among treatments are evident. First, results for psychotherapy are presented generally, followed by results for specific disorders.

Psychotherapy, in General

M. L. Smith and Glass’s (1977) original meta-analysis addressed the question about which type of psychotherapy was most effective. Given that their meta-analysis was the most rigorous and comprehensive review of psychotherapy outcome up to that time, their evidence provided the most scientifically valid answer to the question. The strategy used in this meta-analysis was to classify each of the nearly 800 effects obtained from psychotherapy outcome research into one of 10 classes of therapy: They found that about 10% of the variability in effects was due to the type of therapy, which provides evidence that some types are more effective than others. Adlerian, rational emotive, systematic desensitization, and behavior modification were the most effective, having effect sizes in excess of .70. However, that analysis suffered from many problems, some of which M. L. Smith and Glass recognized and corrected.

A primary issue with M. L. Smith and Glass’s (1977) attempt to compare classes of treatments was that the effects in each class were derived primarily from comparisons of the treatment with a no-treatment control. Thus, the studies in each class (e.g., systematic desensitization and Adlerian psychotherapy) were derived from different studies—and the different studies involved different disorders or problems, different dependent measures, different types of clients, different degrees of quality of the study, and so forth. M. L. Smith et al. (1980) sought to statistically control for those differences by coding characteristics of the studies. It turned out that controlling for the reactivity of the outcome measures, which they defined as those measures that “reveal or closely parallel the obvious goals or valued outcomes of the therapist or experimenter” (M. L. Smith et al., 1980, p. 66), eliminated the differences among classes (i.e., studies with behavioral treatments used more reactive measures and had larger effects). That is, the advantage for some classes of treatment was associated with studies that used more reactive measures. M. L. Smith and Glass’s conclusion, after statistically controlling for reactivity of the measures and other variables, was that classes generally were equivalent. With regard to behavioral and dynamic therapies, they noted the following:

In the original uncorrected data, the behavioral therapies did enjoy an advantage in the magnitude of effect because of more highly reactive measures. Once this advantage was corrected, reliable differences between the two classes [i.e., behavioral and dynamic] disappeared. (p. 105)

M. L. Smith and Glass (1977) concluded, “Despite volumes devoted to the theoretical differences among different schools of psychotherapy, the results of research demonstrate negligible differences in the effects produced by different therapy types” (p. 760).

It was this finding—that behavioral treatments were clearly not more effective than other treatments—rather than the finding that psychotherapy was effective that instigated many of the criticisms of the M. L. Smith and Glass meta-analyses (Eysenck, 1978; Rachman & Wilson, 1980; Wampold, 2013; G. T. Wilson, 1982; G. T. Wilson & Rachman, 1983).

As recognized by M. L. Smith, Glass, and Miller (1980), the best way to control for most confounding variables was to aggregate only those studies that directly compared two treatments because the measures used, quality of the design, types of clients, disorder treated, and so forth would be equivalent for each comparison. M. L. Smith and Glass (1977) attempted to do that. Although the results were equivalent to their general conclusions, there were many problems with their analysis, the least of which was that they had few studies in their database that compared treatments fairly (Wampold & Imel, 2015).

D. A. Shapiro and Shapiro (1982a, 1982b) sought to address the confound problem by examining studies that directly compared two treatments and included the behavioral studies that M. L. Smith and Glass (1977) were criticized for omitting (Rachman & Wilson, 1980). All of the studies in the D. A. Shapiro and Shapiro meta-analysis also contained a no-treatment control, and they found an overall effect size that was consistent with the .80 found by M. L. Smith and Glass (Wampold, 2001b; Wampold & Imel, 2015). The results of that meta-analysis were complex because the direct comparisons between classes of treatments were few for some classes (e.g., no studies compared dynamic therapy with systematic desensitization, whereas 24 studies compared systematic desensitization with relaxation). Nevertheless, among classes of treatments, excluding treatments that had minimal aspects of a “real” psychotherapy, only two comparisons were significant out of 14 comparisons (i.e., cognitive therapy was superior to systematic desensitization, and mixed therapies were superior to systematic desensitization). It appears that the superiority of cognitive therapy to systematic desensitization was an anomaly because other meta-analyses have found that there were no differences among these two classes and that a large difference was found for researcher allegiance (J. S. Berman, Miller, & Massman, 1985).

One of the problems with the D. A. Shapiro and Shapiro (1982a, 1982b) meta-analysis was the classification of treatment into categories. The first issue was that it often is difficult to classify treatments into categories, and the agreement about that process was suspect (Baardseth et al., 2013; Wampold, Flückiger, et al., 2017; Wampold et al., 2010). The second issue was that the classification strategy prevented tests of the treatments within classes, even though such treatments could be quite different (e.g., Freudian psychoanalysis and short-term focused dynamic therapy would most likely be classified as “dynamic therapies”). The third issue was that pairwise comparisons among classes of therapies yielded many statistical tests (actually, if there are k classes of treatments, then there are k(k − 1)/2 comparisons, which is a large number of even moderate number of classes—e.g., six classes of treatments yield 15 statistical comparisons).

Wampold et al. (1997) devised a meta-analytic procedure that avoided the problems of classifying treatments, as well as other problems. They collected all comparisons of treatments that were intended to be therapeutic. The inclusion of such treatments was invoked to eliminate control group treatments that were designed to rule out particular common factors, such as a relationship with an empathic healer. These treatments, often called psychological placebos, alternative treatments, common factor controls, or supportive counseling, have no reasonable rationale that can be communicated to clients, therapists are proscribed from discussing certain topics, and the control treatment contains no ingredients that are based on psychological principles (see Wampold et al., 1997; Yulish et al., 2017). Including only treatments intended to be therapeutic eliminated the problems that occurred previously (D. A. Shapiro & Shapiro, 1982a, 1982b) when several classes of psychotherapies may have contained treatments that clearly were not “real” psychotherapies, which complicated interpretations of the D. A. Shapiro and Shapiro’s (1982a, 1982b) “mixed” and “minimal” classifications. It really does not make much sense to make claims about the efficacy of psychotherapy when treatments are included that would not qualify as psychotherapy, given the usual definition, as presented in Chapter 1.

Following the lead of the D. A. Shapiro and Shapiro (1982b) and the advice of Shadish and Sweeney (1991), Wampold et al. (1997) analyzed only studies that directly compared two or more psychotherapies intended to be therapeutic. They collected all such comparative studies published from 1970 to 1995 in the six premier journals on psychotherapy outcome research; that collection yielded 277 direct comparisons. The analyses of those direct comparisons created issues for the analysis: Wampold et al.’s (1997) primary solution was to examine the distribution of the effects, rather than their mean. If the dodo bird conjecture is true (see Chapter 2, this volume) and all psychotherapies intended to be therapeutic are equally effective, occasionally, due to sampling, a study will appear that demonstrates a relatively large difference between the two treatments; but, on the whole, most studies will reveal differences close to zero (i.e., some large effects will be present simply due to chance).

Wampold et al. (1997) found that when modeled in this way, most comparisons between treatments yielded effects close to zero, and the few larger effects were not unexpected, given the role of randomly sampling. That is to say, the obtained distribution of effects from those comparisons provided no evidence to suggest that some treatments were more effective than others (in more technical language, the effects were homogeneously distributed around zero). Marcus, O’Connell, Norris, and Sawaqdeh (2014) replicated and extended that meta-analysis; they found small differences among treatments, with cognitive behavior therapy (CBT) slightly superior to other treatments (but see Wampold, Flückiger, et al., 2017). However, the superiority of CBT was due to four studies that compared a treatment focused on the symptoms of a disorder with an unfocused treatment. It is becoming clear that treatments focused on the client’s problems produce more symptom reduction than do unfocused treatments (Yulish et al., 2017).

For some, the various meta-analyses of different types of psychotherapy provided evidence that the dodo was correct—“Everybody has won, and all must have prizes”—as Rosenzweig (1936, p. 412) suggested. Of course, to those who staunchly believed that some treatments were more effective than others (e.g., advocates of empirically supported treatments [ESTs]), this conclusion has been uncomfortable. In response to the question about whether ESTs are more effective than non-ESTs, Ollendick and King (2006) stated, “At some level, the answer to this question is patently obvious and resoundingly in the affirmative” (p. 308). The divergence of the conclusions from the evidence on the relative efficacy of different psychotherapies is well illustrated by a chapter that presented the pro and con arguments for the superiority of ESTs (Wampold, Ollendick, & King, 2006). Because this issue is far from settled, for those beginning to learn various approaches to psychotherapy, it seems unwarranted at this point in time to limit this endeavor to the ESTs.

There is an important and valid criticism of the meta-analyses of M. L. Smith and Glass (M. L. Smith & Glass, 1977; M. L. Smith et al., 1980), D. A. Shapiro and Shapiro (1982b), and Wampold et al. (1997). In each of these meta-analyses, studies were aggregated without regard to the disorder’s being treated, which

is akin to asking whether insulin or an antibiotic is better, without knowing the condition for which these treatments are to be given. . . . Alternatively, researchers should begin with a problem and ask how treatments compare in their effectiveness for that problem. (DeRubeis, Brotman, & Gibbons, 2005, p. 175)

This was a sentiment echoed by others (e.g., Crits-Christoph, 1997) and is not a criticism easily dismissed. The next section addresses this criticism.

For Specific Disorders

Due to space restraints, the literature for all diagnoses cannot be examined. Therefore, the review here is constrained to the most prevalent mental disorders—depression and anxiety disorders—and posttraumatic stress disorder (PTSD), substance use disorders, personality disorders, and childhood disorders.

Depression

By 1998, many treatments were designated as ESTs for depression, including behavior therapy, cognitive therapy, interpersonal therapy, brief dynamic therapy, reminiscence therapy (for geriatric populations), self-control therapy, and social problem-solving therapy (Chambless et al., 1998). Presently, the Society of Clinical Psychology (Division 12 of the American Psychological Association) lists on its website 13 psychological treatments with strong or modest research support (https://www.div12.org/psychological-treatments/disorders/depression) that span the array of theories, including cognitive, third-wave CBT, dynamic, humanistic, behavioral, and interpersonal therapy. Thus, it appears that a variety of treatments, based on a variety of theories, have been shown to be effective for the treatment of depression.

Meta-analyses of clinical trials of depression have consistently verified, with some qualifications, that all treatments of depression are equally effective. An early meta-analysis (Robinson et al., 1990) classified treatments into four categories: (a) cognitive; (b) behavioral; (c) cognitive–behavioral; and (d) verbal, which contained dynamic, humanistic, and experiential treatments. Generally, they found that behavioral, cognitive–behavioral, and cognitive treatments were superior to general verbal therapies and that cognitive–behavioral was superior to behavioral. Two critical issues—ones that were discussed in the previous section—complicated the interpretation of the results. First, many of the verbal therapies in those comparisons likely were not treatments that really were intended to be therapeutic; that is, some of the verbal therapies were, in actuality, control treatments meant to control for common factors, such as meeting with an empathic therapist. As has been discussed, this type of treatment typically does not have a cogent rationale for the treatment and offers few actions that practicing psychotherapists would consider to be therapeutic. A later meta-analysis found that cognitive therapy for depression was superior to “other” therapies, which were noncognitive, nonbehavioral treatments (Gloaguen, Cottraux, Cucherat, & Blackburn, 1998). However, many of the “other” treatments were not intended to be therapeutic; when these treatments were omitted, cognitive therapy was not superior to “other” treatments intended to be therapeutic (Wampold, Minami, Baskin, & Callen Tierney, 2002).

The second, and not unrelated, issue was researcher allegiance. It is well documented that the allegiance of the researcher exerts quite large and robust effects on the results of studies; that is, studies conducted by an advocate of a particular treatment consistently find effects for that particular treatment (J. S. Berman et al., 1985; Luborsky et al., 1999; Munder, Brütsch, Leonhart, Gerger, & Barth, 2013; Munder, Flückiger, Gerger, Wampold, & Barth, 2012; Munder, Gerger, Trelle, & Barth, 2011). The explanation for allegiance effects is somewhat ambiguous, leaving the following question unanswered: How does the allegiance of the researcher translate into larger effects for the favored treatment? There are several possibilities, including translation of researcher allegiance to therapist allegiance (e.g., the therapists in the study know which is the preferred treatment, as would be the case when the researcher trains and supervises the therapists), that the study design favors one treatment (e.g., the preferred treatment has a greater dose of therapy), or that the comparison treatment is poorly constructed (e.g., the therapists in the comparison are proscribed from commonly used therapeutic actions; Munder, Brütsch, et al., 2013; Munder, Flückiger, et al., 2012; Munder, Gerger, et al., 2011). The poorly constructed comparison treatment explanation results in the inclusion of treatments not intended to be therapeutic, which often populates classes of treatments labeled as “verbal therapies” or “other therapies.” When Robinson et al. (1990) took into account the allegiance of the researcher, all of the differences among the various categories were not significantly different from zero. That is to say, allegiance accounted for all of the differences among the classes of treatments of depression.

Subsequent meta-analyses of treatments of depression have confirmed that, in general, there are no differences among treatments for depression (Barth et al., 2013; Cuijpers, Driessen, et al., 2012; Cuijpers, van Straten, Andersson, & van Oppen, 2008; Driessen et al., 2017; Wampold, Minami, et al., 2002). However, some studies have shown the superiority of two treatments (i.e., interpersonal therapy and behavioral activation) to CBT for severe depression (Dimidjian et al., 2006; Elkin et al., 1995), although the sizes of the effects were not large.

Many have argued that it is unsurprising that treatments for depression generally are equivalent because acute depression is responsive to interventions. On the other hand, anxiety disorders often are used to make claims about the superiority of some treatments over others.

Anxiety Disorders

The progression of meta-analyses—in which various problems in early ones were addressed in later ones—that was seen in depression is not the case for anxiety disorders, in general. However, there is sufficient evidence to make some tentative conclusions. As this section shows later, there is insufficient evidence to conclude that any particular treatment for any anxiety disorder clearly is superior to any other treatment, although some treatments have sufficient evidence to conclude that they are effective.

In 2001, Wampold (2001b) reviewed all of the meta-analyses for the treatment of anxiety disorders and found little evidence for the superiority of any treatment. However, for most anxiety disorders, there was an insufficient number of direct comparisons among treatments intended to be therapeutic to be conclusive. Here, the research since that time is reviewed briefly.

In 2010, Tolin (2010) conducted a meta-analysis of CBT versus other psychotherapies by examining studies in which two or more bona fide treatments were directly compared, a commendable feature, as discussed earlier. Chief among the findings was that CBT was superior to other therapies for anxiety and depression. The findings for depression were surprising, given they contradicted the evidence reviewed earlier. The results for anxiety were based on only four dated studies (i.e., published in 1967, 1972, 1994, and 2001). The lack of direct comparisons of CBT and other treatments for anxiety examined by Tolin (2010) was due to the liberal definition of what constituted CBT (e.g., eye movement desensitization and reprocessing [EMDR] was classified as CBT). Baardseth et al. (2013) reviewed all direct comparisons of treatments for anxiety and classified a treatment as CBT or not based on a survey of CBT experts; they found that CBT was not superior to other treatments. Tolin (2014) reanalyzed the anxiety trials and again claimed the superiority of CBT, but when an error was corrected, the evidence for the superiority of CBT disappeared (Tolin, 2015).

A persistent claim is that CBT is superior to other treatments for anxiety disorders. The basis for such claims is evident only on disorder-specific symptoms measured at termination for those who completed treatment, and when the effects were detected, they were small; moreover, there were methodological issues with the claims for CBT superiority (Wampold, Flückiger, et al., 2017). However, for many anxiety disorders, including generalized anxiety disorder, obsessive-compulsive disorder (note: no longer an anxiety disorder in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders; American Psychiatric Association, 2013), and panic disorder, various behavioral and cognitive–behavioral treatments are the most widely developed and tested treatments. That is to say, the CBT treatments have not been shown to be superior to other treatments intended to be therapeutic, but they have been shown, in general, to be superior to no-treatment controls or to other types of controls, such as psychological placebo controls. There seems to be a reliable finding that CBT is superior to relaxation for some anxiety disorders (Montero-Marin, Garcia-Campayo, López-Montoyo, Zabaleta-del-Olmo, & Cuijpers, 2017), although the effects are small and must be considered in light of the fact that relaxation was not intended to be therapeutic for some disorders and typically was not focused on reducing particular anxiety symptoms (see Yulish et al., 2017).

Posttraumatic Stress Disorder

PTSD, which no longer is classified as an anxiety disorder, is unusual in that it is the only mental disorder that requires an event to have taken place, in this case, a traumatic event (i.e., death, serious injury, or sexual violence). Perhaps due to prevalence and importance for military personnel, veterans, and victims of interpersonal and sexual violence, many clinical trials have involved an array of treatments for PTSD, including CBT (Foa, Hembree, et al., 2005; Foa, Rothbaum, Riggs, & Murdock, 1991), EMDR (Rothbaum, Astin, & Marsteller, 2005; F. Shapiro, 1989), cognitive therapy without exposure (Tarrier et al., 1999), hypnotherapy (Brom, Kleber, & Defares, 1989), psychodynamic therapy (Brom et al., 1989), interpersonal therapy (Markowitz et al., 2015), and present-centered therapy (McDonagh et al., 2005). Clearly, the theoretical rationales of these treatments have varied widely, including treatments based on conditioning, cognitive restructuring, psychodynamics, and neuropsychology paradigms. Some of the treatments were intentionally designed to exclude exposure and/or cognitive restructuring (e.g., McDonagh et al., 2005; Tarrier et al., 1999), some are what several researchers have characterized as scientifically unjustified (e.g., EMDR; Herbert et al., 2000), and some are based on “old-fashioned” mentalistic constructs (e.g., hypnotherapy and dynamic therapies). Because PTSD is a disorder that it attributable to a discrete event or series of events, it would seem that treatments based on a scientific psychological explanation would be developed and that these treatments would be more effective than other treatments. Nevertheless, Benish, Imel, and Wampold (2008) used the meta-analytic methods of Wampold et al. (1997) to aggregate effects from all studies that directly compared two or more treatments intended to be therapeutic for PTSD. Their findings for PTSD symptom measures and for all measures was that there was little evidence that treatment differences existed; and if they did, the effects were small. Powers, Halpern, Ferenschak, Gillihan, and Foa (2010) meta-analytically found that prolonged exposure, the most thoroughly tested CBT treatment for PTSD, was not superior to other treatments intended to be therapeutic. It appears that the current evidence for the treatment of PTSD does not support the conclusions that one particular treatment is superior to any other. Given the variety of treatments studied, a conclusion could be made that all treatments for PTSD that are delivered by competent therapists who believe the treatment will be effective, that have a cogent theoretical rationale, and that are for clients who seek out treatment are equally effective.

Substance Use Disorders

Treatments for substance use disorders span a wide range of treatments, including CBT, motivational interviewing, 12-step programs, and social skills training. Well-designed trials have found few differences among various treatments (Project Match Research Group, 1997). Again, the most recent meta-analysis that examined direct comparisons of treatments for alcohol use disorders found no differences among this wide array of treatments for either alcohol use measures or abstinence measures (Imel, Wampold, Miller, & Fleming, 2008; W. R. Miller, 2016), a result not inconsistent with previous meta-analyses in this area.

Personality Disorders

The most widely known systematic treatment for a personality disorder, developed by Linehan (1993), is dialectical behavior therapy (DBT) for borderline personality disorder. DBT, a third-wave behavioral treatment, contains cognitive-behavioral components and mindfulness techniques to regulate emotion, provide client support, and increase therapist self-efficacy and skills to treat clients with borderline personality features. DBT has been shown to be effective relative to various control conditions, including treatment-as-usual (TAU) by expert community therapists (Linehan et al., 2006). However, psychodynamic theorists have developed several treatments for personality disorders (Clarkin, Levy, Lenzenweger, & Kernberg, 2007). In a comparison of DBT with transference-focused psychotherapy, a psychodynamic therapy, it was found that both of these structured treatments were effective across multiple outcome domains; however, transference-focused psychotherapy was superior to DBT in terms of impulsivity, anger, irritability, and aggressiveness (Clarkin et al., 2007). Although there are relatively few studies of treatment for personality disorders, particularly compared with mood disorders, Leichsenring and Leibing (2003) conducted a meta-analysis of psychodynamic treatments and CBT for personality disorders, and found that the effects for psychodynamic generally were as large or larger than the effects for CBT. However, many of the studies in this meta-analysis were not well controlled, and few studies directly compared CBT with psychodynamic therapies. Budge et al. (2013), using the meta-analytic methods of Wampold et al. (1997), did find that some treatments for personality disorders were more effective than others, although the results were due to the findings in two studies that one type of dynamic therapy was superior to another type. It does not appear that any one particular treatment for personality disorders is more effective than another, although it does appear that psychodynamic treatments hold their own and even may be superior to CBT.

Childhood Disorders

A number of meta-analyses have been conducted on psychotherapies for children. It is difficult to compare the effect obtained for treatment of children with effects for adults because the manner in which the effect sizes are calculated differ, partly due to study design and the advancements in statistical methods over the years. Nevertheless, it appears that psychotherapy is effective with children, although perhaps not as effective as with adults (Weisz, McCarty, & Valeri, 2006; Weisz, Weiss, Han, Granger, & Morton, 1995). Some have debated whether cognitive and behavioral treatments are more effective than other treatments for children. In 1995, Weisz et al. conducted a meta-analysis and concluded that behavioral treatments were superior to nonbehavioral treatments for children. It was claimed that this superiority may be artifactual, but on examination of one such artifact, quality of the studies, Weiss and Weisz (1995) found that it was not a threat to the superiority of behavioral treatments. A later meta-analysis found that CBT was not superior to noncognitive treatments for children with depression (Weisz, McCarty, et al., 2006). The lack of direct comparisons in these meta-analyses poses a threat to the validity of the conclusions, however—a problem that some recent meta-analyses haves addressed. Spielmans, Pasek, and McFall (2007) aggregated the effects for direct comparisons of CBT and other treatments for children with depression and anxiety. They found that CBT was superior to treatments not intended to be therapeutic but was not superior to other treatments when those treatments were intended to be therapeutic for the disorder. S. D. Miller, Wampold, and Varhely (2008) analyzed all studies between 1980 and 2005 that directly compared two treatments intended to be therapeutic for children with depression, anxiety, conduct disorder, and attention-deficit/hyperactivity disorder. They found that the effects were not homogeneously distributed about zero, in contrast to Wampold et al. (1997), who indicated that perhaps some treatments were more effective than others. However, the differences among treatments were explained completely by researcher allegiance: Studies produced effects in favor of one treatment when the researcher had an allegiance to the treatment. The effect for differences detected by S. D. Miller, Duncan, and Hubble (2007) was at most, very small and similar to that found for adults. As is the case for adults, it does not appear that any one approach to treating children is more effective than others.

Relative Efficacy Conclusions

It appears that, in general, and for specific disorders, no treatment has been consistently shown to be superior to any other treatment intended to be therapeutic. There are limitations to this conclusion, however. Some demonstrations, albeit weak, have shown that some treatments are superior. For example, in a meta-analysis of five studies, Siev and Chambless (2007) found that CBT was superior to relaxation for panic disorder with agoraphobia on some outcome domains. CBT for severe depression seems to be less effective than some alternatives (i.e., behavioral activation and interpersonal therapy). Psychodynamic treatment seems to be more effective than a CBT variant for borderline personality disorder, at least in one trial, and for personality disorders, in general, in a meta-analysis. However, when contrasted with the many studies and many meta-analyses that have shown no differences among treatments, it is difficult to make much to do about the small differences among treatments that have been found.

PSYCHOTHERAPY IN THE REAL WORLD

Most of the evidence about psychotherapy reviewed in this chapter has been derived from clinical trials. Because of the nature of the delivery of mental health services in naturalistic settings, including the confidential nature of the service and the relative autonomy afforded professionals, for a long time, little has been known about services provided in practice settings. This is beginning to change as more systems of care are beginning to assess outcomes and various national surveys have been conducted to assess treatment of mental disorders. The major findings of this service research are summarized briefly here.

The most important question is whether psychotherapy delivered in naturalistic settings is effective. Although reviews of controlled research have indicated that psychotherapy is remarkably effective, there is no guarantee that it works as well when delivered in real-world practice settings. Three strategies address this issue. The first is to assess the degree to which treatments delivered in controlled research resemble that which would be delivered in practice; this is sometimes referred to as clinical representativeness. In general, it has been found that the degree to which a treatment is representative of clinic settings is unrelated to the effect size produced; that is, those treatments that resemble treatments in practice are no less effective than those delivered in highly controlled laboratory settings (Shadish, Matt, et al., 1997; Shadish, Navarro, Matt, & Phillips, 2000).

The second strategy is to implement a treatment in a field setting and compare outcomes with TAU (sometimes called usual care or standard care). Typically, the treatment imported to the field setting is an EST or an evidence-based treatment (EBT) developed and tested in more controlled conditions. The hypothesis of TAU studies is that the quality of service would be improved by transporting ESTs to field settings (Minami & Wampold, 2008). For a number of disorders for both children and adults, studies have found that ESTs or EBTs delivered in field settings produce better outcomes than TAUs (Addis et al., 2004; Budge et al., 2013; Linehan et al., 2006; Weisz, Jensen-Doss, & Hawley, 2006). However, there are a number of caveats here. Often, the therapists in the EST or EBT receive extra supervision and training, often by the author who also developed the EST or EBT. In other cases, the TAU is not a psychotherapy (e.g., is a support group) and involves much less contact with the client. Recent reviews have established that TAU provides less service or no service at all, that the therapists in the TAU condition have less training and supervision, and that TAU is inferior in several other ways (Budge et al., 2013; Spielmans, Gatlin, & McFall, 2010; Wampold et al., 2011). Given the structural inferiority of TAU conditions, it is difficult to make definitive conclusions about the superiority of EST or EBTs; however, from these reviews, it appears that any superiority is small and may be due to these structural aspects of the studies.

The third strategy to assess outcomes in clinical settings is known as benchmarking. The idea of benchmarking is to calculate the effect of psychotherapy in clinical trials (e.g., the effect from pre- to posttherapy) and then calculate the same effect in a naturalistic setting. In an early benchmarking study, Weersing and Weisz (2002) calculated the benchmark via meta-analytic methods and found that TAU for 67 children produced effects that were closer to effects of no-treatment controls than for treatments in clinical trials. Unfortunately, benchmarking requires large samples to provide reliable estimates, and the methods to compare effects are complex.

Minami and colleagues developed benchmarking strategies (Minami, Serlin, Wampold, Kircher, & Brown, 2008) and meta-analytically created benchmarks for the treatment of depression in adults (Minami, Wampold, Serlin, Kircher, & Brown, 2007). Using data from a managed care environment in which outcomes were assessed, based on several thousand clients who had a diagnosis of depression, the results obtained in practice met or exceeded the benchmarks established in clinical trials (Minami, Wampold, et al., 2008).

It appears that treatment in practice is effective, if not as effective, as treatments delivered in clinical trials. There is some evidence that TAU might be inferior to ESTs or EBTs transported to practice settings, but the differences are small and the extra training, supervision, and increased intensity of treatment render the comparisons difficult to interpret. However, the studies that have shown that ESTs or EBTs are superior to some TAUs raise the possibility that some treatments in practice are more effective than others: Is the dodo bird conclusion applicable to field settings? Again, evidence from field settings is difficult to obtain, but efforts have been made to address this question in psychotherapy provided in the context of primary care in the National Health Service (NHS) in the United Kingdom, where outcomes are assessed routinely. Providers have indicated whether they deliver CBT, person-centered, or psychodynamic therapy. In two studies with large samples, the outcomes produced by providers generally were equivalent across the three treatments (Stiles, Barkham, Mellor-Clark, & Connell, 2008; Stiles, Barkham, Twigg, Mellor-Clark, & Cooper, 2006). In another large sample from the NHS, Pybis, Saxon, Hill, and Barkham (2017) found that CBT and generic counseling were equivalent, although the effects were achieved with fewer session for the generic counseling. That finding suggests that the general equivalence of treatments found in clinical trials is also true in practice.

In many quarters, there is a sense that the dose of therapy needs to be limited because clients will tend to use more psychotherapy than is necessary to treat the disorder. However, the evidence has not supported this view. In the aforementioned benchmarking study (Minami, Wampold, et al., 2008), the clients, whose service was not limited used, on average, nine sessions to meet the depression benchmarks, whereas the average number of sessions in clinical trials used to create the benchmarks was 16 weeks (Minami, Wampold, et al., 2008). That is to say, not only is psychotherapy delivered in practice effective, but it also is efficient! Stiles, Barkham, Connell, and Mellor-Clark (2008) found that clients and therapists appropriately adjusted the length of therapy to fit clients’ needs, and when clients made sufficient improvement, they terminated therapy (see also Baldwin, Berkeljon, Atkins, Olsen, & Nielsen, 2009; Owen et al., 2015).

CONCLUSION

Although, historically, there have been debates about the effects of psychotherapy, the research clearly demonstrates that psychotherapy is a remarkably effective treatment—more effective than many medical practices and as effective as medications for most mental disorders. Nevertheless, it appears, with some possible exceptions, that treatments intended to be therapeutic generally are equally effective across disorders and for specific disorders. In general, no one theory produces superior outcomes to any other. It also appears that psychotherapy delivered in practice settings is effective—as effective or nearly as effective as psychotherapy provided in clinical trials. Although it is now well established that psychotherapy is remarkably effective in clinical trials and in practice, it is critical to know how psychotherapy works and how services can be improved—the topic of the next chapter.