Evaluation and Other Applied Research

Which appeals to you more: research on programs with obvious practical applications to human problems or research undertaken to better understand life’s many mysteries with little or no practical utility? If helping people is the main reason you are motivated to learn research methods, this chapter should be of special interest.

In the early 1970s, a program for curbing juvenile delinquency was initiated in a prison in Rahway, New Jersey, that came to be known as Scared Straight.¹ It received much publicity and inspired similar programs in many other states (Miller & Hoelter, 1979) and even other countries (Homant & Osowski, 1982, 55; Lewis, 1983, 210). Scared Straight targets first-time nonviolent offenders in their early to mid-teens (Finckenauer, 1982). The rationale behind the program is that many delinquents are oblivious to the severe legal consequences awaiting persistent criminal conduct. And perhaps these consequences can be made clearer if young offenders visit an adult prison, and receive a down-to-earth talk from inmates about what life is like behind bars.

In the typical Scared Straight program, fifteen to twenty young delinquents board a bus and are driven to a maximum security prison. For an hour or so in the morning, they get a fairly standard tour of the prison. After lunch, the youth are taken to a room where they are confronted by four or five shouting, cursing prison inmates chosen for their no-nonsense attitudes and for their skills in describing the brutal realities of prison life (Lundman, 1984, 136; Waters & Wilson, 1979). The couple of hours with the inmates typically ends with stern warnings from the inmates that the youngsters themselves can expect to be behind bars in a few more years unless they change their ways. It is common for the youngsters, clowning and jovial in the morning, to be sobbing before the session with the inmates is over.

Does the Scared Straight program work? An American television documentary hailed it as ‘‘90 percent successful,’’ and showed that most of the participating young believed that the experience made a big difference in their lives (Finckenauer, 1982, 211; Rice, 1980; Waters & Wilson, 1979). This type of anecdotal evidence is known as testimonial evidence (i.e., evidence based on impressionistic statements made by people closely associated with a program or treatment strategy).

Although certainly worth noting, especially in the initial stages of a program’s evaluation, testimonial evidence can be unreliable and sometimes very misleading (Dewsbury, 1984, 184). Why? Not only are there no control subjects in testimonial evidence, but the participants in treatment programs often give favorable opinions even when the programs are shown in scientific terms to be ineffective (Peele, 1983, 41). Furthermore, for programs designed to reform delinquents, enthusiasm about whatever treatment is being administered is one of the criteria used by corrections officials to decide who is ready for release. Sensing that this is part of the game that must be played, participants in Scared Straight could be misleading officials and even themselves about the program’s effectiveness.

Before describing findings from the research on Scared Straight, stop for a moment and think about how to objectively answer the following question: Does Scared Straight work? First, one must define ‘‘work’’ in a way that can be measured objectively. In this regard, most people agree that the main goal of correctional treatment is to reduce subsequent arrests following prison release (Petersilia & Turner, 1991). Therefore, rearrest rates could be used to operationalize the dependent variable.

The next major step might be to find a sizable number of delinquents and randomly assign half of them to a Scared Straight experience and the other half to some sort of control condition (e.g., take them fishing for a day). If random assignment is not possible, one might use a quasi-experimental design involving two carefully matched groups of delinquents.

Thus far, six studies of Scared Straight have been conducted. One of these studies reported that participants had significantly lower rearrest rates than a comparison group (Langer, 1981). A second study found no difference between the exposure and comparison groups (Vreeland, 1981). A third reported significantly higher rearrest rates among the youths with the Scared Straight experience compared with those without it (Finkenauer, 1982, 134). In a fourth study, results were presented separately by sex. For females, no significant effects were detected, while males with the Scared Straight experience had significantly higher rearrest rates than those without the experience (Buckner & Chesney-Lind, 1983).

The remaining two studies of Scared Straight involved after-only experimental designs. One study made its assessment after a six-month follow-up (Yarborough, 1979), and the other after twelve months (Lewis, 1983). Both studies found no significant differences in overall recidivism rates between the experimentals and the controls. In summary, only one of the six studies of the effects of Scared Straight found the program to be effective in reducing recidivism rates. Additional research may still be worth undertaking, but at this point most scientific evidence does not support the view that Scared Straight reduces the probability of subsequent rearrest (McCord, 1992, 230). Notice that this conclusion stands in stark contrast to nearly all of the anecdotal evidence.

Scientific literature contains numerous examples of empirical research confirming anecdotal impressions, but, in the case of Scared Straight, the bulk of the scientific evidence is contrary to all the impressions of its effectiveness in reducing future offending among participants. With this example as a backdrop, it is useful to make a distinction between two types of research.

CONCEPTUALIZING EVALUATION RESEARCH

Evaluation research is concerned with empirically documenting that programs designed to create change actually do so. In the social sciences, this means that (1) some sort of human problem is identified, (2) a remedial program is conceptualized and then implemented to deal with the problem, and (3) an assessment is made as to whether the program accomplished its goal of reducing the problem.

Nearly all evaluation research attempts to answer causal questions. As emphasized in chapter 13, the best way to answer causal questions is through the use of controlled experiments. Thus, the ‘‘gold standard’’ for conducting an evaluation of the effectiveness of a program undertaken to alleviate a human problem is controlled experimentation (Glaser, 1978, 88; Sheldon & Parke, 1975, 695). The essential features of the program become the independent variable, and the hoped-for outcome constitutes the dependent variable. If controlled experimentation is not feasible, researchers then typically fall back on some sort of quasi-experimental design.

TERMINOLOGY SURROUNDING EVALUATION RESEARCH

Fundamental to the concept of evaluation research is the distinction between basic and applied research. Basic (or pure) research is undertaken to expand scientific knowledge in some area, often with no practical goal in mind. Applied research, on the other hand, is for the purpose of finding solutions to an identifiable problem (Rossi & Freeman, 1989, 420; Rottenberg, 1988, 390). As a major type of applied research, evaluation research is designed to assess how well a particular program designed to help alleviate an identified problem actually accomplished its objective. The research on the effectiveness of the Scared Straight program for reducing reoffending among delinquents is an example of evaluation research.

While the distinction between applied and basic research is worth making, there are two reasons for recognizing that the distinction can sometimes be blurred. First, some research serves both practical and pure objectives, and thus can fit into either category (Rossi & Freeman, 1989, 420). Second, much of today’s applied research is built on the foundations laid years earlier by basic research.

In program evaluation, research subjects are often referred to by other names such as clients, patients, or program participants. This renaming reflects the emphasis on a helping orientation of programs that are subjected to evaluation, rather than an emphasis purely on expanding scientific understanding.

Types of Applied Research

Applied research is divided into three categories: epidemiological (or diagnostic), feasibility, and evaluation. Each is described in this section.

Epidemiological research assesses the prevalence of a problem. It typically consists of a survey based on a representative (or near-representative) sample that allows a researcher to estimate the proportion of a population that exhibits the problem. In clinical settings, the equivalent of epidemiological research is called diagnostic research.

For example, many children born to mothers who consume even moderate amounts of alcohol during pregnancy have been shown to suffer major physical and neurological damage (Abel, 1984; Fried, 1984, 91; Rosett & Weiner, 1984). As part of public health efforts to reduce alcohol drinking during pregnancy, epidemiological surveys have been conducted in several countries to estimate the extent to which pregnant women drink, and to identify women with the greatest need for intervention (Barrison et al., 1985, 17; Moss & Hensleigh, 1988; Rubin et al., 1988).

The findings from epidemiological research can be used in at least four ways. First, it can help determine the full extent to which a prevention or treatment program is needed. Second, such research can help identify subpopulations that are most in need of intervention. Third, it can help researchers identify causes of the problem and thereby devise the most effective remedies. And, fourth, when repeated over several time frames, epidemiological research can help determine the extent to which a remedial program was successful.

Feasibility research provides estimates of the time, effort, and expense involved in producing changes in whatever problems have been identified. Like epidemiological research, feasibility research often involves surveying a population. (Sometimes epidemiological and feasibility surveys are conducted together.) Feasibility research pieces together cost estimates of various courses of action and the expected benefits. This research often includes recommendations regarding the facilities and personnel that will be required to combat the problem targeted for remedy. A great many of these have been conducted to examine whether some of the public policies enacted for the betterment of society or for deterring potential offenders are cost effective ways to attempt to prevent criminal or delinquent behavior.

Evaluation (or evaluative) research assesses the effectiveness of programs intended to alleviate social, health, or interpersonal problems. Evaluation research has been used in criminal justice on the numerous programs that have been set up to rehabilitate offenders as part of their punishment or as after-care.

Criteria Used in Program Assessment

Evaluation programs may be assessed in two ways. One is impressionistic and the other is objective.

Impressionistic program assessment involves asking either the program participants or the staff to gauge how well they believe the program met its intended goals. If only a small number of people provide this information, it is tantamount to antidotal evidence, but if large numbers of systematically selected participants or staff provide the assessment, the information can constitute scientifically valid evidence of a program’s effects.

The value of impressionistic evidence is certainly enhanced by having assessments of similar programs for comparison. For example, suppose that at the end of a training program for police officers, participants complete a form in which they rate how well the program provided them with information they need to better perform their duties. If, on a 10-point scale the average assessment the first time the program is offered is 5.5, compared with 7.8 the second time it is offered, evaluators would have reason to believe that the program had improved in accomplishing its goal. Nevertheless, such data is still impressionistic in nature.

Objective program assessment occurs when a more direct measure of program effectiveness is used. Regarding the hypothetical program for police officers, an objective assessment of the program’s effectiveness might involve administering a test of what the program was designed to teach. The test could be taken at the beginning and the end of the program to determine how much scores had actually improved.

If the program being evaluated is of a clinical nature, each patient might be interviewed to assess the severity of his or her symptoms before and after completion of the treatment regimen. One could even randomly divide clients who were eligible for treatment into two groups, and then give different types of treatment to each group. Such a set of procedures—two groups receiving different treatment and both being measured before and after treatment—would constitute the use of a classical experimental design for evaluating a clinical program.

Process and Impact Evaluation Research

Evaluation research may be either of a process or an impact nature (Aultman-Bettridge, 1998; Card et al., 1992, 77). Process (or on-going) evaluation research is concerned with how well a program functions. As a program functions over time, and information is accumulated regarding its effects, those administrating the program can make changes with an eye toward continual improvement.

Impact (or summative) evaluation research is applied not to continually functioning programs, but to ones that have designated end points. For example, studies of the effectiveness of Scared Straight constitute impact evaluation research. This is because each group of youth who visit the prison (or a control condition) can be independently studied. To assess such programs, researchers stipulate an objective for the program, then collect evidence as to whether such a goal had in fact been reached.

The essential difference between process evaluation research and impact evaluation research is this: The former is on-going, and thus program assessments must be made while the program is continuing to function. Often, in process evaluation research, an assessment is made as the program functions, some adjustments are made in the program to hopefully improve it, and then at some later time, an additional round of assessments and adjustments occur. Because of the continuation of process evaluations over time, plus the fact that there is rarely a control program for comparison, the most common design is some sort of before-after no control group design.

Impact evaluations, on the other hand, can be carried out with a wider variety of experimental and quasi-experimental design forms. For example, a study was undertaken to determine how well a welfare program for adolescent mothers with dependent children helped the mothers’ transition to self-sufficiency (Lie & Moroney, 1992). This study used a classical experimental design in which mothers in the experimental group received intensive social work services and a group of control mothers received the typical low level of services. At a two-year follow-up, significantly more experimental mothers were self-sufficient relative to the control mothers.

HISTORY OF EVALUATION RESEARCH

With isolated exceptions, evaluation research first began to appear in the social sciences in the 1950s (Rossi & Freeman, 1989, 23). At least three unrelated events gave impetus to its emergence, each of which are described below.

The Effectiveness of Psychotherapy

One event was the publication of a review article in the 1950s by an English psychologist, Hans Eysenck (1952) (pronounced: I’ sink). A storm of controversy surrounded the article’s suggestion that psychotherapy, which at the time was the most widely used treatment for mental illness, might have few or no beneficial effects (Barlow & Hersen, 1984, 12; Phares, 1979, 457).

Eysenck reached two conclusions: First, despite the widespread use of psychotherapy throughout the first half of the twentieth century, very little was known from a scientific standpoint about its effects. Second, what could be gleaned from the available research was that psychotherapy had few beneficial effects, except possibly for persons with the least serious forms of mental disturbances. These issues, incidentally, continue to be debated in the scientific literature (Brown, 1987; Elson, 1992; Phares, 1979, 468; Sloane et al., 1976).

Eysenck’s conclusions were criticized in part because of the wide diversity of therapies that he and others subsumed under the category of psychotherapy (Kazdin, 1989). Critics also contended that the unfavorable scientific results could reflect more about the quality of the research than about the ineffectiveness of psychotherapy (reviewed by Barlow & Hersen, 1984, 21). Nevertheless, more recent reviews have also concluded that the evidence is still not strong in support of claims that psychotherapy by itself has significant beneficial effects on serious forms of mental illness (Elson, 1992; Seligman, 1995).

The Behaviorist Movement in Psychology

The second event that helped bring about an increase in evaluation research was the behaviorist movement, which first began in the field of psychology in the 1920s. Behaviorism is an approach to the study of behavior that focuses on observable and measurable aspects of behavior, rather than on the subjective and mental aspects (Cooper et al., 1987, 7; Griffin, 1985, 615; Sperry, 1987, 37).

Most research conducted by behaviorists during the 1930s and 1940s involved recording simple behavior patterns by laboratory animals such as lever-pressing in order to obtain a food pellet (Skinner, 1966, 21). The emphasis placed on careful measurement of individual behavior became a hallmark of behaviorism (Barlow & Hersen, 1984, 29). Over the next couple of decades, behaviorists had moved much of their research out of animal laboratories and into clinical practice, with treatments for such things as phobias and childhood behavior problems (Cooper et al., 1987, 12). Behaviorists maintained their emphasis on careful measurement along with an insistence on documenting any claims that a particular form of treatment was effective. This type of emphasis was still fairly new to clinical practice in the social and behavioral sciences.

Governmental Efforts to Alleviate Social Problems

In the 1960s, a major boom to evaluation research occurred in the United States when the Kennedy and Johnson administrations declared ‘‘war on poverty’’ and various related social problems (e.g., crime, teenage pregnancy) (Coleman, 1990, 134; Rossi & Freeman, 1989, 27).

Funding for these programs was usually dispensed to governmental agencies or to nonprofit organizations in the form of grants. As part of the application process, applicants had to carefully describe their program as well as how the program’s effects would be measured. While many of the evaluations were anecdotal or impressionistic, others were based on rigorous experimental designs in which the dependent variables were carefully measured (Dixon & Wright, 1975, 58).

Most of the programs initiated in the 1960s were disappointing in terms of their impact on America’s social problems (Rossi et al., 1978; Sheldon & Parke, 1975, 694). As a result, many social scientists found themselves rethinking many of the theories on which these proposals were based (Alexander, 1972; Etzioni, 1973, 1977).

One of the most thoroughly researched programs to have originated during the Kennedy and Johnson administrations was one known as Head Start. This program is given special attention here for two reasons: First, it has been the object of much evaluation research. Second, these evaluation efforts highlight some of the complexities surrounding evaluation research, both from a scientific standpoint and in terms of the social and political decision making that has ensued.

As originally formulated, the most specific purpose of Head Start was to prevent poor and underprivileged children from falling behind middle and upper status children in academic achievement (Zigler & Muenchow, 1992). The primary means to achieve this goal involved providing underprivileged children with the sort of cultural and educational experiences that middle and upper status children typically receive at home.

Regarding its primary objective, several studies have shown that in the first three or four years of grade school, Head Start children performed as well on standardized tests as middle and upper status children, while poor children without the Head Start experience did not. However, beyond primary school, Head Start children generally slip back in academic achievement to levels comparable to poor children without Head Start (Becker & Gersten, 1982; Bentler & Woodward, 1978; Haskins, 1989; Head Start Bureau, 1985; Holden, 1990, 1400; Kantrowitz & McCormick, 1992; Miller & Bizzell, 1984; Scheirer, 1978, 56; Seitz et al., 1981; Stephan, 1986; Westinghouse Learning Corporation, 1969). In other words, beyond the fourth or fifth grade, Head Start students do not perform significantly better academically than do students from the same neighborhoods and socioeconomic background who have no Head Start experience.

Despite this disappointing evidence, Head Start programs have remained very popular among the general public and the politicians responsible for the programs’ financing. Funding for America’s Head Start (and related) programs have increased almost every year since these programs first began in the 1960s (Leslie, 1989; Rothbart, 1975, 23).

Social scientists have been puzzled, if not dismayed, by the inconsistency between the evidence that Head Start has not accomplished its main objective and the fact that it continues to receive both popular and political support. At least three factors may explain this paradox and offer lessons to those interested in evaluation research.

First, people who are not trained in science often become confused by the bewildering array of claims and counterclaims surrounding scientific and anecdotal evidence. Especially when the evidence is counterintuitive, people often trust their intuition rather than evidence based on sound research methodology.

Second, when confronted by serious problems, people prefer to do something rather than nothing, even if the chance of success is only slight. This tendency to grasp at straws not only typifies how some people attempt to solve social problems, but also helps to account for the vast expenditures every year on quack medical cures.

Third, even if a program does not accomplish its original intended objective, there are sometimes other benefits that may justify its continuation (Bee, 1981, 285; Scheirer, 1978). For instance, Head Start provides relatively safe and well-managed day care for poor families (especially single mothers), thereby freeing them to secure employment. In addition, Head Start appears to have positive effects on the physical health of participants (Stephan, 1986). Also, studies of a program similar to Head Start (although more expensive) found that, even though achievement on standardized tests was not affected, several other behavior patterns improved. Specifically, when compared with students of similar backgrounds during adolescence, recipients of the preschool programs had significantly lower high school dropout rates, lower unemployment, fewer teenage pregnancies, and lower arrest rates (Barnett, 1992; Berrueta-Clement et al., 1984; Schweinhart et al., 1993). However, at least two partial replications failed to confirm some of the conclusions regarding these beneficial effects (Holden, 1990, 1402; Reynolds et al., 1998).

TYPES OF PROGRAMS EVALUATED

The types of programs subjected to evaluation research in the social sciences may be divided into three fairly distinct categories. These are prevention programs, treatment programs, and improvement-oriented programs. Because the research designs utilized in assessing these three types of programs are often different, each is discussed separately in the following sections.

Evaluation of Prevention Programs

Prevention programs, of course, are designed to prevent some type of problem from developing or at least from becoming worse. Most of the programs targeted by social scientists for prevention have to do with various social or mental health problems such as poverty, crime, drug addiction, interpersonal conflicts, and emotional difficulties.

Two types of prevention programs are generally recognized: primary and secondary. Primary prevention programs are those aimed fairly indiscriminately at some population, whereas secondary prevention programs are more narrowly focused with respect to a high-risk population. To make this distinction clearer, suppose a program planner wanted to prevent teenagers from smoking. If he or she approached this goal by targeting all teens in a high school district, the approach would be termed primary prevention. However, if the planner targeted only teenagers who were at the highest risk of becoming smokers, such as those doing poorly in school and those whose parents smoke, the program would be termed secondary prevention. As readers might suspect, there are prevention programs that do not easily fit either of these conceptual categories. Nonetheless, the distinction is still useful.

Evaluation of Primary Prevention Programs

Examples of primary prevention programs that have been subjected to scientific evaluation regarding their effectiveness are presented in table 17.1. These programs include television advertising campaigns designed to discourage youth from smoking or from using illegal drugs. Other examples come from research on programs intended to improve automotive safety. These include studies undertaken to determine the effects of raising and lowering the legal drinking age (Smith & Burvill, 1986), of varying the penalties for drunk driving (Forcier et al., 1986; Ross, 1984; Ross et al., 1982), and of mandating seat belt usage (Conybeare, 1980; Robertson, 1978).

As with all evaluation research, the ideal design for assessing the effectiveness of prevention programs is some type of controlled experiment. Nevertheless, it is often necessary to make design compromises due to ethical, financial, or time considerations. Thus, quasi-experimental designs are used fairly often in evaluating prevention programs (Campbell & Boruch, 1975; Peterson & Remington, 1989).

Evaluation of Secondary Prevention Programs

Table 17.2 presents examples of prevention programs of a secondary nature that have been evaluated using various experimental or quasi-experimental designs. Examples of secondary programs are ones designed to deal with several forms of delinquency and criminality.

Evaluation of Treatment Programs

Programs tailored to help people recover from an illness or overcome a problem are termed treatment programs. Such programs can be administered to single individuals or to groups of individuals. Treatment programs that are most closely related to the social sciences are ones designed to help those with mental illness; criminality; drug dependency; poverty; and interpersonal difficulties, including marital and family discord.

It goes without saying that not all programs designed to alleviate problems actually accomplish their intended goal. As with prevention programs, decisions about whether or not a treatment program actually works should be based on some sort of objective measure of the dependent variable rather than on anecdotal testimonials or the goodwill and enthusiasm of those who designed and administered the program.

An example of a treatment program is the one discussed at the beginning of this chapter—Scared Straight. Notice that this program had much to be said on its behalf, including common sense (i.e., why shouldn’t it work?). Nevertheless, when Scared Straight was subjected to the unflinching scrutiny of evaluation research, nearly all of the evidence cast doubt on its ability to reduce subsequent delinquency among youth who had already gotten into legal trouble. As shown in table 17.3, many types of treatment programs have been evaluated scientifically, especially in the fields of psychiatry, psychology, criminal justice, and social work. Again, the best design for assessing the effectiveness of a treatment program is that of a controlled experiment. For most treatment programs, a classical design is best. In such a design, after measuring the dependent variable for randomly selected experimental and control groups, the experimental group receives exposure to a new treatment program, while the control group receives some type of conventional treatment.

Evaluation of Improvement-Oriented Programs

Evaluation research is sometimes applied to aspects of human behavior or institutional functioning for which improvement is always desired, but no specific problem has been identified. For instance, a corporate executive might be interested in identifying ways of cutting costs or increasing profits even if the company had been doing fairly well in recent years.

Improvement-oriented evaluation research is common in the field of education. For example, beginning in the 1960s, professors at many American colleges began having students complete course evaluation forms at the end of the term so that the quality of their courses could be assessed. The reasoning behind this movement was that such feedback could be used to objectively monitor changes in teaching quality. Whether such an effect has been achieved is difficult to demonstrate scientifically, but student evaluations of instructional quality are a widely used example of evaluation research in higher education. Table 17.4 displays some other examples of improvement oriented evaluation efforts.

LOCATING REPORTS OF EVALUATION RESEARCH

Especially in the past twenty-five years, a number of journals have specialized in publishing the results of evaluation research (see the list at the end of this chapter). Nevertheless, reports of many valuable studies of evaluation programs fail to be published in conventional journals (or books). Instead, these studies are largely limited to what are called in-house (or intra-agency) documents. Consequently, in-house documents can be a valuable source of information about the effectiveness of many programs. Nevertheless, their marginal status as ‘‘legitimate’’ publications means that they have not been subjected to the normal peer-review scrutiny of articles appearing in scientific journals.

Currently, even the most sophisticated computerized literature searches contain little information about in-house documents. In the future, a central clearinghouse for such documents may be established on the World Wide Web. In the meantime, those interested in the effectiveness of most agency-sponsored programs must use some ingenuity in locating relevant documents. It is often a good idea to e-mail agencies that would be likely to operate programs of interest, asking them for any relevant documents and individual contracts.

Can one cite and reference in-house documents? Yes, but care should be taken to reference them well enough so as to maximize the possibility of readers locating the document. Providing details about the agency responsible for the report is particularly important, including any numbers assigned to the report for cataloguing purposes (and even a website containing the report).

PROGRAM EVALUATION: DOING IT RIGHT

Conducting an objective and meaningful program evaluation requires major commitments of time and energy. The following three steps are among the most important to keep in mind:

First, researchers wishing to undertake a program evaluation need to become acquainted with what others working in the same or similar areas have already discovered. This means scouring libraries, surfing the web, and corresponding with agencies where prior studies might have been conducted.

Second, researchers should determine in writing the objectives of any program to be evaluated in conjunction with the program’s designers and administrators, and then devise a careful plan for measuring each of those objectives. This means that program outcomes must be specific and measurable. For instance, if a program is being designed to ‘‘improve student learning,’’ program administrators and the evaluating researchers need to agree on how such ‘‘improvements’’ will be objectively measured. Will the measurement be made in terms of grades, examinations, or assessments made by students?

Third, an on-going plan for collecting the data bearing on program objectives should be devised prior to the beginning of the program. Many well-intentioned evaluation efforts have failed because collecting the data needed to objectively assess the program began to unravel as the program got underway.

PROGRAM EVALUATION: A SOURCE OF TENSION

Conducting scientifically sound evaluations of programs can be challenging not just from a technical design standpoint, but also because those who oversee and manage the day-to-day functioning of programs have a stake in seeing to it that the evaluations produce favorable results. Evaluators, on the other hand, need to maintain an objective impartiality in making their assessments of program effectiveness. This means that if a program does not accomplish its intended objectives, those who designed and carried out the program need to be informed of this fact so that they can either abandon it or at least substantially modify it. Resistance to such conclusions can be substantial, especially when reputations, future funding, and even jobs are on the line (Campbell, 1969, 428).

Some have noted that those who perform the most competent program evaluations tend to have personalities that differ substantially from the practitioners who design and manage the programs (Briar, 1980, 31; Rothman, 1980, 75). For this reason, some have recommended that those conducting evaluation research should never work directly for those who are managing the programs being evaluated, and should even avoid sharing work space that brings them into day-today contact with one another (Marques et al., 1993, 210).

CLOSING THOUGHTS ABOUT EVALUATION RESEARCH

Evaluation research can be both extremely rewarding and very frustrating. Reward comes from knowing that the power of the scientific method is being harnessed to help improve the human condition.

Frustration, especially in nonclinical evaluation research, can come from the difficulties experienced in securing the necessary cooperation within a large organization long enough to objectively evaluate a program. Another source of frustration is that decisions about whether to retain a program are often made without seriously considering evidence from evaluation reports (Rossi & Freeman, 1989, 29; Rothbart, 1975, 25). It should be kept in mind that agency heads and program managers often have little training in social research methods, and must often make decisions on grounds having little to do with program effectiveness (Halpert, 1973, 377).

When confronted with organizational obstacles, researchers need to remain calm and professional. The wheels of science grind slowly and finely, whereas administrative decisions often take place quickly and on the basis of very incomplete knowledge. Ultimately, researchers should remember that their main function is to provide honest and objective assessments. How those assessments are utilized is largely an administrative decision. Sometimes evaluation efforts do not have much of an impact when they first appear, but may be very influential years later.

Some programs are retained far beyond the point of being cost-effective simply because of political and bureaucratic ‘‘inertia’’ (Arnhoff, 1975, 1277). For some of these programs, state and federal legislators have passed so-called sunset laws. Such laws automatically terminate the funding for programs after a specified number of years unless explicit steps are taken to maintain funding.

Social science researchers need to be cautioned against overselling the value of programs for dealing with major social problems. As noted earlier, many who proposed solutions to social problems in the 1960s lost credibility in the face of hard evidence that many of the programs instituted during those years were ineffective (Etzioni, 1973, 1977). If there is a lesson to be learned, it is that social scientists should be cautious in claiming to have solutions to the problems they study.

SUMMARY

Applied research is undertaken to help deal with ‘‘real world’’ problems. In the social sciences, most problems are of an individual, social, or health nature. The flip side of applied research is basic (or pure) research, which is undertaken to better understand a phenomenon with no immediate intention of altering it.

Three types of applied research are recognized: epidemiological, feasibility, and evaluative. Epidemiological (or diagnostic) research is undertaken to identify the extent of a problem at one or more points in time. Feasibility research is used to help develop a plan for effectively dealing with a problem. Evaluation (or evaluative) research refers to studies designed to assess how well a remedial (or treatment) program is accomplishing its intended objectives. Ideally, evaluation research is based on experimental designs, although quasi-experimental designs are also widely used.

Evaluation research can be divided into two categories: clinical and institutional. Clinical evaluation research evaluates the effectiveness of individualized treatment for persons who have voluntarily sought help. The history of clinical evaluation research can be traced back to events in the 1950s.

Methodologically, two main categories of institutional evaluation studies can be identified: process or (on-going) and summative. Process evaluations usually involve some aspect of the continued functioning of an organization. Most process evaluation studies utilize before-after no control group experimental designs. Summative studies employ a wide range of experimental designs and are applied to programs that have a logical end point, after which assessments of effectiveness are made.

Three events are associated with the emergence of evaluation research in social science. The first has to do with a widely read 1950s review article that challenged the view that scientific evidence existed to support the perception that psychotherapy is effective in treating major forms of mental illness. For the first time, this assumption was gradually replaced with more scientific and quantitative methods of assessing its effectiveness, but independently and in conjunction with pharmacological treatment. Second, the behaviorist movement in the 1950s and 1960s began pushing clinical treatment in the same direction: toward more careful measurement and greater documentation of a program’s effects on behavior.

Third, beginning in the 1960s, two consecutive U.S. presidential administrations (the Kennedy and Johnson administrations) sought to fund governmental programs to combat a host of social problems such as crime, illiteracy, and poverty. Funding for the functioning of these programs and provisions for documenting the effects of these programs, whether positive or negative, made program administrators accountable in ways that had never occurred before. In fact, much of the evaluation research ended up casting doubt upon the effectiveness of many, if not most, of these early programs, one of the most famous of which was Head Start, a program that remains mired in questions about its effectiveness to this day.

Three program categories in the field of evaluation research can be identified: prevention programs, treatment programs, and improvement-oriented programs. Preventive programs are subdivided into primary and secondary prevention programs. Primary prevention has to do with population-based programs where only a small proportion of people exhibit the characteristics to be prevented. Secondary prevention targets subpopulations in which many will exhibit the characteristics to be prevented, but without program exposure. Treatment programs are aimed specifically at helping people who have the full-blown symptoms of some mental or behavioral malady. Improvement-oriented programs are geared toward improvement of behavior (e.g., teaching) or institutional functioning (e.g., serving public demands for better mail service) that are not usually in crisis, but still in need of greater efficiency.

Overall, evaluation research has become popular in the social sciences since the middle of the twentieth century. It is often driven not by theory and idle curiosity, but by the real needs of humanity for health and comfort.