NOT LONG AGO, I am told, a widely respected middle school teacher in Wisconsin, famous for helping students design their own innovative learning projects, stood up at a community meeting and announced that he "used to be" a good teacher. These days, he explained, he just handed out textbooks and quizzed his students on what they had memorized. The reason was very simple. He and his colleagues were increasingly being held accountable for raising test scores. The kind of wide-ranging and enthusiastic exploration of ideas that once characterized his classroom could no longer survive when the emphasis was on preparing students to take a standardized examination. Because the purveyors of Tougher Standards had won, the students had lost.
I don't know how many teachers across the country would identify with this story—either because they have already thrown up their hands, as this man did, or because they struggle every Monday morning to try to avoid his fate. But I do know this: the issue of standardized testing is not reserved for bureaucrats and specialists. All of us with children need to make it our business to understand just how much harm these tests are doing. They are not an inevitable part of "life" or even a necessary part of school; they are a relatively recent invention that gets in the way of our kids' learning. Their impact is deep, direct, and personal. Every time we judge a school on the basis of a standardized test score—indeed, every time we permit our children to participate in these mass testing programs—we unwittingly help to make our schools just a little bit worse.
In case I am being too subtle here, let me state clearly that I think standardized testing is a very bad thing, and the more familiar you become with it, the more appalled you are likely to be. I am not talking about the kind of tests invented by individual teachers for their classes but about those prepared by giant companies and taken by thousands of students across schools, districts, and even states. Similarly, I am not primarily interested in the tests used for admission to college, like the SAT—although there is plenty wrong with these exams, too.1 Of more immediate concern are those that begin much earlier, tests used all over the country, like the Iowa and Comprehensive Test of Basic Skills (ITBS and CTBS) and the Metropolitan, Stanford, and California Achievement Tests, as well as those developed for use in just one state, such as the ISAT in Illinois, the TAAS in Texas, the MEAP in Michigan, and so on.
The case against exams like these "may be as intellectually and ethically rigorous as any argument made about social policy in the past 20 years," says one writer, "but such testing continues to dominate the education system.... We are a nation of standardized-testing junkies."2 Estimates of how many times students in the United States sit down to take these tests every year vary from 40 million to 400 million. It is clear, though, that no other nation in the world does anything like this to its children.3 Yet despite requirements for some students to take several standardized tests a year,4 and although even young children are routinely subjected to these tests5 (in the face of explicit appeals by experts to stop), the trend, incredibly, is for even more testing.
In some cases, this trend reflects a deliberate strategy, part of an educational philosophy based on getting students to memorize a bunch of basic facts and skills. Standardized tests generally go hand-in-hand with that kind of teaching—and with a system of carrot-and-stick control to mandate that kind of teaching. In other cases, the use of such tests reflects no particular endorsement of a style of instruction or testing—only a vague desire to hold schools accountable coupled with a total ignorance of other ways of achieving that goal. For example, civil rights groups and sympathetic judges who are understandably outraged by disparities among school systems in the same state (or even county) may uncritically use standardized tests to indicate how much progress has been made to "close the gap" between black and white neighborhoods or poor and rich districts—all the while apparently unaware of how much harm they are doing by legitimating and perpetuating a reliance on such testing.
Standardized tests persist and proliferate for other reasons, too. First, they are enormously profitable for the corporations that prepare and grade them. (More often than not, these companies simultaneously sell teaching materials designed to raise scores on their own tests.) Second, they appeal to school systems because they're efficient, and the worst tests are usually the most efficient. It is fast, easy, and therefore relatively inexpensive6 to administer a multiple-choice exam that arrives from somewhere else and is sent back to be graded by a machine at lightning speed. There is little incentive to replace these tests with more meaningful forms of assessment that require human beings to evaluate the quality of students' accomplishments. In the words of Norman Frederiksen, a specialist in the measurement of learning with the Educational Testing Service, "Efficient tests tend to drive out less efficient tests, leaving many important abilities untested—and untaught."7
Anyone trying to account for the popularity of standardized tests may also want to consider our cultural penchant for attaching numbers to things. One writer has called it a "prosaic mentality": a preoccupation with that which can be seen and measured.8 Any aspect of learning (or life) that resists being reduced to numbers is regarded as vaguely suspicious. By contrast, anything that appears in numerical form seems reassuringly scientific; if the numbers are getting larger over time, we must be making progress. Concepts like intrinsic motivation and intellectual exploration are hard for the prosaic mind to grasp, whereas test scores, like sales figures or votes, can be calculated and charted and used to define success and failure. The more tests we make kids take, the more precise our knowledge about who has learned well, who has taught well, which districts are in trouble, and even which schools (in this brave new world of for-profit education) will survive another day.
In a broad sense, it is easier to measure efficiency than effectiveness, easier to rate how well we are doing something than to ask if what we are doing makes sense. But the heirs of Descartes and Bacon, Skinner and Taylor, rarely make such distinctions. More to the point, they fail to see how the process of coming to understand ideas in a classroom is not always linear or quantifiable. Contrary to virtually every discussion of education by the Tougher Standards contingent, meaningful learning does not proceed along a single dimension in such a way that we can nail down the extent of improvement. In fact, as Linda McNeil has observed, "Measurable outcomes may be the least significant results of learning."9 (That sentence ought to be printed out in 36-point Helvetica and hung in the office of every person in the country involved with school reform.) To talk about what happens in schools as moving forward or backward in specifiable degrees is not only simplistic, in the sense that it fails to capture what is actually going on; it is destructive, because it can change what is going on for the worse. Once teachers and students are compelled to focus only on what lends itself to quantification, such as the number of grammatical errors in a composition or the number of state capitals memorized, the process of thinking has been severely compromised.
If it is worth reflecting critically on our infatuation with numbers, it is at least as important to examine our assumptions about standardized tests in particular. Such tests are commonly justified on the basis of providing us with objective information about teaching and learning, but the precise score assigned to a student or school is meaningless until we know the content of the test and whether it is a valid measure of learning. Similarly, you can call these tests "objective" in the sense that they are scored by machines, but it was people who wrote the questions (which may be biased or murky or stupid) and people who decided to include them on the exam. As one writer put it, "Judgment was used in the choice of items and that judgment decided which bubble would count and which would not, and hence what the score would be. The people exercising the judgment are too far out of the picture to have faces and personalities, so it is easy to act as if they do not exist."10
Beyond the test-makers, we need to look at the test-takers. Those scientific-sounding results are actually the product of rows of real students scrunched into desks, frantically filling in bubbles. As soon as we focus on this human part of the testing process, the significance of the scores becomes dubious. For example, test anxiety has grown into a subfield of educational psychology, and its prevalence means that the tests that produce this reaction are not giving us a good picture of what many students really know and can do. The more a test is made to "count"—in terms of being the basis for promoting or retaining students, for funding or closing down schools—the more that anxiety is likely to rise and the less valid the scores become.
Then there are the students who don't take the tests seriously. A friend of mine remembers neatly filling in those ovals with his pencil in such a way that they made a picture of a Christmas tree. (He was assigned to a low-level class as a result, since his score on a single test was all the evidence anyone needed to judge his capabilities.) Even those test-takers who are not quite so creative may just guess wildly, fill in ovals randomly, or otherwise blow off the whole exercise, understandably regarding it as a waste of time. In short, it may be that a good proportion of students either couldn't care less about the tests, on the one hand, or care so much that they choke, on the other. Either way, their scores aren't very meaningful. Anyone who can relate to these descriptions of what goes through the minds of real students on test day ought to think twice before celebrating a high score, complaining about a low one, or using standardized tests to judge schools.
Can tests be reliable indicators despite these factors? Perhaps, but it is an open secret among educators that much of what the scores are indicating is just the socioeconomic status of the students who take them. One educator suggests we should save everyone a lot of time and money by eliminating standardized tests, since we could get the same results by asking a single question: "How much money does your mom make?...OK, you're on the bottom."11 (In the case of the SATs, the scores reflect not only family income [see [>]nl, par.4] but also the proportion of the eligible population that actually took the test.) The larger point is that "a ranking of states, districts, or schools by test scores is too crude a measure to offer any insight about the quality of education" because other factors, having nothing to do with instruction, contribute significantly to those scores.12
Some standardized tests aren't this bad. They're worse. I have in mind the tests that by design "provide little or no information about ... what the individual can do. They tell that one student is more or less proficient than another, but do not tell how proficient either of them is with respect to the subject matter tasks involved."13 Those are the words of a psychologist named Robert Glaser, who many years ago coined the term "norm-referenced" to describe this kind of test and contrasted it with a "criterion-referenced" test, which does compare each individual to a given standard. Believe it or not, the majority of states today rely on norm-referenced tests,14 that is, tests that aren't intended to find out how much students know. These tests were created only to find out how well your child does compared to every other child taking the test, which is usually reported as a percentile.
Think for a moment about the implications of this fact. No matter how many students take the test, no matter how well or poorly they were taught, no matter how difficult the questions are, the pattern of results is guaranteed to be the same: exactly 10 percent of those who take the test will score in the top 10 percent, and half will always fall below the median. That's not because our schools are failing; that's because of what "median" means. A good score on a norm-referenced test means "better than other people," but we don't even know how much better. It could be that everyone's actual scores are all pretty similar, in which case the distinctions between them are meaningless—rather like saying I'm the tallest person on my block even though I'm only half an inch taller than the shortest person on my block.
More important, even if the top 10 percent did a lot better than the bottom 10 percent, that still doesn't tell us anything at all about how well they did in absolute terms, such as how many questions they got right. Maybe everyone did reasonably well; maybe everyone blew it. We don't know. Norm-referenced tests aren't meant to tell us how well a student did or how much of a body of knowledge was effectively learned. To use them for that purpose is, in the words of a leading authority on the subject, "like measuring temperature with a tablespoon."15 Yet they are used for exactly that purpose all across the United States.16
We've already bumped up against some of these same criticisms in the context of surveys that rank students from different countries. Exactly the same points apply to ranking schools or districts or states. A reasonably informed person would not care how her child's school did compared to other schools in the area; a reasonably conscientious journalist would not dream of publishing something so meaningless and misleading. The only thing that should count is how many questions on a test were answered correctly (assuming they measured important knowledge). By the same token, the news that your state moved up this year from thirty-seventh in the country to eighteenth says nothing about whether its schools are really improving: for all you know, the schools in your state are in worse shape than they were last year, but those in other states slid even further.17
Even that doesn't tell the whole story. When specialists sit down to design a norm-referenced test, they're not interested in making sure the questions cover what is most important for students to know. Rather, their goal is to include questions that some test-takers—not all of them, and not none of them—will get right. They don't want everyone to do well. Furthermore, they want each question to be answered correctly by the same students who get most of the other questions right. The ultimate objective, remember, is not to evaluate how well the students were taught but to separate them, to get a range of scores. If a certain question is included in a trial test and almost everyone gets it right—or, for that matter, if almost no one gets it right—that question will likely be tossed out. Whether it is reasonable for kids to get it right is completely irrelevant. Moreover, the questions that "too many" students will answer correctly are probably those that deal with the content teachers have been emphasizing in class because they thought it was important. So norm-referenced tests are likely to include a lot of trivial stuff that isn't emphasized in school because that helps to distinguish one student from another.18
Given that scores from norm-referenced tests are widely regarded as if they said something meaningful about how our children (and their schools) are doing, they are not only dumb but dangerous. And the harm ramifies through the whole system in a variety of ways. First, these tests contribute to the already pathological competitiveness of our culture, which leads us to regard others as obstacles to our own success—with all the suspicion, envy, self-doubt, and hostility that rivalry entails. The process of assigning children to percentiles helps to ensure that schooling is more about triumphing over everyone else than about learning.
Second, because every distribution of scores will contain a bottom, it will always appear that some kids are doing terribly. That, in turn, reinforces a sense that the schools are failing. Worse, it contributes to the insidious assumption that some children just can't learn—especially if the same kids always seem to fall below the median. (This conclusion, based on a misunderstanding of statistics, is then defended as "just being realistic.") Parents and teachers may come to believe this falsehood, and so too may the kids themselves. They may figure: no matter how much I improve, everyone else will probably get better too, and I'm always going to be at the bottom. Thus, why bother trying? Conversely, a very successful student, trained to believe that rankings are what matter, may be confident of remaining at the top and therefore have no reason to do as well as possible. (Remember: excellence and victory are two completely different goals.) For both groups, it is difficult to imagine a more powerful demotivator than norm-referenced testing.
One more disturbing consequence: teachers and administrators who are determined to outsmart the test—or who are under significant pressure to bring up their school's rank—may try to adjust the curriculum in order to bolster their students' scores. (More about this later.) But if the tests emphasize relatively unimportant knowledge that's designed for sorting, then "teaching to the test" isn't going to improve the quality of education. It may have exactly the opposite effect.
Even though they suffer from the more general problems with standardized testing, criterion-based exams make more sense than the norm-referenced kind. At least they're set up so everyone theoretically could do very well (or very badly); it's not a zero-sum game. But in practice these tests may be treated as though they were norm-referenced. This can happen if parents or students aren't helped to understand that a score of 80 percent refers to the proportion of questions answered correctly, leaving them to assume that it refers to a score better than 80 percent of the other test-takers.19 Worse yet, criterion-referenced tests may be turned into the norm-referenced kind if newspapers publish charts showing how every school or district ranks on the same test, thus calling attention to what is least significant. (One expert on testing suggests that if newspapers insist on publishing such a chart, they should at least run it where it belongs, in the sports section.)20
Still, the main point is that when the tests themselves have been designed specifically as sorting devices, the harm is damn near inescapable. It is a point that almost everyone should be able to understand, yet our children continue to be subjected to tests like the ITBS that are both destructive and ridiculously ill suited to the purposes for which they are used. And they will continue to be used—and the scores will continue to be published—until you and I stop responding to the results by saying, "Ninety-fifth percentile! That's terrific!" or "Bottom quartile? What went wrong?" and start responding to such numbers by saying, "Wait a minute. What difference does that make? Do they think we're idiots?"
Even standardized tests that are criterion- rather than norm-referenced tend to be contrived exercises that tell us very little about the intellectual capabilities that matter most. What they primarily seem to be measuring is how much a student has crammed into his short-term memory. Lauren Resnick concedes that some standardized tests contain "isolated items that test students' critical thinking and reasoning knowledge," but she explains that they nevertheless fail to offer students the opportunity "to carry out extended analyses, to solve open-ended problems, or to display command of complex relationships, although these abilities are at the heart of higher-order competence."21
Resnick points out that what generally passes for a test of reading comprehension is a series of separate questions about short passages on random topics. These questions "rarely examine how students interrelate parts of the text and do not require justifications that support the interpretations"; indeed, the whole point is the "quick finding of answers rather than reflective interpretation." Tests of writing, meanwhile, are positively laughable: they are about memorizing the mechanics of grammar and punctuation, often requiring that students correct mistakes in isolated sentences. To state the obvious, "Recognizing other people's errors and choosing the correct alternatives are not the same processes as those needed to produce good written language."22
In mathematics, the story is much the same. An analysis of the most widely used standardized math tests found that only 3 percent of the questions required "high level conceptual knowledge" and only 5 percent tested "high level thinking skills such as problem solving and reasoning."23 Typically, the tests aim to make sure that students have memorized a series of procedures, not that they understand what they are doing. They also end up measuring knowledge of arbitrary conventions (such as the accepted way of writing a ratio or knowing that "<" means "less than") more than a capacity for logical thinking.24 Even those parts of math tests that have names like "Concepts and Applications" are "still given in multiple-choice format, are computational in nature, and test for knowledge of basic skills through the use of traditional algorithms."25 As for science, the parts of standardized examinations devoted to this subject often amount to nothing more than a vocabulary test. Multiple-choice questions that focus on "excruciatingly boring material" fail to judge students' capacity to think and wind up driving away potential future scientists, according to the president of the National Academy of Sciences.26
The point here is not that standardized tests are too hard or too easy. The problem is not the difficulty level, per se, but that they are geared to a different, less sophisticated kind of knowledge. And the more this is so, the more teaching comes to imitate these tests as teachers are steered away from helping kids learn how to think. Indeed, the students who ace these tests are often those who are least interested in learning and most superficially engaged in what they are doing. This is not just my opinion: studies of elementary, middle school, and high school students have all found a statistical association between high scores on standardized tests and relatively shallow thinking. Not only are these examinations not about deep understanding—they seem to be about its opposite.27
Perhaps this is why, as Piaget pointed out years ago, "anyone can confirm how little the grading that results from examinations corresponds to the final useful work of people in life."28 But never mind their inability to predict what students will be able to do later; they don't even capture what students can do today. In fact, we could say that such tests fail in two directions at once. On the one hand, they overestimate what some students know: those who score well often understand very little of the subject in question. They may be able to find a synonym or antonym for a word without being able to use it properly in a sentence. Older students may have memorized the steps of comparing the areas of two geometric figures without really understanding geometry at all. Younger children may be able to correctly "write 8 next to a picture of eight ice cream cones" while continuing to believe that eight of them spread out are more than eight crowded together.29 Students may even be able to "psych out" the test itself by ascertaining which kinds of answers are usually incorrect or what the writers of the test are looking for.
On the other hand, standardized tests underestimate what other students know because, as any teacher can tell you, very talented kids often get low scores. It is true in writing—"countless cases of magnificent student writers whose work was labeled as 'not proficient' because it did not follow the step-by-step sequence of what the test scorers (many of whom are not educators, by the way) think good expository writing should look like."30 It is true in reading: a first-grade teacher in Ohio, frustrated that an excellent reader was being placed in a remedial class because he didn't perform well on a standardized test, showed an administrator the books this boy could read as well as an entire book he had written. This evidence was brushed aside and she was told just to look at the test scores. As she recalls the conversation, "When I pointed out that there wasn't even any reading on this so-called reading-readiness test, well then [the administrator] said maybe that was the problem—that I should spend more time getting them ready to read rather than having the kids read."31
What we have here is a double indictment of standardized testing. "Pupils who read widely and with good comprehension may be undervalued, while pupils who perform well on isolated skill tests but who can't or don't care to read are lulled into complacency."32 The same is true in math. One group of researchers described a fifth-grader who flawlessly marched through the steps of subtracting 2⅚ from 3⅓, ending up quite correctly with 3/6 and then reducing that to ½. But successfully performing this final reduction doesn't mean he understood that the two fractions were equivalent. In fact, he remarked in an interview that ½ was larger than 3/6 because "the denominator is smaller so the pieces are larger."
Meanwhile, one of his classmates, whose answer had been marked wrong because it hadn't been expressed in the correct terms, clearly understood the underlying concepts. Intrigued, the researchers then proceeded to interview a number of fifth-graders about another topic, division, and discovered that 41 percent had memorized the process without really grasping the idea, whereas 11 percent understood the concept but made minor errors that resulted in getting the wrong answers. A standardized test therefore would have misclassified more than half of these students.33
As disturbing as all of this may be, we can dig even deeper, looking not only at the specific questions that appear on such tests but at the format and nature of the tests themselves. It is the very features of standardized testing we take for granted that ultimately undermine their usefulness:
***
Put these last few points together and you have a scenario that is not merely disagreeable but ludicrously contrived. After all, how many jobs demand that employees come up with the right answer on the spot, from memory, while the clock is ticking? (I can think of one or two, but they are the exceptions that prove the rule.) How often are we forbidden to ask coworkers for help or to depend on a larger organization for support—even in a society that worships self-sufficiency? And when someone is going to judge the quality of your work, whether you are a sculptor, a lifeguard, a financial analyst, a housekeeper, a professor, a refrigerator repairman, a reporter, or a therapist, how common is it for you to be given a secret pencil-and-paper exam? Isn't it far more likely that the evaluator will look at examples of what you've already done or perhaps watch you perform your normal tasks? To be consistent, those educational critics who indignantly insist that schools should be doing more to prepare students for the real world ought to be manning the barricades to demand an end to these artificial exercises called standardized tests.
Of course, anyone who reads through this list may be inclined to wonder, "Well, how could you have a standardized test that wasn't concerned only with right answers or wasn't secret or timed or whatever?" Indeed, you probably couldn't. But here is a very different question: How could you devise a way of figuring out how well students are learning, or schools are teaching, that didn't have these features? As we'll see in the final chapter, this question does have an answer. But it's critical that we frame the issue in these broader terms, that this becomes our point of departure, because only then are we free to look beyond—and solve the problems created by—standardized tests.
Lately, some opponents of standardized testing have invoked a bucolic saying: "You don't fatten a steer by weighing it." The point, of course, is that merely measuring something, such as students' learning, doesn't in itself lead to any change in what is being measured. This is true as far as it goes, but the metaphor is poorly chosen44 because it implies that testing has no impact on education. The reality is that it almost always does have an impact, increasingly by design. Unfortunately, the impact is usually negative.
Consider the messages that standardized testing communicates to children about the nature of learning. Because a premium is placed on remembering facts in many of these tests, students may come to think that this is what really matters—and they may even come to develop a "quiz show" view of intelligence that confuses being smart with knowing a lot of stuff. Because the tests are timed, students may be encouraged to see intelligence as a function of how quickly people can do things. Because the tests often rely on a multiple-choice format, students may infer "that a right or wrong answer is available for all questions and problems" in life and "that someone else already knows the answer to [all these questions], so original interpretations are not expected; the task is to find or guess the right answer, rather than to engage in interpretive activity."45
If we're looking for more direct harms, we could just tally up the time students and teachers waste actually taking these tests. But where our kids really pay the price is with what comes before and because of the tests. "It is the hours spent practicing types of questions that might appear on the tests and the days denying students enrichment options that are truly meaningful that make proficiency tests so harmful and invasive," as one educator put it.46 In schools around the country, the content and style of teaching are being placed in the service of the tests. Teachers often feel obliged to set aside other subjects for weeks at a time in order to teach test-taking skills. Sometimes the tests hijack the entire curriculum as schools are transformed into giant test-prep centers. When students will be judged on the basis of a multiple-choice test, teachers may use multiple-choice activities beforehand. It is not uncommon to find instruction "in the same format as the test rather than [in a form] used in the real world. For example, teachers reported giving up essay tests because they are inefficient in preparing students for multiple-choice tests."47 The assignments (in class and to take home) may change as well. It's not unusual to hear of schools where teachers are required to use multiple-choice formats in their teaching.48 This has aptly been called the "dumbing down" of instruction, although curiously not by the conservative critics with whom that phrase is normally associated.
More striking, either because they think it is best for their students or because they have a gun to their heads, teachers will dispense with poetry and focus on prose, breeze through the Depression and linger on the Cold War, cut back on social studies to make room for more math—depending on what they think will be emphasized on the tests. They may even put all instruction on hold and spend time giving practice tests.
The defenders of standardized testing don't try to deny that it forces schools to reconfigure the curriculum; indeed, they cheerfully acknowledge this. "There's nothing wrong with teaching to the test. That is what education is all about," declared Robert V. Antonucci, the former commissioner of education for Massachusetts.49 An article in the American School Board Journal prefers the euphemism "curriculum alignment" and insists that it's paying off ... as measured exclusively by test scores!50
It is a relatively new idea—that tests should be used not only to measure but to mandate—but in scarcely a generation it has come to be taken for granted. Sometimes it is done deliberately, perhaps because it offers policymakers "one of the few levers on the curriculum that [they] can control."51 Other times, educators and parents simply realize that a test emphasizes isolated facts rather than critical thinking and figure the curriculum had better be retooled accordingly. Either way, the tail of testing is now wagging the educational dog.
You'd think that when officials sit down to formulate an education policy, they would begin by agreeing on some broad outlines of what students ought to know and be able to do, and only then address the question of measuring how successfully this is happening. The reality, though, often seems to be exactly the opposite: "What can be measured reliably and validly becomes what is important to know," as critics of one state's reform efforts observed.52 It's rather like the old joke about the fellow who was looking around for his lost keys one night, explaining to a passerby that he was searching the sidewalk right near a streetlight not because that was where he dropped them but because "the light is better here."
A more indirect effect of the same mentality can be glimpsed in Ohio, where the pressure to boost proficiency test scores has led to changes in how teachers of children from age 9 to 14 are certified by the state. Teachers have been forced to specialize in only two content areas (such as math and science), which means that the kind of departmentalization that has created such a fragmented educational experience in high school may now happen, thanks to testing pressures, as early as fourth grade.53
In some states, as the following chapter will explain, officials have turned up the heat by creating "high-stakes" testing programs. But even without this added pressure, it's virtually inevitable that teachers will feel pressured to change what they do. As one principal in Virginia put it, "We know what's being tested. So now we know what we have to teach."54 In fact, when teachers from forty-eight schools around the country were surveyed a few years ago, nearly all "reported spending substantial time (a week or more) giving students worksheets that review test content, giving students practice with the item formats likely to be on the test, and directly teaching test-taking strategies."55 And teachers understand the costs of doing this. In another survey, 60 percent of the math teachers and 63 percent of the science teachers described the "negative effects of a testing program on curriculum or student learning," citing the "narrowing and fragmenting of the curriculum" among other consequences.56
We've already seen some illustrations of the limits of these tests. If students get higher scores on math exams for memorizing techniques than for understanding concepts, which do you think teachers will emphasize? If a teacher sees first-graders being penalized for having spent time reading rather than being drilled on reading-readiness skills, what direction will her class take the following year? Teachers all over the country struggle with variations of this dilemma, worrying not only about their own jobs but about the short-term price their students may have to pay for more authentic learning. The choices are grim: either the teachers capitulate, or they struggle courageously to resist this, or they find another career.57
I remember visiting a school in Illinois a few years ago where the battle had already been lost. The district was under a court order to bring up its test scores and had bought a packaged program called "Success for All" to do just that (see [>]n53). The result resembled a factory more than a place of learning, with children being exhorted to succeed and perform and achieve. (By contrast, words like "curiosity," "discovery," and "exploration" were nowhere to be seen or heard.) In room after room I saw children correcting punctuation, answering plot questions about story fragments, and completing worksheets full of multiplication problems and those all-too-familiar analogy questions.
Of course I hadn't seen the school earlier, so I can't say for sure that its patron saint was John Dewey before it became Stanley Kaplan. In other places, though, the shift is visible to the naked eye:
And so it goes. "Everywhere we turned," one group of educators reported in 1998, "we heard stories of teachers who were being told, in the name of 'raising standards,' that they could no longer teach reading using the best of children's literature but instead must fill their classrooms and their days with worksheets, exercises, and drills." The result in any given classroom was that "children who had been excited about books, reading with each other, and talking to each other were now struggling to categorize lists of words."59
In some classes, of course, it was never otherwise. The very raison d'être of high school advanced placement (AP) courses, for example, has always been to prepare students for a test. (There has been much discussion about who gets to take these classes60 but very little about their basic purpose and high-powered drill-and-skill method of instruction.) One writer suggests that teachers ought to just issue a formal declaration of surrender to the Educational Testing Service and be done with it.61
Even in classes less noticeably ravaged by the imperatives of test preparation, there are hidden costs—opportunities missed, intellectual roads not taken. For one thing, teachers are less likely to work together in teams.62 For another, in each classroom, "the most engaging questions kids bring up spontaneously—'teachable moments'—become annoyances."63 Excitement about learning pulls in one direction; covering the material that will be on the test pulls in the other. No wonder elementary school teachers in one state overwhelmingly denounced the effects of a new testing program, saying in a recent survey that "morale has sunk, practice tests are soaking up teaching time, and students are more anxious about school than ever before."64
Sometimes teachers feel they must depart from teaching testable facts in testlike fashion ... in order to impart advice about test-taking, per se. The first thing to be said about using school for this purpose is that it is an egregious waste of our children's time—although the fault lies not with the teachers who do it but with the tests themselves and those who pay inordinate attention to the results. Second, it is educationally "harmful if students transfer [these test-taking tactics] to other classroom activities."65 We don't want kids to get in the habit of skimming a book, looking for facts they might be asked on a test, instead of really thinking about and responding to what they're reading.
Third, even if clever strategies (for example, skipping to the questions first, then going back to the passage to find the answers) are effective, this means that, to some extent at least, a high test score reflects not knowledge or intelligence but good test-taking skills. If one can be successful by engaging in what some have called "legal cheating"—if we can indeed raise students' scores by teaching them tricks or by cramming them full of carefully chosen information at the last minute—this should be seen not as an endorsement of such methods but as a devastating revelation about how little these tests are really telling us.
Linda Darling-Hammond offers this analogy: Suppose it has been decided that hospital standards must be raised, so all patients must now have their temperatures taken on a regular basis. Shortly before the thermometers are inserted, the doctors run around giving out huge doses of aspirin and lots of cold drinks. Remarkably, then, it turns out that no one is running a fever! The quality of hospital care is at an all-time high!66 What is really going on, of course, is completely different from providing good health care and assessing it accurately—just as teaching to the test is completely different from providing good instruction and assessing it accurately. "By focusing on improving test scores," two researchers warn, "only test scores, and not schools themselves, will improve."67
Notice that scores typically plummet whenever a state or district decides to administer a new test. (And the headlines read: Our schools are failing! Our students are ignorant!) After a few years, the scores begin to rise as students and teachers get used to the test. (And the headlines read: Our schools are improving! Tougher standards are effective!)68 Another kind of evidence comes from stories like the one about a junior high school in New Jersey where an intensive test-prep effort succeeded in producing the highest scores in the area—after which one third of the students required remedial classes when they got to high school.69 They weren't helped to learn; they were helped to get good scores, which did them no good and may even have done them considerable harm.
What all this means can be summarized in a sentence: At best, high test scores for a given school or district are probably meaningless; at worst, they're actually bad news because of the kind of teaching that was done to produce those scores.
To talk about the kind of teaching that was done is to talk about the kind of teaching that was not done. The first thing to go in a school or district where these tests matter a lot is a more vibrant, integrated, active, effective kind of instruction. A Cambridge, Massachusetts, teacher of seventh- and eighth-grade students70 can tick off exactly what she's had to sacrifice in order to prepare her students for that state's new test. Class meetings to build community, learn democratic skills, and solve problems together? No time. The flexibility to depart from the lesson plan and discuss important current events? Forget it; today's news won't be on the exam. A while back, this teacher had devised a remarkable unit in which every student picked an activity that he or she cared about and then proceeded to become an expert in it. Each subject, from baking to ballet, was researched intensively, described in a detailed report, and taught to the rest of the class. The idea was to hone research and writing skills as well as to help each student feel like an expert in something and to heighten everyone's appreciation for the craft involved in activities they may not have thought much about. In short, it was the kind of academic experience that people look back on years later as a highlight of their time in school. But now her students won't have the chance: "Because we have so much content material to cover, I don't have the time to do it," this teacher says ruefully. "I mean, I've got to do the Industrial Revolution because it's going to be on the test."
One leading educational organization has noted that "ironically, the calls for excellence in education that have produced widespread reliance on standardized testing may have had the opposite effect—mediocrity."71 But it's important to point out that this effect shows up in some areas more than others. The noted high school reformer Ted Sizer recalls a conversation he once had with a top school official who was "proposing to test the kids until they begged for mercy." It turned out, you may not be surprised to learn, that this administrator sent his own kids to private schools where standardized tests were exceedingly rare if they were used at all and, incidentally, where excellence meant an emphasis on discovery rather than a "back-to-basics" mentality.72
The point isn't that this official was caught in a political faux pas for pulling his children out of the public schools he was supposed to be defending. Rather, he seemed to think the traditional approach to education, including a heavy diet of standardized testing, is for other people's children—and, as it turns out, particularly for children of color. Even apart from charges that some standardized tests are biased against minorities because of their content,73 such tests—with all the implications for teaching they carry—are more likely to be used and emphasized in schools with higher percentages of minority students.74 The result is that even people who are understandably desperate to improve inner-city schools wind up making the problem worse when they cause reform efforts to be framed in terms of improving standardized test scores.
Indeed, the whole conversation about improving education in this country has been narrowed by the use of these tests. The more that scores are emphasized, as Sherman Dorn at the University of South Florida has pointed out, the less we discuss the proper goals of schooling. Now it's just a matter of finding the most efficient means for what has become the de facto goal: doing better on tests. Furthermore, we tend to stop using (or developing) other ways for evaluating classroom practices and student learning: "As long as a school or teacher has adequate test scores, what happens in the classroom is irrelevant," Dorn remarks. Similarly, poor test scores are viewed as indicators that change is needed, "no matter what happens in the classroom."75
For a parent, the implications of all this are straightforward. In the words of Gail Jones, a professor of education at the University of North Carolina, "The bottom line is, do you want to have a child who can take tests well or do you want to have a well-educated child?"76