CHAPTER 11

The Facts About Teachers and Test Scores

CLAIM  Teachers determine student test scores, and test scores may be used to identify and reward effective teachers and to fire those who are not effective.

REALITY  Tests scores are not the best way to identify the best teachers.

Many educators hoped that No Child Left Behind’s emphasis on high-stakes testing would diminish when Barack Obama was elected. Unfortunately, President Obama’s Race to the Top adopted the same test-based accountability as NCLB. The two programs differed in one important respect: where NCLB held schools accountable for low scores, Race to the Top held both schools and teachers accountable. States were encouraged to create data systems to link the test scores of individual students to individual teachers. If the students’ scores went up, the teacher was an “effective” teacher; if the students’ scores did not go up, the teacher was an “ineffective” teacher. If schools persistently had low scores, the school was a “failing” school, and its staff should be punished. No excuses for failure, said the corporate reformers; we can’t wait, children have only one chance. We must close their schools and fire their teachers now, for the sake of the children.

Educators say that every child can learn, but they understand that children learn at different rates and that some inevitably learn more than others. Educators recognize that some children have more advantages and a faster start than others. Some have disabilities that interfere with their learning. Not all children start at the same place and not all children end at the same place, and test scores are not the only way to determine whether children are learning. Even the Obama administration’s use of “growth scores,” which measure increases in test scores from year to year, places far too much emphasis on standardized tests.

In the corporate reform mythology, every child can learn, and there can be no excuses for those who don’t. If they don’t get higher test scores every year, it is the fault of their teachers, whose expectations are low. Anyone who suggests that students’ family life or poverty might have anything to do with their test scores is just making excuses for bad teachers. Reformers believe that “highly effective” teachers can cause their students’ scores to go up every year. Lesser teachers cannot; those who can’t produce, they believe, must be found and fired without delay. Reformers say that an effective teacher can bring about three times as much learning in a year as a hapless, ineffective teacher. Or they say that three effective teachers in a row will change the life chances of their students, but a run of ineffective teachers will ruin students’ lives forever.

By the spring of 2010, the narrative of the corporate reformers was fully formed. In Central Falls, Rhode Island, a tiny and impoverished district, the local superintendent threatened to fire every teacher in the high school because its test scores were low. The state superintendent of schools supported the idea; so did Secretary of Education Arne Duncan and President Obama. None of the teachers received individual evaluations, but our nation’s leaders agreed they should all be fired. A cover story in Newsweek spelled out the reform narrative. On the cover was emblazoned the headline “The Key to Fixing American Education,” and behind it, written again and again, as if on a blackboard, was the solution: “We must fire bad teachers, we must fire bad teachers.”1

Where did these ideas come from? One important source is the work of the statistician William Sanders in Tennessee, who began his career advising agricultural and manufacturing industries. Sanders claimed that his statistical modeling could determine how much “value” a teacher added to her students’ testing performance. By monitoring students’ progress on standardized tests from year to year, Sanders figured, he could isolate the “value added” by the teacher of that child. By comparing prior test scores, Sanders reasoned that the racial and socioeconomic characteristics of that student became unimportant. In effect, Sanders treated student learning as a finite quantity, with the teacher as the variable. The students’ test score increases or losses could be attributed to the teacher. In his studies, an effective teacher was one who produced large test score gains year after year. Based on Sanders’s work, reformers concluded that three effective teachers in a row could close the achievement gap.2

Other economists said it might take four great teachers in a row, or even five great teachers in a row, to close the gaps, but the reformers usually preferred to stay with the claim of three years. (Of course, if all children have a great teacher, and all children are making the same gains, the achievement gap won’t close, but that’s another issue.) The reformers often repeat the claim that three “great” or “effective” teachers in a row would close the test score gap between black and white children, between rich and poor children, between Hispanic and white children. Michelle Rhee cited this supposed finding many times; she said, for example, at the University of Southern California in 2011: “We know for poor minority children, if they have three highly effective teachers in a row, versus three ineffective teachers in a row, it can literally change their life trajectory.”3

Arne Duncan often said something similar: “Three great teachers in a row, and the average child will be a year and a half to two grade levels ahead. Three bad teachers in a row, and that average child might be so far behind they might never catch up.”4

A variation on the same theme is that a great teacher produces three times as much learning in a year as a poor teacher. Or, put another way, the students of the great teacher get a test score gain of eighteen months in a year, while the students of the poor teacher learn only six months’ worth of whatever they studied in a year. The Stanford economist Eric Hanushek wrote in 2010 that the difference in effectiveness among teachers “is truly large, with some teachers producing 1½ years of gain in achievement in an academic year while others with equivalent students produce only ½ year of gain. In other words, two students starting at the same level of achievement can know vastly different amounts at the end of a single academic year due solely to the teacher to which they are assigned. If a bad year is compounded by other bad years, it may not be possible for the student to recover.”5 Perhaps such “great” teachers exist, but there is no evidence that they exist in great numbers or that they can produce the same feats year after year for every student.6

In the fall of 2010, the documentary film Waiting for “Superman” popularized the idea in the national media that American public education was in desperate condition because there were so many bad teachers in the schools. At the same time, a group of urban superintendents led by Joel Klein and Michelle Rhee published a “manifesto” about how to fix the schools, which asserted: “So, where do we start? With the basics. As President Obama has emphasized, the single most important factor determining whether students succeed in school is not the color of their skin or their ZIP code or even their parents’ income—it is the quality of their teacher.”7 Their manifesto asserted that teachers’ credentials, experience, and education were irrelevant in judging their quality. The only thing that matters, they argued, is “performance,” meaning the test scores of their students.

Klein and Rhee misquoted President Obama. President Obama had said that “the biggest ingredient in school performance is the teacher. That’s the biggest ingredient within a school. But the single biggest ingredient is the parent.”8 Richard Rothstein described the Klein-Rhee manifesto as a “caricature” and added:

Decades of social science research have demonstrated that differences in the quality of schools can explain about one-third of the variation in student achievement. But the other two-thirds is attributable to non-school factors … What President Obama means is that if a child’s parents are poorly educated themselves and don’t read frequently to their young children, or don’t use complex language in speaking to their children, or are under such great economic stress that they can’t provide a stable and secure home environment or proper preventive health care to their children, or are in poor health themselves and can’t properly nurture their children, or are unable to travel with their children or take them to museums and zoos and expose them to other cultural experiences that stimulate the motivation to learn, or indeed live in a zip code where there are no educated adult role models and where other adults can’t share in the supervision of neighborhood youth, then children of such parents will be impeded in their ability to take advantage of teaching, no matter how high quality that teaching may be.9

Social scientists generally agree that students’ families (especially family income, which determines advantages and opportunity) have an even bigger impact on student performance than their school or teachers. According to some economists, family accounts for about 60 percent of the variation in test scores; the school (its leadership, its staff, its resources, its programs, and such matters as the presence or absence of peer effects, that is, the presence or absence of willing students) is responsible for about 20–25 percent of the variation. Within the 20–25 percent attributable to the school, teachers are the biggest component affecting how students perform on tests, possibly as much as 15 percent. President Obama accurately said that the teacher matters most within the school, but “the biggest ingredient” in students’ academic performance is their family.10 (Personally, I am skeptical about these precise statistical calculations about large and complex human activities, but I am not an economist, so what do I know?)

Yet the myth persists that the teacher is primarily responsible for student scores and that great teachers can overcome the influence of family, poverty, disability status, language proficiency, and students’ own levels of interest and ability. Certainly, there are many people whose lives were changed by one teacher, but their stories typically describe teachers who were unusually inspiring, not “the teacher who raised my test scores to the top.” Teachers do have the power to change lives. But after more than a decade of No Child Left Behind, researchers are still searching for a nonselective school or a district where every student, regardless of his or her starting point, has achieved proficiency on state tests because that school or that district has only effective teachers.

Despite the absence of evidence, the claims persist. On its Web site, Michelle Rhee’s organization StudentsFirst says, “Research shows that a highly effective teacher generates 50% more learning than an average teacher. Conversely, an ineffective teacher generates 50% less learning than an average teacher. This means that kids learn three times more in a highly effective teacher’s classroom than in an ineffective teacher’s classroom.”11 Presumably, if a school hired and retained only those highly effective teachers, there would be dramatic gains in student test scores for all students. But Rhee doesn’t seem to understand that very few teachers get the same high test score gains year after year. In 2012, Melinda Gates said in a television interview, “An effective teacher in front of a student, that student will make three times the gains in a school year that another student will make.”12 She said that the job of the Gates Foundation is “to make sure we create a system where we can have an effective teacher in every single classroom across the United States.”

Given the reformers’ conviction that the teacher is the key to raising test scores dramatically for every student, they had to find a strategy to identify those highly effective teachers and get rid of those who didn’t have the right stuff.

In his academic studies and in Waiting for “Superman,” Eric Hanushek proposed that public schools should fire 5–10 percent of the teachers whose students got the lowest scores. If that happened, he said, the United States would rise nearly to the top of international test rankings. Moreover, he argued, replacing those bottom-of-the-barrel teachers with average teachers would add trillions of dollars to the nation’s gross national product. He wrote:

U.S. achievement could reach that in Canada and Finland if we replaced with average teachers the least effective 5 to 7 percent of teachers, respectively. Assuming the lower-bound estimate of teachers’ impact, U.S. achievement could reach that in Canada and Finland if we replaced with average teachers the least effective 8 to 12 percent of teachers, respectively …

Closing the achievement gap with Finland would, according to historical experience, have astounding benefits, increasing the annual growth rate of the United States by 1 percent of GDP. Accumulated over the lifetime of somebody born today, this improvement in achievement would amount to nothing less than an increase in total U.S. economic output of $112 trillion in present value. (That was not a typo—$112 trillion, not billion.)13

Hanushek suggested that there were three ways to get this dramatic improvement in teacher quality. One was to recruit higher-caliber teachers; another was to improve the skills of current teachers. But he maintained that both these methods had been tried and found inadequate. Instead, he recommended “deselection” of the bottom teachers based on their performance, defined as the test scores of their students. But school districts and states would need to change their policies, he believed, to attract and retain the kinds of teachers who could produce amazing test scores:

They would need recruitment, pay, and retention policies that allow for the identification and compensation of teachers on the basis of their effectiveness with students. At a minimum, the current dysfunctional teacher-evaluation systems would need to be overhauled so that effectiveness in the classroom is clearly identified. This is not an impossible task. The teachers who are excellent would have to be paid much more, both to compensate for the new riskiness of the profession and to increase the chances of retaining these individuals in teaching. Those who are ineffective would have to be identified and replaced. Both steps would be politically challenging in a heavily unionized environment such as the one in place today.

Although Hanushek is associated with the Hoover Institution at Stanford, his views were embraced by the Obama administration’s Race to the Top program and lauded by Republican governors across the nation, such as Scott Walker in Wisconsin, John Kasich in Ohio, Mitch Daniels in Indiana, Jeb Bush and his successor, Rick Scott, in Florida, and Chris Christie in New Jersey. Even Democratic governors like Dannel Malloy in Connecticut and Andrew Cuomo in New York endorsed the belief that low test scores were caused by “bad” or “ineffective” teachers, not by poverty and not by the relationship between resources and student needs.

Hanushek’s theory that test scores will improve by “deselecting” teachers whose students receive low test scores got a huge boost in 2012 with the highly publicized release of a study by the economists Raj Chetty and John N. Friedman of Harvard University and Jonah E. Rockoff of Columbia University. The Chetty study reviewed the records of students and teachers in the 1990s, before the advent of high-stakes testing, and concluded that students who had an effective teacher for a single year would have higher lifetime earnings and other benefits. The study was announced on the front page of The New York Times, where one of the authors said, “The message is to fire people sooner rather than later.” The study said that replacing a poor teacher with an average teacher would raise a single classroom’s lifetime earnings by $266,000.14 President Obama was so impressed by the Chetty study that he referred to it a few weeks later in his State of the Union address, saying, “We know a good teacher can increase the lifetime income of a classroom by over $250,000.”

However, critics were quick to raise questions about the study. They said that the authors may have confused correlation with causation (a class that gets higher test scores is also likelier to go to college and earn more) and that a large-scale study cannot pinpoint the effects of individual teachers. More than one critic pointed out that a lifetime gain of $266,000 for a class of twenty-six children, engaged in the labor force for forty years, translated to about $250 a year, or $5 a week. It would be even less for a larger class. As Bruce Baker observed, “What this boils down to is that a student can get a lifetime boost of $5 a week if we now spend billions of dollars on value-added rating systems. Maybe. Or maybe not.”15

None of the enthusiasts of value-added assessment recognized that nations at the top of the international league tables did not get there by “deselecting” teachers whose students got low test scores. Nations such as Finland, Canada, Japan, and South Korea spend time and resources improving the skills of their teachers, not selectively firing them in relation to student test scores.

Nonetheless, what entered the reform lexicon was a fixed belief that bad teachers must be found out and fired.

But then came the knotty problem: How can a school district measure teacher quality? How can district leaders know which teachers should get bonuses and which should be fired? The only way to answer these questions, reformers believe, is to collect test scores every year and then see which teachers got those big gains and which ones didn’t. Then rank the teachers from top to bottom. Once the ranking is done, according to reform theory, the teachers whose students got the big gains get bonuses and the ones whose students got no gains get fired. Eventually, if this is done consistently, the district ends up with only great teachers.

Some districts and states have already collected enough data to rank teachers by the test score gains of their students. Whether the rankings are accurate or not, some teachers have gotten bonuses and some have been fired. But no district has yet demonstrated the reformers’ thesis that firing teachers based on student test scores will bring about great increases for the district. Despite the oft-repeated claims by reformers that three years in a row of great teachers will close the gap, no school district has ever done it, not even districts with a superintendent and school board fully supportive of the corporate reform faith and without a teachers’ union to stand in the way. It remains a theory based on speculation, not evidence.

One reason it is hard to prove the theory is that the ratings are unstable from year to year. A teacher may be rated effective one year but ineffective the next. And the fact that the top-rated teachers produce gains large enough to close the achievement gap in three to five years doesn’t necessarily matter much if you cannot identify the teachers who have this impact year after year. Only a small proportion of teachers gets big test score gains year after year, so it may be difficult to find enough of them to staff an entire school, let alone an entire school district. As Matthew Di Carlo of the Shanker Institute has pointed out, “Because of the imprecision of these growth models, various sources of bias, and year-to-year variation in students and conditions, very few teachers manage to be ‘top’ teachers for three, four or five consecutive years. A huge chunk of the ‘top’ teachers in year one are average—or even below average—in year two. Even more of them fall out of the ‘top’ bracket in the third, fourth, and fifth years.”16

Another reason it is hard to prove the theory is that teachers are not factory workers who can be shifted from spot to spot as if they were on an assembly line. The teacher who is highly effective in one school may not be equally effective in another. But we can’t know for sure, because no one has tried to move teachers around to prove the theory that three great teachers in a row will close the achievement gap for an entire school or district. Not yet, anyway.

While it seems certain that some teachers are excellent and others are not, the theory is based on some wobbly claims. The very concept of value-added assessment reflects the mind-set of statisticians and economists who measure productivity gains. A farmer plants corn of a certain variety in a certain type of soil, treats it with certain conditions, and then measures the growth of the crop to determine the worthiness of the treatment. In the context of value-added assessment, the teacher is the treatment. If the teacher is effective, the corn grows to a certain height. If the teacher is not, the corn does not grow or grows very little.

But children are not corn. They are not seeds or plants with fixed characteristics. Children’s lives are not static. They have crises and ups and downs in their home lives and their personal lives. Maybe their parents got divorced. Maybe a parent lost her job. Maybe a student broke up with her boyfriend or totaled the family car. Maybe a family member died. Maybe the family moved to a new home. Maybe they were evicted from their home. These changes affect motivation, attention, and school performance. Children are not crops. They are not empty vessels waiting to be filled by a teacher.

In addition, the conditions for the teacher do not remain static. There may be more or fewer high-scoring students assigned to the teacher’s class. Class size may increase because of state budget cuts. The curriculum and instructional materials may be better or worse this year. The school leader may change and be more or less supportive. Valued colleagues may retire. The school climate may be tranquil or disruptive. Any number of changes in the school may affect the teacher’s classroom, the availability of resources and support, and ultimately the test scores of students.

The problems with value-added assessment are legion. Students are not randomly assigned, so teachers face different challenges every year. An excellent teacher may have a highly motivated group of students one year, while an equally effective teacher may be assigned a class with two or three troublemakers, who disrupt the class. Some teachers are deliberately assigned high-performing or low-performing students, or choose to teach one group or the other. One teacher gets great results, the other does not, but they faced different challenges, and the comparison is unfair.

The American Educational Research Association (AERA) and the National Academy of Education (NAE) prepared a joint statement about the problems with value-added assessment. They found that students’ test scores are influenced by far more than their teacher, and the various statistical models don’t account for all these factors. The other factors include:

• school factors such as class sizes, curriculum materials, instructional time, availability of specialists and tutors, and resources for learning (books, computers, science labs, and more)

• home and community supports or challenges

• individual student needs and abilities, health, and attendance

• peer culture and achievement

• prior teachers and schooling, as well as other current teachers

• differential summer learning loss, which especially affects low-income children

• the specific tests used, which emphasize some kinds of learning and not others, and which rarely measure achievement that is well above or below grade level17

Value-added ratings, they emphasized, are not stable. They vary from class to class, from year to year, and from one way of measuring to another. There are different ways to calculate value added, and the results will vary depending on which method is used. Different applications of value-added methodology produce different teacher ratings. When students take a different test, the teacher ratings also change.

The report by these two professional associations found that “teachers’ value-added ratings are significantly affected by differences in the students who are assigned to them.” Students are not randomly assigned. Those who teach students who are English-language learners or who have disabilities or who are homeless or who have poor attendance might have lower value-added ratings. Also, teachers of the gifted are likely to see small value added, because their students begin with high scores. “Even when the model includes controls for prior achievement and student demographic variables, teachers are advantaged or disadvantaged based on the students they teach.” Thus, to the extent that teachers’ job evaluations and compensation are tied to value-added measures, they may feel encouraged to avoid the neediest students, the students who are going to jeopardize their reputations, their careers, and their salaries.

Advocates of value-added assessment claim that they want to improve education for the neediest students by identifying the most effective teachers. They presume over time, as the weakest teachers are fired, only effective teachers would remain. But given the instability of the measures, and the threat to teachers’ livelihoods, value-added assessment may well harm the most vulnerable students. Current levels of inequality will deepen if teachers are incentivized to shun the students with the highest needs. Schools in high-poverty districts already have difficulty retaining staff and replacing them. Who will want to teach in schools that are at risk of closing because of the students they enroll?

The very concept of “value added” assumes that it is possible to isolate the effects of a single teacher on student achievement. But, says the joint AERA-NAE panel, this is overly simplistic:

No single teacher accounts for all of a student’s learning. Prior teachers have lasting effects, for good or ill, on students’ later learning, and current teachers also interact to produce students’ knowledge and skills. For example, the essay writing a student learns through his history teacher may be credited to his English teacher, even if she assigns no writing; the math he learns in his physics class may be credited to his math teacher. Specific skills and topics taught in one year may not be tested until later, if at all. Some students receive tutoring, as well as help from well-educated parents. A teacher who works in a well-resourced school with specialist supports may appear to be more effective than one whose students don’t receive these supports.

Children are not corn or tomatoes, and no statistical methodology can successfully control for all the factors that influence changes in students’ test scores. When analyzing the growth of cornstalks, we take into account the quality of the seeds, soil, water, wind, sunlight, weather, nutrients, pests, and perhaps other factors as well as the skill of the farmer. Measuring learning is far more complex than measuring agricultural production and involves many more factors because human beings are even less predictable than plants.

The champions of value-added assessment could learn from Harvey Schmidt and Tom Jones, whose musical The Fantasticks got it right:

               Plant a radish.

               Get a radish.

               Never any doubt.

               That’s why I love vegetables;

               You know what you’re about!

               Plant a turnip.

               Get a turnip.

               Maybe you’ll get two.

               That’s why I love vegetables;

               You know that they’ll come through!

               They’re dependable!

               They’re befriendable!

               They’re the best pal a parent’s ever known!

               While with children,

               It’s bewilderin’.

               You don’t know until the seed is nearly grown

               Just what you’ve sown.

Another complicating factor in the creation of value-added rankings is that they are based completely on standardized tests. But are the tests robust enough to serve as a proxy for teacher quality? The tests are not barometers or yardsticks. They are designed and constructed by humans and subject to error. Testing experts warn about measurement error, statistical error, human error, and random error. Given all the problems with standardized tests, and given the limited range of knowledge and skills that they test, can we be sure that they are truly an adequate or appropriate measure of student learning or teacher effectiveness? One can easily imagine a teacher who spends most of the year drilling her students to take the state tests. That teacher may get a high value-added rating yet be an uninspiring teacher. Do we want to honor and reward only those teachers who excel at teaching to the test? Or do we want to honor those teachers who are best at getting their students to think and ask good questions?

The cardinal rule of psychometrics is this: a test should be used only for the purpose for which it is designed. The tests are designed to measure student performance in comparison to a norm; they are not designed to measure teacher quality or teacher “performance.” Teaching is multifaceted and complex. Good teachers want students to participate in discussion and debate in the classroom; they want students to be active and engaged learners and to take the initiative in exploring more than what was assigned. Can standardized, multiple-choice tests accurately reflect teacher quality? What students have learned may be gauged more accurately by their classroom work and by their independent projects—their essays, their research papers, and other demonstrations of their learning—than by their test scores.

Certainly teachers should be evaluated, but evaluating them by the rise or fall of their students’ test scores is fraught with perverse consequences. It encourages teaching to multiple-choice tests; narrowing the curriculum only to the tested subjects; gaming the system by states and districts to inflate their scores; and cheating by desperate educators who don’t want to lose their jobs or who hope to earn a bonus. When the tests become more important than instruction, something fundamental is amiss in our thinking.

Some districts and states are trying to avoid narrowing the curriculum by expanding testing beyond reading and mathematics; they intend to test the arts, physical education, science, and everything else that is taught. They are doing this to create the data to evaluate all teachers. Students will be tested more so their teachers’ can be evaluated more. As the current national obsession with testing intensifies, we can expect to see more testing, more narrowing of the curriculum, more narrowing of instruction to only what is tested, more cheating, and less attention to teaching students to think, to discuss, to consider different ways to solve problems, and to be creative.

Linda Darling-Hammond, a Stanford University professor who is one of the nation’s leading experts on the subject of preparing and evaluating teachers, lost her enthusiasm for evaluation by test scores as she saw the confusing and misleading results in such places as Tennessee, Houston, the District of Columbia, and New York City.18

Darling-Hammond concluded that the teacher ratings “largely reflect whom a teacher teaches, not how well they teach. In particular, teachers show lower gains when they have large numbers of new English-learners and students with disabilities than when they teach other students. This is true even when statistical methods are used to ‘control’ for student characteristics.”

Why punish teachers for choosing to teach the students with the greatest needs or for being assigned to a class with such students?

If the goal of teacher evaluation is to help teachers improve, this method doesn’t work. It doesn’t provide useful information to teachers or show them how to improve their practice. It just labels and ranks them in ways that teachers find demeaning and humiliating. Darling-Hammond noted that Houston used a value-added method to fire a veteran who had been the district’s teacher of the year. Another teacher in Houston said: “I teach the same way every year. [My] first year got me pats on the back. [My] second year got me kicked in the backside. And for year three, my scores were off the charts. I got a huge bonus. What did I do differently? I have no clue.”

In 2010, the Los Angeles Times commissioned its own value-added analysis, based on nothing but test scores, and published the rankings of thousands of teachers. This initiated a national controversy about the ethics of publishing teachers’ job ratings. No one claimed that instruction improved as a result.19 The flaws of value-added analysis set off another heated debate in early 2012, when the New York City Department of Education publicly released the names and ratings of thousands of teachers. Rupert Murdoch’s New York Post filed a freedom-of-information request for the ratings, which the teachers’ union opposed, citing the ratings’ inaccuracy. Mayor Michael Bloomberg contended that parents and the public had a right to know the teacher ratings. After the newspaper won the court battle with the union, the scores were released and widely published. The Department of Education warned the public about a large margin of error: On a 100-point scale, the margin of error in mathematics was 35 percentage points; the margin of error in reading was 53 points. In other words, a teacher of mathematics might be ranked as a 50 but might in fact be anywhere from the 15th percentile to the 85th percentile. In reading, the same teacher might improbably be at the −3rd percentile or the +103rd percentile, which demonstrates how useless the rankings were.20

The New York Post printed a story and photograph of the city’s “best” teacher and its “worst” teacher. The teacher who was allegedly the worst was hounded by reporters at her home, as was her father. A few days later, it was revealed that she was a teacher of new immigrant students, who left her class as they learned English. She worked in a good school, and the principal said she was an excellent teacher. What was gained by giving her a low rating and putting her name and photograph in the newspaper? She suffered public humiliation because she taught English-language learners.21

Stated as politely as possible, value-added assessment is bad science. It may even be junk science. It is inaccurate, unstable, and unreliable. It may penalize those teachers who are assigned to teach weak students and those who choose to teach children with disabilities, English-language learners, and students with behavioral problems, as well as teachers of gifted students who are already at the top.

So, we circle back to the assertion that is common among reformers: Will three great teachers in a row close the achievement gap? It is possible, but there is no statistical method today that can accurately predict or identify which teachers are “great” teachers. If by great, we mean teachers who awaken students’ desire to learn, who kindle in their students a sense of excitement about learning, scores on standardized tests do not identify those teachers. Nothing about a multiple-choice test is suited to finding the most inspiring and the most dedicated teachers in every school. In every school, students, teachers, and supervisors know who those teachers are. We need more of them. We will not get them by continuing to turn teachers into testing technicians or judging teachers by inappropriate statistical models.

If by great, we mean the ability to get students to produce higher scores every time they are tested, the current value-added assessments may identify some teachers who can do this. But, to my knowledge, there is no school in which every teacher achieves this target. Claiming, as reformers do, that one day every classroom will have a teacher who can produce extraordinary test score gains for every student, no matter what his or her circumstances, is simply not leveling with the American public. No nation in the world has achieved 100 percent proficiency. And no other nation in the world evaluates its teachers by the rise or fall of their students’ test scores.

It is not even clear that this is a worthy goal.

Aside from the absence of evidence for this way of evaluating teachers, there remains the essential question of why scores on standardized tests should displace every other goal and expectation for schools: character, knowledge, citizenship, love of learning, creativity, initiative, and social skills.