CHAPTER 5 The Facts About Test Scores

CLAIM Test scores are falling, and the educational system is broken and obsolete.

REALITY Test scores are at their highest point ever recorded.

Critics have complained for many years that American students are not learning as much as they used to or that academic performance is flat. But neither of these complaints is accurate.

We have only one authoritative measure of academic performance over time, and that is the National Assessment of Educational Progress, known as NAEP (pronounced “nape”). NAEP is part of the U.S. Department of Education. It has an independent governing board, called the National Assessment Governing Board. By statute, the governing board is bipartisan and consists of teachers, administrators, state legislators, governors, businesspeople, and members of the general public.

President Clinton appointed me to that board, and I served on it for seven years. I know that the questions asked on its examinations are challenging. I am willing to bet that most elected officials and journalists today would have a hard time scoring well on the NAEP tests administered across the nation to our students. Every time I hear elected officials or pundits complain about test scores, I want to ask them to take the same tests and publish their scores. I don’t expect that any of them would accept the challenge.

Critics may find this hard to believe, but students in American public schools today are studying and mastering far more difficult topics in science and mathematics than their peers forty or fifty years ago. People who doubt this should review the textbooks in common use then and now or look at the tests then and now. If they are still in doubt, I invite them to go to the NAEP Web site and review the questions in math and science for eighth-grade students. The questions range from easy to very difficult. Surely an adult should be able to answer them all, right? You are likely to learn, if you try this experiment, that the difficulty and complexity of what is taught today far exceed anything the average student encountered in school decades ago.

NAEP is central to any discussion of whether American students and the public schools they attend are doing well or badly. It has measured reading and math and other subjects over time. It is administered to samples of students; no one knows who will take it, no one can prepare to take it, no one takes the whole test. There are no stakes attached to NAEP; no student ever gets a test score. NAEP reports the results of its assessments in two different ways.

• One is by scale scores, ranging from 0 to 500. Scale scores reflect what students know and can do. It is like a scale that tells you how much you weigh but offers no judgment about what you should weigh.

• The other is achievement levels, in which the highest level is “advanced,” then “proficient,” then “basic,” and last “below basic.” Achievement levels are judgments set by external panels that determine what students should know and be able to do.

To see how these two measures work, consider the reporting of scores for fourth-grade mathematics. If we were looking at the scale scores, we would learn that the scale score in the year 2000 was 226; by 2011, it was 241. The score is higher, but there is no qualitative judgment about what it ought to be. The maximum on the scale is 500, but there is no expectation that the nation will one day score 500 or that a score of 241 can be translated to mean ²⁴¹⁄₅₀₀. It is not a grade of 48 percent. It is not a passing grade or a failing grade. It is a trend line, period.

If you take the same fourth-grade mathematics report and look at the achievement levels, you will learn that 65 percent scored at basic or above in 2000, and 82 percent were at basic or above in 2011. Unlike the scale score, which shows only the direction of the trend, the achievement levels represent a judgment about how well students are performing.

^*Significantly different (p<.05) from 2011

The NAEP governing board authorized the establishment of achievement levels in the early 1990s with the hope that the public would have a better understanding of student performance, as compared with scale scores. Critics of the achievement levels complained at the time that the process was rushed and that the standards might be flawed and unreasonably high. But a member of the governing board, Chester E. Finn Jr., said it was necessary to move forward promptly and not to let the perfect become the “enemy of the good” for fear of sacrificing “the sense of urgency for national improvement.”¹

The critics were right. The achievement levels have not led to better understanding. Instead, the public is confused about what expectations are appropriate. The achievement levels present a bleak portrait of what students know and can do and, like No Child Left Behind, create the expectation that all students ought to be proficient.

All definitions of education standards are subjective. People who set standards use their own judgment to decide what students ought to know and how well they should know it. People use their own judgment to decide the passing mark on a test. None of this is science. It is human judgment, subject to error and bias; the passing mark may go up or down, and the decision about what students should know in which grades may change, depending on who is making the decisions and whether they want the test to be hard or easy or just right. All of these are judgmental decisions, not science.

Here are definitions of NAEP’s achievement levels:

“Advanced” represents a superior level of academic performance. In most subjects and grades, only 3–8 percent of students reach that level. I think of it as A+. Very few students in any grade or subject score “advanced.”

“Proficient” represents solid achievement. The National Assessment Governing Board (NAGB) defines it as “solid academic performance for each grade assessed. This is a very high level of academic achievement. Students reaching this level have demonstrated competency over challenging subject matter, including subject matter knowledge, application of such knowledge to real-world situations, and analytical skills appropriate to the subject matter.” From what I observed as a member of the NAGB who reviewed questions and results over a seven-year period, a student who is “proficient” earns a solid A and not less than a strong B+.

“Basic,” as defined by the NAGB, is “partial mastery of prerequisite knowledge and skills that are fundamental for proficient work at each grade.” In my view, the student who scores “basic” is probably a B or C student.

“Below basic” connotes students who have a weak grasp of the knowledge and skills that are being assessed. This student, again in my understanding, would be a D or below.

The film Waiting for “Superman” misinterpreted the NAEP achievement levels. Davis Guggenheim, the film’s director and narrator, used the NAEP achievement levels to argue that American students were woefully undereducated. The film claimed that 70 percent of eighth-grade students could not read at grade level. That would be dreadful if it were true, but it is not. NAEP does not report grade levels (grade level describes a midpoint on the grading scale where half are above and half are below). Guggenheim assumed that students who were not “proficient” on the NAEP were “below grade level.” That is wrong. Actually, 76 percent on NAEP are basic or above, and 24 percent are below basic. It would be good to reduce the proportion who are “below basic,” but it is 24 percent, not the 70 percent that Guggenheim claimed.²

Michelle Rhee, the former chancellor of the District of Columbia public schools, makes the same error in her promotional materials for her advocacy group called StudentsFirst. She created this organization after the mayor of Washington, D.C., was defeated and she resigned her post. StudentsFirst raised millions of dollars, which Rhee dedicated to a campaign to weaken teachers’ unions, to eliminate teachers’ due process rights, to promote charter schools and vouchers, and to fund candidates who agreed with her views. Her central assertion is that the nation’s public schools are failing and in desperate shape. Her new organization claimed, “Every morning in America, as we send eager fourth graders off to school, ready to learn with their backpacks and lunch boxes, we are entrusting them to an education system that accepts the fact that only one in three of them can read at grade level.” Like Guggenheim, she confuses “grade level” with “proficiency.” The same page has a statement that is more accurate, saying, “Of all the 4th graders in the U.S., only ⅓ of them are able to read this page proficiently.” That’s closer to the NAEP definition, yet it is still a distortion, akin to saying it is disappointing that only ⅓ of the class earned an A. But to deepen the confusion, the clarifying statement is followed by “Let me repeat that. Only one in three U.S. fourth-graders can read at grade level. This is not okay.” So, two out of three times, Rhee confuses “proficiency” (which is a solid A or B+ performance) with “grade level” (which means average performance).³

What are the facts? Two-thirds of American fourth graders were reading at or above basic in 2011; one-third were reading below basic. Thirty-four percent achieved “proficiency,” which is solid academic performance, equivalent to an A. Three-quarters of American eighth graders were reading at or above basic in 2011; a quarter were reading below basic. Thirty-four percent achieved “proficiency,” equivalent to a solid A. (See graph 5; graphs 5–41 appear in the appendix.)

Unfortunately, you can’t generate a crisis atmosphere by telling the American public that there are large numbers of students who don’t earn an A. They know that. That is common sense. Ideally, no one would be “below basic,” but that lowest rating includes children who are English-language learners and children with a range of disabilities that might affect their scores. Only in the dreams of policy makers and legislators is there a world where all students reach “proficiency” and score an A. If everyone scored an A or not less than a B+, the reformers would be complaining about rampant grade inflation—and they would be right.

In recent years, reformers complained that student achievement has been flat for the past twenty years. They make this claim to justify their demand for radical, unproven strategies like privatization. After all, if we have spent more and more and achievement has declined or barely moved for two decades, then surely the public educational system is “broken” and “obsolete,” and we must be ready to try anything at all.

This is the foundational claim of the corporate reform movement.

But it is not true.

Let’s look at the evidence.

NAEP has tested samples of students in the states and in the nation every other year since 1992 in reading and mathematics.

Here is what we know from NAEP data. There have been significant increases in both reading and mathematics, more in mathematics than in reading. The sharpest increases were registered in the years preceding the implementation of NCLB, from 2000 to 2003.⁴

Reading scores in fourth grade have improved slowly, steadily, and significantly since 1992 for almost every group of students. (See graph 6.)

• The scale scores in reading show a flat line, but this is misleading. Every group of students saw gains, but the overall line looks flat because of an increase in the proportion of low-scoring students. This is known to statisticians as Simpson’s paradox.⁵

• The proportion of fourth-grade students who were proficient or advanced increased from 1992 to 2011. In 1992, 29 percent of students were proficient or above; in 2011, it was 34 percent.

• The proportion of fourth-grade students who were “below basic” declined from 38 percent in 1992 to 33 percent in 2011.

• The scores of white students, black students, Hispanic students, and Asian students in fourth grade were higher in 2011 than in 1992. The only group that saw a decline was American Indian students.⁶ (See graphs 7, 8, 9, and 10, which show rising scores for whites, blacks, Hispanics, Asians, but not for American Indians.)

Reading scores in eighth grade have improved slowly, steadily, and significantly since 1992 for every group of students.

• The proportion of eighth-grade students who were proficient or advanced increased from 1992 to 2011. In 1992, 29 percent of students were proficient or above; in 2011, it was 34 percent. (See graph 11.)

• The proportion of eighth-grade students who were “below basic” declined from 31 percent in 1992 to 24 percent in 2011.

• The scores of white students, black students, Hispanic students, Asian students, and American Indian students in eighth grade were higher in 2011 than in 1992. (See graphs 12, 13, 14, and 15.)

Don’t believe anyone who claims that reading has not improved over the past twenty years. It isn’t true. NAEP is the only gauge of change over time, and it shows slow, steady, and significant increases. Students of all racial and ethnic groups are reading better now than they were in 1992. And that’s a fact.

Mathematics scores in fourth grade have improved dramatically from 1992 to 2011.

• The proportion of fourth-grade students who were proficient or advanced increased from 1990 to 2011. In 1990, 13 percent of students were proficient or above; in 2011, it was 40 percent. (See graph 2.)

• The proportion of fourth-grade students who were “below basic” declined from 50 percent in 1990 to an astonishingly low 18 percent in 2011.

• The scores of white students, black students, Hispanic students, Asian students, and American Indian students in fourth grade were higher in 2011 than in 1992. (See graphs 16, 17, 18, 19, and 20.)

Mathematics scores in eighth grade have improved dramatically from 1992 to 2011. (See graph 21.)

• The proportion of eighth-grade students who were proficient or advanced increased from 1990 to 2011. In 1990, 15 percent were proficient or above; in 2011, it was 35 percent. (See graph 22.)

• The proportion of eighth-grade students who were “below basic” declined from 48 percent in 1990 to 27 percent in 2011. (See graph 22.)

• The scores of white students, black students, Hispanic students, Asian students, and American Indian students in eighth grade were higher in 2011 than in 1992. (See graphs 23, 24, 25, 26, and 27.)

As it happens, there is another version of NAEP that the federal government has administered since the early 1970s. The one I described before is known as the “main NAEP.” It tests students in grades 4 and 8; scores on the main NAEP reach back to 1990 or 1992, depending on the subject. It is periodically revised and updated.

The alternative form of NAEP is called the “long-term trend assessment.” It dates back to the early 1970s and tests students who are ages nine, thirteen, and seventeen (which roughly corresponds to grades 4, 8, and 12). The long-term trend NAEP contains large numbers of questions that have been used consistently for more than forty years. Unlike the main NAEP, the content of the long-term trend NAEP seldom changes, other than to remove obsolete terms like “S&H Green Stamps.” The long-term trend NAEP is administered to scientific samples of students every four years.

Both the main NAEP and the long-term trend NAEP show steady increases in reading and mathematics. Neither shows declines. The long-term tests hardly ever change, so they provide a consistent yardstick over the past four decades.

Here are the changes in the long-term trend data in mathematics, from 1973 to 2008:⁷

The overall score does not reflect the large gains that were made over the past four decades, again because of Simpson’s paradox. Each of the four major groups of students saw significant gains. (See graphs 28 and 29.)

White students over the past forty years show impressive gains: age nine, up 25 points; age thirteen, up 16 points; age seventeen, up 4 points.

Black students over the past forty years show remarkable gains: age nine, up 34 points; age thirteen, up 34 points; age seventeen, up 17 points.

Hispanic students also show remarkable gains: age nine, up 32 points; age thirteen, up 29 points; age seventeen, up 16 points.

On the main NAEP, from 1990 to 2011, here are the data for mathematics:

White students: fourth grade, up 29 points; eighth grade, up 23 points. (See graphs 16 and 23.)

Black students: fourth grade, up 36 points; eighth grade, up 25 points. (See graphs 16 and 23.)

Hispanic students: fourth grade, up 29 points; eighth grade, up 24 points. (See graphs 17 and 24.)

Asian students: fourth grade, up 31 points; eighth grade, up 28 points. (See graphs 18 and 25.)

In reading, the changes are less dramatic, but they are steady and significant.

On the long-term trend assessments, these were the changes in reading from 1971 to 2008:

White students: age nine, up 14 points; age thirteen, up 7 points; age seventeen, up 4 points.

Black students: age nine, up 34 points; age thirteen, up 25 points; age seventeen, up 28 points.

Hispanic students: age nine, up 25 points; age thirteen, up 10 points; age seventeen, up 17 points.

Compare this with gains on the main NAEP reading from 1992 to 2011:

White students: fourth grade, up 7 points; eighth grade, up 7 points. (See graphs 7 and 12.)

Black students: fourth grade, up 13 points; eighth grade, up 12 points. (See graphs 7 and 12.)

Hispanic students: fourth grade, up 9 points; eighth grade, up 11 points. (See graphs 8 and 13.)

Asian students: fourth grade, up 19 points; eighth grade, up 7 points. (See graphs 9 and 14.)

NAEP data show beyond question that test scores in reading and math have improved for almost every group of students over the past two decades: slowly and steadily in the case of reading, dramatically in the case of mathematics. Students know more and can do more in these two basic skills subjects now than they could twenty or forty years ago.

Why the difference between the two subjects? Reading is influenced to a larger extent by differences in home conditions than mathematics. Put another way, students learn language and vocabulary at home and in school; they learn mathematics in school. Students can improve their vocabulary and background knowledge by reading literature and history at school, but their starting point in reading is influenced more by home and family than in mathematics.

So the next time you hear someone say that the system is “broken,” that American students aren’t as well educated as they used to be, that our schools are failing, tell that person the facts. Test scores are rising. Of course, test scores are not the only way to measure education, but to the extent that they matter, they are improving. Our students have higher test scores in reading and mathematics than they did in the early 1970s or the early 1990s. Of course, we can do better. Students should be writing more and reading more and doing more science projects and more historical research papers and should have more opportunities to engage in the arts.

But let’s recognize the progress that our educators and students have made, give credit where credit is due, and offer educators the encouragement and support to continue their important work.