September 5, 2006

Jay Mathews discusses the prospects for national testing

Sunday's column by Washington Post writer Jay Mathews focuses on renewed arguments for a national test that every student would take. The dramatic difference between the proportions of students in various states' being labeled proficient based on the state test vs. the National Assessment of Educational Progress is the fulcrum of the argument. I think. I suspect this is a piece largely engaging in celebrity-wonk showcasing (no chance that a president with approval ratings in the 30s will successfully push any major education initiative that rocks political pathways, as I explain below), but I'll address two issues raised in the article.

Standards vs. Cut-scores

Jerry Bracey has been going after NAEP's value-laden labels for the achievement categories for a decade or more, and his pre-publication e-mail to Jay Mathews gives you a sense of what he thinks of this reporting. Essentially, Bracey's argument is and has been that NAEP's labels don't mean much: The governing board set "basic" to be fairly close to the mean scores many years back, which makes one wonder what they thought the majority of students were doing in school. That's an interesting counterweight to the hype in this story (and others) that students in the U.S. are mediocre.

The truth is that neither point has much weight except for political symbolism. The levels or bands on NAEP are ordinal in the sense that scores in higher bands represent greater achievement on the NAEP scale than scores in lower bands. Yes, Bracey is correct that the values chosen have no inherent meaning. Moses did not come down from the mountain and have tablets written that such-and-such a score on the NAEP is proficiency. That's true of all cut-scores. But I think Bracey is missing the forest for the trees.

The greater sin of the reporting by Mathews is the confusion of standard-setting with cut-score setting. The rhetorical flourishes to justify a national test this go-round (oh, my, the states and feds don't agree on the proportion proficient!) imply that if the stats disagree, the states must not be setting standards, and a test must do that.

That's balderdash or policy bravado, and I'm not sure which. Oh, and it might just be sloppy reporting, too.

For more than 15 years, the education policy ether has been filled with discussion of standards and alignment. Standards are sets of statements of what we think students should learn and be able to demonstrate. Students should know the reasons for the writing of the Declaration of Independence and the ways in which people have used its rhetoric for political and social purposes is an example of a standard one might write for history. I don't think any state has such a standard, because it crosses historical periods and countries, but you could write such a standard.

In the theory of action of standards and alignment, the establishment of standards would determine both the focus of instruction and the scope of assessment. That hypothetical history standard would probably force a reorganization of history teaching if it was central to a course, for example, because it would break down the neat compartmentalization of the Declaration in the 18th-century "unit" (though there might be a text that pushes students to ask such questions). Back when Lauren Resnick was our standards theory king, the tests were supposed to be challenging, assessing higher-order thinking, and so we needn't worry too much if teachers taught to the test. The 1994 reauthorization of the Elementary and Secondary Education Act had all of that high-falutin' stuff in there, taking the standards piece of the Goals 2000 legislation and saying, Thou must apply this standards stuff to schools with high concentrations of poverty, too. In 1997, the reauthorization of the federal special-education law (the Individuals with Disabilities Education Act) added ... and to students with disabilities as well.

How far we have fallen, if one of our star education reporters can't see the differences among standards, tests, and cut scores.

I don't know if Mathews is accurately reflecting the views of those he interviewed (I figure that if a journalist gets 70% of the facts correct, he or she is doing a decent job), but any attempt to use a test to "set standards" is getting things backwards. Don't we first decide what we want students to do? Of course, part of the problem with the debate is how the backwards-reasoning is promoted by the AYP requirements in No Child Left Behind. You want high-stakes testing? Fine: Let's see how You, Ms. Education Commissioner/Superintendent, game the system. First, you say you agree with the goal that 100% of students will demonstrate proficiency in reading and math by 2014. Then you quickly ask your staff what cut score would allow your state to declare AYP in most schools for a number of years until the policy changes or you intended to retire or run for higher office, anyway. Next step: Define proficiency so you get that cut score, and then finally figure out some way to tie that notion of proficiency to your standards. If you're currently paying for off-the-shelf commercial tests, or you haven't written standards, you get to design this set of links from scratch.

No, not all states have done this. Some have overpromised and are reaping the rewards of such expectations by having the state label most schools as Peachy-Keen while the state AYP definition labels most as Not Meeting AYP. But the debate still revolves around differences in labels rather than whether instruction is decent and what evidence exists.  And the talk about a national test says nothing about the expectations we should hold for students and schools. Does anyone think that any sort of national test would mean we'd first have an exercise in setting national education standards?  Whooooooooooeeeeeee! We tried that in the late 1980s and early 1990s, and the result were some mediocre standards, some decent standards, and a set of history standards that was viciously lied about by the future Second Lady.

So if there were a national test every child takes, I predict that there would be a yawning gap between the test and any sense of real standards or expectations.

(For those who think of standards as a term of art in assessment with a different definition, please accept my apologies. I don't think there is an agreed-upon term-of-art called standards, so I'm going with the wonkish use of the word over the past 15 years. But I'll be happy to be educated on this.)

"Local control"

The phrase "local control" is policy short-hand that refers to the fact that education is not mentioned in the constitution, that it really did used to be controlled at the community level, and that members of Congress jealously guard their states' ability to decide on policy within the state. The confusion of Mathews is understandable, but it's important to note here that local control in education politics at the federal level refers more specfiically to state decision-making and not control by a community. As an historian, I'm not sure how fair it is to term the state politics as local, since states wrested control away from rural and truly local communities early in the 20th century.

We would be more accurate if we talked about state police powers rather than local control, but I suspect we won't be very accurate here. Why should education policy debates be accurate?

But, in any case, Mathews and other observers are correct that the idea of national tests currently have about as good odds of coming true as John Kerry's getting 70% of the vote in Houston. Whether you call it state police powers or local control, few politicians with state or local constituencies (i.e., those in Congress) will have the stomach for much centralization at this point.

Also see commentary by Jim Horn. Update: also see Andy Rotherham's followup, which refers to a July 2006 explanation of cut-scores he wrote and the Fordham Foundation piece on standards and tests. The one thing I noticed right away was Checker's and his coauthors' inattention to baseball pop culture. The phrase is not "If you build it, they will come" but "If you build it, he will come." Mathews also implied that all of the wonkish folks surveyed in this not-quite-Delphi-questionnaire process were coauthors or endorsers of the Finn et al. approach, something that is not true.

