January 3, 2006

Nichols, Glass, and Berliner

Today, Education Policy Analysis Archives publishes High-Stakes Testing and Student Achievement by Sharon L. Nichols (U. Texas San Antonio), Gene V Glass, and David C. Berliner (both of Arizona State University). It's a provocative article that argues that an extensive effort to look for relationships between high-stakes testing and state-level performance on NAEP from the late 1980s through a few years ago found evidence that high-stakes testing increased achievement only for fourth-grade math and in a limited fashion. Because I commented on an earlier draft of this document, one that appeared as a report in September (my name's in black and white as a reviewer on the last page of the appendices), readers may well be curious of the process followed on receipt of the manuscript. There needs to be some care when members of a board submit a manuscript to the journal and when an editor has seen a manuscript before.

I received the manuscript in late March and assigned it to five reviewers (sending them each a blinded copy). Because of the manuscript's length, I gave reviewers a considerable part of the summer to return remarks. After receiving them, I stripped the names off the comments and sent the compilation to a trusted member of the editorial board who knew the identity neither of the authors nor of the reviewers, and I asked the board member to make a decision on publication as a proxy editor: publish essentially as is, ask the authors to revise and resubmit, or reject. Based on the comments and his or her own sense of the manuscript, the proxy editor for this manuscript decided to ask for a revision. I conveyed the proxy editor's statement, comment, and all of the reviews (a total of 18 pages) to the authors in early September.

When the authors returned a revised version to me in the fall, I was removed by several revisions from the version I had commented on earlier in the process and decided that it would be acceptable on balance to make a decision, based on the revisions and the earlier comments of reviewers and my proxy editor, Les McLean of the University of Toronto. (There is also a question of how fair it would have been to Professor McLean to ask him to continue to serve as a proxy editor for such a long manuscript.) The result is a long article but one that is considerably tighter and stronger than either the report that I saw in draft form or any of the versions in between. I take full responsibility for the decisions first to remove myself from the initial review and then to make a final publication decision.

This article continues a debate over the effects of high-stakes testing, in the forum of a professional journal where the merits of the research can properly be debated. Nichols, Glass, and Berliner develop two new measures of pressure—a cross-sectional measure relying on accumulated ratings of many reviewers, what they call the APR, and a proxy longitudinal measure using an expert whose cross-sectional ratings correlate positively with their APR. This article will not end the debate, and I don't think that the authors expect it to. But it continues the development of rating high-stakes pressure in an new direction, and it contributes significantly to this literature. I look forward to the continued debate.

Update (3:16 pm EST): Thanks to A.G. Rud, a member of the editorial board who caught a misspelling.

