August 5, 2010

"Overcaffeinated value-added enthusiasts" and the public

I've developed a fondness for Rick Hess's phrasing. Whether or not I agree with him, I have to smile when he describes advocates of value-added or bust as "overcaffeinated value-added enthusiasts." This is in the context of the back-and-forth over value-added measures in the District of Columbia teacher filings (see blog entries by Aaron Pallas, Hess, Pallas, and Hess, from which I drew the overcaffeinated term). What we're seeing here is the beginnings of a public dialog over the technical details of value-added measures, whether in DC or here in Florida (see today's St. Pete Times story on two audit reports over Florida measures, plus articles from Jacksonville and Miami over continuing questions).

Or, rather, we're not seeing much of a dialog, more of a he said-she said dynamic. Pallas wrote from what was publicly available (which was as simplistic as what I read about Bill Sanders' techniques in newspapers in 1990s Tennessee), Hess criticized him for insufficient due diligence for an academic blogger, and we're now into round two on who owes whom what on transparency. Florida is slightly different in the actors: most of the critics this summer have been superintendents, worried about whether problems with the underlying test scores or value-added measures will end up shaming their elementary schools (and them) with lower ratings on the state's accountability system. But it's still he said-she said with the auditors saying they found no problems and the superintendents still having reservations.

What we're missing is clear reporting on the technical issues, but don't blame the reporters. In some cases, there is poor planning by state departments of education (or the DC schools, in this summer's news), so there's nothing clear and accurate and easy to communicate. In other cases, as in those jurisdictions using Bill Sanders' techniques, you've got a proprietary model that the public isn't allowed to inspect. And then there's the simple fact that there is no single holy grail of value-added measures and inherent error issues that tend to be underplayed because standard errors and measurement error are eyeglaze-inducing even if they're important. So the reporting is a far cry from the sensawunda reporting on scientists who uncovered BL's lowball estimates of the gusher's output: oh, wow, there are pools of oil under the surface? oh, wow, you can estimate flows from the speed of particles in a fluid? Nothing like that exists on value-added or growth measures.

Some part of the situation is inevitable when a technical apparatus becomes a tool of political discussion. I don't mean the partisan politicization of statistics (though that happens) but the fact that even mildly controversial bills do not pass in many legislative bodies unless there is a certain amount of pathos in the debate, and the exaggeration of debate tends to drown out the caveats for anything. There are plenty of very careful statisticians out there who can tell you the issues with value-added or growth measures. They're not quoted in news stories, because no editor is going to let "mixed-model BLUE algorithms tend to swallow dependent-variable variance before you get to the effect measures" appear in a newspaper. So there's a mismatch between the technical issues and the level of discussion. You shouldn't need someone with the skills of Robert Krulwich to report on technical measures affecting public policy, but that's where we are.

That feeds into the dichotomous debate that is dominated by the "let the measures work" and the "it's imperfect, so toss it out" arguments. As I wrote a year ago,

The difficulty in looking coldly at messy and mediocre data generally revolves around the human tendency to prefer impressions of confidence and certainty over uncertainty, even when a rational examination and background knowledge should lead one to recognize the problems in trusting a set of data. One side of that coin is an emphasis on point estimates and firmly-drawn classification lines. The other side is to decide that one should entirely ignore messy and mediocre data because of the flaws. Neither is an appropriate response to the problem.

When the rubber meets the road, you're sometimes going to get the firmly-drawn classification lines in Florida that lead people to nitpick technical details (I wonder how many of the superintendents griping this summer have bonuses tied to school grades), and you're going to get nebulous debates when systems such as IMPACT are not accompanied by technical transparency. This just doesn't work for me, and it shouldn't for you, either.

Listen to this article
Posted in Education policy on August 5, 2010 10:54 AM | Submit