May 2, 2010

The theater of basing a majority of evaluation on test scores

Now that SB 6 is dead, that a governor's task force on RTTT came to a compromise in a single day, and it looks like there is some direction for teacher evaluation in Florida that's acceptable to Florida's K-12 teachers unions, it's time to take stock of the rhetorical stance SB 6 supporters had that a "majority" of a teacher's evaluation had to depend on student test scores. I've seen this pop up in other states, so it's a common rhetorical stance. Let's get a few things off the table first: this is not based on any research, and the supporters have no clearer idea of what "majority of a teacher's evaluation" might mean than supporters of the "65% solution" had any clue what spending money in a classroom meant. For that matter, neither did I as a skeptic (about either proposal).

So the "majority on test scores" stance is political, then. That's fine as a minimal statement; almost all decisions about pay structures are political in a broad sense rather than based on research, and to some extent they're reactive. Teacher pay scales became standardized to protect bureaucratic structures from (and sometimes in response to) accusations of corruption, and the single salary schedule is a response historically to gross pay inequity.

I'll go further: I don't think there's a way to avoid political values embedded in pay structures. Once you involve public money and a service most people connect with citizenship (education), you've got politics, however well structured and justified by reference to neutral statements of organizational need. On that level, performance pay is justifiable from the sense of satisfying public perceptions about how teachers should be paid. That was explicit in Denver's ProComp plan: the voters approved higher taxes in return for a performance-pay structure.

The problem with the "majority based on test score" position is twofold. One is the obvious one: it's divisive, and many parents and other community members are offended by the idea. Here, Diane Ravitch spoke for millions when she criticized SB 6. But there's another problem: it obscures the evaluation process rather than clarifying it. By reference to an implied point-based system, it fails to focus on what matters in a teacher evaluation system in terms of either an algorithm or underlying concepts.

I've written a bit about point-based systems, and because the focus of my paper was elsewhere, I didn't have a chance to talk about the limit of point-based scoring systems: it matters not where you can earn points but where you might lose points. I learned this in high school when I was a debater: individual raters have an implied comfortable range for scores, and it's the range of scores that matters, not the total number of points available in different categories. If raters have different effective ceilings as well as ranges (i.e., it is impossible for people to earn perfect scores with some raters, while others commonly hand out full marks), then the raters with the largest ranges of scores exert more power over final results than raters who have a very narrow range.

Similarly, components of any point-based system will have differential impact on final results when they have broader ranges in practice regardless of the proportion of the scale that derives from individual components. Imagine a teacher evaluation system with 100 points. Suppose 60 points comes from student test scores, and the range is restricted for most teachers to between 52 and 60 points. On the other hand, suppose 30 points in this hypothetical evaluation comes from direct observation, and the range of scores is between 10 and 30 (and more than a handful of teachers may earn the low score). Which component has the greatest influence on final results? It's the 30-point direct-observation component in this thought experiment, because in this hypothetical example teachers can lose more than twice the number of points there than through student test scores.

But the "majority of evaluation" rhetoric does more than obscure the real power in point-based systems: it obscures the question of what teachers are responsible for. "Outcomes!" says the supporter. Right, I say: that doesn't say a darned thing about the types of outcomes that will make the difference in evaluation. In Florida, Louisiana, and other states where people have pushed a majority from test scores approach, the push has been to create a mandate and defer the implementation to a regulatory process. That's a nice illusionist's trick if you can get away with it, but the process of implementation always mediates absolutist mandates, and then the legislature is giving up what mediates the test scores.

There are three ways I can see that test scores' impact on evaluations would be mediated in any system (and yes, I'm including SB 6 here): ad hoc (i.e., caprice), by reference to student disadvantage (i.e., blame-shifting), or by reference to teacher behaviors in classrooms (i.e., standards of practice). Without any legislative guidance, ad hoc and capricious mediation is likely (probably by the temperament and philosophy of the administrator with the greatest authority over evaluation). More destructive than ad hoc mediation would be blame-shifting: a teacher would be held blameless if someone else/something else (poverty, language, presumed parental neglect, etc.) can be blamed instead. Bad, bad idea.

Of the three options that come to mind tonight, mediating test scores by professional standards of practice seems the most productive. But then that raises the central question: if the use of test scores is inevitably subject to mediation, and the best choice for that mediation is through professional standards of practice, why not base evaluation on professional standards of practice to begin with--for example, to let an evaluation that documents effective practice create the rebuttal presumption of effectiveness?

The answer here is two-fold: one is that there is no agreed-upon standards of practice for teaching more generally, other than by crude and obvious standards (don't beat your students) or by reference to effects (keep your students' attention). The other explanation is that even if there were agreed-upon standards of practice, the process would be sufficiently messy as to irritate the sensibilities of those who advocate the putatively cleaner "majority from test score" approach.

The result is that instead of getting a messy but constructive system based on developing standards of practice, any such system that putatively bases the majority of a teacher's evaluation on test score is going to get ad hoc or blame-shifting mediation through the back door.

Update: Linda Perlstein noticed the 50% rhetoric and should get credit for the pattern recognition. Consultants' advice? Hmmn... looks like an interlocking-directorate phenomenon (no conspiracy needed).

Listen to this article
Posted in Accountability Frankenstein on May 2, 2010 10:24 PM |