## July 25, 2009

### Temporizing and teasing on tests and teacher evaluation

I still don't have time to expand at length on combining qualitative and quantitative sources of data for teaching evaluation, but given the hoopla surrounding the draft Race to the Top regulations, I should at least provide an update, or rather a bit of a tease for what's developing into a short paper-to-be. In addition to my fairly general understanding of some technical issues, I'm developing the argument that any point-based system for combining professional judgment and test scores needs to avoid fixed weights for the components of the system.

The explanation is not that technical, and I can sketch it here: the benefit of a truly Bayesian approach to using test scores to evaluate teachers is a reciprocal relationship between the decision-making authority of professional judgment and the power of other data (including test scores). A forceful judgment by professionals reduces the power of test scores in such a system, while tepid judgments increase the power of test scores. That is one possible solution to the thorny question of relative weights: if educators are willing to judge their own, test scores are less important (addressing the concerns of teachers unions and many administrators), but if educators are *not* willing to judge their own, test scores are more important (addressing the concerned of those criticizing the very low proportion of teachers given poor evaluations).

In a point-based system with fixed weights (or fixed percentages of the total) assigned to individual components, you don't have a structure with a reciprocal relationship between the exercise of professional judgment and the authority of test-score data. But I think the dynamic benefits of a Bayesian approach can be created in a point system, as long as the weights are not fixed. I need to think through the potential approaches, but it's possible.

There: that's the tease.

Posted in Accountability Frankenstein on July 25, 2009 3:16 PM |