February 14, 2007

Parsing growth and grossing the Parthenon

Kevin Carey criticizes Leo Casey's take on growth measures to evaluate teacher effectiveness.  Casey cited a 2003 RAND Corp. study which cast doubt on the use of student-achievement growth measures to evaluate teachers (something pushed by the Aspen Commission).

Carey makes two points:

  1. How to use the imperfect data tools we currently have available is a policy decision. To Kevin Carey, this type of decision-making includes issues such as the acknowledgment and discounting of technical flaws (in multiple possible meanings of discounting).
  2. One possible reason for the lack of evidence of growth models' ability to be used to judge teachers is resistance to their use in the U.S., except for Tennessee and a few other jurisdictions.

I completely agree with #1. It is the authority of policymakers to tangle with the technical details of policy and the implications of those technical details. Not only do I have no problems with this claim, but I argue that point in Accountability Frankenstein. But the authority also implies responsibility to do so, and I hope Carey understands that placing the marker down at this point means that he'll be holding policymakers to making reasonable judgments on those technical details: No hand-waving and displacing responsibility onto invisible bureaucracies, right? Of course, I doubt he or anyone else can point to any legislature that has set a cut-score for any graduation or teacher competency test... or bar exam, electrical contracting exam, general contractors' license exam, etc. No, I'm not arguing that legislatures should really do that, but Carey's point is all on theoretical authority and very little on acknowledging the fact that legislatures generally do displace responsibility for technical details.

The second point is something I'm going to quibble about.  Yes, Tennessee has had something called "value-added assessment" since the early 1990s, but I have yet to see any evidence that Bill Sanders' system consistently distinguishes anything more than a small proportion of teachers from the vast, vast majority (as either good or bad), and that's even assuming the validity of the TerraNova test results in Tennessee. Sanders acknowledges it, and it's partly an artifact of any multilevel modeling (which tends to swallow a good portion of the variance originally in the data).

The "resistance" point only makes sense if you're restricting us to the U.S., since the U.K. has been attempting multilevel modeling of longitudinal achievement much longer than anyone in the U.S. Go ask Harvey Goldstein what he'd say from the U.K. experience, or read his papers, such as Using Pupil Performance Data for Judging Schools and Teachers (PDF). Basic point: There's still little evidence that growth models are the holy grail either for school-level or teacher-level accountability. (Credit Goldstein for using "holy grail" to refer to various fantasies of growth-model advocates.)

Extra credit to anyone who knows why I used "Parthenon" in the title of an entry referring to Tennessee, apart from the obvious spoonerism.

Listen to this article
Posted in Accountability Frankenstein on February 14, 2007 8:05 PM |