June 19, 2006

NBPTS gets snookered on evaluation

What was I going to say after waiting several weeks to discuss the hullaballoo over the Sanders report for the National Board of Professional Teaching Standards? It's been a quiet month in Lake Wobegone... no, that's not it. I've been busy. Yes, that's it. Let's get the links out of the way first, from Eduwonk to AFT , and Barnett Barry (and you can follow the links snowball-wise from there). Here's the gist: The NBPTS has wanted to commission a study of the effects of certification on student achievement—or, more causally, evidence on whether nationally-certified teachers were more effective than non-certified teachers—and they chose someone with a certain amount of cache nationally because he's been effective at promoting growth analysis (and his statistical model, specifically). Then the NBPTS looked foolish for appearing as if they were quashing the study, so they released it, after releasing a summary and general criticism. They've got ostrich egg on their faces, collectively.

Before I read the study, I was prepared to say something like the following, given my prior criticism of Sanders: There's a difference between accountability and general research. Any accountability algorithm has to be public and transparent, to be fair and to be consistent with the goals of accountability (which include public and transparent information about student achievement). But while Sanders' model is proprietary (generally a bad thing, in my view), a version of it exists in the SAS Institute software's PROC MIXED, and statisticians have been able to play around with that enough to know how it behaves (and can reasonably extrapolate to Sanders, even if we'd all prefer he'd come clean). So let's treat this study as we treat all research, which is to respect rigor and look to see incremental contributions even if we might quibble about method.

I've read the study report now, and while I still see a difference between what we should expect for an accountability mechanism and what we might see as valuable in a single research project, I'm disappointed with the public version of the paper itself. As usual, Sanders included no demographic information as covariates, and the paper itself has very little information otherwise on methods. There is far more information in Dan Goldhaber's paper on national certification.

Here's the rub, from a reader's perspective: Sanders claims in large part (as he has before) that the primary difference between the papers is because he has random effects for teachers. Theoretically pure, I guess—I don't mean to ridicule the rationale for mixed models (there's a good reason to want to use it, if you have the data and the estimates converge), but unless you have data that you rework in different ways, you can't tell that it's the multilevel, mixed-effects model that makes the difference. Maybe it's in how you work with the scales (as some have suggested), or the covariates included, or the sample sets. I'd love to see each set of researchers hand their data over to the others. Then you get some better idea of how this all works (or doesn't).

In the end, NPBTS suckered themselves into hiring a researcher for the wrong reasons without having a solid contract requiring that the resulting report would have enough information to be credible, and then they compounded the secrecy of Sanders's shop by not releasing the report immediately. Bad move, guys, all around.

Incidentally, don't ask me to comment much on the ABCTE study of what has occasionally been called evidence of its effectiveness. Balderdash. Participants were already-certified teachers. If anything, it's just a validity study for the test itself.

Policy research is hard. It's very tempting to exaggerate the importance of any study and to remember the incremental nature of this. The Coleman study in 1966 didn't prove anything. Neither did Goldhaber last year or Sanders et al. this year. And if you read someone's work without seeing any discussion of limitations, caveat lector. But that's always been true.

Listen to this article
Posted in Education policy on June 19, 2006 11:39 PM |