August 5, 2010

"Overcaffeinated value-added enthusiasts" and the public

I've developed a fondness for Rick Hess's phrasing. Whether or not I agree with him, I have to smile when he describes advocates of value-added or bust as "overcaffeinated value-added enthusiasts." This is in the context of the back-and-forth over value-added measures in the District of Columbia teacher filings (see blog entries by Aaron Pallas, Hess, Pallas, and Hess, from which I drew the overcaffeinated term). What we're seeing here is the beginnings of a public dialog over the technical details of value-added measures, whether in DC or here in Florida (see today's St. Pete Times story on two audit reports over Florida measures, plus articles from Jacksonville and Miami over continuing questions).


Or, rather, we're not seeing much of a dialog, more of a he said-she said dynamic. Pallas wrote from what was publicly available (which was as simplistic as what I read about Bill Sanders' techniques in newspapers in 1990s Tennessee), Hess criticized him for insufficient due diligence for an academic blogger, and we're now into round two on who owes whom what on transparency. Florida is slightly different in the actors: most of the critics this summer have been superintendents, worried about whether problems with the underlying test scores or value-added measures will end up shaming their elementary schools (and them) with lower ratings on the state's accountability system. But it's still he said-she said with the auditors saying they found no problems and the superintendents still having reservations.

What we're missing is clear reporting on the technical issues, but don't blame the reporters. In some cases, there is poor planning by state departments of education (or the DC schools, in this summer's news), so there's nothing clear and accurate and easy to communicate. In other cases, as in those jurisdictions using Bill Sanders' techniques, you've got a proprietary model that the public isn't allowed to inspect. And then there's the simple fact that there is no single holy grail of value-added measures and inherent error issues that tend to be underplayed because standard errors and measurement error are eyeglaze-inducing even if they're important. So the reporting is a far cry from the sensawunda reporting on scientists who uncovered BL's lowball estimates of the gusher's output: oh, wow, there are pools of oil under the surface? oh, wow, you can estimate flows from the speed of particles in a fluid? Nothing like that exists on value-added or growth measures.

Some part of the situation is inevitable when a technical apparatus becomes a tool of political discussion. I don't mean the partisan politicization of statistics (though that happens) but the fact that even mildly controversial bills do not pass in many legislative bodies unless there is a certain amount of pathos in the debate, and the exaggeration of debate tends to drown out the caveats for anything. There are plenty of very careful statisticians out there who can tell you the issues with value-added or growth measures. They're not quoted in news stories, because no editor is going to let "mixed-model BLUE algorithms tend to swallow dependent-variable variance before you get to the effect measures" appear in a newspaper. So there's a mismatch between the technical issues and the level of discussion. You shouldn't need someone with the skills of Robert Krulwich to report on technical measures affecting public policy, but that's where we are.

That feeds into the dichotomous debate that is dominated by the "let the measures work" and the "it's imperfect, so toss it out" arguments. As I wrote a year ago,

The difficulty in looking coldly at messy and mediocre data generally revolves around the human tendency to prefer impressions of confidence and certainty over uncertainty, even when a rational examination and background knowledge should lead one to recognize the problems in trusting a set of data. One side of that coin is an emphasis on point estimates and firmly-drawn classification lines. The other side is to decide that one should entirely ignore messy and mediocre data because of the flaws. Neither is an appropriate response to the problem.

When the rubber meets the road, you're sometimes going to get the firmly-drawn classification lines in Florida that lead people to nitpick technical details (I wonder how many of the superintendents griping this summer have bonuses tied to school grades), and you're going to get nebulous debates when systems such as IMPACT are not accompanied by technical transparency. This just doesn't work for me, and it shouldn't for you, either.

July 30, 2010

"Pushback" week

It's almost as if Nick Anderson and Ruth Marcus worked at the same paper, because "pushback" appears to be the talking point of the week on education policy. Yesterday, Anderson reports, President Obama "pushed back" against some civil-rights groups' criticism of Race to the Top, and Marcus applauded him when the president "took the opportunity to push back." Oh, wait: they do work for the same paper. Well, at least we know that at the Post, some colleagues talk with each other, unlike the one who fired Dave Weigel last month and the other who hired him this month. Then again, the fools at the Post, Inc., appear to be management and bull-male columnists, not rank-and-file reporters.

There are four major stories that dominated national education news in the past week, at least as far as I was paying attention:

  • The drama surrounding the civil-rights group report and non-presser and the two major education speeches this week by Duncan and Obama.
  • Continuing problems in trying to attach state aid to federal bills (after the emergency war appropriations, there's the inability to break the small business aid bill, which had jobs money attached).
  • Michelle Rhee's plans to fire several hundred teachers based on the IMPACT evaluation system.
  • The New York state testing cut-score embarrassment.

Pushback was used in the Post's coverage of the first story, but I think you can say it's a theme for the week. House and Senate members are now in almost open warfare over education jobs riders to bills (possibly extending to the FMAP aid to states on Medicaid, stuck in Congress since early this year). There is debate over how many teachers Rhee is firing and how bad a system IMPACT is. And Joel Klein is twisting himself in knots trying to explain how the mistakes in proficiency rates that he used to puff up his record really isn't a problem and, uh, Lady Gaga shows how good the New York City schools are. I'm half-expecting him to talk about New York's smog swampy beauty, the East River though, doesn't it split the Park Slope from the Palisades? Someone get Bill Shatner to read Joel Klein's ratiocinations!

Some things behind the headlines that seem obvious to this historian:

  • Part of the loose (and fragile) coalition criticizing the Obama administration's turnaround policy stems from unions concerned about due process for employers and community-based organizations worried about the closure of public facilities in poor neighborhoods and the role of public employment in providing a leg up to the middle class. That's not new, and it's complicated. The civil-rights group interest in public employees can be salutary (my understanding is that Black teachers were a solid core of local NAACP chapters in the mid-20th century) but sometimes at cross-purposes with other interests: I heard informally from some observers that part of the pushback against the decentralization of Chicago schools in the late 1980s was the role of the central school bureaucracy in providing a leg up into the middle class, and the reduction of the central bureaucracy threatened those positions. Today, the invisible risk is the position of minority teachers' aides and other non-certified employees. My guess is that they've been disproportionately affected by school-system layoffs that try to hold onto classroom teachers.
  • I still don't have a clue how much test scores played a role in the firing of DC teachers, and my guess is that you don't, either. IMPACT included test scores, but you'd have to look at the details of individual employees to know whether an individual firing is a case where all the indicators (including the required five observations) pointed in the direction of an incompetent teacher or whether test scores trumped supervisory judgment for any. Normally employers have broad discretion in evaluation systems, but the failure to bargain IMPACT may put the DCPS in some jeopardy of an unfair labor practice finding. (That depends on both the structure of DC collective-bargaining law and the details of what happened with IMPACT and WTU's requests for bargaining.) Double jeopardy for Michelle Rhee: the inclusion of the pseudoscientific "learning styles" in the IMPACT observation system. My guess is that the AFT (the national affiliate for the Washington Teachers Union) can quickly get their hands on well-known psychologists to rip that to shreds for any teachers where the tipping factor was a supervisor's judgment that they didn't cater to student "learning styles."
  • Joel Klein's dancing around the cut-score fiasco in New York illustrates once again that the performative setting of cut scores is often a result of the tension between bravado and "reform testosterone," on the one hand, and politically acceptable failure and the political need to game the system, on the other. We'd like to think that cut-score setting is arbitrary in the sense of arbitration, but it's too often arbitrary in the sense of caprice and politics. Two years ago, Jennifer Jennings and I wrote a commentary for Teachers College Record ($$ required) about the dangers of trusting threshold-based proficiency percentages as opposed to central tendencies such as means and medians, with New York City as the object lesson. She's too mature for this, but I have no such reticence with the last week's revelations: nyah nyah nyah, we told you so. And from those of us who warned years ago about the fragility of growth/value-added statistics? same message.

Bottom line here for administrators: test-based measures should only be used as a case to fire teachers or administrators where they strongly point in the same direction as observation-based evaluation instruments that are developed with some common sense, with unions and excising crap such as learning styles.

July 26, 2010

"Opportunity to learn" revived?

As Ed Week's Michele McNeil is reporting, a coalition of civil rights groups has issued a white paper today through a (new?) organization, the National Opportunity to Learn Campaign. Last night, Diane Ravitch was tweeting her reading of the paper as a gentle but firm rebuke of the Obama administration's approach to accountability. To some extent, I think she's right: the 17-page report briefly referred to the inappropriateness of judging schools and teachers primarily by test scores, but that was a brief reference.

For the longer and more committed passage criticizing policy prejudices towards school closures, I read the argument differently, because of the other arguments in the paper in favor of more money for early childhood education, wraparound care programs, and NCLB's public-school choice provisions and against budget cuts. And then there's the name that's a throwback to early-90s arguments in favor of opportunity to learn standards. To me, that all looks like a straightforward community-civil-rights approach more than an argument against high-stakes testing. In that context, the argument against school closure is an argument against withdrawing resources from a community institution that may be one of the few public facilities in a poor neighborhood.

That also fits with how the coalition's paper addresses Race to the Top: don't withhold resources or programs from poor children. Instead, combine formula grants with conditions. Notably, the paper states that a limited competition is acceptable, suggesting that the constituent organizations would not directly oppose Race to the Top as long as its structure does not permanently replace formula grants in ESEA. I know what others are going to say in response: we have plenty of conditions on federal funding, but the federal government almost never penalizes states for falling down on the job.

To a great extent, the politics of and posturing around education reform are all depressing to me: education reform policies are dwarfed by the state of the country's economy right now. In fact, that's a crucial part of the argument of the Broader, Bolder Approach. So you should maybe focus your efforts on the national economy right now? Or if not the national economy, maybe focusing on states, where the real action is going to happen over the next few years?

I think the coalition is moving about 15 months too late, if the key movers intended to shape federal policy. It's very likely that there won't be more RTTT, there won't be ESEA reauthorization, and there won't be a heck of a lot of things that should be happening from the perspectives of a variety of people on different sides of this debate. I wish I had been been wrong a month ago, but it looks more and more that I was right in predicting that David Obey's gambit last month was a stupid gamble instead. I was wrong in guessing that Obey would be frustrating George Miller, but I think I'm right on the general picture. To be clear, it's far from the biggest SNAFU of the Congressional session: that's the too-small size of the stimulus in early 2009 and the failure of the White House to nominate (or recess-appoint) enough Fed governors. But I'm still depressed, and puzzled by the strategic choices.

(One final puzzle is the group's website. The contact information is for the Schott Foundation in Massachusetts, which is consistent with the few blog entries (written by Michael Holzman) and the press-kit stuff. But there are no staff members or individuals listed on the website, just organizations. The whois entry for otlcampaign.org shows that the domain name has existed since sometime in 2009, but it's registered through a proxy, and the Internet Archive has no history of the website (blocked at the site). This is all perfectly legal, but it's odd.)  

July 24, 2010

Firings in DC

Andy Rotherham is correct that the termination notices in the DC public schools this week included about a third of the total who had not met licensure standards, and a greater number were rated in the highest classification in the annual evaluations. Nonetheless, what is newsworthy about the terminations is the public nature of outright firing of a chunk of teachers for nonperformance. It wasn't the firing of a third of the district teachers, but significantly less than 10%. Let's assume a similar number of those given notice of "underperformance" this year either quit or are fired next year. That would be the firing of around 13-16% of the teachers for nonperformance in two years. It's noticeable.

By itself, the number is neither good nor bad, though many will argue the point either way without additional information. I say we wait. First, we wait for the Washington Teachers Union to sort through the information to see if any teachers were fired without the five classroom observations required for the evaluations. The grievance mechanism that exists in the union contract is on procedural grounds, and here we'll see how careful Rhee's bureaucrats have been. Then, we wait to see if there are any examples of firings that don't meet a basic smell test--anyone who had won teaching awards and plaudits but were given low ratings for reasons of favoritism or obviously inappropriate application of student test scores. Either procedural errors or plausible miscarriages of justice are reasonable grounds on which the union will fight for members and has an ethical obligation.

Nor is that willingness to fight for individual members inconsistent with a union's willingness to try different methods of evaluation. My chapter can and does file grievances when we think an individual's procedural rights were violated in the tenure review process. That says nothing about the standards of review. It says that we'll fight for the integrity of the review process.

July 16, 2010

Gates in Tampa ... no, my daughter's school!

Two chances in one week to provide personal perspective on Gates' philanthropy. Along with a few thousand other AFT delegates, I saw Gates's speech last Saturday. Today's comment comes via the Business Week article on the Gates Foundation's education program. The article is one of the better journalistic portraits of the foundation, including historical perspective by Maris Vinovskis and some technical perspectives from Howard Wainer and Daniel Koretz. And then in the second half, the article quotes some teachers such as JoAnn Parrino and Kathy Jones. I expected the article to quote either Hillsborough superintendent MaryEllen Elia or Hillsborough Classroom Teachers Association president Jean Clements, and then suddenly the focus was on some teachers at Chamberlain High School, where my daughter graduated in the spring. Yes, she had both Parrino and Jones, as well as a few others mentioned indirectly in the article as Daniel Golden followed Hillsborough's Gates project staff into a teacher meeting at the high school.


Both teach AP social studies courses, Parrino with human geography (taken by ninth graders in Chamberlain) and economics (I forget whether it's micro or macro). Jones teachers the world and European history classes. Both have their student admirers within the school. In the article, Parrino is quoted in favor of random classroom visits, and Jones on a different topic, whether there is such a thing as a year-over-year growth measure when the class is a one-year class such as a topical social studies class. And the music teachers apparently scoffed at the notion that their competence can be measured by student performance on an end-of-semester music theory class. Most of the teachers I've met at the school are reasonably thoughtful at the least, and the article begins to touch on their perspectives and skepticism.

What is notable is that none of the discussion Golden reports is the type of "we can't be expected to do great things with poor kids" excuse that's the common straw-man argument by advocates of high stakes testing. Jones is right to be skeptical that there is any competent value-added measure for history, and the band and chorus teachers are absolutely correct that a music-theory class is an awful measure of their competence. Want to know what a Florida band or orchestra or chorus director pushes their students to perform in? Music Performance Assessments, or MPAs. These are juried festivals of school groups, and teachers in Hillsborough take them very seriously. To use music-theory paper exams instead of MPAs is a pedagogical crime. Do you think the Hillsborough High School band director should be judged by how well my son and his fellow sax players know a Napoleonic 6th, or how well they can blend in a performance of "Take the A Train"?

At some point, advocates of using student outcomes as part of teacher evaluation need to get some sense about implementation. Hillsborough is clunking along right now, and it'll need to adjust things on that part of the evaluation system. The rigid "everyone has to be evaluated in the same way even if it makes no sense" system is not viable in the long term. But it's what the mantra of "50% must be on student outcomes" will lead to unless Charlie Barone and others come out in favor of common sense in the use of student outcomes, and that includes telling their friends when they're wrong in a formulaic approach.

July 14, 2010

Fat tails and audit trails in Florida test scores

I'm starting the day behind on a bunch of things, thanks to a week at the AFT convention in Seattle and the beauteous handling of bad weather by Delta. I arrived in Tampa about 23 hours after leaving Seattle, and let's leave it at that.

So I'm a bit behind on the background behind the evolving controversy over test scores in Florida. NCS Pearson was way, way late on releasing scores, and part of the reason was what Florida DOE officials called glitches in the demographic files Pearson had on students, or how test scores are tied to students and then teachers.

I have a sneaking suspicion that's also behind the controversy that's developing, as first the urban and then a bunch of other system superintendents complained that the proportion of elementary students not making adequate progress year-to-year just didn't fit with any sense of reality (on the low side). Head to the St Pete Times for the published stories and blog entries, including new complaints that the organization auditing Pearson's work is a subcontractor of Pearson, but here's the reason why I suspect the demographic files are a good starting point: Florida's "growth" measure is not the mean or median growth year-over-year on some vertical scale, nor is it a regression-based measure of deviation from some version of expected growth. Instead, it is a jerry-built dichotomous variable of whether an individual student made a particular growth benchmark in a year: yes/no.

It's been a few years since I looked at the details of this "growth" definition, but there's some inherent sensitivity in any measure based on thresholds to variability around the relevant threshold. In the case of Florida's growth measure, the vulnerability is going to be less around the construction of a particular scale at a point in an individual test because the measure depends on a student's prior-year score. So a psychometric vulnerability is going to be two sources: the general characteristics of tests in two years, and the added variability that you get from comparing scores in two years (there's measurement error in both scores, and the measurement error when you compare the scores is going to be greater than the measurement error in either base year or following year).

Since the two-year-variability issue has been a fact of life for this measure for a number of years, I would be surprised if that were the issue. So then the question is whether this year's fourth- or fifth-grade reading test scores have unusual distributions that would cause interesting problems at the thresholds for "making gains" for students who were low-performing in the prior year. A particularly fat tail at the low end might cause that, but that's speculation, and I suspect an obviously fat-tailed distribution would have been picked up by the main auditor, Buros.

But you can have a non-psychometric wrench in the works, because Florida's dichotomous variable is highly sensitive to one other matter: the correct matching of student test scores from year to year. If the student data files were messed up, and student scores from 2009 were matched to the incorrect student scores from 2010, you'd have all sorts of problems with growth. I strongly suspect that's what tipped off problems with the data files earlier in the spring. If the failures were general, you'd have a skewed distribution of the dichotomous growth variable as the lowest-performing students from 2009 would be the most likely to be matched (incorrectly) to higher scores in 2010 and vice versa, so the first clue would be markedly high growth indicators for 2009's low-performing students and markedly low growth indicators for 2009's high-performing students.

But that's not what school districts are reporting: they're reporting unusually low growth proportions for low-performing students from 2009. I can think of a few different ways you'd have that after Pearson tried to correct any obvious problems it saw earlier, but that's speculation. What needs to happen is an examination of the physical artifacts from this year for a sample of schools: the booklets, the student demographic sheets, and the score sheets. We're talking about more than a million students tested, but we can start with a sample of schools that the urban-system superintendents are worried about and track the data from beginning to end with a small enough set to see exactly what happened to the satisfaction of local school officials, policymakers, and the general public.

And if Pearson destroyed all physical artifacts so you can't trace the path of data? Cue "expensive lawyer" music...

July 12, 2010

Gates speech at AFT

Originally written Saturday, July 10: I've figured out how to hang this electronic device onto the back of the chair in front of me while my old PDA foldable keyboard is synced and sitting on my lap, so I can write this blog entry in the middle of the AFT session. AFL-CIO President Richard Trumka gave a spirited speech before lunch, and then the floor approved a resolution on teacher evaluation without amendment.

This afternoon, we started with resolutions on community support and career/technical education (CTE) programs. For the most part, the resolutions this afternoon were neither going to be the controversial resolutions nor the controversial part of the afternoon session, which was Bill Gates' appearance at the convention. Very popular was a resolution urging public meetings for the national commission on fiscal responsibility and reform and giving AFT an official position in favor of progressive effective tax policy instead of Social Security benefits cuts that are regressive. As I've written before, a number of people simultaneously want policies that would end in significant layoffs of teachers over 50 and also significantly reduce pension benefits and contributions to public-employee pensions. Evidently, there is some group of self-defined reformers who are in fear that somewhere, someone is enjoying a retirement free from fear of destitution.

The Gates appearance started at 4:15. From what a colleague told me later, he helicoptered over from his island estate. Randi Weingarten at first started speaking from the sheet announcing Innovation Fund awardees and then turned to introducing Gates. She took care to quote from Gates's annual letter at points where he specified opposition to solitary use of test scores to evaluate teachers and supported evaluation as a tool to help most teachers. With a smattering of boos, Weingarten smiled and said, "I thought you guys were leaving," referring to the threats of a boycott by the small dissenting caucus By Any Means Necessary (BAMN). The majority of delegates roared. Later, there were about 25 delegates out of several thousand present who walked out as Gates stood at the podium. So much for the huge boycott of Gates's speech...

Gates started by publicly congratulating AFT for the approval of the resolution on teacher evaluation/development and on steps taken thus far, including the AFT locals who are working with the Gates Foundation on specific programs. He mixed in some misleading statements about "declining" graduation rates (as opposed to stagnation) with some fair statements and a clear statement that teachers must be included in reform. He spent a few moments discussing the failed small-schools initiative. The greatest applause lines came when Gates criticized the existing record of poor administrators' evaluations and when he acknowledged that people who have never taught in a classroom do not understand how difficult teaching can be.

The BAMN protesters then had pretty awful timing, coming back towards the hall shouting protests ... just as Gates said some teachers have challenges with students who are bored or engage in disruptive behavior. The hall erupted in laughter at the irony.

Gates's weakest argument was the individual teacher equivalent of effective-schools rhetoric: see what teachers do when students demonstrate great achievement. It's a high-risk claim, to assert that the development of a teacher evaluation system can also document which a priori behaviors are best. What may be easier is the collection of videos of different teachers, with a broad enough sample that some will turn out to be great teachers. Gates also highlighted two project districts in AFT: Hillsborough, Florida, and Pittsburgh, Pennsylvania. As is common with description of risky projects in early days, the rhetoric was a bit breathless, and I could hear a few oohs and boos in the audience when he mentioned merit pay, Race to the Top, and tying tenure to student achievement.

Gates ended with the obligatory reference to Al Shanker and the need for teacher voice in reform. "Don't give it back, take the risk, and keep it up." "No other union is doing what you are to make this [reform] happen."

Additional thoughts a few days later: Gates got some personal mileage by appearing at AFT. He spoke with a few reporters afterwards, and his appearance generated some newspaper stories at the St. Pete Times and Washington Post that were more about the Gates Foundation than the AFT convention. At AFT, I don't think delegates had their minds changed much by Gates, since they were likely to be aware of what he's done and where he agrees and disagrees with them.

Gates's rhetoric is compartmentalized. In a good part of what he said, teachers were at the center of what he describes as reform, including teacher evaluation. But then the sore-thumb statement popped out about tying due-process protections to student test scores, unmediated by professional judgment. It's as if there's a switch inside his head, where he can talk either about test scores or about better evaluation of teacher practice. Reform rhetoric as a quantum effect? I don't know. But it's poor strategizing and a poor contribution to discussion. One of the wealthiest men in the world should be able to be more sophisticated.

June 19, 2010

What uses of test scores will pass legal muster in teacher evaluations?

Legal considerations on the use of test score derived stats in teacher evaluation: Scott Bauries started an interesting discussion June 2 of value-added measures and teacher evaluations from a legal perspective. It's very important to read the comment thread, as he's challenged on his conclusions by Bruce Baker and Preston Green, especially with regard to disparate-impact claims. Bauries claims that employers need to defend the procedural due process but are probably safer on the substance, regardless of the problems with value-added measures.


Reading the main entry and discussion, I lean strongly towards' Bauries' conclusion, with one important caveat (below). My impression of the 2000 G. I. Forum v. Texas Education Agency case on the disparate impact of high-stakes graduation tests, which the state won, was that the plaintiffs were not prepared for the last burden-switching test on disparate impact. My rough impression of disparate-impact claims of illegal discrimination based on the Civil Rights Act: it's a series of penalty kicks/shots in soccer/hockey or maybe the games with alternating possession in overtime. I'm not a lawyer, and this is primarily based on my understanding of Title VI rather than Title VII law, but to the probably-inapt analogy: First, the plaintiffs try to demonstrate that a mechanism such as a test affected a property interest of the plaintiffs and had a disparate impact on one of the protected classes. If the plaintiffs succeed, the defendant tries to demonstrate that the mechanism meets an important interest, was properly constructed and applied, and members of the affected class had a fair chance at succeeding in the mechanism.

So far, we're describing lots of situations that have evolved in the past 25-30 years, especially with high stakes testing. Debra P. v. Turlington established the basic federal expectations in terms of student tests, and as a number of states created a new round of graduation tests in the 1990s, they relied on Debra P. v. Turlington as a guide to meeting the basic questions and getting to the final round all tied up. And this sort of makes sense if you think about the maturity of various mechanisms: you can argue that there is a rational state interest in a certain outcome (an adequate measure of achievement in the case of graduation requirements), and then satisfying the "fair chance at succeeding" is often a question of satisfying a set of criteria rather than perfection and that's often a reflection of the organization's experience and capacity.

The final test is whether there is a better option: could the defendant have feasibly chosen an alternative mechanism that satisfies the same interest with less impact. I've never read all of the materials in the G.I. Forum case, but the following is a key passage in Judge Prado's ruling:

The Plaintiffs were able to show that the policies are debated and debatable among learned people. The Plaintiffs demonstrated that the policies have had an initial and substantial adverse impact on minority students. The Plaintiffs demonstrated that the policies are not perfect. However, the Plaintiffs failed to prove that the policies are unconstitutional, that the adverse impact is avoidable or more significant than the concomitant positive impact, or that other approaches would meet the State's articulated legitimate goals. In the absence of such proof, the State must be allowed to design an educational system that it believes best meets the need of its citizens. (emphasis added)

In the end, the plaintiffs' lawyers in the Texas case were unable to provide a clear alternative to high-stakes testing that they could demonstrate was both feasible (i.e., wouldn't cost an arm and a leg) and would have a lower disparate impact. I'm not too worried about the state interest, since you can usually construct alternative mechanisms that have facial validity and that have roughly the same "noise" as whatever you're arguing against. And the not-an-arm-and-a-leg criteria is tougher to meet if you're arguing for portfolios, since it increases the cost... but it starts from a relatively low base of cost per-pupil. Ultimately, though, it is hard to argue that a prospective alternative would result in a lower disparate impact if it is only prospective and thus you have no evidence whether the protected class you're worrying about would be helped by the alternative.

So in the discussion over at EdJurist, Bauries's clinching argument is really that for all their flaws, value-added measures are going to look reasonable to a judge in that they try to adjust for incoming achievement of students and plaintiffs will have to put forward an alternative with concrete evidence that the alternative does a demonstrably better job at treating teachers fairly. The catch-22: without a working model of alternatives with that record, plaintiffs are going to be sunk on disparate-impact claims.

Bruce Baker has followed up on Bauries with a set of tongue-in-cheek impossible criteria to make the use of value-added measures reasonably fair. I understand the temptation, but he's onto one thing: ultimately, local K-12 unions will have to figure out how to respond. This will include whether they have separate evaluation procedures for the 20% of teachers for whom value-added measures are even possible, how to mix the data, and so forth.

And now for the caveat: a good part of the legal consequences of using student test scores for personnel decisions will depend on how stupid local administrators are in the first jurisdictions to use them, and the first that are challenged. I can imagine districts where administrators are careful to fire experienced teachers only where there is a record of several years of low statistical measures of student achievement and only where that is consistent with low marks in other areas, such as administrator and peer observations. I can also imagine districts where administrators purge teachers based on a single year's worth of data and with no checks of consistency with other sources of information. If the legal tests are in jurisdictions with the first set of practices, they're far more likely to pass muster than if the first cases are for terminations that don't meet a basic smell test of rationality.

June 9, 2010

Get your performance-pay evaluation report bingo cards here

So another few evaluation reports have been released with little evidence of student achievement flowing from performance-pay systems. This is going to sound like a broken record from me, but I don't make too much out of one or two studies in policy research. These studies on systems in Chicago and New York confirm something any historian (or anyone who's read education historians) could have predicted: even if there is some benefit from changing a pay system, it's a darned hard thing to try. This is one of the reasons why I dislike the boutique, closed evaluation tradition in education research: every evaluation collects data, walls it off, and then presents (only) conclusions to the public. When there are millions of dollars being spent through the Teacher Incentive Fund in addition to privately-funded efforts (or any program with an interesting but untested theory of action), there have to be data archives so that other researchers (those not on the original evaluation team) can conduct secondary analyses.

But having put forward these caveats, I'm going to guess that most studies of performance pay are going to show negligible effects on test scores. This may be my inner cynic (okay, not very inner), but the long-term questions on performance-pay policies revolve less around whether it is consistent with the theory of action proponents have but focus instead on whether the politics demand something regardless of effects and what is workable from a variety of standpoints.

June 4, 2010

More on so-called "side deals"

Andy Rotherham has responded to my blog entry early this morning. Let me skip for now the question of why he was the sole person quoted in the article and address the local MOUs in Florida on Race to the Top. Rotherham wrote in part, "If these agreements have no bearing on the state's application or implementation then why go through the laborious exercise of crafting them[?]" I wasn't in the room for any of these, but having observed Florida schools for almost 15 years, I can imagine a number of reasons, including distrust by some party in a county in the judgment of FEA President Andy Ford and other participants in the task force about the clause excluding non-mandatory subjects of bargaining from impasse. I said as much in my prior post. I'm not a labor lawyer, and neither is Andy Rotherham, but I do know something about the dynamics within FEA, where there is often a healthy internal debate. The argument that local MOUs are an inherent evasion of the grant is something that requires examination of the actual language at issue.


Now, to my comment about Rotherham's being used as the sole source for Wednesday's story in the St. Petersburg Times. Why did it seem curious to me? Partly it's a matter of sensitivity to these issues on a number of fronts. For more than a year, Rick Hess has been pointing out the potential for all sorts of perception problems with a competitive process that's the result of (enormous) discretionary authority. In April, Liam Goldrick noted that the New Teacher Project was simultaneously advising several states on RTTT and then commenting on the process (something he thought was unwise from an organizational standpoint).

So when I read a story with a single source commenting on the issue, where the source may have had business interests at stake and where the disclosure of that in the story was vague, and then the term "some say" with only that source as documentation, it looked odd. It was a good journalist quoting someone who is open about disclosure in every direct piece of writing of his I've read. And it wasn't the larger point. But I don't remember anyone correcting the impression earlier in the spring that TNTP had been a disinterested observer. But I could also have thought of a number of reasons why there weren't more people quoted or more disclosure about Wetherbell (Rotherham's firm): maybe Matus got the information late in the day and couldn't reach more than Rotherham; maybe the material was in the submitted story and an editor chopped out additional disclosure; maybe Rotherham didn't have access to the text of the local MOUs or didn't have a copy of the state MOU and was relying on Matus's over-the-phone description of "hey, this looks like it could be different, especially in Hernando." But on the other hand (I think we're on my fourth hand here), the failure to correct the stuff on TNTP is a lapse for education journalism more generally, and it has to stop. So I decided to note what I had observed, call it minor compared with the other issues, and go on. If it looks like I'm being hard on the Times, it's because this local newspaper is one of the top papers in the country on education, and I think I can expect great reporting. But this is a minor error, I meant the observation as such, and I explicitly said so. If I were going to point out that the alleged transparency problem with Florida's application is a distraction until better researched, maybe I need to be consistent and explicitly say my concerns along parallel lines were less important, nu?

My central point was that absent some more solid analysis of what the local MOUs actually meant and whether they conflicted with the application packet, the larger issues were not about procedure but the sustainability of whatever happened with RTTT, assuming it was beneficial. Here's what I wrote this morning:

It's a legitimate question to ask what the right balance is in collective bargaining on the scope of bargaining, on the relative power of the parties, and on state law that can essentially dictate terms and conditions of employment outside bargaining.... It's also a legitimate question to ask about the commitment of parties to reform after the money runs out.... Those issues are still out there, and they're out there whether or not a particular state has an MOU like Florida's.

And here's what Andy Rotherham wrote:

[T]here are two big outstanding questions on RTT that we won't know the answers to for several years: First, how durable will the policy changes be?  Will states relax things when the money is gone and/or will "loser" states undo the reforms they put in place in an effort to win? Prize theory is built on the idea that the progress generated in an effort to win is built upon.  That idea has not been fully tested yet in the public/political sphere.

Surprise: we agree on the importance of that question! No, it's not really a surprise. It's just that a lot of electrons have lost their lives this week in what thus far looks like a nonstory.

Correcting the facts on so-called RTTT "side deals"

In Wednesday's paper, St. Petersburg Times reporter Ron Matus relied on the sloppy language "some say" to spin a mountain out of a molehill about county-specific MOUs between school boards and local FEA affiliates. In the article as well as two blog entries Wednesday and Thursday, Matus stated that there were a number of counties with local memoranda of understanding (or MOUs) and quoted one individual who said that the existence of what Matus called "side deals" might be a problem for Florida's application. Matus stated that his source Andy Rotherham had helped other states with RTTT applications, but the article did not state whether that was in the context of consultancy contracts (i.e., whether Rotherham's new organization had its reputation and business at stake in competition with Florida's RTTT application).

The omission of any mention of Rotherham's business concern is minor compared with the failure of the Times to look at the content of the side deals and see whether they modified the obligations of local parties vis-a-vis the state Memorandum of Understanding that most districts and unions signed across Florida. Since Ed Week has gotten into the story, albeit without quoting Rotherham, it's important to look at the facts.


First, the issue of impasse. Language from the state MOU (part of the RTTT application):

Only the elements of this MOU which are contained in existing law are subject to the provisions of section 447.403, Florida Statutes. (p. 3)

Explanation: F.S. 447.403 is the part of Florida's public-employee collective-bargaining law that covers impasse. In other words, if there's a part of the MOU that is not already a term and condition of employment under Florida law, it's not susceptible to the impasse procedure. That point is clarified in the attachment the Times education blog noted was part of many counties' documentation.

The Broward side agreement in its entirety:

Any items relating to the RTTT Application or Plan that are unsuccessfully negotiated between the parties specifically for the purpose of applying for or receiving the RTTT grant award will not be subject to the impasse procedures set forth in Chapter 447.

Can someone explain to me why this is any different from the state MOU? But there's a second issue that Ed Week's Michele McNeil discussed: "these side deals also say that any changes successfully negotiated because of Race to the Top will expire once the funding does," and refers to the Hernando County MOU. But on p. 4 of the state MOU, Part IV explicitly states that the state MOU expires "upon the expiration of the grant project period, or upon mutual agreement of the parties, whichever occurs first."

The question one might logically ask is why some counties and locals felt they needed extra language. The FEA had a long weekend discussion with state leaders and local leaders about the state RTTT application this time around, and from news coverage it looks like FEA President Andy Ford was strongly encouraging locals to sign on. The reason why was pretty clear: he had had a seat at the table in the task force Charlie Crist set up in the week after Crist vetoed SB 6. When you've had a hand in crafting changes, you've got a stake in success. In addition, the additional language had taken care of one of the legal concerns of FEA bargaining-support staff, because the MOU from the first application in Florida looked like it might give school boards the ability to impose contracts on matters beyond what is currently in state bargaining law. Unlike in many northern states, Florida school boards have the authority to impose contract terms under impasse for a the duration of a fiscal year, but only on mandatory terms and conditions of employment as defined in Florida law.

In January, FEA had cautioned locals not to sign the MOU, and it crafted language for the few locals who wanted to sign (including one large county, Hillsborough). The language FEA crafted for the locals in the first RTTT round? It specifically exempted issues from impasse when the issues were not already in state law. (I don't have the exact wording in front of me, but I am sure an intrepid reporter could ask the Hillsborough press rep for it.) In that case, it's clear that the local MOUs created legal conditions different from what would have been the case with a signed state MOU and no local MOU. So when similar language appeared in the state language, why did some local teachers unions sign essentially redundant local MOUs? Let's just say a generous level of suspicion about the process.

The greatest problem with this coverage of the county-specific MOUs is that it's a distraction from serious issues of reform implementation with RTTT. The issues Matus and McNeil have raised in the context of local MOUs exist with the state MOU. But instead of focusing on the substance, the reporters are focusing on the process issue. It's a legitimate question to ask what the right balance is in collective bargaining on the scope of bargaining, on the relative power of the parties, and on state law that can essentially dictate terms and conditions of employment outside bargaining. In a state like New York, bargaining authority leans more towards unions than in Florida, and likewise state law. Northern states are the ones to have seniority-preference laws that trump the bargaining process, and Florida has had several statutes trying to mandate all sorts of things unions would be very unlikely to agree to in local collective bargaining.

It's also a legitimate question to ask about the commitment of parties to reform after the money runs out. That is one of the critical questions with the DC teachers contract: what happens if/when the billionaires pull out? The billionaires' support of DC along with RTTT present a theory of action all about inertia: if we can just budge districts away from current practices, we'll accomplish long-term structural changes. In contrast, Denver's ProComp was in the context of a permanent funding stream and a political deal with voters: give us the money permanently, and there is a permanent change in compensation practices.

Those issues are still out there, and they're out there whether or not a particular state has an MOU like Florida's. I understand why this reporting on process exists, especially in a rush to print news, but am disappointed that two good reporters have perseverated on an apparent process issue without checking the details of their assumptions (i.e., whether the local so-called "side deals" are substantively different from the state MOU).

Disclosure: I am a former member of the FEA governing board. (I have not corresponded with elected FEA leaders about the reporting on this story, but I want to be open about my former position within my state affiliate.)

Update (9 am EDT): After I posted this earlier in the morning, Valerie Strauss published an entry on the topic in the Washington Post blog she writes, largely repeating what the Times had said. I also corresponded in the last hour with one reporter interested in the story, and one of the empirical questions is whether the local MOUs in Florida are more like Broward (short and redundant) or more like Hernando (which was much longer and with elements McNeil discussed in her blog entry yesterday afternoon). There's also a broader question about state administrative authority. Suppose a superintendent of a district in any state receiving RTTT funds decides she or he isn't going to follow one of the requirements. She or he just didn't put it in writing. Does the state's obligation to eliminate participation and cut off funding for that district change? As I said earlier this morning, the broader and more interesting questions are not really about local MOUs.

May 2, 2010

The theater of basing a majority of evaluation on test scores

Now that SB 6 is dead, that a governor's task force on RTTT came to a compromise in a single day, and it looks like there is some direction for teacher evaluation in Florida that's acceptable to Florida's K-12 teachers unions, it's time to take stock of the rhetorical stance SB 6 supporters had that a "majority" of a teacher's evaluation had to depend on student test scores. I've seen this pop up in other states, so it's a common rhetorical stance. Let's get a few things off the table first: this is not based on any research, and the supporters have no clearer idea of what "majority of a teacher's evaluation" might mean than supporters of the "65% solution" had any clue what spending money in a classroom meant. For that matter, neither did I as a skeptic (about either proposal).

So the "majority on test scores" stance is political, then. That's fine as a minimal statement; almost all decisions about pay structures are political in a broad sense rather than based on research, and to some extent they're reactive. Teacher pay scales became standardized to protect bureaucratic structures from (and sometimes in response to) accusations of corruption, and the single salary schedule is a response historically to gross pay inequity.

I'll go further: I don't think there's a way to avoid political values embedded in pay structures. Once you involve public money and a service most people connect with citizenship (education), you've got politics, however well structured and justified by reference to neutral statements of organizational need. On that level, performance pay is justifiable from the sense of satisfying public perceptions about how teachers should be paid. That was explicit in Denver's ProComp plan: the voters approved higher taxes in return for a performance-pay structure.

The problem with the "majority based on test score" position is twofold. One is the obvious one: it's divisive, and many parents and other community members are offended by the idea. Here, Diane Ravitch spoke for millions when she criticized SB 6. But there's another problem: it obscures the evaluation process rather than clarifying it. By reference to an implied point-based system, it fails to focus on what matters in a teacher evaluation system in terms of either an algorithm or underlying concepts.

I've written a bit about point-based systems, and because the focus of my paper was elsewhere, I didn't have a chance to talk about the limit of point-based scoring systems: it matters not where you can earn points but where you might lose points. I learned this in high school when I was a debater: individual raters have an implied comfortable range for scores, and it's the range of scores that matters, not the total number of points available in different categories. If raters have different effective ceilings as well as ranges (i.e., it is impossible for people to earn perfect scores with some raters, while others commonly hand out full marks), then the raters with the largest ranges of scores exert more power over final results than raters who have a very narrow range.

Similarly, components of any point-based system will have differential impact on final results when they have broader ranges in practice regardless of the proportion of the scale that derives from individual components. Imagine a teacher evaluation system with 100 points. Suppose 60 points comes from student test scores, and the range is restricted for most teachers to between 52 and 60 points. On the other hand, suppose 30 points in this hypothetical evaluation comes from direct observation, and the range of scores is between 10 and 30 (and more than a handful of teachers may earn the low score). Which component has the greatest influence on final results? It's the 30-point direct-observation component in this thought experiment, because in this hypothetical example teachers can lose more than twice the number of points there than through student test scores.

But the "majority of evaluation" rhetoric does more than obscure the real power in point-based systems: it obscures the question of what teachers are responsible for. "Outcomes!" says the supporter. Right, I say: that doesn't say a darned thing about the types of outcomes that will make the difference in evaluation. In Florida, Louisiana, and other states where people have pushed a majority from test scores approach, the push has been to create a mandate and defer the implementation to a regulatory process. That's a nice illusionist's trick if you can get away with it, but the process of implementation always mediates absolutist mandates, and then the legislature is giving up what mediates the test scores.

There are three ways I can see that test scores' impact on evaluations would be mediated in any system (and yes, I'm including SB 6 here): ad hoc (i.e., caprice), by reference to student disadvantage (i.e., blame-shifting), or by reference to teacher behaviors in classrooms (i.e., standards of practice). Without any legislative guidance, ad hoc and capricious mediation is likely (probably by the temperament and philosophy of the administrator with the greatest authority over evaluation). More destructive than ad hoc mediation would be blame-shifting: a teacher would be held blameless if someone else/something else (poverty, language, presumed parental neglect, etc.) can be blamed instead. Bad, bad idea.

Of the three options that come to mind tonight, mediating test scores by professional standards of practice seems the most productive. But then that raises the central question: if the use of test scores is inevitably subject to mediation, and the best choice for that mediation is through professional standards of practice, why not base evaluation on professional standards of practice to begin with--for example, to let an evaluation that documents effective practice create the rebuttal presumption of effectiveness?

The answer here is two-fold: one is that there is no agreed-upon standards of practice for teaching more generally, other than by crude and obvious standards (don't beat your students) or by reference to effects (keep your students' attention). The other explanation is that even if there were agreed-upon standards of practice, the process would be sufficiently messy as to irritate the sensibilities of those who advocate the putatively cleaner "majority from test score" approach.

The result is that instead of getting a messy but constructive system based on developing standards of practice, any such system that putatively bases the majority of a teacher's evaluation on test score is going to get ad hoc or blame-shifting mediation through the back door.

Update: Linda Perlstein noticed the 50% rhetoric and should get credit for the pattern recognition. Consultants' advice? Hmmn... looks like an interlocking-directorate phenomenon (no conspiracy needed).

April 22, 2010

Dorn reviews Ravitch

My review of Diane Ravitch's new book is now up at the Education Review website. I should have finished it a few weeks ago, but the fragmentation of my time this spring has interrupted all sorts of usually-short-term projects, such as book reviews.

If there is one benefit to the delay, it was my ability to watch the sales keep racking up while the book climbed several bestseller lists. At one level, I think, "I wish my book on the topic had sold a tenth as many copies!" But that's silly; I'm glad someone was able to meet the clear need for this book in a way that's been rewarded.

Bottom line of the review: read the book. In writing the review, I made the choice to skip much of the contemporary discussions around the book and focus on Ravitch's historical arguments. As usual (with Ravitch), she writes a highly appealing argument, and it's important to look at the claims dispassionately. I should say that I dearly wish she were correct in her claim that Lynne Cheney's attack on the voluntary national history standards in the 1990s was a primary cause of mediocre curriculum standards and our current policy obsession with high stakes testing. At the time (as a new scholar in the field) I was very upset with Cheney's distortions of the record, and at one level it is attractive to see her in the villain's role. But I think it's more complicated.

April 15, 2010

Misinterpretations of Crist's veto, and where to go next

I suspect that a number of observers will spin Charlie Crist's veto of Senate Bill 6 to the point where the representation doesn't come close to reality. By a quirk of timing, I was in Tallahassee today talking with legislators and staffers in the morning. In other words, I was at Ground Veto. Yep: I came, and Charlie caved. No, that would be a post hoc fallacy, even if his veto message used the same word (overreach) that I used to describe the bill. Wait: he used a hyphen (over-reach). Or maybe I don't own the term, and the idea had been floating around the state for the last few weeks, including in newspaper editorials, and it was one of the options available for a governor vetoing the bill. So I can't claim credit as being the person who killed the bill, though I was one of thousands who contacted Crist in the last week.

In the meantime people are spinning this as the Event that Destroyed Florida Education, or the Victory of the Union(s), or the Resuscitation of Crist's Senate Campaign. Maybe one or all of those labels is true, but I doubt more than one is. (To calculate the probabilities, we need to use quantum spin dynamics, a new field that melds political science with nuclear physics.) Whoa, friends, and maybe you should take a step back. Here are the reasons why Crist vetoed the bill:

  • Thousands of Floridians from both major parties contacted Crist to urge a veto.
  • His sisters who teach probably told him they hated the bill.
  • The Republican legislators and former Governor Bush who were pushing the bill had largely sided against him in the primary against Marco Rubio.
  • Crist prefers consensual processes.

Crist's veto kills this particular bill, in this form. It does not signal a victory of teachers unions over performance pay, and it does not mean that the Florida Education Association will oppose either performance pay or alternations in the process leading to due-process protections. In fact, if you're on Facebook and "friends" with Andy Ford (he's a nice guy, and the ironic quotation marks are about FB, not Andy), go ahead and see what anti-SB 6 groups he joined... and which he didn't. If you're a reporter, go ahead and talk with Commissioner Smith and ask him to repeat the first thing Ford said at discussions about Race to the Top.

Where do we go from here? It depends largely on whether the FEA executive cabinet will support Andy Ford in negotiating with other stakeholders and politicians, on what the administrator and school board associations push for, and whether the business groups or the Republican sponsors of SB 6 are willing to negotiate in good faith. Here are some obvious questions that don't correspond with any hypothesized litmus tests:

  • Can the key parties agree that a performance-pay framework can exist?
  • Can the parties agree that a performance-pay framework cannot force budget cuts to current operations?
  • Can the parties agree on a performance-pay framework that addresses student outcomes on a "pass a smell test" basis but does not depend on blue-sky assumptions about assessment for students with disabilities, English language learners, and every subject in the curriculum?
  • Can the parties agree that teachers should not automatically receive continuing-contract status (with due process protections) without a more serious evaluation than usually exists (i.e., by default after three years regardless of the scope of evaluation)?
  • Can the parties agree on the scope of personnel contracts that can be negotiated at the local level?
  • Can the parties agree on what due process protections are workable for experienced teachers who have demonstrated effectiveness in the classroom?
  • Can the parties agree on what must be part of teacher evaluations and the range of options for those evaluations?
  • Can the parties agree on what constitutes a proof of concept for their pet ideas?

Disclosure: I am a 14-year member of the United Faculty of Florida and thus a member of FEA. I am firmly convinced that if you are a Florida teacher and want a future with no performance pay, and if you somehow persuade your local and state leaders to agree with you, you will be at the policy table... as the meal. I am equally convinced that if you are Jeb Bush or one of his close friends and want a future with no job security for teachers beyond a single year, you will succeed... in turning a great number of people who would otherwise agree with you into political enemies. And if you think that there can either be a future in state education policy with no high-stakes tests or a future in state education policy where there is a quantified high-stakes test for every subject and grade level... well, I'm not legally licensed to give my opinion of that response.

In other words, many of the questions above have yes as an answer, but only if people who would otherwise hold extreme positions are willing to work on problems rather than positions.

April 1, 2010

Hilda Turner and why teachers are skeptical of John Thrasher's motives

In Tampa, there is a five-year-old elementary school named after the late Hilda Turner. The students attending Turner Elementary may not know why it's named after her, or who she was. Most legislators in the capitol probably don't know about her case against the all-white Hillsborough school board in the early 1940s and why the long history of politicized teacher evaluations give Florida teachers reasons to believe that Senator John Thrasher's bill is an attack on them.

But my friend and colleague Barbara Shircliffe knows, and she reminded me of the case today. She published a history of Tampa's desegregation case a few years ago (The Best of That World), and she's currently researching the history of teacher desegregation in the South. In the early 1940s, teachers across the South faced a split between what the federal courts had decreed and what the reality on the ground was. In 1940, Melvin Alston had won a lawsuit against the Norfolk, Virginia, schools for having separate salary schedules for white and black teachers, because the (federal 4th Circuit) court had ruled that unequal salaries were wrong. (In the decision linked above is the salary schedule that shows high school teachers were paid more than elementary teachers, men in high schools were paid more than women teaching in high school, and white teachers were paid more than black teachers.)

But most school systems didn't change anything until they were sued, and it took quite a spine for a teacher to take on her or his employer. Maybe the teaching shortage of WW2 made a difference. Certainly the fact that black soldiers were bleeding for their country played a role in growing militance (including the "Double V" campaign of the Pittsburgh Courier). Or maybe this sham of an evaluation for Hilda Turner in 1942 kicked her into action (Turner v. Board of Public Instruction, reference exhibit 3). The case quickly became messy and ugly, and I'm going to leave the story of that for my colleague's next book. But this wasn't isolated. Black teachers in Florida were treated unfairly and unequally for decades, often by their white colleagues. It probably wasn't until the mid- and late-1960s that teachers of all races in Florida started working together to address teaching conditions in the schools.

Nor were the types of spurious judgments in that evaluation uncommon. The fact that an annual evaluation was one of the lawsuit exhibits may be a legal quirk (since it was damning evidence of how the system treated black educators). But it also illustrates the controlling way that systems treated all teachers, and that continued for decades. In the 1950s and 1960s, they were subject to attacks by the state's anticommunist legislative committee, run by Horace Johns, which eventually turned to outing gay teachers. (If I remember correctly, current U.S. Rep. Bill Young was a member of that committee when he was a state legislator starting out in politics.) Teachers in general were attacked in 1968 for striking, but gay teachers were the target of another attack in the 1970s by Anita Bryant. In the following decade the state imposed a generic evaluation instrument (the Florida Performance Measurement System), designed before the recognition that there was subject-specific expertise in teaching. And all of that came before the Sunshine State Standards in the mid-1990s, Jeb Bush's A+ accountability program, vouchers, No Child Left Behind, the Bush Recession of 2008, and finally John Thrasher's bill. I can point to a number of events or policies that supported teachers, but the background has always been a recent history of blaming and judging teachers.

Because there has never been a sufficiently well-grounded system of teacher evaluation, the experience of teachers on the ground has been ineffective, useless evaluations... or worse. And what teachers see in Senator Thrasher's bill is the "worse" category. Combined with the elimination of tenure (a topic for another entry), the mandate of a formulaic approach to teacher evaluation is too much for many teachers to swallow. This is not the result of hyperbole on the part of the Florida Education Association. This is the result of Florida's history of education.

(For more on the local context of Turner's actions, see Doris Weatherford's history of women in Tampa, pp. 287-288.)

March 30, 2010

Race to the Top winners and losers

So officially, Delaware and Tennessee won (note, Andy Smarick: I spelled both states and your name correctly). But in the side competition (including brackets and sidebar bets), who won and lost?

Those who predicted political decision-making were wrong. I know Mike Petrilli has wondered if politics has intervened in the reviewing process (and thought the secrecy of reviewer identity was political suicide). When New York, Ohio, and Illinois are frozen out, it's hard to spin the choice of Delaware and Tennessee as political (though Petrilli takes a half-hearted stab at it). Addendum: Rick Hess takes a firmer stab at it, though I think you could take any possible RttT awardee list and fabricate a post hoc "this was all politics" explanation.

Those who predicted a "low bar" in getting money were wrong. In the end, when Arne Duncan said USDOE would give the money to a small number of states, he meant it.

Those who predicted "reforminess" as the secret criterion were wrong. All the cool kids were assuming Florida and Louisiana would win because, well, they're the fair-haired boys this year. Wrong! While stakeholder buy-in (or the lack thereof by Florida's unions) was part of the reason for Florida's four-place finish, there were other ways Florida's application lost points, and Michelle Rhee's application for DC fell at the bottom of the Tweet 16.

Here's who won in the side competition: the reviewers. At least at first reading, the reviewers' comments on Florida's application were serious in comparing the application to the scoring guidelines. I'm sure you can quibble with scores here and there, but I think any sane journal editor might be tempted to kill to have this quality of effort from manuscript referees.

Especially in Florida, there's a great deal of second-guessing and spinning after the announcement of results. I'm tempted to pitch in, but I'll decline, at least for today.

March 25, 2010

In better news, bipartisan bill passes Florida Senate reforming high school testing

In addition to Senate Bill 6, the Senate also passed an amended form of Senate Bill 4, which moves the state's high school testing program away from comprehensive exams in 10th and 11th grade and towards end-of-course (EOC) exams. Senators from both parties finally "get it" that the so-called comprehensive science exam was counterproductive, and a well-implemented EOC exam system is significantly better than the one-size-fits-none eleventh-grade test. But that doesn't mean the bill is perfect: FSU physics professor Paul Cottle has been diligent in explaining his concerns with dilatory clauses placed in the bill that eliminate any deadlines for physical-science exams.

It's important to keep in mind that only part of the purpose of these exams is to encourage students to go into STEM fields, though it's important to raise the floor of science courses students take in part to reduce inequalities in access to lab-based courses. The purpose of pushing all students to take more math and science courses is because they are going to be adults when they leave, citizens who vote on issues where they should be informed. I want elementary-school teachers to have stronger math and science backgrounds, and so should you. I'd like someone in charge of a venture fund or pension fund to be able to recognize fraudulent science claims without wasting other people's money. And when my oldest nephew finishes his graduate program in astrophysics, I want a ready source of groupies fanatics educated readers willing to pay oodles of money for listen to him to talk about microwave inferometers and the early universe.

Okay, maybe the last isn't a public purpose. But the rest is. We all benefit when high school students have a well-rounded academic education not only in "skills" such as reading and arithmetic but in history, literature, math, and science, and moving from the FCAT to EOC exams is the right step.

Florida Senate overreaches on changes to regulation of teaching

Yesterday, the Florida Senate voted for Senate Bill 6, which would dramatically change the structure of teacher evaluation, contracts, pay, and licensure in the state. A few amendments were approved on the floor of the senate, but only three appear substantive, and the largest changes happened in committee, in part to address concerns about constitutionality for the initial bill. 

As the Washington Post's Valerie Strauss has, most observers have focused on the evaluation, pay, and contract issues, and that's because the intent of the bill is to elliminate any form of tenure, to reorient evaluation around student test scores, and to eliminate the ability of school boards to pay teachers in part based on experience. For a variety of reasons, legislation such as SB 6 is policy overreaching, and as it has in several other ways in the past decade, Florida has gone far beyond any other state in education policy. In part because it is so hostile towards the Florida Education Association, I suspect that some observers will praise the senate even if this turns out to be horrid policy. That way lies Thrasymachus, and it's not pretty.

SB 6 is overreaching. Instead of reducing the protections of tenure, it eliminates all meaningful due process related to job security. Instead of mandating that student outcome data be a part of teacher evaluation, it requires that test scores form the majority of any teacher evaluation system. Instead of moderating the influence of job experience on pay, it completely prohibits any such factor being used.

As a result of this overreaching, school boards are going to be motivated to work with teachers unions on workarounds for most of these issues. For each area where school boards and union locals agree the state has gone too far, they'll figure out another way to provide for some job security, to moderate the effect of test scores on evaluations, or to create a legally defensible proxy for experience in salary structures and call it performance-based pay. It took me about 10 minutes to come up with a few mechanisms for these issues, and I'm not nearly as clever as highly-motivated union officials and superintendents. But as a result, you're going to see highly variable treatment of teachers across the state, which I don't think is the intent of legislators.

There is only one area where the state has an undisputed right to regulate teaching, either in Florida or elsewhere, and that's in licensure. Regardless of what happens in collective bargaining at the local level, any state can decide who has the right to be licensed as a teacher, and at least at first, the part of SB 6 that is least amenable to mediating influences is in the requirement that teachers demonstrate effectiveness to have their professional certifications renewed. Does that mean that it will be tied closely to test scores? That's what I fear. While there's a substantial academic literature on the problems with using either test scores or growth measures, Daniel Willingham's video remains the clearest short explanation for a lay audience. But I'm sure there's going to be lots of testosterone-laced talk about getting tough on teachers, at least until the State Board of Education has to decide what proportion of experienced teachers it's going to non-renew licenses for... and wait for things like lawsuits and backlash from parents and districts.

I expect that I might find a few additional nuggets of unworkable details in the bill, but that's the big picture. If the Florida House passes SB 6 without substantial changes, there's going to be a great deal of turmoil in schools over the next few years, and until the questions raised by the bill are settled about local bargaining authority and the use of test scores in teacher evaluation, there's going to be a substantial cost of the bill in terms of instability.

March 23, 2010

The sugar-daddy amendment to SB 6

Note (March 25, 2010): This entry was written on March 23, before the Senate adopted the Thrasher/Crist amendment. For my thoughts about the version that passed the senate on March 24, see my entry describing it as overreaching

.Among the amendments to Florida Senate Bill 6 filed today is a short amendment sponsored by John Thrasher (Jacksonville) and Victor Crist (Tampa) to address a concern I raised Saturday (and I assume others have also raised): As originally filed and then approved by state senate committees, Senate Bill 6 would essentially punish the Hillsborough (Tampa) school system for having won a Gates Foundation grant because the carving out a portion of teacher evaluation for trained observers would reduce the amount accounted for by student outcomes below the statutory minimum in the bill.

So along comes the bill with a possible solution to this individual problem: a school district can apply to the State Board of Education for an exemption if it's constructed in various ways that match Hillsborough's situation... including the first requirement: "Any school district that received a grant of at least $75 million from a private foundation for the purpose of improving the effectiveness of teachers within the school district may seek an annual exemption..."

In other words, only Hillsborough need apply. If you've got a sugar daddy, you're eligible for the exemption. If you don't, even if you're a school system willing to invest your own money in a similar system meeting all the other requirements, you can kiss any exemption goodbye.

March 20, 2010

Would Florida SB 6 criminalize Gates grant to Hillsborough schools?

Note (March 25, 2010): This entry was written on March 20, about an earlier version of Senate Bill 6. Early this week, the bill was modified to allow Hillsborough to seek an exemption; the amendment was crafted so that no other district could apply, even if they replicated Hillsborough's efforts using local funding. For my thoughts about the version that passed the senate on March 24, see my entry describing it as overreaching.

In the past year, supporters of using student test scores to help evaluate teachers have expressed incredulity when some teachers union officials have been opposed to those moves in states such as California. "We're not even talking about having test scores dominate all evaluation!" has been the tone of such comments, "but student achievement should be one of the important factors."

Whether or not you agree with that position, it's intellectually defensible. This month, though, I suspect DFER members and Obama administration officials are going to do their best to avoid writing or speaking about Florida Senate Bill 6, which takes the approach that student test scores should be an absolute criterion for continuing professional licensure, and undefined "learning gains" should "comprise more than 50 percent of the determination of the classroom teacher's performance" (ll. 1197-1198 of the 3/19/10 version), no matter what subject the teacher is responsible for and whether anything like a value-added measure is technically feasible.

This majority-of-evaluation position is essentially what the state department of education wanted districts and locals to sign off on for Race to the Top, and Commissioner Smith's public support of Senate Bill 6's approach is inconsistent with his earlier claims in December and early January that the department would be flexible about how districts and unions could implement the RTTT MOU. As the head of the Florida superintendents association wrote in a letter to the commissioner, "you and your staff have emphasized flexibility in implementing these elements" (Bill Montford to Eric Smith, January 8, 2010).

In fact, Senate Bill 6 is less flexible than the text of the Memorandum of Understanding on the use of student outcome data for teacher evaluation. Here is the relevant MOU paragraph:

(D)(2)(ii)(1). Utilizes the Department-selected teacher-level student growth measure cited in (D)(2)(i) as the primary factor of the teacher and principal evaluation system. Primary is defined as greater than 50% of the evaluation. However, an LEA that completed renegotiation of its collective bargaining agreement between July 1, 2009, and December 1, 2009, for the purpose of determining a weight for student growth as the primary component of its teacher and principal evaluations, is eligible for this grant as long as the student growth component is at least 40% and is greater than any other single component of the evaluation.

The second sentence beginning with However appears to be framed specifically to allow Hillsborough County to participate; Hillsborough and its teachers union won one of the Gates Foundation multimillion-dollar grants in the fall, and one of the provisions of the grant is to construct teacher evaluation around three components: student data, an administrative review, and observations from a trained classroom instruction evaluator (the last part of the Gates initiative to develop such evaluation expertise). And in the January letter noted above, Montford wrote that all districts should be able to do what Hillsborough and its union had agreed to for the Gates grant.

So what happens if Senate Bill 6 passes?  Well, there goes any value of the Gates award in Hillsborough; the arrangement in Hillsborough would violate the law because less than 50% of the teacher evaluation structure will use student outcomes. Is this really what DFER and the Obama administration wants? Teachers union and district take a risky step in a joint commitment; state punishes district.

Keep in mind that SB 6 is a moving target: on Thursday, a state senate committee changed the bill to eliminate constitutionally-dubious provisions in the original that would have forced local school districts to raise taxes if they didn't do what the bill rquired and that would tie half of teacher pay to test scores. And thus far there is no House companion. But the teacher-evaluation and licensure components of SB 6 are based on a fantasy of assessment data and state authority that is unrealistic and is a slap in the face of administrators and teachers who are working at the ground level to develop better teacher evaluation systems. 

I can't expect Commissioner Smith to acknowledge openly that his public support of SB 6 is a political calculation that he has no choice if he wants to keep his job. His capitulation is sad, since I like Smith and he's done a considerable amount of work in the background to educate members of the state Board of Education and legislators. But those outside Florida are free to criticize overreaching on teacher evaluation proposals, and this is a chance for them to prove that they are not as absolutist as teacher union activists in California and other states claim. So, is anyone from DFER or the Obama administration willing to speak up against the excesses of SB 6?

March 19, 2010

ESEA reauthorization blueprint, the CliffNotes version

I have several meetings today, but I want to write down my thoughts on Duncan's ESEA reauthorization "blueprint" before I forget them. As I wrote over the weekend, Mike Petrilli is reading the substance of the blueprint correctly; the Obama administration is proposing that federal policy walk back a few steps from NCLB's absolutist mechanisms and disentangle the different issues involved in accountability. Petrilli is also correct in seeing a connection between the administration's ESEA reauthorization proposal and the promises by both Duncan and Russlyn Ali to be more aggressive in the department's Office of Civil Rights (OCR). That's essentially the implicit deal the administration is putting out for review by stakeholders: "We won't force states to label the majority of schools as failing, but we will require states to intervene in the worst 5% of schools in each state, and we will be aggressive in monitoring equity issues in other schools." 

At least in theory, this fits with my argument in Accountability Frankenstein that schools have three different types of challenges: the challenge of truly mismanaged schools in crisis, the challenge of inequality, and the challenge of making sure the next generation is smarter and wiser than we are. I argued that NCLB tried to address all of those challenges with the same mechanisms, and it looks like the Obama administration is recognizing that they need different policy approaches: requiring states to identify 5% of schools in crisis, using OCR to address inequality, and pushing for common curriculum standards for the next-generation challenge. 

That's not saying that the proposed mechanisms are going to work. I am less worried about using testing to screen for schools in crisis than others, but I agree with Diane Ravitch that educational euthanasia is a simplistic response. That doesn't mean that states should allow schools with deep problems to fester but that both states and the federal government need to be much more humble about their ability to "turn around" schools in crisis or even replace them with putatively brand-new schools. It's the proposed four-option turnaround mandate in the blueprint that bears the most resemblance to NCLB's cookie-cutter interventions, and that's a matter of deep concern for me. 

Then there is the effective-teachers piece of the blueprint, which is less bureaucratic than NCLB's "highly qualified teacher" approach and the trigger for NEA's and AFT's critical responses to the blueprints (though I think Andy Rotherham is correct that the Obama administration's pushing of a health-care excise tax, abandonment of the Employee Free Choice Act, and passiveness with regard to NLRB appointments is definitely playing a role). The blueprint is very general with regard to its treatment of teacher effectiveness, and it could be consistent either with something like the Toledo peer-review system and Denver's ProComp, or with the problematic Senate Bill 6 in this year's Florida legislature. 

The generally positive response to Duncan's presentations this week (especially from rural-state senator Tom Harkin) suggests that Duncan's hit a number of right notes, at least politically. That's not the same as effective policy, but it's a long way from a 40-something-page document and a law.

March 14, 2010

Petrilli nails ESEA reauthorization proposal

After finishing the last entry, I realized I should write something about Friday's USDOE proposal on ESEA reauthorization. But procrastination is sometimes a serendipitous thing, thanks to the Fordham blog: Mike Petrilli's analysis is correct, at least on first approximation. A narrative framework is not statutory language, Duncan's proposal isn't George Miller's, and other Beelzebubs squatting in the filigree, but I had the same general reaction Petrilli did.

I'll write more about ESEA reauthorization later in the week.

March 8, 2010

Sour-grapes agreement

Michael Olneck and Peter Sacks turn petty in letters to the editor about Diane Ravitch that the New York Times printed today. Wow. I agree with Ravitch on a number of things and disagree with her on a number of things, some of which is in our area of expertise (history of education) and some of which falls outside the history of education. But I'm not sure why Sacks in particular is turning on the venom spigot. Well, actually, I do have some hypotheses about general hostility to her I've occasionally seen (as opposed to disagreement): she caricatured the field of history of education in a sloppy late-70s publication sponsored by the National Academy of Education, and along with Patricia Graham she was a woman to get high-status national recognition in the 1970s for her work in education policy at the national level, which heretofore had been a male bastion. (Graham was director of NIE from 1977 to 1979.) The first is a seriously flawed work, but that's several decades in the past, and in any case, a particular work should stand or fall on its own merits. I've never seen the second item discussed or even acknowledged. 

There's a related issue here, which is Ravitch's position outside traditional faculty. As far as I'm aware, she's never had a tenure-track or tenured faculty position, and she's one of the few historians who can say that they published their dissertation commercially before receiving the Ph.D. (The Great School Wars was published in 1974; Ravitch received her Ph.D. from Columbia in 1975). For the most part, her books are far more widely read than those of us who have full-time faculty positions, and I think she and Graham are the only historians of education to have held political appointments in the federal government. That's an interesting combination of insider and outsider positions. 

When Meier and Ravitch started their joint blog/conversation three years ago, I briefly referred to this history in writing, "Regardless of various professional views of her scholarship, Ravitch is a recognized voice on education policy. There are plenty of people I correspond with who have fewer claims to expertise, so I can either have a snit-fit about that or deal, and at this point, having a snit-fit is darned close to sexism and uber-testosterone in education policy studies." I'm sorry Olneck and Sacks, and especially Sacks, have made a different choice.

For the record, Sacks is factually wrong when he states, "Dr. Ravitch fashioned herself into the Ayn Rand of educational policy and rose to fame as a result of a free-market ideology that came into fashion in George W. Bush's administration." Ravitch's appointment was during the first Bush administration, and whatever you might think of Ravitch's historical arguments in different books, she's a much better writer than Rand.

February 11, 2010

Additional thoughts on performance pay politics

An addendum to my entry earlier this morning: I think that there is a politically-robust rationale for performance-pay policies, but it's not at the level of incentives usually used as the justification. The more plausible rationale for performance-pay policies is at the level of public-sector accountability: most people with jobs do not expect identical salaries or salaries based on a formula, and small variations based on something other than seniority and educational credentials might boost the facial validity of public-sector HR practices.

Note that this is not an argument that business practices are always incentives based (or should be: witness AIG as a disaster stemming from short-term incentives) or even widely varying. In some cases--large law firms, for example--entry-level professionals receive step pay increases in their first few years akin to teachers' step increases. But if I were to ask the head of the Florida Council of 100, Susan Story, whether she'd stop advocating performance pay even if the research consensus in a few years were solidly against its doing anything for student achievement, my guess is that she'd still push for some form of performance pay.

The discourse around this is somewhat similar to other comparisons people make between their lives and public policy: when policies look like you're pushing the cart and someone else paid by public funds isn't, you're less likely to maintain support for it. A friend of mine visited a newspaper columnist some years ago to complain about an article the columnist had written regarding AFDC (the federal welfare program before 1996). Don't you understand the factual errors with all of the myths about welfare? my friend asked. Sure, said the columnist, but you don't understand why public attitudes have changed: as the majority of mothers now have to find their own child-care arrangements while they're working, they're going to be far less sympathetic towards women who aren't willing to work or perceived as not willing to work.

I don't agree with the columnist's thumbnail history of public attitudes towards federal welfare policies or on assumptions that women on welfare have not historically wanted to work. But there is a significant grain of truth there that when the majority of mothers work when their children are young, and they have to find and pay for child care and wrestle with the stress involved in that, those mothers are not going to want to see that they're pushing the cart and others aren't. For similar reasons, those who oppose any performance pay have an uphill road telling people who work in environments with non-step-like pay arrangements that somehow public schools should be arranged differently.

Why the Teacher Incentive Fund and Race to the Top are long-term dead ends for merit-pay advocates

The apparent push in the proposed 2011 Obama budget for an enlarged Teacher Incentive Fund on the heels of Race to the Top makes me think that merit-pay/performance-pay advocates may be spreading their political capital very thin on teacher evaluation. Most advocates of paying teachers in part based on test scores are also advocates of using test scores in part to evaluate teachers more broadly, especially dividing probationary teachers from teachers with a right to due process before dismissal. And they're trying to do both. Smart or stupid? I think it's counterproductive for several reasons:

  1. The research on benefits of individual-teacher performance pay is limited. Very limited and quite mixed. Putting all your chips on a huge expansion of experimental performance-pay schemes? You may not get what you want, and public evaluations may doom the politics. (Think Reading First, though the allegations of corruption set the stage in that case for death-by-evaluation.)
  2. Grant programs end. If the expansion of performance-pay programs relies on temporary revenue, then the program may well die along with the extra revenue. Denver's teachers union and district worked together on a long-term political deal: performance pay that teachers helped develop tied to a long-term boost in revenue. That's not the structure of RTTT, TIF, or the Gates Foundation grants.
  3. Real-life performance-pay bonus budgets are stingy. The best example of that reality is here in Florida, where the state budget for the school-based rewards for test scores has been no greater than $100/student (for a school) since the late 1990s, and while my undergraduate students sometimes enter my classes thinking that a huge amount of school budgets are based on test scores, in reality that's no more than about 1.5-2% of per-pupil expenditures in Florida (and that's for the schools that receive the money). When this money is distributed to staff (sometime it is, sometimes it isn't), it's in the form of bonuses, not additions to base salaries. The fiscal and political reality is that the only way to permanently boost base salaries substantially based on test scores is to give the money to a tiny fraction of teachers, and that's a recipe for political disaster (and legal problems).

The last point is one I am surprised opponents of performance pay have not raised sufficiently, and here's how I thought someone would have put it by now: Okay, so you want to pay teachers well if their students learn a great deal? Wonderful. So if students perform at a very high level, you're willing to raise taxes to reward teachers for that accomplishment? Liberal advocates of performance pay would probably answer yes if. I don't think fiscal conservatives who are performance-pay advocates have thought through the dilemma on that point very clearly; either the answer is that you're willing to raise taxes or that you have low expectations for schoolchildren. 

Eventually, I suspect that advocates of performance pay will have to decide whether they want to put all of their political capital into pay schemes that are fragile or into hiring and retention issues. The proposed ballooning of TIF is a sign that no one in Washington is thinking about the political balance of these issues in the long run. 

Disclosure: I'm a member of a higher ed union that has long had a contract with merit pay and considerable differences in pay by rank and discipline. K-12 is a very different world in this regard. 

Note: I started this entry on Tuesday, and because I forgot to change the "publish date" (which Movable Type usually sets at the time you started an entry, not published it), it first appeared as if it were published Tuesday. My editing fault, not your faulty memory. Now, your forgetting to read all of my books and articles? That's a different story.

January 9, 2010

Spot temperature:Climate::Test score:____________

I fully expect that within a week (if not yet already) some climate-change skeptic will use the cold wave currently freezing much of the country as an argument that climate changing really isn't happening. And every time there's a vicious cold snap in winter or a cooler-than-average summer we get the argument. And some reporter and editor decides to devote part of the ever-shrinking news hole to bad coverage of the issue, while a relative handful of reporters use the question as an opportunity to educate readers about the difference between weather and climate.

Today, I'm sitting in central Florida with more layers on than I usually need in early January. It's colder weather than usual. But we're in a warming climate, because in the long run of decades (or centuries) the current cold wave is just noise, and the trend is towards a warmer atmosphere. "Just noise," you may be thinking through chattering teeth, "tell my heating bill that it's just noise." The current cold wave is nasty for individuals today (and a few days more), but it's temporary.

The variability of weather makes sense to most people because we have enough experience to distinguish between spot temperatures and broader patterns. We know that temperatures have daily and seasonal cycles. But the cyclical nature of weather does not give us enough background to grasp climate change. For that, you need data. A lot of data. A lot of data from a lot of places and times, of different sorts, with a number of experts sifting through it.

And even then you get climate-change conspiracy theorists, including someone who's evidently a hacker.

You can probably guess the logical analogue here: we do not have anywhere near the same density of data on student achievement that we have on climate, and yet we draw bold conclusions about the underlying achievement from a relative paucity of noisy data. As I wrote in August, we need to learn how to make decisions with noisy data. But in terms of broad trends in achievement, it is a bad habit of Americans to equate the latest test scores with long trends. 

And that doesn't even touch the question of whether test scores are like temperature readings. Ah, but they are, if you're talking about your and my outside thermometers: placed at different heights, in different conditions (sheltered, out in the open, shade v. sun), different ages of the thermometers (and thus consistency of the readings across the years). I am sure that background thermometers in these varied conditions are highly correlated in the sense that when it's colder, they're all colder, and when it's warmer, they're all warmer, and so the correlations across time are likely to be very high. But I wouldn't use them in any scientific research.

Stay warm, and have whatever hot beverage you like!

December 13, 2009

Turnaround or abandonment in NYC?

The extent of school closings in New York City is becoming evident, and after JD2718's posts on the subject over the past half-week, UFT's Leo Casey provides an overview and alleges an ulterior motive (to create available space for other purposes, not to improve education).

I'm far from NYC and can't speak from close knowledge of the city schools, and I'm still grading student work so I have no time to read extensively. But this is an important story and rolling conflict, and there are a few predictions I'll hazard:

  • At least one conservative will commit rampant inconsistency by simultaneously (or nearly simultaneously) weeping over the demise of the DC voucher program and applaud Klein for his bold moves, repeating the double standard on the issue I have described before
  • A small handful of schools may be preserved through fairly heroic efforts, but most of the closures will stand.
  • There will be no effective way to hold Tweed responsible for consistency and rationality in its school opening/closing decisions. 

In truth, many administrators engage in maneuvers that appear as arbitrary as Klein's closures do, but rarely is it on such a scale or so visible beyond the locality.

December 5, 2009

Are central Florida schools flouting Florida law limiting test-prep?

I have heard from teachers and students in three area districts (Hillsborough, Pinellas, and Hernando counties) that secondary teachers in some subjects are being ordered to spend the first 10 minutes of class suspending the curriculum and teaching material from another class. In the case of two counties (Pinellas and Hernando), I have heard stories that math teachers are being asked to teach 10 minutes of reading--not include word problems in math, which is certainly appropriate, but teach reading (a subject very few of them would have certification in). In one county (Hillsborough), I have heard a report from a student that a high-school anatomy teacher has been asked to spend 10 minutes reviewing other science subjects (and the emphasis appears to be in chemistry), probably to prepare students for the 11th grade FCAT science comprehension exam.

In 2008, the Florida legislature added a section to the existing law on assessment (F.S. 1008.22(4), if you're curious), specifying limits to what schools can do to prepare for tests, specifically

STATEWIDE ASSESSMENT PREPARATION; PROHIBITED ACTIVITIES.--Beginning with the 2008-2009 school year, a district school board shall prohibit each public school from suspending a regular program of curricula for purposes of administering practice tests or engaging in other test-preparation activities for a statewide assessment.

There are a number of exceptions to this prohibition--school districts can distribute sample test books, teach test-taking skills in limited quantities, etc.--but the spirit is clear: schools are not supposed to be engaging in test-prep that is a substitute for instruction. And taking time away from math class to teach reading, or away from anatomy to teach chemistry, looks like it's clearly prohibited.

It's also counterproductive from an administrative standpoint: if you wanted to add reading instruction, why would you ask a math teacher to do it? I should be clear: these are unconfirmed reports rather than documented examples. But if these reports are true, this clearly looks to be an end-run around ordinary curriculum policies requiring a certain amount of instruction in the classes to get more instruction or more test-prep in for high-stakes subjects.

There is one additional legal problem with this practice: there are both state and federal policies about teacher qualifications. I bet it's illegal in a number of respects to assign math teachers to teach reading and then report that everyone instructing in a subject is properly certified.

I have contacted the three districts in question to ask where the policies required by the law are. If you are aware of any specific examples (and I would need the school, date, class, period, and witness for sufficient documentation), please contact me by e-mail (sherman dottish-thingie dorn at-symbol-stuff gmail.com).

November 12, 2009

Race to the Top: review, revise, redux

I am in California this weekend for the Social Science History Association annual meeting, where we get to talk about Maris Vinovskis's book on the last quarter century of school reform, and since one of my copanelists Saturday morning is Jennifer Jennings, I finally get to meet the sociologist-formerly-known-as-Eduwonkette in person, face to face. Because several family members live in Costa Mesa, I also get to enjoy Kean Coffee about 20 miles south of the conference hotel/cruise ship (when the heck did the SSHA officers decide to book the Queen Mary??!).

While the focus of the book panel will be ... well, Maris's book, I'm sure we'll be talking about Obama education policy at some point, including Race to the Top. I was rushing around last night not getting enough done, so I didn't have a chance to do more than casually skim the stuff that's now available on the revised final guidelines. A few initial thoughts:

  • Bottom line? No idea. I traveled west and had coffee (see above), so I don't have a bad case of jet lag, but I've been on planes for 7 hours today. 
  • I very much like the competitive priority on STEM fields. That uses a standard device for focusing grant-writers' minds in USDOE competitions (the bonus points for meeting a competitive priority). (Disclosure: it looks like my state's department of education is following the push a bunch of us have been making about using Race to the Top funds for end of course exams, especially in science.)
  • From the list of changes made, it looks like there have been a lot of political calculations made on what changes had to be made to keep stakeholders in the game and what had to stay the same to satisfy policy goals.
  • Duncan is not anal retentive enough to make the points add up to a "nice round number." I have a suspicion this is deliberate, and if so I think I know the reason why.
  • People who focus on the total potential range of points for each section are missing an important feature of point distributions in scoring systems: it's the actual range and not the potential range that matters on rankings. If the potential range is 58 points from top to bottom on one component but the scoring leaves a real-life range of 10 points, it doesn't matter that the total number of points is 58. It could have been anything from 10 to 58. So what matters is how the reviewing panel looks at everything.

If we have time, I'll try to persuade Jennings to put on her Eduwonkette cape and save the state where I grew up. But I think California's problems are beyond what even a brilliant sociologist can solve. At least I get to see family members, which is worth the jet lag I'll be fighting in the next week.

October 29, 2009

Channeling Jerry Bracey on "proficiency": it's political, not scientific

One of the late Jerry Bracey's hobbyhorses was the pretense that the NAEP achievement level labels were scientific, as he argued in 1999: "The standards have generally been the object of scorn and derision from the psychometric community." He was fond of quoting the 1999 report on NAEP proficiency levels, esp. from p. 162: " Standards-based reporting is intended to be useful in communicating student results, but the current process for setting NAEP achievement levels is fundamentally flawed." So when NCES issues a report comparing the implied theta-values of cut-scores for proficiency on state assessments to the theta-values of cut scores for proficiency on NAEP and both Ed Week and the Christian Science Monitor report on the paper with a straight face, we're obviously seeing one place where Bracey's voice is already missing.

I think Jerry perseverated on this issue, to the detriment of a sensible argument about political judgments. The larger point which is inescapable is that cut scores are set arbitrarily, and there is no way to avoid that fact. Those who support setting achievement levels hope and pray that they're arbitrary in the sense of arbitration and careful judgment, not by being capricious. But they are arbitrary, and even moreso the labels assigned them. What we know is that someone who scores at a "proficient" level on NAEP is scoring higher than someone in the "basic" band. That's all we know from those labels: ordinality. Moses did not come down from Mount Sinai with NAEP scores carved in tablets. 

So what do we do with the inherently political nature of those labels? As I have argued in Accountability Frankenstein, the caution with which we use the judgments on cut scores should depend on the stakes of their use. If they're used to target resources, that's one thing (resources are going to be targeted in some manner), but the more that someone's job depends on them, the more wary we should be of how we set thresholds. 

Today, however, NAEP labels and cut-scores are serving a purely performative act, to stigmatize states for their political response to NCLB. I hereby propose that we have the following new labels for NAEP achievement levels: 

NAEP-achievement-levels.jpg

I think that's in the spirit of the day's report...

Correction: I assumed that NCES was using detailed data from the state assessments to estimate IRT parameters. Silly me. They were using distributional data for linkage. Oops... for me for forgetting the methods from the last such report. I'll let the measurement folks argue about the methods used here. 

October 14, 2009

The comparability fly in the Ouchi/principal-autonomy ointment

Yesterday from a "stakeholders" meeting (I think at the USDOE), Charlie Barone tweets,

Richard Laine of Wallace Foundation: forthcoming Rand study will show [principal] autonomy in hiring a key factor in student achievement.

I've been expecting something like this for a while, not because I'm connected to a RAND insider (I'm not) but because this is the obvious new version of decentralization form that would marry the 1980s-90s site-based management fad with new managerial fads in education.

To some extent I am attracted to Bill Ouchi's argument about principal autonomy leading to lower total student load. Ouchi's claims about total student load is essentially one of Ted Sizer's central arguments from Horace's Compromise, that the number of students a teacher sees is a key factor in the ability to push student achievement. But... and here's a fairly important but... Ouchi's work is tantalizing rather than definitive (because it has not be replicated substantially in terms of total student load), and the temptation to manage large urban districts as "portfolios" with quasi-independent school-level management may push a single form of decentralization at the cost of comparability in expenses and access to great teachers.

What the heck do I mean by that? In a sentence, we may not want principals to have complete autonomy in a task where they have relatively weak skills: knowing which novice teachers are going to be great teachers.


Everyone and her or his grandmother is focusing on the problem of where senior teachers work. This is an intellectual sleight of hand if you simultaneously argue that teachers with seniority are taking advantage of contracts with seniority privileges on transfer to avoid schools who need them and also insinuate that experience means nothing. Let me get this straight: we need to prevent experienced teachers from exploiting labor-market choice to move to schools with more comfortable teaching situations because... they're not inherently any better than teachers with only a few years of experience? This is an inconsistency ripe for Jon Stewart-like treatment.

More important than the intellectual sleight of hand is the way that this argument ignores an opportunity for a simple but politically sensitive intervention we could make that could simultaneously improve the lives of poor children and new teachers: create regional new-teacher clearinghouses and matching services. Here's the thought experiment: Far from decentralizing, I think it would be a healthy system for schools to require new teachers go into a large regional market where vacancies for relatively new teachers (e.g., those with fewer than three years of experience) would be balanced with a matching process akin to matching of med-school graduates to residencies. This would require collective bargaining and regional agreements between districts (or changes to statute), but here's the idea:

Brand-new teacher's perspective: A new teacher registers with the regional teaching market clearinghouse, with all of the stuff you'd want applicants to provide. The clearinghouse is directly tied to vacancies in the region, and that would probably include multiple districts in most parts of the country. The clearinghouse matches teachers to jobs for the first year. The teachers and administrators are told, explicitly, "This is a one-year arrangement. In the second year, the teacher is headed to a new school, and the administrator provides an evaluation knowing that the teacher is not coming back to that school until at least two years down the road." And that's what happens. At the end of the first year, the clearinghouse matches jobs to teachers who want to continue teaching and whom the first-year administrators recommend continue. Same with the end of the second year. And the clearinghouse's job is to make sure that by the end of a new teacher's third year, that teacher has worked in multiple settings, with different characteristics of students (at least within the range of the region), in areas of the teacher's documented expertise (i.e., no out-of-field matches). 

At the end of Year 3? Open market in the spring, in most places, and administrators wanting to hire on the open market must hire teachers with at least three years' experience -- in other words, teachers for whom there is a record of evaluations from different administrators and for whom there is a record of performance for students in different settings (within the range of the region's student population). Schools are allowed to hire teachers who worked in their schools before... if the now-third-year teacher wants to work there again.

Benefit to teachers: first-year teachers stuck with horrible administrators (or generally toxic environments) know that they'll be moving on if they survive. They'll get experience with multiple settings where they'll be able to demonstrate their chops. At the end of their third year, they'll have some variation in experience with administration to be able to judge people better when applying in an open-market situation. Disadvantage to teachers: if you happen to get lucky and get a great job in Year One, you have to move on.... and let another new teacher get the benefit of that experience.

Benefit to administrators: because new teachers are forced to move on after a year, honest evaluations are less likely to result in social backlashes. When you hire on the open market, you'll know you'll have evaluations and (where this is gathered) other performance data that is from school settings with a range of student populations. Disadvantage: you don't get to hire absolutely new teachers; you get whom you get, and if you were great spotters of talent, or you think you're better than the average principal at spotting good talent, you'll be upset.

(Personally, I think I would prefer this as an administrator: if you've read Moneyball, you know the sabremetricians' rule of thumb: you can predict a baseball player's professional performance from college experience, but someone straight out of high school is just a raw bet without college experience. Why would you want the authority to make hires in a situation where you're almost guaranteed to be a worse judge of talent/skill than any other personnel situation? Then again, I'm sure many principals think of themselves like the [very poorly-predicting] old scouts of baseball, making seat-of-the-pants judgments.)

Advantages for systems: See advantages for administrators above. In addition, you have lower risk with variation in administrators' skills in talent judgment, while principals would still have the autonomy to pick more experienced teachers, after they pick up enough of a record for administrators to see who has more talent. You could also get development of evaluation skills in a regional context without diseconomies of scale. If clearinghouses have to track teachers, they could also be tasked with additional evaluation responsibilities across a region. Advantage for relatively poor systems: you know that wealthier districts will not be able to be as much of a magnet for new teachers, because of regional rotation, and you could push administrators to do what is necessary to convince teachers that they want to return to your district after their initial three-year rotation is done. Disadvantages: there would need to be legal agreements to cover this, and there would be some logistical challenges in identifying vacancies (and making sure those vacancies are reported accurately and promptly) as well as the operation of a clearinghouse. School districts would have to delegate hiring authority for some of their jobs to a regional body, and if school systems really thought that they were hot stuff in terms of talent scouting, that might be hard to swallow. (See above and Moneyball on the egos of baseball scouts and possibly school administrators.) Disadvantage for wealthy districts: poof goes your advantage in recruiting brand-new and relatively-new teachers, because they'll spend some time in your districts but also some time in poorer districts.

Now, the payoff in terms of debates about comparability: a regional new-teacher clearinghouse/matching process would instantly equalize a significant part of the teaching staff across a region, because of rotation among jobs and districts. Yes, there would still be an advantage of wealthier districts in attracting teachers with three or more years of experience, but poorer districts would know that they at least have a shot of persuading new teachers that they can make a good career inside a district... if the relatively new teachers have an experience that is supportive. 

Remember that this is a thought experiment: I don't know of any places with regional new-teacher clearinghouses/matching services, and I dreamed it up out of whole cloth (plus some inspiration from what happens with med-school students). But I think it points out a structural problem with giving principals entire autonomy: with complete autonomy, there is no balancing out of regional needs. Equality of opportunity would depend entirely on the skills of individual principals, and while principals are extraordinarily important, that's putting a heck of a lot of eggs in a single basket. If you care about making sure that a broad range of students have access to great teachers, there are serious dangers in the Ouchi principal-autonomy approach.

October 8, 2009

First, find me a box of cereal that squirms and drips snot in winter

Congratulations to former Florida Governor Jeb Bush, who knows a critical rule of politics: declare victory whenever you can, no matter whether you were right. I am quite serious about his political acumen: his push of a system that assigned letter grades to schools was ingenious politics. And Bush deserves credit for supporting a research technical assistance center in Florida as well as funding for reading coaches. But Jeb Bush's comments to the Jeb Bush Celebration Conference this week had an interesting quip:

Frankly, if Walmart can track a box of cereal from the manufacturer to the check-out line, schools should be able to track the academic growth of a student from the time they step in the classroom until they graduate.

I am firmly in favor of using longitudinal data, but this comment is cheerleading and not serious discussion. There are significant challenges in the creation, maintenance, and use of longitudinal data systems, and Walmart-style tracking logistics don't touch the greater ones.

October 3, 2009

Child murder, Chicago style

Chicago teacher Deborah Lynch pointed out in a Sun-Times opinion piece yesterday that one of the Chicago schools' "turnaround targets" this fall has been Fenger High School, near the gang fight that led to Derrion Albert's death and the school where she implies many of the combatants attend. (Hat tip/alternative source.)

I am not saying that knowing the kids better could have averted the melee and tragic death of last week, obviously. But trouble had been brewing at the school even before last week. Staff reported a riot the previous week inside the building, involving teachers being hit, and that two different police stations had to be called in to quell the disturbance. Those are the times when the staff members draw on their relationships with kids to urge restraint, to urge calm and peace, to try to talk things out rather than fight things out. Those are the times when a seasoned staff can identify strategies and resources to address and prevent further problems.

Lynch's argument is interesting and plausible. I'd be cautious of taking it at face value, but don't toss it out the window. As far as I am aware, there is nothing either to contradict or to support the claim that the length of time a staff (as a whole) has spent in a school is predictive of the general school environment. I suspect it depends on the staff; experienced good teachers and staff are going to have the types of relationships with students that Lynch describes.

But there is another important limit to Lynch's argument, and I'm thinking about the debate that's usually focused on academics rather than violence: the relationship between schools and the rest of students' lives. I suspect that if George Schmidt is correct, that the police congregated around Fenger rather than following potential combatants, any immediate investigation needs to focus largely on the tactical decisions of the police. It's possible that no matter what happened in the school, the gang fight would have occurred unless police decisions had been different.

The murder of Derrion Albert is representative of one fact: in violent neighborhoods students are usually safer in school than out of school. A skilled set of professionals can make it so kids are safe in school, safe enough to focus on school. And it's much harder to bring peace to a violent neighborhood without involving schools. What happens inside the classroom can change the conversations that happen outside school boundaries, but there are no guarantees. What if Fenger had not been the target of a turnaround effort: would Albert still be alive? I don't know. 

Update (October 7): More on MSNBC, and more focused, on the rearrangement of enrollment patterns.

September 2, 2009

"Lake Wobegon" Klein

From pp. 68-69 of Accountability Frankenstein:

The complexity of an accountability system can also help muffle opposition to accountability if it gives a reasonable chance for students or schools to be successful in the system's labeling... the political potential to muffle opposition within a system may be more important than the technical qualities of a system, for schools typically trumpet any positive label on any website, pamphlet, or streetside marquis. All three of these states provide evidence of the capacity for complex systems to muffle dissent. In North Carolina, the majority of schools have received some recognition award in every single year of its accountability system's history. In Florida's system, 13% earned recognition in its first year, 1999, but that proportion rapidly grew, and a majority of schools received recognition awards in each of the years from 2003 to 2006. In California, 47% of California's schools earned statewide recognition in 2002, and two thirds of the schools in the Los Angeles Unified School District earned recognition.

I don't know why anyone would suspect that there is any political convenience involved in having the single letter grades assigned to a whole slew of NYC schools jump to A, but it's not isolated to New York. It's just that New York has overtaken Lake Wobegon as a symbol of overestimation of results. Then again, since Garrison Keillor spends several months a year in New York, maybe it's highly appropriate.

August 30, 2009

Race to the Top comment sausage

A friend of mine from Chicago introduced me to the term link sausage as a blog entry that is not much more than a set of links. Here are links to various comments on Race to the Top (a tiny slice of the well-over-thousand comments submitted):

As I expected, others have started to chime in on the NEA comments. The New York Times took the comments as a sign of obstinance. Former Park Ridge Education Association president Fred Klonsky wrote,

While it seems to me that it is late in coming, the letter from Brilliant is well deserved, and [Sherman] Dorn's comments notwithstanding, I think it reflects the views of the NEA membership. At least among those who have been following the debate.

I think that was my point: the comments reflected the views of a large slice of the NEA membership, but not in a productive fashion, and I fear that on balance it will harm the concrete interests of teachers (both in and out of the NEA) no matter how you want to define those interests. 

Note: As Klosnky points out in comments, he's not an ex-president (yet). The error is all mine in sloppy reading of his about page.

August 28, 2009

I'm commenting on Race to the Top, and I want a pony, too!

Impressions of a quick skim of 20 or so comments on the draft Race to the Top regs:

  • I couldn't find the national AFT comments anywhere.
  • Thus far, the two sets of technical comments by the Learning Disabilities Association of America and the group of academics with Kane, Staiger, and several others (uploaded by Thomas Kane), respectively, earn my "okay, you guys read the regulations and targeted your comments" award. Whether you agree with them or not, the comments were shrewd and focused. (I happen to like most of the comments, which are practical and sensible on the whole.)
  • The New Teacher Project signed onto the multi-organization letter that was essentially a vague "okay, we agree with this" note (with the advice for the USDOE to be selective in the first round), and then submitted comments that were, ahem, not nearly as far in the opposite direction as NEA but bewildering in its unbridled confidence in the suggestions made. TNTP staff, please read the comments written by Kane et al. You're smart, and they're smart, and they're much closer to the mark than you were this week. At least you don't come close to winning the second "I'm commenting on Race to the Top, and I want a pony, too!" award (first was to the NEA). 
  • I think that the California Teachers Association (the NEA affiliate in California) avoided the factual blunder in the NEA comments of asserting that Race to the Top is a mandate. Instead, they asked what states would have to give up in return for the money. In this case, they were deeply, deeply concerned with the threat to federalism embedded in asking that a state be able to link teacher and student records. That would be more plausible if TNTP's comments were enacted, but either the draft regs or the Kane et al.'s suggestions are reasonable in an imperfect world.
  • One state department of education accidentally sent the USDOE its cover letter to a national organization telling the national organization it was sharing its reg comments, in the place where it was supposed to upload comments. No signs of actual comments on the regs (thus far today). Ouch! I suspect there are similar technical glitches in other places.

I didn't comment. This is the first week of classes, and I'm a firm believer in the biggest bang for my buck (or hour).

August 23, 2009

NEA's comments: righteousness over responsibility to members?

I'm an NEA member, through my membership in the United Faculty of Florida. I'm a skeptic and critic of high-stakes accountability. Wrote a book and a few articles on the topic. And I am astounded at the NEA's comments on the Race to the Top draft regulations. (Hat tip.)

It is one thing to submit a righteous objection to the entire program if you are an individual with no responsibilities but to your conscience and your personal judgment of posterity. It is an entirely different thing when you represent several million teachers and you submit a document that for all intents and purposes appears to have an internal audience inside the NEA. That's nice, in the worst sense of the word "nice," because NEA staff had a responsibility to protect and advance their members' interests, not indulge any of our fantasies. To put it bluntly, on what planet would this regulatory comment have any effect on the final regs?

Let me be clear on my perspective as an NEA member and as an observer of political processes: There are lots of reasonable individual passages within the document, but you don't submit a manifesto when you comment on regs as an organization. You don't submit a manifesto that covers up any potential for effectiveness with what amounts to political poison. And you don't submit a manifesto that undermines your credibility. 

Two examples will have to suffice, because there's only so much I can wince at publicly: "we cannot support yet another layer of federal mandates" (from p. 2), or with regard to the creation of statewide longitudinal data systems, opposition to "[i]gnoring states' rights to enact their own laws and constitutions" (p. 24). The problem with these claims (and attendant tone of outrage) is that Race to the Top is not a mandate. Love it or hate it, it's something states must apply for. 

There were certainly alternatives available to the NEA, including the following choices:

  • Realpolitik: nudge the regs a bit to help state and local affiliates.
  • Legal: set up a legal challenge after final publication.
  • Abstinence: if you need to make a statement of conscience, declare that "we have serious doubts that this program will substantially help schools and will not participate in the regulatory comment process." 

I may be dead wrong about this, and there may be some uber-secret strategy behind this comment, but from where I sit at the end of the summer, it looks like one of my national affiliates' new president's first major move has been a bunch of wasted electrons.

August 16, 2009

What "multiple measures" looks like in reality

Friday's Sun-Sentinel article on the new evaluation scale for Florida high schools shows what happens when a state moves away from general-assessment test scores as the end-all and be-all of accountability. In this case, Florida's new scale for high schools rewards schools for graduating more students, especially those who have problems with the state assessments, for enrolling students in challenging courses, for students who succeed in the challenging courses, and for student success in voc-ed certification programs.

How are Broward County schools responding?

At South Broward High School in Hollywood, students will get the chance to take additional AP classes, such as human geography, world history, music theory and macroeconomics, in addition to more traditional offerings such as AP English and biology, said principal Alan Strauss.

They're also ready to better monitor performance of at-risk students and ensure the entire senior class is ready to graduate, Strauss said. "I say overall I would hold myself accountable for grad rate and preparing my kids for college," Strauss said. "I don't find a problem with that. I think that's what my job should be."

Surprise, surprise! A more balanced accountability mechanism leads to planning a more balanced set of programs for students. I can quibble with loads of details on the new scale, but the direction is the right one, and I think we'll know in a few years how this is going. I'll stick my neck out and predict the evidence will be reasonably good (in terms of outcomes). A small step for a single state, a giant step for accountability options.

August 13, 2009

How can we use bad measures in decisionmaking?

I had about 20 minutes of between-events time this morning and used it to catch up on two interesting papers on value-added assessment and teacher evaluation--the Jesse Rothstein piece using North Carolina data and the Koedel-Betts replication-and-more with San Diego data. 

Speaking very roughly, Rothstein used a clever falsification test: if the assignment of students to fifth grade is random, then you shouldn't be able to use fifth-grade teachers to predict test-score gains in fourth grade. At least with the set of data he used in North Carolina, you could predict a good chunk of the variation in fourth-grade test gains knowing who the fifth grade teachers were, which means that a central assumption of many value-added models is problematic.

Cory Koedel and Julian Betts's paper replicated and extended the analysis using data from San Diego. They were able to confirm with different data that using a single year's worth of data led to severe problems with the assumption of close-to-random assignment. They also claimed that using more than one year's worth of data smoothed out the problems.

Apart from the specifics of this new aspect of the value-added measure debate, it pushed my nose once again into the fact that any accountability system has to address the fact of messy data.


Let's face it: we will never have data that are so accurate that we can worry about whether the basis for a measure is cesium or ytterbium. Generally, the rhetoric around accountability systems has been either "well, they're good enough and better than not acting" or "toss out anything with flaws," though we're getting some new approaches, or rather older approaches introduced into national debate, as with the June Broader, Bolder Approach paper and this morning's paper on accountability from the Education Equality Project.

Now that we have the response by the Education Equality Project to the Broader, Approach on accountability more specifically, we can see the nature of the debate taking shape. Broader, Bolder is pushing testing-and-inspections, while Education Equality is pushing value-added measures. Incidentally, or perhaps not, the EEP report mentioned Diane Ravitch in four paragraphs (the same number of paragraphs I spotted with references to President Obama) while including this backhanded, unfootnoted reference to the Broader, Bolder Approach:

While many of these same advocates criticize both the quality and utility of current math and reading assessments in state accountability systems, they are curiously blithe about the ability of states and districts to create a multi-billion dollar system of trained inspectors--who would be responsible for equitably assessing the nation's 95,000 schools on a regular basis on nearly every dimension of school performance imaginable, no matter how ill-defined.

I find it telling that the Education Equality Project folks couldn't bring themselves to acknowledge the Broader, Bolder Approach openly or the work of others on inspection systems (such as Thomas Wilson). Listen up, EEP folks: Acknowledging the work of others is essentially a requirement for debate these days. Ignoring the work of your intellectual opponents is not the best way to maintain your own credibility. I understand the politics: the references to Ravitch indicate that EEP (and Klein) see her as a much bigger threat than Broader, Bolder. This is a perfect setup for Ravitch's new book, whose title is modeled after Jane Jacobs's fight with Robert Moses. So I don't think in the end that the EEP gang is doing themselves much of a favor by ignoring BBA.

Let's return to the substance: is there a way to think coherently about using mediocre data that exist while acknowledging we need better systems and working towards them? I think the answer is yes, especially if you divide the messiness of test data into separate problems (which are not exhaustive categories but are my first stab at this): problems when data cover a too-small part of what's important in schooling, and problems when the data are of questionable trustworthiness.

Data that cover too little

As Daniel Koretz explains, no test currently in existence can measure everything in the curriculum. The circumscribed nature of any assessment may be tied to the format of a test (a paper and pencil test cannot assess the ability to look through a microscope and identify what's on a slide), to test specifications (which limits what a test measures within a subject), or to subjects covered by a testing system. Some of the options:

  • Don't worry. Don't worry about or dismiss the possibility of a narrowed curriculum. Advantage: simple. Easy to spin in a political context. Disadvantage: does not comport with the concerns of millions of parents concerned about a narrowed curriculum.
  • Toss. Decide that the negative consequences of accountability outweigh any use of limited-purpose testing. Advantage: simple. Easy to spin in a political context. Disadvantage: does not comport with the concerns of millions of parents concerned about the quality of their children's schooling.
  • Supplement. Add more information, either by expanding the testing or by expanding the sources of information. Advantage: easy to justify in the abstract. Disadvantages: requires more spending for assessment purposes, either for testing or for the type of inspection system Wilson and BBA advocate (though inspections are not nearly as expensive as the EEP report claims without a shred of evidence). If the supplementation proposal is for more testing, this will concern some proportion of parents who do not like the extent of testing as it currently exists.

Data that are of questionable trustworthiness

I'm using the term trustworthiness instead of reliability because the latter is a term of art in measurement, and I mean the category to address how accurately a particular measure tells us something about student outcomes or any plausible causal connection to programs or personnel. There are a number of reasons why we would not trust a particular measure to be an accurate picture of what happens in a school, ranging from test conditions or technical problems to test-specification predictability (i.e., teaching to the test over several years) and the global questions of causality.

The debate about value-added measures is part of a longer discussion about the trustworthiness of test scores as an indication of teacher quality and a response to arguments that status indicators are neither a fair nor accurate way to judge teachers who may have very different types of students. What we're learning is a confirmation of what I wrote almost 4 years ago: as Harvey Goldstein would say, growth models are not the Holy Grail of assessment. Since there is no Holy Grail of measurement, how do we use data that we know are of limited trustworthiness (even if we don't know in advance exactly what those limits are)?

  • Don't worry. Don't worry about or dismiss the possibility of making the wrong decision from untrustworthy data. Advantage: simple. Easy to spin in a political context. Disadvantage: does not comport with the credibility problems of historical error in testing and the considerable research on the limits of test scores.
  • Toss. Decide that the flaws of testing outweigh any use of messy data. Advantage: simple in concept. Easy to spin in a political context. Easy to argue if it's a partial toss justified for technical reasons (e.g., small numbers of students tested). Disadvantage: does not comport with the concerns of millions of parents concerned about the quality of their children's schooling. More difficult in practice if it's a partial toss (i.e., if you toss some data because a student is an English language learner, because of small numbers tested, or for other reasons).
  • Make a new model. Growth (value-added) models are the prime example of changing a formula in response to concerns about trustworthiness (in this case, global issues about achievement status measures). Advantage: makes sense in the abstract. Disadvantage: more complicated models can undermine both transparency and understanding, and claims about superiority of different models become more difficult to evaluate as the models become more complex. There ain't no such thing* as a perfect model specification.
  • Retest, recalculate, or continue to accumulate data until you have trustworthy data. Treat testing as the equivalent of a blood-pressure measurement: if you suspect that a measurement is not to be trusted, take the blood pressure test the student again in a few minutes months/another year. Advantage: can wave hands broadly and talk about "multiple years of data" and refer to some research on multiple years of data. Disadvantage: Retesting/reassessment works best with a certain density of data points, and the critical density will depend on context. This works with some versions of formative assessment, where one questionable datum can be balanced out by longer trends. It's more problematic with annual testing, for a variety of reasons, though that can reduce uncertainties. 
  • Model the trustworthiness as a formal uncertainty. Decide that information is usable if there is a way to accommodate the mess. Advantage: makes sense in the abstract. Disadvantage: The choices are not easy, and there are consequences of the way of modeling uncertainty you choose: adjusting cut scores/data presentation by measurement/standard errors, using fuzzy-set algorithms, Bayesian reasoning, or political mechanisms to reduce the influence of a specific measure when trustworthiness decreases.

Even if you haven't read Accountability Frankenstein or other entries on this blog, you have probably already sussed out my view that both "don't worry" and "toss" are poor choices in addressing messy data. All other options should be on the table, usable for different circumstances and in different ways. Least explored? The last idea, modeling trustworthiness problems as formal uncertainty. I'm going to part from measurement researchers and say that the modeling should go beyond standard errors and measurement errors, or rather head in a different direction. There is no way to use standard errors or measurement errors to address issues of trustworthiness that go beyond sampling and reliability issues, or to structure a process to balance the inherently value-laden and political issues involved here. 

The difficulty in looking coldly at messy and mediocre data generally revolves around the human tendency to prefer impressions of confidence and certainty over uncertainty, even when a rational examination and background knowledge should lead one to recognize the problems in trusting a set of data. One side of that coin is an emphasis on point estimates and firmly-drawn classification lines. The other side is to decide that one should entirely ignore messy and mediocre data because of the flaws. Neither is an appropriate response to the problem.

* A literary reference, not an illiteracism.

August 12, 2009

Belated kudos to Broader, Bolder and to Fordham

In the whirlwind of my obligations this year, my reading has lagged, and I am late in recommending and praising two reports published in the first half of 2009:

  • The Broader, Bolder Approach's accountability report, published in late June. This report suggests combining the use of achievement test data and on-site school inspections for school-level accountability. For those who have read Accountability Frankenstein, you'll know that I agree with those ideas. This report addresses the central gap in the original Broader, Bolder manifesto, and I am delighted to have read the proposal.
  • In March, the Fordham Institute published a report recommending a scaled approach to accountability when private schools take public dollars. Their proposal is roughly that the more dependent a private school is on public funding, the more the school has to provide data and be accountable in a way similar or parallel to local public schools.

Both are thoughtful, well-reasoned brief arguments, and they move each debate in interesting directions. Whether or not you agree with the conclusions, you'll have things to think about.

Updated: Aaaaargh! Six days later, I realize I've been calling the group the Bolder, Broader Approach instead of the other way around. Dear readers: when I make a stupid error, please point it out as soon as you see it.

Proposed ground rules on teacher evaluation and test discussion

Seeing how too many writers about Race to the Top, tests, and teacher evaluation would have taken actions in the Cuban Missile Crisis that would have led to nuclear war--i.e., seeing the worst in opponents, or maybe seeing posturing as the best path forward for themselves personally or for their positions (sound like the health-care debate-cum-food-fight?)--I am hereby proposing the following ground rules/stipulations:

  1. The modal forms of teacher evaluation used in K-12 schools are not useful.
  2. Some aspect of student performance (abstracted from all measurement questions and concerns about flawed tests) should matter in teacher evaluation.
  3. At least one problem of including student performance in teacher evaluation is how to use messy and flawed data. This comes from the fact that current tests are flawed. Heck, all tests are going to be imperfect and create the dilemma that Diane Ravitch referred to this morning. But plenty of today's tests should embarrass anyone who approved their use.
  4. Yes, people who disagree with you have used inane arguments, and some of them might even have gotten some provisions through a legislature by logrolling. I know I can say the same about your putative allies. Let's call each other out on those moves, and then move on to the substantive issues. Doing more than calling people out on that at the time (i.e., holding grudges) is playing the game of "your side is dirtier than mine," and you will inevitably lose that game, especially if there's an historian in the room (and in addition to me, there's also Diane Ravitch, Larry Cuban, Maris Vinovskis, and others who can quickly point out where folks have played dirty political pool for decades, though many of us will just call it the standard operating procedure in education politics). See reference above to Cuban Missile Crisis. If Reagan make an arms-control treaty with Gorbachev, we can all be a little more mature in disagreements.
Anyone who has broken these ground rules or is going to break the ground rules in the near future is currently in a grace period thanks to my staying away from blogging much in the past few weeks. But if I have time in the fall, I'll write a weekly entry on who's doing the best and worst jobs of fighting fairly on this issue.

August 4, 2009

Your personal, homemade commission on tenure and test scores

Sick of finger-pointing in the absence of a New York state commission to study how to use test scores in teacher evaluation (including tenure) decisions? Look no further! In this space, we will be conducting our own homegrown commission over the next three months. No need for the New York Assembly and Senate to act! We'll do it ourselves.

What? you say. You're in Florida. Well, yes, but everyone knows that Florida is just the Southern branch of New York. My father grew up on Flatbush Avenue and graduated from Lincoln High School. He was in New York City for his residency in pediatrics (with an office in Bellevue, but that's another story). The Yankees' spring training home? Eight miles from my house. 

And if that doesn't convince you, you should know that Alexander Russo runs his blog on Chicago schools from Long Island. If he can do that, I can run a citizens' commission for New York from here (and then someone in Chicago can run something in Florida).

Apply in comments: name, role in New York education, what you'll bring to the table.

July 27, 2009

Talking turkey on "Race to the Top"

The hoopla surrounding the draft "Race to the Top" guidelines have obscured the long-game strategy involved here. If you think about the structure of the funds--more discretionary money than the U.S. Department of Education has ever had before, competitive grant system, and a set of priorities that the Duncan department has been signaling for six months--there are two guesses I have about the broader goals:

  1. The double-shot of grants over the next year is intended to be the first of two or three shots of large amounts of discretionary money for the department.
  2. Duncan's learned about vicarious reinforcement and intends to use it here.

The obvious initial "winners" will be states such as Florida which have a number of the required elements in place and are ready to go on a few payoff projects. But there will also be a few very large states left in the cold (and without that extra funding) after these first two rounds of awards. What if California is one of those states out in the cold? Or New York? There will be local pressure from school boards and administrators on members of Congress to continue feeding money to the department until their states land at least one award.

In the long game, the fact that Race to the Top can't bail California out is not really the issue, and I disagree with Mike Klonsky's assumption that this is an attempt to starve the states into submission. While I think a number of people would have preferred a larger ARRA stimulus fund, I don't think you can claim that the Obama administration has acted at all as if it wants thousands of teachers fired. Far more likely is the ordinary political dynamics of federal programs: no one wants to be without a slice of the pie. For these reasons, if it were legal to place a bet of this kind, I'd give rather interesting odds that California loses out big in the first two swats at Race to the Top money. 

And speaking of misdirected Mikes, Mike Antonucci is wrong about the teachers union dynamics in Race to the Top. While my higher-ed local has both the AFT and NEA as affiliates, I'm generally out of the loop on national headquarters stuff, but I can see the writing on the wall: one of the unions may well push in the regulatory process to increase the leverage of state affiliates, not to eliminate the requirement on linkability of teachers to student data. The best thing that the national affiliates can do is help state affiliates' negotiating position with their own state departments of education. If two states' applications are similar, but only one has a letter of support from their state affiliate's (or affiliates') elected officers, both the NEA and AFT need the state with union support in the application to have an advantage. (There are some interesting dynamics here vis-a-vis merged state affiliates, but the larger incentive at the national level is to help all state affiliates.)

July 25, 2009

Temporizing and teasing on tests and teacher evaluation

I still don't have time to expand at length on combining qualitative and quantitative sources of data for teaching evaluation, but given the hoopla surrounding the draft Race to the Top regulations, I should at least provide an update, or rather a bit of a tease for what's developing into a short paper-to-be. In addition to my fairly general understanding of some technical issues, I'm developing the argument that any point-based system for combining professional judgment and test scores needs to avoid fixed weights for the components of the system.

The explanation is not that technical, and I can sketch it here: the benefit of a truly Bayesian approach to using test scores to evaluate teachers is a reciprocal relationship between the decision-making authority of professional judgment and the power of other data (including test scores). A forceful judgment by professionals reduces the power of test scores in such a system, while tepid judgments increase the power of test scores. That is one possible solution to the thorny question of relative weights: if educators are willing to judge their own, test scores are less important (addressing the concerns of teachers unions and many administrators), but if educators are not willing to judge their own, test scores are more important (addressing the concerned of those criticizing the very low proportion of teachers given poor evaluations). 

In a point-based system with fixed weights (or fixed percentages of the total) assigned to individual components, you don't have a structure with a reciprocal relationship between the exercise of professional judgment and the authority of test-score data. But I think the dynamic benefits of a Bayesian approach can be created in a point system, as long as the weights are not fixed. I need to think through the potential approaches, but it's possible.

There: that's the tease.

July 13, 2009

AFT QuEST presentation slides on performance pay

I am not in DC, but I do catch things online: the presentation slides for the AFT QuEST session on performance pay are available, and while Edward Tufte thinks Powerpoint is awful, a stack of straightforward, well-written slides provides a wonderful vicarious outline for those of us who Were Not There.

July 10, 2009

Those evil union supporters who denigrate objective measures...

Quick: who said the following recently?

We do see the incredible power of setting stretch goals. But if you set a goal that's really not within reach, people will just give up on it and you really don't have a goal. We've seen this over and over. I think there's as much talking down of goals around here as there is of actually saying, "You're not thinking big enough."

Oh, this evil denigrator of the value of objective goals. From the text, you might conclude that this person is a teacher union supporter who will die before wanting to break down the firewall between teacher records and student test scores.

Except that the speaker was Wendy Kopp, head of Teacher for America and someone who said later in the interview that she is an advocate of using data and setting goals. But there's an important piece here about motivations and goals. No, I don't have answers for the K-12 world, but as I will continue to state until someone proves me wrong, there is something deeply wrong when an historian knows more about the relevant goals and motivation literature than most of the people who advocate setting extremely high goals in education.

Combining qualitative and quantitative evidence for teacher evaluation: What does "predominant" mean?

According to Gotham Schools, former NSVF and current USDOE official Joanne Weiss "said the Obama administration aims to reward states that use student achievement as a 'predominant' part of teacher evaluations with the extra stimulus funds" (emphasis added). I followed up with a USDOE representative, who emphasized after talking with Weiss that she meant a predominant part, not the predominant part of teacher evaluations, and that is how Walz reported the comment. The department representative added that department leaders "consider it illogical to remove student achievement from teacher evaluation, and we want states and districts to remove any existing barriers."

This came on the heels of TNTP's Widget Effect argument and Joan Baratz-Snowden's Fixing Tenure. I know that the political context of Weiss's remarks is to push the Duncan line that New York State's moratorium on the use of test scores in personnel decisions is wrong, and if necessary Weiss will bar New York from the Race to the Top funds if the legislature doesn't get its act in gear. Stand in line, please; I have a feeling a few million New Yorkers have the first dibs on dunking the entire state senate in the Hudson near Albany sometime in late November.

Back to policy, though: the word predominant perked up my ears because Florida legislature's language has evolved from language involving the dominance of student achievement to quantification. The current language on personnel evaluation is a legacy of language first written in 1999:

The assessment must primarily use data and indicators of improvement in student performance assessed annually as specified in s. 1008.22 and may consider results of peer reviews in evaluating the employee's performance. [emphasis added]

The current performance-pay language in Florida has the Merit Award Program which stipulates that for the purposes of merit pay, achievement data "shall be weighted at not less than 60 percent of the overall evaluation" (F.S. 1012.225(3)(c)).

I need to think about this in some depth, but it strikes me that the Florida legislature mandated one of several options to use in combining quantitative and qualitative judgments of teacher effectiveness, the point system. You can probably come up with other variations that meet the statutory language, but my guess is that any real-world implementation would almost all be linear combinations of different subscores, and I will use incredibly technical measurement language to call it the point system of combining different sources of information about teaching effectiveness. But that's not the only one, and I am always troubled when a clunky system is chosen as the default because it is the first option rather than a deliberate decision among options. I understand why a point system is in the bureaucratic and political gravity well, and it may well be that this particular clunky point system is the best option. However, it should be considered in comparison with what other clunky systems might be appropriate.

For example, there is also the holistic review of teacher effectiveness, such as exists in the new Green Dot-UFT collective bargaining agreement teacher evaluation system. There's no specific way that test scores inherently enter the judgment as such, though the implication is that teachers will have to show that they use assessment to shape instructional practices (what's called action research in the document, at the very least).

But those aren't all: a flow-chart is at least theoretically possible, though I do not have a real-life example. Yes, there are process flow-charts such as exists in Denver (and in the Green Dot system), but it's a flow-chart essentially describing when and how you schedule meetings, not how you make decisions in a meeting. (Step 1: Can you understand this chart? Yes: read the rest of it while walking to your secretary's desk; no: pretend to read it while walking to your secretary's desk. Step 2a [at secretary's desk]...)

Most theoretical: a Bayesian bump algorithm. I am guessing that there is a high probability that any subjective Bayesian statistician reading this blog will have thought of this idea already, but I'll adjust that guess after some data comes in. Since even well-trained evaluators are making subjective judgments about people, you could treat a principal's or peer's judgment as a prior judgment about the probability that a teacher should be retained/rewarded, given help, or fired. In the Bayesian world, that prior judgment can and should be shifted based on data, to form a posterior estimate of the probabilities of what should be done (you can play with a Bayesian calculator here, in a medical-test context). That adjustment is why I'm calling it a "bump" -- start with a professional assessment on various grounds and allow that to be bumped somewhat by test data, with the magnitude of the bumping depending on the data. Going down this path would involve some interesting studies, and it would probably be working with Bayesian posterior odds (which provide an interesting possible back door to a point system). This is a little out of my league in terms of specific characteristics, but the Bayesian perspective on statistics makes it possible to combine qualitative and quantitative data in a framework that already exists.

So we have four large categories of ways to combine essentially qualitative and quantitative data. While I am busy reading student work and doing other stuff in the next week, you all have a chance to dive in and describe what you think are strengths and weaknesses of each approach, as well as any additional categories (or disagreements with my classification entirely). After I have a weekend and get other tasks finished, I will return to explain (a) why a Bayesian approach is not only philosophically appropriate but serves the needs of unions, students, and anyone Alexander Russo describes as reformy; (b) why a Bayesian approach is not that different from a point system, at least in theory; and (c) what characteristics you would look for in a point system for teacher evaluation to meet the political interests described in (a).

July 8, 2009

A word to the wise on accountability

Dear fellow Americans who support equal education and are inclined to attack teachers unions when you get frustrated (e.g., Charles Barone and Citizens' Commission on Civil Rights):

  1. Borg-like rhetoric ("Those who resist the school reform movement are going to find they are on the wrong side of history. They may affect the pace of reform, but not its inexorable direction") is not likely to convince anyone that they're wrong and you're right. It's not even close to the level of Rod Paige's NEA = terrorist remark, but it's still intemperate. And I don't know about you, but the last degree I earned came with a beautiful, shiny rearview mirror, not a crystal ball.
  2. I'm persuadable that NEA staff and national leaders made some incredibly stupid/venal moves in trying to shift policy in the backrooms of power (which apparently are no longer smoke-filled), that the AFT may have made (fewer) such moves, and that locals and state affiliates of both national affiliates also make stupid/venal moves at varying rates depending on location and internal union politics. But a report that essentially treats policy concerns and backroom politics as identical? It strikes me as shoddy analysis, for several reasons. First, it's scattershot, which undermines the credibility of what probably would be stronger arguments on more narrow grounds. Second, it misunderstands the nature of organizations, assuming that unions have intentions rather than internal politics, agreed-upon positions, strategies, and tactics. Third, if you criticize both regular and backroom politics, you're implicitly committing yourself not to do much politicking on your own part.

Every few years you see a wavelet of attacks on teachers unions, and I am assuming that this is part of a new one. Sometimes it's just a coincidence, and I hope that's the case in the entries linked above... and here

Addendum: Charles Barone takes me to task on two items; in comments I say he's right on one and wrong on the other, but you'll have to read what he writes rather than my summary.

June 30, 2009

Grading reports that grade states, which have schools that grade

It's now a PR cliche in education wonkery: grade states. Issue grades, and that's a hook for reporters to write stories about the reports, because the reporters at daily metros can say, "[Your state's name here] receives 'F' in think tank report on education." But beyond the PR value of grades, it's facile, which is why I'm surprised Education Sector gave into this particular venal sin in its report on states' higher-ed accountability policies. C'mon folks: can't you figure out a more substantive way of evaluating states? At the very least, this is so 1990s.

So I'm thinking about developing a report over the next year that grades think-tank reports that issue grades for states on some matter of education, where of course schools have teachers who grade students. Among the standards will be the following:

Clear standards for grades: a year before the report is issued, does the entity that issues the report publish grading standards or criteria?

A - Entity publishes grading standards with sufficient criterion specificity that an outside observer would not be surprised at the grade a state receives the next year. (Note: this is a low bar, not requiring agreement with grades.)

B - Entity publishes standards, but standards are too vague to provide benchmarks for policy progress.

C - Entity has previously published reports issuing grades to states, but changed the standards, or described the project and the areas where states would be grade, but no standards for those areas.

D - Entity has previously published the existence of the report project, but there is no previous publication of intent to grade states in this area of policy.

F - Report appears out of the blue with no publication of intent in this area.

Okay, folks: where does today's Education Sector report fit? How about Ed Week's annual Quality Counts phonebook? Fordham's reports that issue grades?

And, yes, if I'm serious about this, that implies I have to develop some more grading criteria. After all, it would be most interesting and ironic if I created a report that contained the mechanism by which the report itself could be torn apart. Hint, hint, ...

June 26, 2009

How to steer CYA-oriented bureaucracies, or why NCLB supporters need to think about libel law

Someone at USDOE sent me an invitation to listen to the June 14 phone conference where Arne Duncan explained how disappointed he was in Tennessee, Indiana, and other states with charter caps, let alone states such as Maine with no charter law, and how that disappointment might be reflected in the distribution (or lack of distribution) of "Race to the Top" funds (applications available in October, due in December, with the first round of funding out in February 2010). There are a few details that reporters didn't ask about (Duncan's somewhat surprising statement that a good state charter law would set some barriers for entry rather than establish a "Wild West of charter schools," and the way that small charter schools and charter schools with grade configurations outside state testing programs can stay off the radar for accountability purposes), but I was not surprised that two Tennessee reporters were called on for questions.

But apart from the selection of reporters for questions, the phone presser and other DOE moves made me think about the various uses of power in education-policy federalism. In limited ways, explicit mandates can be effective, if there is a sustained willingness within the USDOE (and esp. OCR) to make painful examples of the nastier school systems that try to evade those mandates. Offering technical assistance is another method, and despite the massive conflict-of-interest problems in Reading First, I agree with one of the researchers in the field who thinks that Reading First did improve primary-grade reading instruction, on balance. (Thumbnail version: hourslong scripts, ugh; explicit instruction in phonemic awareness and some other fluency components, obviously necessary.)


But neither heavyhanded mandates nor technical assistance can do everything, and neither works with the greatest motivation for both defensive and hubris-oriented bureaucracies: risk management. If you are a public school teacher or administrator, my guess is that you can identify some fairly silly action by your district that was motivated almost entirely by CYA motives, and if you can marry those CYA activities to pedagogy, you've been lucky or have a black belt in administrative maneuvering. (If you have such victories, please describe them in comments! Otherwise, we'll all wallow in the shared misery of observing defensive administering and the all-too-frequent ensuing train wreck.)

I think the federal government can shape bureaucratic behavior to the good by using that risk management and structuring accountability policies around that. And here's the lesson I take from my high-school journalism class in ninth grade 30 years ago: libel law in the U.S. generally recognizes the truth as a positive defense agaist libel allegations. That seems like a backwards way to frame the legal issue -- after all, isn't it common sense that a publication is libelous only if it's false? -- but the notion of a legal positive defense gives an individual or organization a way to organize behavior in a way that is both professionally appropriate and also make a legal defense aligned with professional expectations. Because the truth is a positive defense against libel claims, even an idiotic general counsel for a newspaper or publisher looks to the professionally-appropriate standard: is there documentation that the published work is true?

Sometimes a positive defense is not explicitly part of jurisprudence but evolves as a practical guidance for clinical legal work and internal advice for school systems. Observing procedural and professional niceties create exactly that type of positive defense in special education law. There is nothing in federal special education law to carve out an explicit positive defense for school system behavior, but many articles written by Mitchell Yell over the past few decades constitute a convincing case that school systems now have a de facto positive defense: professional documentation of decisionmaing and scrupulous adherence to procedural requirements are a positive defense against a broad range of allegations by parents of and advocates for students with disabilities.

Yell has argued (persuasively) that due-process hearing officers and judges use procedural adherence and professional documentation as a filter in special education cases. If a school district can document that it has paid attention to procedural mandates and has met professional standards for documenting decision-making, then hearing officers and judges are extremely reluctant to look at the substantive merits of those decisions. But if a school district has ignored standard procedural expectations that most districts meet, or if a school district has kept no or inadequate documentation of its decision-making rationale, then all bets are off and a hearing officer or judge will be much less likely to defer to the school district on professional judgments.

In essence, Yell implies, school districts can avoid adverse judgments if they pay attention to timelines and other procedural niceties and if they keep teachers and principals on their toes about current "best practices" as well as deadlines, notices, etc.  Not all districts are aware of this positive defense, or I suspect that some enterprising special education researchers could make a mint running seminars, "How never to get sued again." 

More broadly, I'm beginning to think that the construction of a positive defense against charges of incompetence would be healthy for school systems and state policies. The devil would definitely be in the details, but instead of being frustrated by a consistently observed school system behavior, maybe we should take advantage of that consistency.

June 25, 2009

See-no-knowledge in education policy?

I seem to be reading several "we don't know anything so let's plow ahead" arguments in education think-tankery, from Mike Petrilli's argument that because we don't currently have a solid research base about how to turn schools around, we shouldn't try, to Kevin Carey's consistent argument in Education Sector's blog that because there is no research consensus about predictors of good teaching (and considerable research suggesting that there is not a link between effectiveness and countable items like years of experience beyond the first few or graduate degrees), it makes better sense to let people into teaching and then evaluate their effectiveness.

Fortunately, that's not the approach of the Institute of Education Sciences under John Easton, which has just announced a large research initiative on turning around schools. I suspect that both Petrilli and Carey would acknowledge that research in difficult topics is a good thing and argue that IES initiatives are different from policy, because sometimes you have to make decisions based on the state of knowledge you have, not the ... oh, shoot, there's Donald Rumsfeld phrasing again. But you probably know what I mean: Petrilli and Carey's stances are policy stances based on topic-specific agnosticism, not opposition to research.

But there's a serious question buried here: on big questions of policy, where you have to make choices, and the research is nondirective, how do you make decisions? I think the answer has to be incrementally, to allow research to catch up and influence policy later. If you make a huge political and institutional commitment to a policy path that has no research support and no ethical/legal obligation, then you're committing millions of children and hundreds of thousands of educators to a path that is very hard to change later. 

For that reason, while I think Arne Duncan's four-choice speech earlier this week is not based on research, and Petrilli is correct that there is no particular reason to believe that charter schools will somehow rescue the education of students otherwise stuck in horrible circumstances, the policy itself is good largely because it doesn't make hard and fast commitments to a particular path. The good thing about a charter is that it can be revoked, and in states such as Florida where there is a single authorizer for a geographic area (here, the county school boards), authorizers can be reasonably aggressive in shutting down shady or incompetent operations. So I share Petrilli's skepticism, but precisely because I am skeptical of any particular approach to schools in crisis, and because Duncan is being wishy-washy, I will applaud the Secretary for being wishy-washy. 

Update: I first used the term "know-nothingism" in the title. Ugh. Bad move for an historian. Petrilli and Carey are not members of the 19th century anti-immigrant party. Mea culpa.

June 18, 2009

The world is complicated, part 752

So the Center for Research on Education Outcomes has a report on charter-school performance, the Center on Education Policy has released a report on student achievement trends, NAEP released art-education data, and the spin has begun. Missing from almost all the reporting: Statements about the extent of peer reviewing for any of these reports. I'm not too worried about the professionalism of these reports,  since I know that the Department of Education always has an internal review process, CEP usually asks researchers in the area to review draft reports, and I would be surprised if CREDO did not have a pre-publication review process. However, the failure to report on the extent of peer review is a continuing and glaring omission in the reporting of education research.

In terms of the substance of the reports, I'm up to my eyeballs in prior commitments, but it's clear from the brief reading I have been able to do that the findings for all three reports are more complicated than the spin emanating for many of The Usual Suspects.* That's not news, I know, but I am the King of Things That Are Obvious Once He States Them, and I have a job to do.

* a great name for an a cappella group, if you happen to be starting one up.

June 13, 2009

On graduation rates and auditing state databases

I sympathize with Florida's Deputy Commissioner of Education Jeff Sellers, finding himself defending the state's official graduation rate the week that Education Week published its Swanson-index issue and pointed to Florida as a low-graduation state, using numbers far below the state's official numbers.

Some perspective: Florida's official graduation rate is inflated, but it's still better than Swanson's. Florida's graduation rate does more than Swanson (i.e., does anything) to adjust for student transfers and the fact that ninth-grade enrollment numbers overestimate the number of first-time ninth graders. 

Because of Florida's state-level database and the programming/routine that already exists, Florida is much closer to the new federal regulatory definition of a graduation rate than many other states, and Commissioner Eric Smith has been preparing the state board and other interested parties for the likely effect of the change on the official published rate -- i.e., that the rate will be a visible quantum lower than the currently-published rates (and largely for the reasons I have explained in the 2006 paper linked above). So in a few years we'll get a closer estimate of graduation from a lay understanding (the proportion of 9th graders who graduate 4, 5, or 6 years later).

The point in the St Pete Times interview where I winced was Sellers's answer to the question of how the state (and the general public) knows that the exit codes entered for a student are accurate: Sellers said that his department conducts an "audit from a data perspective."

That statement is misleading. It is technically true that there is an audit in two senses: each school district is required to check its data for accuracy before sending the data to the state's servers, and the state conducts a search of students reported as withdrawn in one county to see if they entered another county system before labeling them dropouts. But while I have seen reference to checking that the withdrawal codes are correct, I have not seen any evidence that such checks have actually occurred, and I have been unable to find that evidence anywhere on the Florida Department of Education website. That doesn't mean that it doesn't happen, but call me a touch skeptical. Without random checks, there is no guarantee that a 16-year-old coded as a transfer to another school actually was a transfer.

Given Florida's long experience with a state-managed education database, the lack of published audits of this process should caution us about the magic of state databases. They are important, but they need to be done properly. It makes sense to talk about the internal and external checks that should happen as other states construct databases and all states start to conform to the mandated longitudinal graduation rate:

  • Districts will need to be the first party to check accuracy, both in terms of preventing mistakes/fraud but also conducting consistency checks--are there any records which claim that a 45-year-old is attending kindergarten, for example? The first is supposed to happen in Florida, and I suspect that counties catch the low-hanging fruit in terms of errors. But the accuracy check on withdrawal code is the type of check that requires extensive follow-up to document whether a student identified as a transfer did in fact enroll in another school.
  • States will also need to conduct accuracy and consistency checks, though a state will necessarily be far less likely than school districts to catch outright fraud in claiming students transferred when they did not. 
  • States will also have to conduct the cross-checking that Florida currently performs every year and that I describe above: which students move between districts in the same state, but are counted as dropouts because a county only looks at its own students.
  • Finally, the auditing of transfer records would be MUCH easier if there is a standard way for school districts and individual schools to request the transfer of a student record and simultaneously use that authenticated request as verification that a transfer code is appropriate.

This is an incomplete list, but it's a start.

June 8, 2009

No one ever accused Arne Duncan of impersonating an education researcher

Hopefully some day we can track kids from pre-school to high-school and from high school to college and college to career. Hopefully we can track good kids to good teachers and good teachers to good colleges of education.

This was an excerpt from a speech Duncan gave today to IES staff about the need to use data warehouses to link individual teachers and test scores and then use that linkage to evaluate teachers (hat tip). Oh, yes, and do it based on research. Some day, Secretary Duncan, but tying an individual teacher to student performance is not something that you can assert is based on research available today. It is more wishful thinking than anything else. The best apparent on-the-ground research of this type with teacher education is nonetheless full of caveats. And that's on a program-level scale, not on the level of the teacher. 

I'd accuse Duncan of spouting fuzzy logic, but fuzzy logic (the real stuff, research-wise, using fuzzy sets) may be one tool we use to get out of this dilemma.

June 1, 2009

The Procrustean bed of teacher tests

Mike Petrilli's stab at the Sonia Sotomayor nomination via the Massachusetts teacher tests is a little askew, and I'm surprised he didn't look at an obvious dilemma that's deeper than the politics of a judicial nomination. Several former teachers have sued the state (and Pearson) for what they claim is a discriminatory impact of teacher tests given the disproportionate failure rate of minority teachers. This is the employee side of impact-analysis law that most school lawyers probably know better under the graduation-exam cases in Florida and Texas.

The landmark case here is Debra P. v. Turlington, which led to a number of federal decisions that guide the use of tests that have disparate impact in schools. To wit, tests with disparate impact by protected classes are acceptable if...

  • There is a rational state purpose for imposing them (guarantee graduate skills, in the Debra P. case)
  • There is sufficient notice to those affected
  • Those affected have a reasonable opportunity to learn the material on the test (the key reason for delaying graduation test applications in Florida, where federal judges did not want to hold the victims of segregation responsible for the unconstitutional behavior of schools)
  • The application of the test is professionally done (I'm bundling together several separate issues, including the composition of the test, defensible setting of cut scores, multiple opportunities to retake the tests, etc.)
  • There is no better way to meet the state's purpose that also reduces the disparate impact.

In the employment context, Petrilli is probably correct that the translation of the first item is essentially whether the test is a reasonable proxy for necessary teacher qualifications. But there is almost no way for anyone engaged in the current debate over teacher qualifications can defend these tests or defend the teachers' lawsuit without having some fairly severe inconsistencies.

Consider first the folks who have the approach that we should not care who enters teaching as long as we measure student achievement and make personnel decisions as a result. Several (whom I will not name to protect the guilty) have accused the High Quality Teacher standards in NCLB of obsessing about inputs (i.e., what teachers know) in contrast to outputs (what students learn). Anyone in this camp should abhor the Massachusetts teacher tests (and all teacher tests) because they continue the "let's look at the teacher qualifications absent the kids" approach, and we should be moving away from proxies for teacher effectiveness.

But the lawyers for the teachers and their supporters are not in much better shape, logic-wise. It is going to be very difficult to knock the legs out from the state's teacher testing program. They have to argue that the tests are a poor proxy for teacher skill, or that the tests were poorly constructed, or that there is a better option with a reduced disparate impact. If they cannot convince a judge that the tests were constructed and administered unprofessionally, the lawyers are going to be in the uncomfortable spot of arguing that the testing is an inferior proxy for judging teacher quality, in contrast to ... [The conclusion is left as an exercise for the reader.]

Summary: If you are in favor of judging teachers by student learning, then content-testing knowledge is a poor proxy by your own arguments. If you are against the content-based testing, then you have to come up with a better standard that will hold up in court. No, I don't think there's a way out of this for anyone with skin in the game, but if there is no summary dismissal and no evidence of rank incompetence in test construction, the fireworks will be interesting to watch.

Texas, South Carolina, Missouri, and Alaska

I know that the reports of the common-standards agreement shepherded by the Council of Chief State School Officers and the National Governors Association describe a few different reasons for why four states have not joined in a standards framework that is probably going to be about as close to a less-is-more approach as one can get in a bureaucratic standards document. Yes, I know Texas has just drafted standards (as has Florida, which is joining), that Missouri is searching for a new state superintendent (my guess is others are as well), that South Carolina has Mark Sanford (which is enough for any state to deal with), and that we haven't heard from Alaska. But here are my imaginary real reasons for why these states have opted out (thus far):

  • Other states refused to agree that everyone in the country would have to pronounce Harry Truman's state as mizZURah.
  • Texas would have to admit that bidness is not a word.
  • South Carolina did not get its way that there would be history standards with the required benchmark, "All six-year-olds will understand that each state is required to have at least one completely nutty elected official at all times, and this is a heritage of the Founders." 
  • There was a riot, not when Alaska insisted that NAEP math exams all use the Iditarod as an example of measure, rate, and general all-round toughness (other states just wanted to add their own events), but instead fisticuffs broke out when the Alaska rep. insisted that the current accepted size of the Earth was incorrect because if it was as large as most people thought, then you couldn't see Russia from your house.

Unfortunately, I suspect that the truth is far less entertaining. That's okay. We still have Joe Biden and George Will to mangle the facts in an interesting way.

Addendum: Lest anyone think I am making fun of other states, I should be very clear: I grew up in California in the 1970s, and I now live in Florida. That's enough ridiculous states to live in for a lifetime!

May 12, 2009

Should artists know something about money?

It's cringing time for this union activist: Teaching is an art, not a business wrote Hans, commenting this evening on a story about a judicial mandate prohibiting a UTLA one-day strike this Friday. That statement is irrelevant in the specific context (teacher layoffs), is a false dichotomy, and is wrong-headed in other ways. Let's start with the literal claim that art is incompatible with business. The daughter of a friend and colleague went to SMU on a dance scholarship. She was smart and after a minor injury decided to get some business training and is now an administrator in an art-related New York nonprofit. Artists and non-profits need people who are passionate about art and can also manage money (ask members of the Florida Orchestra, which I hear is surviving today in this economy because its new executive director is very competent).

Or to take another example, there's a wonderful segment of Stuart Math's documentary on desegregation in Shaker Heights, Ohio, where one of the old-time activists describes a post-WW2 meeting of residents who were trying to figure out how to create a stable housing market, and a business owner said, "You know, we can be liberal and effective, too."  And they were, running a neighbor-managed real-estate outfit that was crucial in maintaining a stable, desegregated, prosperous community.

So much for the claim that art can't be business and warm-hearted liberals can't think in terms of getting stuff done. But the whole premise is wrong; I don't think teaching is an art. You can make a good argument that teaching is a craft, but there has to be solid practice at the bottom of it. In addition, anyone who is skeptical of the value of high-stakest testing, as I am, has to have something that's just a tad, a teeny, a tiny bit more astute than a statement that screams, "Just let me do what I want when I'm paid with the public purse." That's nuts, both philosophically and politically.

May 11, 2009

"Governance reform" is not reform

While New York rages over mayoral control, which is all the rage, schools in Pinellas County are headed towards The New Site Based Management, which was the rage in the late 1980s and early 1990s and which Bill Ouchi hopes will be the rage again.

While there are plenty of ways that governance can affect the classroom, I am consistently underwhelmed by the argument that governance reform improves what happens in the classroom. I've seen it all before.

May 5, 2009

Florida could still jump forward on end-of-course exams

The St. Pete Times is reporting that the death of the Florida House bill mandating end-of-course (EOC) exams in high school starting in science is the death of end-of-course exams, at least for this year. I'm not so sure. If I remember correctly, the legislature authorized EOC exams in principle last year, and there is an alternative funding mechanism: stimulus dollars. Embedded in the stimulus bill is section 14006, which is part of the $5 billion discretionary amount given the U.S. Department of Education. The state's application for state stabilization funds probably satisfies the nominal requirement for Florida to be aligible for a state incentive fund, if the state asks for incentive funds to develop EOC exams. This is precisely the type of project that the state incentive fund is designed for; it would replace the single comprehensive test with a number of tests tied to specific courses and instead of having to upset science teachers (such as in physics and earth sciences) with subjects not included in the first round (the filed bill in the House excluded them), there could be development of a full range of EOC exams in science. Seems like an obvious "yes we'll do that" to me.

I could be wrong; there may be legitimate reasons not to apply for state incentive funds to develop EOC exams. What surprises me is that during the legislative session, there was no public discussion I am aware of about the possibility of using federal stimulus dollars to develop EOC exams. I have heard nothing publicly at all about this, yet it's been an obvious possibility, at least to me. Has any reporter asked Commissioner Eric Smith about this? Is there any legislator or legislative aide who has asked about it?

April 6, 2009

One teacher's response to Ron Matus's article

There's been lots of coverage of the Ron Matus story March 29 on firing teachers in Florida, but there's been no follow-up online about the letters to the editor that were printed April 4 (last Saturday), and at this point, I can't even find the letters on the Times website. But I think one needs to be highlighted, because it's from a teacher and makes a few important points:

The premise in the article [by Ron Matus] is that tenure makes it too hard to fire bad teachers, yet the few examples given don't demonstrate that, but rather, simply show inaction on the part of school districts.

If the writer had found districts attempting, but failing, to fire bad teachers, he might have a point. I see this drive to get rid of tenure as an effort to instill fear in teachers and keep them silent. Teachers living in fear for their jobs can't afford to speak out.

Getting rid of tenure (read: due process) might make it easier to dismiss the rare teacher who shouldn't be in the profession. It would also make it easier to dismiss the good teachers--even the great ones, because the great ones are the ones who stand up and advocate for their students, themselves and their profession, and in doing so sometimes step on toes...

John Perry, Tampa

I've known John Perry for a number of years; he's an activist in the Hillsborough Classroom Teachers Association, but I don't think he was when we met. I think Perry's wrong about the order of magnitude of "the rare teacher who shouldn't be in the profession" (emphasis added), but since a good portion of teachers leave the field within a few years, I don't think that there's a shortage of ways to discourage teachers from continuing.

More broadly speaking, I think more sophisticated critics of teachers and their unions understand that administrators are the ones who fail to fire teachers, but Perry's other point is important: while K-12 teachers do not have academic freedom in the same sense that higher-ed faculty do, they're the ones I often hear a certain style of reformers praise for precisely the type of dissent that would be in danger without due process.

So let me phrase the question in the following way: does anyone want administrators to be able to fire teachers summarily after teachers do the following?

  • Refuse to change a grade to let an athlete play.
  • Complain that the new math textbook series is confusing to new teachers and likely to lead to poor teaching.
  • Sign and date a request that a child be evaluated for eligibility for special education services.
  • Complain when girls have fewer opportunities than boys.*

As far as I am aware, the only case above for which K-12 teachers are clealry protected when they speak out is the last one, and that's because of a Supreme Court decision stemming from Title IX; I suspect that the are likely to be protected if they push for assessment to gain services for a child, but I don't know of anything as clear-cut as a Supreme Court decision. And I don't see people who are in favor of "tenure reform" rushing to replace workplace due process with greater whistleblower protections.

April 1, 2009

Sharpton paid off? Please tell me this is an April Fool's joke

The New York Daily News is reporting this morning that former NYC Schools Chancellor Harold Levy is involved in a $500,000 payoff set of donations to the Rev. Al Sharpton's organization, with payments beginning shortly after Sharpton and Joel Klein launched the Education Equality Project in June 2008. With friends like Levy,...

In other news, I am hereby announcing my support for the public flogging of teachers whose students' test scores decrease from year to year, my hope that NYC invests an addition $1 billion in the ARIS system, my trust in the market to determine the true worth of schools within a voucherized environment, and my death last Thursday from reading Michele Foucault. In lieu of flowers, my family is asking that donations be made in my name to the John Birch Society, except for my son, who would appreciate iTunes cash cards instead.

Okay, it looks like the DN story is serious. Yikes. That'll take the wind out of the Education Equality Project (EEP) conference starting today. Then again, maybe "eep!" is the reaction of participants and fans of the Klein-Sharpton effort.

March 30, 2009

Seattle will be drier

I spent some time this weekend finishing the first complete draft of a talk I'm giving in Seattle on Thursday. I'm going to be heading there while a few thousand historians are leaving Seattle after the end of the Organization of American Historians meeting. I'm either expecting to find a time machine or I am heading there for a different meeting (Council for Exceptional Children). Last time I was in Seattle, it was wetter and colder than what's forecasted for the middle of this week. We had a drenching rain in Tampa this morning, so things will even out in my personal experience this week, even if not for the world.

I hope my neighbors weren't paying close attention while I was timing the draft. I don't read papers word-for-word, but I wanted to get a sense of how far I'm off on time, so I read it aloud while alternating between the laundry room and the kitchen.

Oh, the topic? Accountability and students with disabilities. I think I know how I'm ending the hour, but the cliffhanger before the third set of commercials is the tough part right now, and I haven't yet decided if Jason's going to live. If he does, I'm going to have to tear up the last act and start fresh. I've given a spoiler, haven't I?

More seriously, this talk is giving me the opportunity and prod to think through some connections between areas of education politics that I mentally put on "percolate": the democratic rationale for public education, tensions between public and private purposes of schooling, and what technocratic mechanisms may be useful for (and in what circumstances). When I get back, I have to think about potential outlets and how to get a potential coauthor to give up enough time to participate (and the value involved in that). 

The only serious performance question I have is the extent of corny jokes and how far I can/should push them.

  • An RTI Tier 2 intervention plan and a Writ of Mandamus walk into a bar...
  • Peter Singer dies and finds himself at the Pearly Gates facing St. Peter: "So your most important goal right now is to avoid pain?" St. Peter begins...
  • How many IEP team members does it take to screw in a lightbulb?...
  • A rabbi, a minister, and a psychometrist are in a rowboat in the middle of the lake...

Maybe not those jokes.

March 17, 2009

Longitudinal data systems, good; unique teacher linkage, bad

Diane Ravitch's blog entry this morning seriously disparages the value of longitudinal data systems, including the linking of teachers to students, and John Thompson's entry discusses the abuse of data by administrators. Essentially, both Ravitch and Thompson fear the brain-dead or conscious abuse of data to judge teachers out of context. That's also the reason why NYSUT (the New York state joint NEA-AFT affiliate) worked hard to convince the legislature to put a moratorium on using test scores to make tenure decisions; Joel Klein was moving very quickly, and I think UFT and NYSUT had good reason to believe that without the moratorium, there would be substantial abuses of test data in NYC (and elsewhere) in tenure decisions. 

My take: longitudinal data systems are a good thing, but linking teachers to students is a much more fragile undertaking.

Florida has a longitudinal data system that began in the early 1990s and has been used for 10 years to judge schools based on test data. Approximately ten years ago, I sat in a windowless room in Tallahassee as a Florida DOE member discussed the new A-plus system and a variety of technical decisions tied to it, and for which he had brought stakeholders and a few yahoos from around the state to give advice. I was one of the unpaid yahoos who had the great joy of flying in tiny airplanes several hundred miles a few times a year to give advice on the matters. 

We had so many matters to discuss that one minor conversation was almost overlooked: a state mandate that required that the FDOE link each student to a teacher primarily responsible for reading and math. One state official showed us a draft form and then explained the concerns he had about it: in his view, the state that had tried that a few years earlier (Tennessee) had multiple conceptual difficulties connecting individual teachers to individual students. But they had run roughshod over those concerns, and he anticipated that Florida would do the same.

It wasn't a matter of letting teachers off the hook (this now-retired professional staffer is what I think of as an accountability hawk) but logic and sense. How many physics and chemistry teachers help students understand algebra better? How many history teachers help students with writing or reading? For students receiving special education services in a pull-out system, do you want only the special educator to be responsible for a subject, or do you want both the general-ed classroom teacher and the special educator to have responsibility? This spring, my wife (a math major and special educator) is tutoring a local child in math on weekends or evenings; so who should get credit for how he performed on testing in the last week, his teachers in school or my wife? Today, you can add NCLB supplemental educational services (or after-school tutoring) to the mix. 

The larger point: even if you decide to wave away the concerns of Richard Rothstein and others, even if you focus entirely on what happens in academic environments, it is fallacious to link every student performance with a single teacher. If we are providing the appropriate supports for children, then the students with the lowest performance are the ones for whom such unique linkage assumptions are the least justifiable, because they may be receiving academic support from general education classroom teachers, from special educators, from after-school tutors, and maybe mentors or other providers in neighborhood support organizations (such as Geoffrey Canada's). Today, I do not think one can parcel out responsibility without making assumptions that have no basis in empirical research. Those who support individual teacher linkage have the burden to demonstrate otherwise.

March 12, 2009

Joel Klein as DM

John Thompson's blog entry today, God Does Not Play Dice, is in response to Charles Barone's Ed Sector report on value-added or growth models used for high-stakes accountability. (It's on my to-read list along with the IES/Mathematica study on teacher ed programs and various other things.) Thompson describes a number of caveats and then says,

...none of my objections would be major if the model was used for purposes of diagnosis, science, or a "consumers' report." We should pursue social science fearlessly, but we must not play dice with the lives of teachers by evaluating them with some theoretical work in progress.

That plays off Einstein's quip, "God does not play dice," in reference to quantum mechanics. That comment always made me think that if God does not play dice, maybe God forces you to pick up the dice and roll.

And that gave me the image of Joel Klein as Dungeonmaster.

A troll has just entered your classroom. He has a mace, a strength of 11, and 16 hit points.

After the Cafeteria Blob you threw at us, I only have 4 hit points, and I lost my Spitball Blocking spell.

Fight or run away?

Better fight; if I run away, I lose the Memo Spindle.

Better hope you're lucky. You need to roll a 17 to block the mace, 20 to break it.

But you're only giving me a D12!!

This is New York. You're tough enough. Roll.

March 10, 2009

Get Accountability Frankenstein for $10!!

Information Age Publishing is having a ten-year anniversary sale where you can get 10 or more books from their catalog for $10 each. Their authors, editors, and series editors include Gene Glass, Ernie House, Erwin Johanningmeier, Terry Richarson, Tom Popkewitz, Kathy Borman, Kenneth Wong, Jaekyung Lee, Maurice Berube, V.P. Franklin, Carol Camp Yeakey, and many others.

March 2, 2009

Take a breath (if you don't have asthma) and go on

I don't have asthma, but as my head cold morphs into the ordinary misery of seasonal allergies, I realize it's a darned nuisance not to be able to breathe comfortably. With luck I'll shortly be back to normal (or at least for what passes as normal for me), and in times like these, it pays to take a deep breath on receipt of almost any news and criticism. Evidently, my perspective lies somewhere between former Hill staffer and new DFER policy guru Charles Barone and NYC union activist Norm Scott, because I'm getting dished on by both. I'm not going to use the lazy journalist's excuse, "Because both sides are criticizing me, I must be right," in part because I'm not a journalist, in part because it's easily possible to be wrong about multiple things at once, and in part because while I disagree with Barone's and Scott's posts, they (generally) have the guts to say where they disagree with me. Oh, yeah, and they spell my name right. That counts for a lot with me.

Barone criticizes me (and others) for writing too much from an adult's perspective. I've written about that topic before (at length in Accountability Frankenstein and in more digestible chunks in One-Blog Schoolhouse), so let me provide a somewhat different gloss here: I could easily turn my blog over to several guest writers, my children and their friends. I suspect Barone's response to their criticisms of high-stakes testing would be, "Well, I know a little more about the world and your own best interest than you do." That statement would be absolutely right (at least in the first half) and an absolutely adult perspective.

(Incidentally, I agree with his substantive point in his entry that teacher happiness is not the point of either education policy or teacher education. I don't think that you can usually have effective teaching with completely miserable teachers, but I suspect or at least hope Barone would agree with me, and there's plenty of ground between avoiding total misery for teachers and seeing their euphoria as the primary goal of policy.)

Scott criticizes me (and others) for ignoring the fact that Arne Duncan was flawed as head of the Chicago Public Schools. Er, no. I'm fairly sure I'd have disagreed with him on a number of his decisions in the same way that I am fairly confident on where I'll disagree with him on federal education policy. But that open expectation of some disagreement does not mean the Obama administration is evil. Scott asks, "Exactly how much 'context' do these people need?" I'd say 20 years of Republican presidencies divided by 8 years of Bill Clinton. In comparison with Bill Clinton on the whole, Obama is good. And in contrast to the others, he's very, very good. That doesn't mean that I'm going to stay quiet when I think the administration is doing something wrong. It means I do have some perspective. Breathe, folks, breathe. For those who are worried about Arne Duncan, I think you'd do much better to putting your energies into worrying about Timothy Geithner instead.

February 25, 2009

On exaggerations in the service of bitterness

Today, Charles Barone indulged in some recriminations about the use of test data to evaluate teachers: "In fact, in many states there is tremendous pressure to pass legislation which assures a firewall-like separation between teachers and student performance. Such laws have already passed in California, New York, and Wisconsin; ..."

But let's examine that claim with regard to New York, about which others such as Kevin Carey and Jennifer Jennings wrote last April. The language:

3012b. Minimum Standards for Tenure Determinations for Teachers.

(a) A superintendent of schools or district superintendent of schools, prior to recommending tenure for a teacher, shall evaluate all relevant factors, including the teacher's effectiveness over the applicable probationary period, or over three years in the case of a regular substitute with a one-year probationary period, in contributing to the successful academic performance of his or her students. When evaluating a teacher for tenure, each school district and board of cooperative educational services shall utilize a process that complies with subdivision (b) of this section.

(b) The process for evaluation of a teacher for tenure shall be consistent with article 14 of the Civil Service Law and shall include a combination of the following minimum standards:

(1) evaluation of the extent to which the teacher successfully utilized analysis of available student performance data (for example: State test results, student work, school-developed assessments, teacher-developed assessments, etc.) and other relevant information (for example: documented health or nutrition concerns, or other student characteristics affecting learning) when providing instruction but the teacher shall not be granted or denied tenure based on student performance data;

(2) peer review by other teachers, as far as practicable; and

(3) an assessment of the teacher's performance by the teacher's building principal or other building administrator in charge of the school or program, which shall consider all the annual professional performance review criteria set forth in section 100.2(o)(2)(iii)(b)(1) of the Regulations of the Commissioner.

The part that was added last spring is in italics, but the rest remains, including clear performance references in bold. How are we supposed to read the combination of "the extent to which the teacher successfully utilized analysis of available student performance data... when providing instruction" together with the ban on granting or denying tenure "based on student performance data"? I'm not a lawyer, but obviously there has to be data for one to judge teachers on how well they use the data. My reading (which I think is plausible) is that one couldn't make a blanket decision based only on test scores, but you could grant or deny tenure based on how well a teacher used the data in adjusting instruction. This latter is pretty close to the best-world scenario of Response to Intervention (RTI) policy, which has a lot of research at least in core areas in elementary schools. In comments on Barone's entry, I wrote,

I think we may be reading the same legal language with very different lenses. To me, the tenure-qualifications language in NY state essentially conforms with RTI -- teachers have to show that they can use data. Those upset with the added language for this year -- which bars a brain-dead statistical formula -- must think it would be as appropriate and also easier to define effectiveness with test scores as what is currently allowed/required by law. Me? I don't think there's anything that's easy here to implement in a fair way, and there ain't yet no Holy Grail. I also suspect that there is no provision in NY law that prohibits the type of analysis of teacher education that Louisiana has been building for the last 5-7 years. Either I'm reading your definition of a firewall too broadly, or I'm misreading NY law.

Here is Barone's response, word-for-word (the bold-faced sentence is my emphasis):

It seems to depend on how you define "brain dead." The data can't be used, thoughtfully or otherwise, to inform tenure decisions. Whether there is a holy grail, or it hasn't been found, remains to be seen. But surely everyone agrees that poor and minority kids are getting the short end of the stick, and data available now can and should be used to help level the playing field for kids while we adults have our fun little debates. I notice you rarely use the word student or child, unless you are quoting me. I think we need to err on the side of the kids for a while even if it makes adults uncomfortable. If we wait for there to be a consensus among academics, today's kindergartners will be collecting Social Security before anything is done. If then.

The "bitterness" referred to in the title of this entry refers to this response. I'm disappointed by Barone's avoidance of the substantive topic by applying a rhetorical litmus test (how often I mention children in my blog), as well as the politician's logic here (something must be done; this is something; so we must do it). But let me get to the point: Barone is misreading the law. Data can be used to inform tenure decisions, and in fact, they must be, because the law requires that part of the tenure decision depends on teacher use of data. No data, no use of data -- no tenure. It may not be Barone's picture of how data informs a personnel decision, but Barone's claim is just plain wrong

Addendum: In comments, Barone argues that the New York state law is clear and bars use of test data for making tenure decisions. Here's the way to decide it:

1) Does New York law prohibit a district from denying tenure because a teacher refuses to implement Response to Intervention practices?

2) Is Response to Intervention something based on student performance data?

If the answers are "no" and "yes," respectively, I'm right. Any other combination, and Barone is right. Let's try another scenario:

Main office conference room, where the assistant principal is meeting with a new teacher. "Let's look at your student's last quizzes and talk about where they learned the material well, and where you might want to reteach."

The teacher holds up his hand. "Wait a minute. Am I going to be judged based on what I say in this meeting?"

The assistant principal nods her head. "In part, what I'm judging with your effectiveness is how you respond to student needs. C'mon. Let's just look at the quizzes."

"No way. State law forbids the use of student performance data in tenure decisions. I'm talking with my union rep!"

If Barone is right in the global sense, this conversation could really happen. But I don't think it could (or has). When Barone claimed that New York had put a "firewall" between teachers and performance data, I know he was thinking in the narrow sense of "if students perform poorly on standardized tests, then we should be able to deny tenure." But regardless of whether that is a good or bad policy, that's not the only way one can connect teachers and student performance. Expecting teachers to look at student performance and change instruction based on data is a second way, and New York does not bar it. Looking at teacher education and student performance is a third way, and New York does not bar it. Which of those three is good policy is an interesting and debatable question, but what is not debatable is that all three connect teachers to data.

February 20, 2009

Technology and assessment

Education Sector's new report Beyond the Bubble is shorter than I had expected, so I finished it while watching the end of my son's tae kwondo class last night. It looks to be a decent summary of the optimistic side of technology-and-assessment literature. Its tone is, "Yes, we can dramatically change and improve assessment with technology that is either just about to come online or that deserves some investment." And I think that for some things, that's absolutely right: an online/computerized science exam could have color images of tissue slides, videos of animal behavior, and so forth. But, while author Bill Tucker bowed his head in the direction of friendly technoskeptic Larry Cuban, there are some flies in the ointment:

  • Students with disabilities. This is true for pencil-and-paper tests as well, but when you only have black ink, there are a few other issues you don't have to worry about that on-screen designers have to: red-green color blindness, epilepsy and screen movement, etc. The half-page on universal design is good, and any CFP will need to specify (and budget for) disability/accessibility awareness.
  • Code creep. I don't mean internet safety but the fact that programming languages grow up and die. We've gone from perl to python, from HTML to XML, and languages and interfaces will continue to evolve. I wonder how many of the cases pointed to in the report are essentially one-off projects that will die at some point because the platform no longer exists. (Any readers remember Infocom's text games?)
  • Holy Grail syndrome, also known as a belief in "the leap in cognitive science that will allow perfect, automatic scoring of essays is just around the corner." Same with the great and brilliant analysis of hundreds of microstate data that a single student can generate in a simulation environment. I trust colleagues who work in cognitive psychology to do some great things in the next decade, but this seems a bit utopian. Okay, more than a bit.

All of this doesn't say we shouldn't be engaged in using technology, but maybe we should work along two tracks: encourage the fast, frequent, and flexible for now and also invest in the medium- and long-term projects.

There is something that the paper never addresses: intellectual-property rights. Part of the imprisonment of assessment in an oligopoly is the ownership of assessment materials, backed up by the fear of security problems. (Here's reality for you: the day after a state test is given, assume NO security for that test. None. Despite all the laws. Just give that idea up, folks, unless you believe in the tooth fairy, have never heard of BitTorrent, and don't think college students ever cheat.) I am curious what the position of various folks are on open-source assessment. I am not entirely sure what it would consist of, or how it would meet adequate technical standards, but it's tough to argue that despite the testing industry's oligopoly status, we should suddenly think that a brand-new investment will erase both the proprietary rights of the major firms or the start-up threshhold for the creation of commercially-viable products.

February 6, 2009

Klein compares Bloomberg to Putin

No, he didn't, but at the mayoral-control hearing in Albany, according to the indefatigable Elizabeth Green,

Klein defended himself passionately, arguing that mayoral control is a democratic governance structure, not an authoritarian one, as some members painted it.

The logic here is weak: under that view, a plebiscite dictatorship is democratic because every few years the head honcho could be kicked out of office. 

I think there are multiple reasonable approaches to the policy question, such as UFT's "you need two (more) righteous people to save Gotham" proposal of giving the mayor a plurality on the main policymaking body (so the mayor and chancellor would have to convince 2 out of the other 8 members) or something that would give an independent body subpoena authority and the responsibility and right to issue reports on the schools.

But the gist is to inject public accountability beyond the one-person constituency of Joel Klein. I'm a little curious why advocates of mayoral control don't grasp the fundamental irony that you don't create accountability by removing it. There are multiple ways of addressing the messiness of urban politics, but if the appointed chancellor has spent several years ignoring parents, he's getting his natural comeuppance today.

UTLA and "benchmark" or "periodic" testing

Last week, the United Teachers of Los Angeles called for the cessation of every-few-months testing in the district. The response of the district: such testing is an important tool in improving student achievement, which they know because schools with such testing have had annual-test scores higher than schools without such testing.

The flaw in the district's reasoning is left as an exercise for the reader, because I'm more concerned at the moment about what this debate shows about our attitudes towards assessment. UTLA is wrong to attack frequent testing on principle, though I think they may have a good point about this type of assessment. Such periodic assessment may help schools target assistance to students, or they may serve primarily to mimic the state test and encourage teaching to the test (the predictive success of which principals would know by results on the quarterly assessments). Without knowing more about the details, you can't say which is which, and both phenomena are possible (including in the same school).

What concerns me is the direction in which the machinery of testing is taking formative evaluation. There's a lot of research to suggest that when used to guide instruction, frequent assessment can dramatically change results. There are a number of technical questions about so-called formative assessment (or progress monitoring) that are the domains of researchers in the area: how to create material sufficiently related to key skills or the curriculum, how to create assessments where score movement is both meaningful and sensitive to change, how to gauge appropriate change, how to structure the feedback given to teachers, and so forth. My reading of the literature (which is not complete) is that the most powerful uses of formative assessment require very frequent, very short assessments--on the order of once or twice a week, and about the same length as your typical elementary-school spelling test (i.e., a few minutes at most). 

So what do we see as the evolving, bureaucratic version of formative assessment: long tests taken every few months. That's better than once a year in terms of frequency, but it's still a blunt instrument and absorbs a large chunk of time. The reason for this preference is obvious: a large, unwieldy school system can organize systematic evaluation/feedback around quarterly tests. That's doable. But organizing around something that's taken weekly and would often require data entry (e.g., a one-minute fluency score for first- and second-graders)? That's a different kettle of fish.

That doesn't mean it's impossible. It's easy, if you're a principal who's willing to devote the right resources. Consider reading fluency, for example. (I'm not saying that fluency is more important than comprehension. I just have the experience with this to imagine what I'd do as a principal.) Teach a paraprofessional to have every first- and second-grade student in the school read to them one minute a week on a sample reading passage (there are sets of roughly equivalent passages one can purchase for this purpose). Have them enter the data through a Google Docs form, a SurveyMonkey survey, or some other tool that will send the data to a spreadsheet. Get someone to program the results so that you can show data per child with trend lines and sort by grade, classroom, etc. For a few extra lines of code, you could add locally-weighted regression trends to be really fancy, but that's beside the point.

Here's the point: this is not rocket science, this does not require a gazillion-dollar software package from TestPublisher Inc., and it's very different from the type of quarterly testing that superintendents are buying into in a big way (including that gazillion-dollar software package from TestPublisher Inc.). It's very different from the quarterly testing that UTLA is protesting.

So, Ramon Cortines, here's my challenge: can you document that the quarterly-testing regime is better than the weekly-quiz-plus-trends proposal I've outlined above? The second can fit easily into the routines of any school. The second can start conversations EVERY WEEK at a school. The second is MUCH cheaper. It's also less sexy: no giant software packages manipulable from the front office, no instantly-printable pastel-colored graphs that demonstrate what kids were able to do on a test six weeks ago. You'd definitely give up the flashy for the mundane. But prove to me that the flashy is better than the mundane.

February 5, 2009

What personality is your Performance-Pay Attitude? (and other mixed metaphors)

Since other bloggers I read have used various quizzes to spice up their entries, or maybe do something online while they're waiting for a bus, here is the all-purpose Performance-Pay Personality Quiz. Oh, wait: "personality" isn't quite appropriate here. But to mix metaphors, what personality is YOUR attitude towards performance pay?

  1. Do you think that there is ever a justification for some teachers' being paid more than others?
    • 1 point -- A paycheck is performance pay: either pay people a good wage for doing their job, or fire them for not doing it.
    • 4 points -- Some differential pay is required to encourage teachers to take hard-to-staff jobs (either by subject or school), and that's more important than merit pay.
    • 7 points -- On balance, performance pay would be a good thing, but it's not the most important thing to change in schools.
    • 10 points -- Performance pay or bust: I'll throw everything else out the window to get it!
  2. What's the most important motivation for teachers and administrators?
    • 1 point -- They love children; that's their only motivation.
    • 2 points -- Personal integrity is a more powerful motivator than salary. Teachers need salaries, but if you can show teachers how to feel better about the job they're doing (including showing them how to do a better job), you can move mountains.
    • 3 points -- Money's an important part of the picture. It's not the only thing, and seeing money as the only motivational tool would be foolish public policy, but to ignore it would be wrong.
    • 4 points -- There's nothing like money to get people's attention, and teachers are people.
  3. How important is it for education policy to encourage educators to work together?
    • 1 -- Teachers are not islands: rewarding individuals will kill the type of mentoring and sharing that's essential for professional development. Doubt me? Go ask stock-market traders who entered their career recently whether individual rewards encouraged their elders to mentor them... or spend every second on the floor trying to make a buck.
    • 2 -- Cooperation is crucial. It's not everything, since all teachers have strengths and weaknesses, and we don't want a school full of Stepford Teachers, but I worry that too much emphasis on individual recognition will discourage teachers from talking to each other, and from any chance that teachers will hold each other accountable.
    • 3 -- Teachers' talking in a lounge is like little kids' hugging each other. Often it's wonderful, but you sometimes worry what they're sharing. Individual recognition is pretty important to give credibility to the better and more professional teachers.
    • 4 -- Teacher go it alone anyway: recognizing their achievement as individuals is unlikely to harm the type of substantive collaboration that happens rarely.
  4. What is the right balance between judging teachers based on the professional judgment of peers and using student performance?
    • 1 -- Peer judgment: they're the ones who know what good teaching looks like, and what we care about is whether teachers are teaching well.
    • 2 -- Er... wouldn't peers be interested in what students are learning? Student performance should be part of the mix, as one springboard for evaluation. But peer judgment should be central.
    • 3 -- Student performance should anchor qualitative judgments of teaching. Yes, peers can judge teachers, but student performance should be central.
    • 4 -- Skip the peers. What matters is whether students are learning.
  5. How ready is the technology of testing to use in judging individual teacher and school performance?
    • 1 -- When the solid historical record of more than a century shows that people have abused tests in every decade, we should assume that tests will be misused, and it's the burden of high-stakes testing advocates to show otherwise.
    • 2 -- Tests are useful, but we're far from being sure that tests tell us what most politicians think they tell us.
    • 3 -- They're imperfect, but we need to start using test scores to judge effectiveness now because we can't wait for tests to be perfect to look at performance.
    • 4 -- They're just fine, and they have been for years.
  6. What role should collective bargaining play in education reform?
    • 1 -- Collective bargaining is crucial to protecting due process and teacher rights, and if possible to block stupid reforms.
    • 2 -- Collective bargaining is crucial to protecting due process and teacher rights, and unions can play an important part of reform.
    • 3 -- Collective bargaining is primarily an obstacle to important reform. Where unions will accept reforms, great. Where they won't, federal and state governments have powerful incentives to change the balance of power at the local level.
    • 4 -- Federal and state governments should do their best to break unions, because they do nothing good. Break them, circumvent them, discredit them with their bargaining units.
  7. What should be the ceiling in terms of paying for performance (both the total amount of money and how many teachers should be eligible)?
    • 1 -- Arguments in favor of performance pay are a cover for not wanting to pay teachers more. Those who work with children are generally underpaid, and while performance pay looks like it's in "the children's interest," in reality it's another way of being cheap.
    • 2 -- Part of my skepticism about performance pay is the assumption that only 10-25% of teachers should receive it. To these brilliant people, I ask: "Okay, suppose there's performance pay and every student meets whatever is your definition of proficiency by 2014. Does that mean you'd be willing to double teacher pay for that result, or is this an education-reform shell game?"
    • 3 -- Part of my acceptance of performance pay is looking at the numbers: there are lots of students, and it's almost impossible to staff every classroom with a brilliant and greatly-skilled teacher. So let's pay the great ones the best. "In a perfect world we'd double teacher pay" is another way of saying "never."
    • 4 -- Competition is the best way to motivate individuals, and you're going to get little competition if everyone can earn a bonus. Limit performance pay to the top slice of teachers.

Psychometrics-free labels to share with frenemies and colleagues:

7-11: You are Alfie Kohn. You'd really like the testing industry to suffer an ignominious death, and anyone who thinks that using tests will improve schooling is smoking something fairly powerful.

11-16: You are Reg Weaver. You are publicly skeptical of merit pay, you think most designed systems are going to be disasters, but you're also going to hold your nose and support teachers who decide it's in their best interests.

17-23: You are Randi Weingarten. You know that the American public is used to people making more money if they do a better job, but you're skeptical of most performance-pay plans in operation today. You think collective bargaining is the best way to moderate the more idiotic ideas surrounding teacher pay and to protect the legitimate interests of teachers and communities.

24-28: You are Thomas Toch. You're well aware of the flaws of testing and accountability systems, but you think moving in the direction of performance pay is essential, and you will trust that the system can be improved incrementally once it's started in the right direction.

29-34: You are Michelle Rhee. The day that teachers have a starkly uneven pay scale, the day that school districts fire a fifth of their teachers, and the day that unions are decertified around the country will be the day you will not only take up that Newsweek broom again but dance with it a la Fred Astaire. 

(Don't like the questions? Fine: make up your own completely unscientific spoof of internet quizzes!)

January 13, 2009

Oversight boondoggle

Last week the Wall Street Journal lambasted Florida Governor Charlie Crist for failing to appeal a ruling that struck down the Florida Schools of Excellence Commission as an unconstitutional infringement on the powers of county school boards in Florida. The legislature wanted to set up the FSEC as a second authorizer of charter schools in case county boards were unfair and refused to let enough charter schools open. This bewildered me because Florida has no statutory cap and there are a few hundred charter schools in the state.

This afternoon, I remembered a blog entry written by St. Pete Times reporters in December: the FSEC has been spending the people's money like it was water, racking up almost half a million dollars in expenses over two fiscal years without authorizing a single charter school that has yet opened its doors. 

Isn't the Wall Street Journal supposed to have a conservative fiscal philosophy?

January 12, 2009

Deantidisestablishmentarianism in education policy rhetoric

Joel Klein and Al Sharpton wrote an open letter to Barack Obama and Arne Duncan that appeared this morning in the Wall Street Journal. And I have just a few questions about this:

  • How can the sitting chancellor and a long-time civil-rights activist claim to be railing against "the entrenched education establishment" when you could reasonably conclude that they are The Establishment?
  • Why do they think that placing a column in the WSJ establishes their anti-establishment street cred? That newspaper isn't exactly an underground pamphlet.
  • Isn't Klein the type of guy who already has Arne Duncan's cell number? They're fellow urban superintendents, they've talked at meetings, and you assume he could call Duncan up at any time, and probably get Obama's number as well. So why do they need this open letter--do they feel this deep psychological need to pose as Village Voice rebels with a cause?

Klein and Sharpton are setting up a straw-man opponent. In my masters class in the fall, one of my students argued that accountability is well-entrenched as part of the public-school policy script. Whether you want to use Tyack and Cuban's "grammar of schooling" or Mary Metz's "real school" language, I think there's a case to be made that anyone who claims that accountability is "new" is in denial and as punishment should have to watch three or four consecutive playings of an inane 1980s adolescent-rebellion film.

So someone who is less establishment than Joel Klein would be... anyone? Anyone?

Second thought: For a few years, I've had the suspicion that the public "letter to the next president" was a bit precious (in the pejorative sense). The collections of letters to the president published after the end of an administration are usually drawn from the sample of correspondence from ordinary Americans that the White House staff select for a president to read as a reality check. Even if Klein gets some credit in my book for having a salary far less than what either New York financiers or university presidents are commonly receiving these days, in no way could one call Joel Klein or Al Sharpton "ordinary Americans."

So if Joel Klein gets to write a "letter to the next president," though we all know he could call Obama up with ideas about either antitrust policy (his Clinton-era gig) or education policy (his current gig), then the gloves are off. I'm writing a letter, too! And you know from my loving hardass manifesto that I intend to bring some style to it. So here's the rule for 2009, for all of you: Staid pretentious public letters to the new president are out. Your job is to write the most outlandish letters that tell the truth. Come on: it's going to be the Obama era. You can say it.

One more update: Apparently Margaret Spellings doesn't have Arne Duncan's cell number, either! Or at least she's pretending not to. Isn't it so nice of major papers to devote part of their ever-shrinking news hole to long classified ads from major policy honchos who can't navigate their cell-phone menus? Though I think the following would have been free on Craigslist: "Arne: call me. Margaret." What? The Post may have been joking? Oh, yeah, and that's a good use of newsprint...

December 17, 2008

Okay, it's Arne Duncan. Back to the substance already, willya?

The following is one of those trick questions you should never answer: Was Arne Duncan appointed because he's a cipher/Rorschach test for those with an axe to grind in national education politics, or is he an appointee primarily because of his personal and political connections? In between other tasks, I've been reading the comments flying past at half the speed of light, and after the most sensible and well-grounded supporting piece I've seen yet (disclosure: I'm a sometimes contributor to the blog), I've been reminded of Stephen Carter's response when asked if he ever benefited from affirmative action: so what?

So what if he's a policy cipher? He won't be making decisions by himself, and if anyone has a bully pulpit on education, it's going to be Duncan's boss. What matters is the collective decision-making, including the debate over the hard decisions to be taken with NCLB. 

So what if his appointment is far more closely tied to networking than many of the other Cabinet appointees? He'll now be in a far more public and less insulated role than as aide to Paul Vallas or the CPS head serving at the pleasure of Richard Daley. He'll rise or fall on his own merits, at this point.

As I wrote six weeks ago, let's move on to some discussion that is less personality-based.

November 23, 2008

When the news hole shrinks, any mention is a blessing... well, sort of

Adam Emerson used to be the Tampa Tribune's higher-ed reporter. As the Tribune's owner Media General has been laying off reporters and editors left and right over the past year, assignments have shifted, and Emerson now has the K-12 education beat. So when he called me up with the news, it was also to ask about Florida's graduation rate. Basic story: in the last week, the Florida Department of Education released its annual data on graduation. They published two sets of statistics, both including and excluding GEDs from the number of students in each cohort receiving a diploma. They did not publish the alternate rate that they will have to start publishing in a few years, where the students who drop out to take GEDs will still be part of the cohort schools are responsible for. Some progress in transparency is still progress, and as I told Emerson, Florida's education commissioner is smoothly preparing both his board and the public for when the official graduation rate drops because of the change in definition. I suspect he may also be giving signals to the superintendents around the state that they'll no longer be able to hide problems with the dropout-to-adult-GED path or with GEDs.

We talked about this and other topics in a longish phone call, and as I usually do, I wished him well on the story, especially on getting enough space for it. Well, Emerson's story is now published, and in a 130-word story, my name is in there three times. He's a good reporter, and any gap between the published story and the first paragraph above is entirely a matter of the space he had to tell the story. I like seeing my name in print as much as the next yahoo, but yeow, that's a rapidly-shrinking news hole.

November 15, 2008

NCLB music

Bill Wraga, at work a mild-mannered U. of Georgia faculty member, has recently uploaded the latest NCLB/ed reform song I've come across. Some others:

This is certainly not the first time that education issues have been set to song. Doggerel is a longstanding tradition among students around the world, and sometimes it's a ritual. (One of the traditions at Bryn Mawr College is the three evenings in the year when most of the undergraduates gather and sing a bunch of songs about campus life, with lyrics in both English and Greek.) Tom Paxton is a wonderful songwriter, but his song is not his best. I'm hoping to find a set of lyrics for ed reform that has a bit of whimsy, is set to "I'll Fly Away," or is written by students.

October 31, 2008

Happy Halloween, and now read my book!

Charles Barone chose Halloween to point to my proposal for post-NCLB federal accountability policy. For the record, despite what the picture on my website implies, I really look like the hunk of handsomeness that's at the top of Barone's entry (well, on the right side of the picture). I appreciate the link and hope folks will leave a comment on Barone's entry. (Commenting here won't count.)

Federal influence

Mike Petrilli asks one right question: where can the federal government influence behavior, and what are the tradeoffs? I'm especially delighted that the research in question is about desegregation. As I've written before, the argument against top-down reform by David Tyack and Larry Cuban is smart, sensible, detailed, and fits with an enormous amount of historiography... but it doesn't address desegregation. I'm not headed entirely towards Nudge territory, though I much enjoyed the book, and part of the reason is that there is a role for top-down policy imposition. We just have to be very careful about how that power is used.

NCLB regs and graduation rates

A few quick ones this morning, while my brain warms up... So the new NCLB regulations are out. (Or, rather, they were out a few days ago, but I've been putting out fires while in the midst of a cold, and this was a lower priority.) Atlanta Journal-Constitution reporter Laura Diamond asked on Wednesday, Will NCLB changes improve grad rates? The obvious answer is yes and no: yes, the measures mandated by the federal government will be much better than the goat-rodeo world of dropout measures that currently exists, but, no, better measures will not move the world in themselves. After almost two decades of looking at attainment and dropout-prevention and -remediation programs, I am no longer surprised when people look to vocational education, personal counseling, and (these days) credit-recovery programs as solutions to dropping out. They may all be good on a small-scale basis with some students, but I worry when people reinvent the wheel and think they're hot stuff.

September 12, 2008

Shared responsibilities III: The next ESEA

Over the summer, Charles Barone challenged me to put up or shut up on NCLB/ESEA. I immediately said that was fair; Accountability Frankenstein had a last chapter that was general, not specific to federal law. I'm stuck in an airport lounge waiting for a late flight, so I have an occasion to write this now. Because I'm on battery power, I'm going to focus on the test-based accountability provisions rather than other items such as the high-quality teaching provisions. Let me identify what I find valuable in No Child Left Behind:

  • Disaggregation of data
  • Public reporting
I think most people who don't have their egos invested in NCLB recognize that its Rube Goldberg proficiency definition has no serious intellectual merit and has been a practical nightmare. Yet there is the policy dynamic that observers in the peanut gallery like me can recognize, which is the practice of states in gaming any system, and the way that such gaming undermines the credibility of states with those inside the Beltway. So there's a solid justification in a continued regulatory regime if it is sane and recognizable as such by most parents and teachers (i.e., the connotation of "loving hardass" that I meant in a prior post and that some readers have recognized). I'll have to write another entry on why I think David Figlio is wrong and why teachers are not magisters economici, but incentives just don't appear to be doing that much. An appropriate regulatory regime has to make it easier to be a good educator than a bad educator, make it easier for states to support good instruction than to game the system, and be reasonably flexible when the specific regulatory mechanisms clearly need adjusting.

So where do we go from here? I don't think trying to tinker with the proficiency formula makes sense: none of the alternatives look like they'll be that much more rational. What needs more focus is what happens when the data suggest that things are going wrong in a school or system. On that, I think the research community is clear: no one has a damned clue what to do. There are a few turnaround miracles, but these are outliers, and billions of dollars are now being spent on turnaround intervention with scant research support. To be honest, I don't care what screening mechanism is used as long as (a) the screening mechanism is used in that way and in that way only: to screen for further investigation/intervention; (b) the screening mechanism has a reasonable shot of identifying a set of schools that a state really does have the capacity to help change things -- if 0 schools are identified, that's a problem, but it's also a problem if 75% of schools are identified for a "go shoot the principal today" intervention; (c) we put more effort and money into changing instruction than in weighing or putting lipstick on the pig. Never mind that I'm vegetarian; this is a metaphor, folks.

So, to the mechanisms:

  • A "you pick your own damned tool" approach to assessment: States are required to assess students in at least core academic content areas in a rigorous, research-supported manner and use those assessments as screening mechanisms for intervention in schools or districts. Those assessments must be disaggregated publicly, disaggregation must figure somehow into the screening decisions, and state plans must meet a basic sniff test on results: if fewer than 5-10% of schools are identified as needing further investigation, or more than 50%, there's something obviously wrong with the state plan, and it has to be changed. The feds don't mandate whether proficiency or scale scores are used; as far as the feds are concerned, it's a state decision whether to use growth. But a state plan HAS to disaggregate data, that disaggregation HAS to count, and the results HAVE to meet the basic sniff test.
  • A separate filter on top of the basic one to identify serious inequalities in education. I've suggested using the grand-jury process as a way for even the wealthiest suburban district to be held to account if they're screwing around with racial/ethnic minorities, English language learners, or students with disabilities. I suspect that there are others, but I think a bottom line here is the following: independence of makeup, independent investigatory powers (as far as I'm aware, in all states grand juries have subpoena power), and public reporting.
  • Each state has to have a follow-up process when a school is screened into investigation either by the basic tool noted above or through the separate filter on inequality. That follow-up process must address both curriculum content and instructional techniques and have a statewide technical support process. At the same time, the federal government needs to engage in a large set of research to figure out what works in intervention. We have no clue, dear reader, and most "turnaround consultants" are the educational equivalents of snake-oil peddlers. That shames all of us.
The gist here is that we stop worrying about perfecting testing and statistical mechanisms as long as they are viewed properly as screening devices. Despite the reasoned criticisms of threshold criteria (e.g., proficiency), the problem is not that they exist but that these mostly jerry-built devices are relied upon for the types of judgments that make many of us wince and that the results fail the common-sense sniff test. As long as the federal government tries to legislate a Rube Goldberg mechanism, it will have little legitimacy, and states will continue to be able to wiggle away from responsibilities when they're not doing stupid things to schools. (Yes, both can happen at the same time.) Much wiser is to shift responsibility onto states for making the types of political decisions that this involves, as long as the results look and smell reasonable.

Doing so will also allow the federal government to focus on what it's largely ignored for years: no one knows how to improve all schools in trouble (and here I mean the organizational remedies -- there's plenty of research on good instruction). Instead of pretending that we do and enforcing remedies with little basis in research, maybe we should leave that as an open, practical question and... uh... do some research?

September 9, 2008

Cold permutations

First, to provide a minor update on this morning's news items:

  • Semi-success on the reserving-time front. I had a lunch meeting and then a 3 pm meeting, and the time in between was too short to do much, so I exchanged one parking sticker for another. Whee. At least my wonderful grad student assisting with the journal did a monster job helping on a long MS, giving my head-cold-affected mind a much easier job going through the next article. I WILL climb on top of this mountain of work. Just not today.
  • It's a semi-full-blown cold now. Proof: I should be asleep, and I'm exhausted, but I can't sleep.


I've been trying to wrap my mind around permutation tests and exchangeability for about a week, and I figure that my typical head-cold mentality may be the best shot I can take at it both in terms of the orthogonal way I think at way-too-late-on-a-head-cold evening and also the fact that once I'm up this late and in this state, no student or MS author wants me to be making decisions right now. (For the record, I'm on antihistamines. I know, I know: Never take Benadryl and grade. No. That's not funny, not even in my state of mind.)

A few weeks ago, I was pondering the NYC achievement gap controversy, a debate over the summer that among other things spawned a Teachers College Record commentary by Jennifer Jennings and me (available just to subscribers for now, but to the world in a few weeks). And while the limits on TCR commentaries and op-eds require a fairly narrow argument, I kept thinking about trends and time series data as I looked at the New York City Department of Education's claims. I kept thinking to myself, There has to be something an historian can contribute to this debate that is specific to the way historians think. I'll probably write something at length when I'm more coherent and have some time, but there was an obvious answer that came to mind: to historians, the order of events matter. An argument about causality depends on contingency which depends on a sequence. (Historians often focus on contingency rather than causality, except when we're playing the counterfactual game. The obvious answer to the question, "What caused Gore's defeat in 2000?" is "everything, or almost everything.") The sequence doesn't prove causality (or contingency), but it's necessary.

That logic is usually not applied in policy. In the case of New York City, as is typical in this type of reform publicity, someone pointed to a time series of data and claim, "Aha! See this trend? Ignore its tentative nature: it's PROOF that we're on the right track." One obvious problem with the NYC data is the reliance on threshold-passing percentages; that's the focus of the TCR commentary. But the NYC Department of Education made claims about the achievement gap more broadly, and the data is a lot messier than the folks in Tweed would state. Below are three permutations of the "z-scores" of achievement gaps (the differences in Black-White means on the 4th-grade state math tests, scaled to the population's standard deviation). One is the real time series that runs between 2002 and 2008. The other two are permutations. Before you look for the data (it's on p. 13 of the PDF file linked above), see if you can tell the differences among them, and which is the observed order:

0.74
0.79
0.73
0.67
0.72
0.67
0.71
0.79
0.67
0.72
0.67
0.71
0.74
0.73
0.79
0.72
0.71
0.74
0.73
0.67
0.67

My professional judgment as an historian is also common sense: if the order of events does not make a discernible difference, even if you ignore measurement error and standard errors, then it's hard to conclude that there's a trend. How to test that is the realm of statistics, and when I explained the issue to my colleagues Jeffrey Kromrey and John Ferron, the answer from them was clear: permutation tests. That's a general family of nonparametric tests of inference that's the formal version of the question I asked: if you jumble up the data in all the possible ways they could be permuted, and if you look at a particular measure of interest (a test statistic), where in the distribution of all permutations does the observed data set fall? In the case of the 4th grade Black-White gap on New York state math tests measured as a z-score, we have 7 points of data, which have 7! = 5040 permutations. If you choose an appropriate test statistic for each permutation and the observed time series is about 125 from either end of the distribution, that excludes the 95% or more permutations in the middle of the distribution.

No, I haven't had the time or inclination to follow up, learn how to calculate one of the possible test statistics and how to get the R statistics program to do a permutation test. There are two problems, as I've learned from my colleagues: choosing the right test statistic is a matter of art as well as science; and there may be a problem with exchangeability. As far as I understand it, exchangeability is a less constricting assumption than the standard "independent, identically-drawn" sample assumption in parametric inferential statistics. From what I understand, the practical definition of exchangeability means roughly that you could theoretically exchange all the data points without screwing up the distribution. Again, if I understand correctly, one situation that violates the assumption of exchangeability is in autocorrelated data—i.e., when one data point influences the next one (or the next few). And if there's anything that's likely to be autocorrelated, it's a time series. That's not a serious problem if you're just looking to see if a trend exists at all; for that, autocorrelation is a form of trend (though an artifactual one). But if you're trying to make causal inferences or anything more complicated when there's autocorrelation (i.e., if achievement data levels or trend slopes are different before and after a policy change), I think you have to throw permutation tests out the window.

And that's such a shame, because the concept is still right when extended beyond the question of a trend: if a policy makes a difference, then it should make a difference on which side of the policy change you're sitting. So if you're a clever person with statistics, please provide some ideas in comments for where to go with this or if, as I suspect, the best we can do with permutation tests is ruling out possible trends/autocorrelation.

September 8, 2008

Monday bits

I didn't have time this weekend to write a lengthy, thoughtful post, or even a lengthy and thoughtless piece, so you get bits this morning.

  • Reserving Mondays: I've shut off my e-mail for now to get some editing tasks done, and I'll see if I can reserve Mondays for selfish purposes for the entire semester. Wish me luck on this one!
  • Honesty: the Palm Beach Post's editorial board approves a draft change in calculating graduation rates in Florida. Kudos to Florida's commissioner of education, Eric Smith, for pushing this. (Disclosure: I've given a few ideas to the state department of ed on options for how to handle graduation in 5, 6 years, etc.)
  • Sunday morning grading: I got out to a coffeehouse early yesterday to read my first batch of undergraduate papers. Several brought smiles to my face with great writing, provocative ideas, or both. That's a good sign for the semester.
  • Fetishized vs. nonfetishized curricula: I wonder how the history of the Core Knowledge Foundation would have been different if E.D. Hirsch had thought to frame the issue not just as accumulating tiny bits of knowledge (how Herbartian of him!) and instead had framed it as a matter of both a knowledge base in different disciplines and the heuristic frameworks of those disciplines.
  • I know I have at least a below-the-radar version of a head cold because I've had moments of earache in the last day, I had less energy over the weekend than I normally do, and I was sure last night that a mashup of Timothy Burke's guide to historical arguments and Atlas Games's Once Upon a Time would make a great introduction to historiography.

September 1, 2008

Shared responsibilities for children II: The loving hardass manifesto

Back in June, I briefly noted the potential political dynamics of the dueling manifestoes associated with the Broader, Bolder Approach to Education and the Education Equality Project, apologized for overplaying that analysis, and wrote an entry to talk broadly about shared responsibilities and education as part of the state. I've promised but have not followed through on my own manifesto, and it's now long past time for that. So, without further ado...


The Loving Hardass Manifesto*

I'm going to cut the shared-responsibility issue in a way that doesn't avoid the hard problems. Essentially, wherever your work touches children's lives, you're responsible for busting your butt without ruining your health or life. Unlike the Education Equality Project manifesto, I do not think that teachers are all-powerful or all-responsible. They're very important and responsible, but not for everything. Unlike the Broader, Bolder Approach, I do not think we can avoid central questions about accountability within school by reference to the other legitimate needs of children outside of schools. Yes, children have lives outside school, but it's acceptable to focus on what happens inside schools for things schoools are responsible for. And unlike Barack Obama, I am not going to say that both statements are right. Both statements are partially right. And while I know and admire several people who have signed one or the other statement, I will not sign either one, because both are flawed.

Let me start with the Project crowd. If you're a politician or administrator and believe that everything you've done is perfect, with no regrets, and all the evidence points in your favor, I hope you brought enough to share, because whatever you're smoking, I want to try it. Using only the high-quality evidence that is in your favor (and here I mean David Figlio-quality evidence), you can make a claim that high-stakes accountability leads to modest improvement in outcomes. But that's about it.

If you're a civil-rights activist and think that the best way to improve schools is to lambaste teachers and their representatives, I have a year for you: 1968. And a book: Tyack and Cuban's Tinkering toward Utopia. I have plenty more to suggest, but I figure that's enough.

But I'm also disappointed by the Broader, Bolder Approach. Everything it says about putting education in the context of broader government programs for children is correct. And yet, if its purpose is to get us to think in a different way about accountability and NCLB, it underwhelms. There's something odd about a statement on school accountability that has precisely one paragraph suggesting vague ways to change how accountability should work within schools.

Let's think about some basic facts: most kids come to school with families they go home to at night. If the children and their teachers are lucky, their families will only have the ordinary neuroses that God or Woody Allen placed there. If the children are unlucky, they'll also deal with poverty, disability, abuse, negligence, or having Paris Hilton as a distant relative. If you're a teacher, you can gripe about the families, but it's probably best not to, for a few reasons:

Your complaining to peers will not improve the parenting of anyone.

We've heard it before, and it wasn't convincing the last time, either.

If you complain about the parents, you will be depriving your students of their internationally-recognized right to be the first to complain to a therapist about how they were brought up. Really: it's in the UN Charter, under "Psychotherapy as an Adolescent," right above the bit about iPods and PlayStations. Go look it up if you doubt me.

I just lied. You may not have caught this, but the 1959 Declaration of the Rights of the Child does not mention the right to criticize parents in therapy or the right to consumer electronics. There isn't a single mention of either Apple or Microsoft, a shameful omission which Bill Gates is working hard to remedy. But until then, children only have the recognized right to things such as health care, food, shelter, the care of parents or other responsible adults, freedom from discrimination, and education.

I don't know if you've noticed this, but as a society we're not doing so well on fulfilling these rights. 600 million Chinese citizens use cell phones, and in a country that is far wealthier, we've still got millions of children without health care. It used to be that American parents would shame their kids into eating everything at dinner by pointing out that children around the world were starving. That makes you wonder what Chinese parents tell their children to shame them. Maybe they say, "Take your vaccination and stop crying: Kids are getting sick in America!"

Since the dueling manifestoes appeared in June, I've been scratching my head. The broader, bolder approach is fine as a statement of broad social policy but it doesn't work in terms of day-to-day accountability. You are responsible for the people who are in your life. When my children have been sick, and I've taken them to their doctors, I've never once been asked, "How are they doing in math?" and then had a doctor refuse to treat my child because they're not yet evaluating double integrals. They treat the kid in front of them the best they can. My father was a pediatrician and allergist who treated both wealthy families from one side of town and working-class families from another part of town. He never complained about the families from one side or the other. He just treated them.

But that doesn't mean my father had absolute responsibility, either. He was expected to be a professional, to keep up with the literature, and to follow standards of medical practice. But there has never been a "Health Care Equality Project" whose primary activities were to take pot-shots at doctors, call them "interests who seek to preserve a failed system," and want to pay doctors by a handful of measures of the health of their patients. My father was never paid by how much his patients weighed that year, or by how many tissues they used because of colds. We already have accounting-driven health care, and I don't know of any doctors or patients who think it's a good idea.

We also don't have ridiculous fads in medicine. Well, we do, but it's generally called the X diet (for various string values of X), or "alternative medicine," for those who think that if you dilute some processed duck liver by 30 or 40 orders of magnitude, your body will react in any way other than, "I'm sorry if you paid for that sugar pill instead of your mortgage, but the best I can do right now is a placebo effect. I hope you like it." In education, we have far more fads. If we had as many fads in medicine as we do in education, people would think that wearing uniforms made you thinner.

So there is something about the dueling manifestoes that just does not seem real to me. It's not that I am immune to their appeal. I want there to be equal education. And I've already written in many places that schooling needs to be thought of in the context of all the state structures that touch kids' lives. But it's still not resonating with me. My generation of the family takes care of these issues collaboratively. My oldest brother has been a lawyer, lobbyist, and think-tank staff member on health-care policy, which takes care of one right. I teach and write about education. The rest of the immediate family's a bunch of layabouts who do nothing other than have jobs and take care of their families, but Stan and I, we're holding our own on this caring-for-children thing, and if your family isn't, don't blame us. We are the Broader, Bolder Approach. But we're both going on diets soon, so that will change.

Back to the central point about responsibility. The hard task that both manifestoes avoid is defining what we really should expect from schools. I don't know: maybe "bust your butts" isn't something people say in polite company. And it's even harder to define in practice. But since the people who signed the Education Equality Project say they're in favor of holding people accountable, here's my charge: go define what "bust your butts" means in ways that are realistic, or fold your tent. I suggest you start by talking with teachers and parents, not among yourselves. This is just one (loving hardass) reader's response, but I know you can do it, or I wouldn't insist on it.

And for the Broader, Bolder crowd, you know you can do better. As a group, you include a bunch of incredibly well-read, smart researchers. And you're right on putting schooling in a broader context. But you just fell down on the accountability part. That one short paragraph on accountability? Please reread it. Really. You think that was the best you could do? You KNOW what you'd say to a grad student who had that fluff in a dissertation. Revise and resubmit, because I know you can get this up to your usual standards.

And the rest of you in the peanut gallery? Don't think that we can rest on our laurels, either. The folks I'm criticizing at least had the energy and guts to put pen to paper. What have you done to define "bust your butts"?

And, yes, this means that I need to look back at the last chapter of Accountability Frankenstein and see if it needs to be sharper. A commenter some months ago said it was not specific to NCLB, and that's a fair enough point. I wanted the book to be about accountability in general, but if I really know my stuff, I should be able to apply it in specific situations. Want a specific list of changes that should happen with the next reauthorization of ESEA? Coming up this fall...

* While I was drafting this in bits and pieces, I pondered whether to use the term hardass, but since Bob Sutton has written the book The No Asshole Rule and Harry Frankfurt's On Bullshit won a book award, I don't think I'm going that far out on a limb. A loving hardass knows that holding people to standards can be in their best interest. So for everyone who signed one of the manifestoes and think I'm nuts here, you're wrong. And in two years, you'll thank me for this.

August 27, 2008

Two interviews to read today

A few shout-outs while I'm still juggling a few hundred tasks the first week of classes:

I can now bury my head in my own details, knowing that the education blogule is going strong without me.

August 5, 2008

Two brief comments

I promised not to comment on anything during my two-week break, but the NewTalk NCLBfest made me wonder who's missing from this debate. Your observations in the comments are most welcome.

Also, I think I may have alienated my family forever by going against their advice and buying a Sony Reader. Even my technophile son thinks I'm nuts. But the EPAA MS authors will probably appreciate my carrying their stuff with me to various short-reading opportunities.

August 1, 2008

A higher-ed unionist's view of the performance-pay debate

Corey Bunje Bower criticized a Newsweek column by Jonathan Alter and has the following response to Alter's slur against teacher unions:

Perhaps the most ridiculous thing that Alter writes -- and the statement that gives away the ideological underpinnings of his argument if anybody wasn't already aware -- is that unions "still believe that protecting incompetents is more important than educating children." Unions are far from perfect, and this is far from the most inflammatory rhetoric that I've read about them, but it's still sheer and utter nonsense.... Though more polite, it's the intellectual equivalent of calling somebody with whom you disagree a [N]azi or a terrorist.

If I were a union leader, however, I would mull over Alter's final point.... the general idea that unions could view submitting their members to more scrutiny in exchange for higher pay is something on which both sides might find some common ground.

I suppose I qualify as a union leader albeit in higher ed, so I'll take the bait. Disclosure: my faculty union was the one to propose merit pay at the table many years ago, and university faculty are more likely to approve of something called merit pay because there is a tradition of peer review for tenure/promotion. (Our collective bargaining agreement provides for general due process and substantive standards but leaves specific procedures for annual reviews to department votes.) So while I am skeptical of several top-down proposals for/policies encouraging performance pay in K-12, it is out of my seeing problems with it rather than a visceral opposition to merit pay. As the car ads say, your mileage may vary.


There are two policy issues here: one is how to think about teacher pay and working conditions in general, and the other is the question of collective bargaining at the local level (and the centralization/local question more generally). In Accountability Frankenstein, I wrote about high-stakes accountability advocates' simplistic and often flawed grasp of motivation. To put it briefly, even if we had a Holy Grail measure of "teacher contribution to learning," that wouldn't be a sufficient justification for relying on test scores for teacher pay. No one has the best idea for what works best, and a top-down approach would short-circuit even the most rabid merit-pay advocate's interest in finding out what works, in much the same way that NCLB's proficiency measure aborted alternative ways to examine student achievement (including quantitative measures such as average scale score, medians, percentile splits, etc.). Essentially, those interested in performance pay have to make the policy choice between experimentation and a crusade. So to all 0.379 Capitol Hill staffers and campaign advisors reading this blog, you should be wary of federal mandates: if you mandate the wrong formula, everyone will pay the price for Beltway arrogance, and you'll endanger the political legitimacy of the idea for the long term.

Caution about top-down mandates also fits with the local nature of collective bargaining and the affiliate structure in American unions. Despite what people may claim about the NEA's visceral opposition to merit pay, the big picture is more complicated: locals have negotiated performance pay or merit pay or whatever you want to call it, and the governance structures of both the NEA and the AFT commit the national affiliates to support collective bargaining at the local level. (There are also the merged locals and state affiliates that belong to both national affiliates.) That federal structure means that the NEA and AFT support what local leaders decide in terms of bargaining strategy and the agreements that the parties ratify at the local level. Where local leadership negotiates performance pay, the state and national affiliates support that. And where local leadership decides not to negotiate performance pay, the affiliates support that, too. (See a March 2008 column from NEA Today for an example of recent rhetoric that illustrates this complexity.) The more accurate policy position of both the NEA and AFT is that they oppose top-down mandates of performance pay, including how it is structured. The AFT is not officially skeptical of performance pay, but both national affiliates work with and for the locals. If you believe that either national teachers union can dictate bargaining positions to locals, e-mail me about my deep-discount sale price on the Brooklyn Bridge.

The second question about performance pay is thus the degree to which there should be centralized decision-making in education, and that is true for collective bargaining as well as for other matters of policy. It is not necessarily a matter of offering a grand bargain to Randi Weingarten and Dennis Van Roekel, because the bargain for some segments of a national union may be anathema to others. Let me put forward a pro-performance-pay, pro-union person's pipe-dream proposal that would serve someone's interests as a union leader, and you may understand: If I were a K-12 union leader in Florida, I would definitely listen to a national policy proposal that would tie some incentives for performance pay (bargained at the local level) to the degree to which a state had the following in place:

  • Collective-bargaining rights for public employees
  • Card-check procedures for certification of public employee unions
  • Binding arbitration for first contracts after a certain length of bargaining (say, 6-12 months)
  • Fair share in a bargaining unit that is represented by a union
Florida currently has one of those (collective bargaining rights for public employees), but gaining the others would be a pretty good trade in return for negotiating some version of performance pay (assuming it's not something that looks like the awful stuff that Florida has tried in recent years). To someone in a state like Florida, that looks like a possible deal. Framed as an incentive, it doesn't step on constitutional toes, but it gives more options to states that respect unions and collective bargaining. On the other hand, that's an awful deal to a union leader sitting in a state that already has fair share as well as collective bargaining. To someone who is opposed to any performance pay in such a state, that proposal looks closer to an insult than a serious attempt at a grand bargain.

As a result of this pattern, where different circumstances lead to different views of policy by local union leaders, you can have leaders sitting in different places, each of whom has a deserved reputation for being able to craft a deal with administrators, but where they have very different views of policy proposals. Ultimately, someone who wants performance pay in K-12 schools has to understand the fact that national affiliates support locals, and that the needs of locals will vary by state environment.

July 28, 2008

Ocala rethinks high grade-retention rates

In the late 1990s, Florida instituted a requirement that third-graders reach a certain test threshold in reading or be held back in third grade. Now Marion County schools (which includes Ocala) is rethinking grade retention where it can (hat tip), once they realized they had several hundred middle-school students who could legally drive.

The research on retention is fairly clear: if you have the choice between holding a student back a grade and praying they somehow improve, on the one hand, and advancing the student a grade and praying that they somehow improve, the better long-term choice is to promote the student and pray. Then again, my colleague Sister Jerome Leavy would point out that while plenty of Catholic schoolteachers believe in the power of prayer, you gotta do some teaching, and that's a poor way to frame public policy questions. Retention/promotion questions are an administrative distraction from the need to identify children who need help and intervene early.

July 23, 2008

Review of "Accountability Frankenstein"

As far as I'm aware, Teachers College Record recently published the first review of Accountability Frankenstein. From the comments by Dick Schutz, "If you are in any way concerned with the status and future of US el-hi education, you owe it to yourself to read this book." You can read the review to see where he thinks I got things right and wrong.

Crisis rhetoric, attention seeking, and capacity building

Berliner and Biddle's The Manufactured Crisis was the independent reading choice of several students in my summer doctoral course, and as they have been writing comments on the book in the last week, I have been thinking about the split retrospective view of the 1983 A Nation at Risk report, produced by the National Commission on Excellence in Education. The report has been on the receiving end of a tremendous amount of criticism by Berliner, Biddle, Jerry Bracey, and many others.

Of the various criticisms of the report, two stick fairly well: the report was thin on legitimate evidence of a decline in school performance, and the declension story is ahistorical. First, the report relied on a poor evidentiary record, using problematic statistics such as the average annual decline in SAT scale scores from 1964 to 1975, statistics the report's authors claimed were proof of declining standards in schools. (Why this was flawed is left as an exercise for the reader.) Using this evidence, the report claimed that

... the educational foundations of our society are presently being eroded by a rising tide of mediocrity that threatens our very future as a Nation and a people. What was unimaginable a generation ago has begun to occur--others are matching and surpassing our educational attainments.

If an unfriendly foreign power had attempted to impose on America the mediocre educational performance that exists today, we might well have viewed it as an act of war. As it stands, we have allowed this to happen to ourselves. We have even squandered the gains in student achievement made in the wake of the Sputnik challenge. Moreover, we have dismantled essential support systems which helped make those gains possible. We have, in effect, been committing an act of unthinking, unilateral educational disarmament.

Where do I start with the problems here: the war-like rhetoric, the implication that we don't want the rest of the world's education to improve, the bald assertion that there is any solid evidence of student achievement gains post-1958 that can be attributed to Sputnik, or the assumption that if there were low expectations observable in the early 1980s it must have been a decline from previous times instead of a generally anti-intellectual culture?

But 25 years after the report's release, it is easy to poke holes in and fun at the hyperbolic rhetoric. What the last few weeks have brought home for me is the very different perceptions of the report. Berliner, Biddle, Bracey, and other critics are absolutely right that the report is factually and conceptually flawed. And yet there are many people involved with the commission who not only thought they were factually correct, they thought that the report's purpose was to help public schooling. If you read various accounts of the commission's work, it is clear that they thought the report was necessary to build political support for school reforms.

Part of the report's creation lies in the campaign promise of President Ronald Reagan to abolish the federal Department of Education. In this regard, his first Secretary of Education Terence Bell brilliantly outmaneuvered Reagan, and within a few months of the report's release, it was clear that the report had resonated with newspaper editorial boards and state policymakers. Even without it, given the Democratic majority in the House and the presence of several moderate Republicans in the Senate, it was unlikely that Congress would abolish the department. After it, the idea was largely unthinkable.

But the motives of Bell and the commission members were clearly not about saving an administrative apparatus. They were true believers in reform, and if all of the recommendations had been followed, today we would have a much more expansive school system. (The recommendations included 200- or 220-day school calendars and 11-month teacher contracts.) Some of the recommendations were followed, primarily expanding high school course-taking requirements and standardized testing, as well as the experiments in teacher career ladders in several states. But the guts of the implemented recommendations were already in the works or in the air: I remember that California state Senator Gary Hart had been pushing an increase in graduation requirements, a bill that passed in 1983. (This is not the same Gary Hart as the famous one from Colorado.) While I could have graduated from high school in 1983 with one or two semesters of math (I forget which), students in my former high school now must take several years of math. (As others have pointed out, one of the unintended beneficial consequences of raising course-taking requirements was dramatically reducing the gender differences in math and science course taking. Richard Whitmire, take note: Terence Bell is the villain!)

Lest some people not know or have forgotten, A Nation at Risk was not the only major mid-80s report on public schooling. Others were written from a variety of perspectives: Ernest Boyer's High School, Ted Sizer's Horace's Compromise, Arthur Powell et al.'s The Shopping-Mall High School, and John Goodlad's A Place Called School. All were published in 1983 or 1984. All were earnest. All were more thoughtful than A Nation at Risk. I suspect that if Two Million Minutes had been made and released at the same time (if with different non-U.S. countries and different students), it would have fit into that cache of reform reports very well.

Those other reports did not gain the same attention as A Nation at Risk, and I am not certain that any of the reports dramatically changed the policy options discussed at the state level. Changed course requirements and testing were prominent parts of the discussion before the reports, and they were the primary consequences of state-level reforms in the 1970s and 1980s. What the body of reports did instead was push the idea that schools needed reforming. On that score, I think they succeeded, even if several of the report writers (Sizer and Goodlad) became horrified at the direction of reform policies.

Today, we have a new set of actors making similar claims about the need to reform schools: did you receive the e-mail from Strong American Schools/Ed in '08 that I did yesterday? If you didn't, here's the text:

We are only as strong as our schools, and our schools are failing our children.

Consider:
  • Almost 70% of America's eighth-graders do not read at grade level.
  • Our 15-year-olds rank 25th in math and 21st in science.
  • America showed no improvement in its post-secondary graduation rate between 2000 and 2005.
We know that the nations with the best schools attract the best jobs. If those jobs move to other countries, our economy, our lives and our children will suffer.

For that reason, Strong American Schools launched a new campaign this week to combat the crisis in our public schools.

Click on the image below to view our television advertisement:

Please join us. Tell your governors, your state and national representatives and senators that you want a change for stronger schools.

Make your voice heard.

The ad's rhetoric is definitely in line with A Nation at Risk, down to the tagline: "As our schools go, so goes our country." It's tired rhetoric at this point, and I think it's important to understand why the folks behind Strong American Schools are keeping at it, though they've made no traction in making education a highly visible part of the presidential campaign thus far: as with the major figures in A Nation at Risk, they are true believers in reform to increase the capacity of regulators.

But Strong American Schools has now become a shadow of A Nation at Risk, itself the least substantive of the mid-1980s reports on American schooling. Instead of making specific claims or recommendations, they're pushing "a change for stronger schools," or rather attention. To do so, they claim a crisis, though this is probably the worst time to claim that weak education is the cause of what Phil Gramm calls our "mental recession": to anyone who looks at the current state of the world, our economic woes are the consequences of the subprime mortgage crisis and energy prices (which themselves are driven by the growing Chinese and Indian economies). In 1983, the economy was out of recession. I just don't think the world will realign itself in the same way as in the 1980s. That doesn't mean that there isn't a tie between education and the
economy in the long term, but it's diffuse rather than mechanical.

And there's another question here: is it ethical or even helpful to claim that a long-term problem is an acute crisis, just to gain public attention for an issue? We've gone down this road many times before, and I just don't see where it helps in the long term.

July 21, 2008

The higher-ed split among conservatives

One could probably have predicted today's Inside Higher Ed article describing how several conservative academics criticized the current push for quantitative assessment of higher ed. I didn't, but if you did, give yourself a pat on the back.

The article describes a panel on Friday sponsored by the American Academy of Distance Learning (more about that later) where the former head of Margaret Spellings's Office of Postsecondary Education and the executive director of the National Association of Scholars ripped Spellings and her allies for pushing standardized tests in higher ed to the detriment of liberal arts. According to the article, Diane Auer Jones was more diplomatic than Peter Wood, but both complained that the push for accountability was turning reductionist. In this regard, I think Wood's reported comments are on the money: today, the policy rhetoric on higher education is vocational, and that threatens to make the defense of a liberal-arts education more difficult. He ties it to the push for accountability in higher education, and I've had similar concerns about calls for standardized testing as the primary accountability mechanism for colleges.

The predictability comes in the split among conservatives, one that Wood ties back to a "practical"/"classical" distinction in the late 18th century. The Spellings Commission report ignored fundamental tensions in American higher education, and one interesting feature of the report is the invisibility of the curriculum. The report's rhetoric was tied closely to economics, and I suspect that Jones's resignation in May on a matter of principle was the result of a long-simmering frustration among some conservative academics, not an isolated event. No party or political coalition is monolithic, and I've heard several current and former Capitol Hill staffers from Democratic offices who were far closer to Spellings on higher-ed accountability than either Jones or Wood. And I'm closer to Jones and Wood at least on this issue, though I'm a Democrat.

And now the coda: The building frustration among some conservatives that I'm inferring here may explain why Jones and Wood were willing to use the sponsorship of a proprietary university's president's shadow accreditation office: I've tried to look for the "American Academy of Distance Learning," which seemed to be an odd outfit to sponsor a talk about standardized testing and the liberal arts. I found an American Academy of Distance Learning (or at least a reference to its tax-exempt status) headquartered in Denver, but Dick Bishirjian runs the proprietary Yorktown University, which is in Denver... at the same address as AADL, down to the same suite number. But the media advisory for the panel lists AADL with a Norfolk post office box. Bishirjian also appears to be the president of the American Academy of Privatization, a proponent of "privatization training for public officials." I'm not sure what that means, precisely, but the P.O. box for it is the same as that given in the media advisory for AADL. In other words, it looks like Bishirjian has a mail drop in Norfolk and office space in Denver. That's an amazingly slim infrastructure to run a university and two other organizations... or at least to claim so. A July 10 Denver Post article gives a little more information about Yorktown, at least in relationship to Republican Senate candidate Bob Schaffer, who served on Yorktown's board of trustees for several years. Yorktown apparently has a single graduate program and only a few dozen students. Given the plaudits for Bishirjian by Paul Weyrich earlier this month on David Horowitz's website, it looks like Bishirjian had enormous difficulties gaining accreditation. So... is his sponsorship of the forum for Jones and Wood something that's tied to his proprietary institution's interests? I don't know if either Jones or Wood is aware of Bishirjian's background or the disconnect between his proprietary institution's curriculum and their arguments, but this is definitely one of the odder set of bedfellows I've seen in higher education.

July 17, 2008

Teachers and the public sphere

Partially drafted in Chicago Sunday evening, July 13, and revised July 17:

I'm listening to Susan Ohanian at the moment, talking to a group of about 50 AFT delegates and others. Ohanian is a well-known opponent of NCLB and academic standards and was invited to speak at an event sponsored by the AFT Peace and Freedom Caucus (which should sound familiar to NEA national delegates, who can sign up for an NEA Peace and Freedom Caucus as well). As I've written elsewhere, Ohanian is right in several things and wrong in others. (Go read our books to figure out where we agree and disagree; I like her as a person, and she raises important questions about the purpose of education and high-stakes testing.) But I'm more interested this evening in the audience after she and the other speaker (the leader of an independent teachers union in Puerto Rico) finish. The AFT crowd neither applauded nor booed this morning when Barack Obama talked about merit pay in his live-feed speech to the convention floor. (The crowd went to its feet and cheered loudly when he first appeared and cheered again loudly at the end, and applauded at various points in the 10-minute speech. As Mike Antonucci has noted, it's essentially the same speech he gave to NEA, the one that had NEA California delegates booing, so we have an interesting comparison point.) But since a strong positive reaction followed Ohanian's statement that it was wrong for Obama to claim that teachers are the most important influence on children, I'm fascinated.

Part of the reason why I'm fascinated is because I think Ohanian's arguments are inconsistent. Ohanian worried about the statement by Obama that "the single most important factor in determining a child's achievement is not the color of their skin or where they come from; it's not who their parents are or how much money they have. It's who their teacher is." Ohanian argued that this statement is rhetoric that sets up blaming teachers for all sorts of problems they are not responsible for. A few minutes later, she claimed that the real danger of high-stakes accountability was the destruction of children's imaginations and the creation of a compliant workforce. But there's a logical inconsistency here: how can schools create worker robots if they are not powerful in shaping the lives of children?

I worry (and I said towards the end of the event) that Ohanian's criticism undercut arguments about the importance of the public sphere. You can say that teachers are not crucial to children's lives, but then it's hard to argue that schools should be well-funded. You can say that teachers are not crucial, but then it's hard to argue against all sorts of problematic policy proposals that take authority away from teachers or that position teachers' professional judgment as irrelevant. Ohanian was nodding in acknowledgment at the time, so I think (or I hope) she knows that her impromptu remarks were not consistent with either her deeper views of schooling or that of most teachers.

As it turned out my initial impression of the crowd was wrong: there was a lively discussion after the speakers finished, with plenty of dissent with Ohanian's arguments. So in one sense, I never had my question answered: what drew some of the delegates to agree with the remarks by Ohanian that concerned me the most?

July 15, 2008

Know what union membership means before you write, Ray

Ray Fisman wrote a laudatory article released Friday by Slate about NYC's P.S. 49 principal Anthony Lombardi, an article with themes remarkably similar to what Robert Kolker wrote for New York Magazine in 2003, even down to quoting Randi Weingarten calling Lombardi a tyrant without crediting Kolker. Fisman links to an Inside Schools page summarizing P.S. 49 data and using Kolker's quotation, again without credit. C'mon, Mr. Fisman: if I can find the source by Googling, why couldn't you? (Given that flaw, I am doubtful of Fisman's claim that Lombardi was ever "at the top of the teachers-union hit list" (evidence of any such list or just colorful language to cover up a reporter's lassitude?)

But the passage that had me laughing was the following bit of ignorance:

Currently, New York City teachers get their union cards their first day on the job. In theory they're on probation for three years after that, but in practice very few are forced out. Lombardi suggests replacing this system with an apprenticeship program. Rather than requiring teaching degrees (which don't seem to improve value-added all that much), new recruits would have a couple of years of in-school training. There would then come a day of reckoning, when teachers-to-be would face a serious evaluation before securing union membership and a job for life.

Here is a fundamental conflation of tenure and union membership, or union membership with the legal protections of a collective bargaining agreement, or "serious evaluation" with something. I'm not sure where the root of the error lies, but I do know one thing that's true everywhere, as far as I know: union membership does not change your legally recognized rights under a collective bargaining agreement. It does other things that are important (greater chance of gains at the bargaining table through solidarity, access to specific benefits provided by the union beyond CBA protection, etc.), but Fisman just doesn't know what he's talking about here.

And then Joanne Jacobs repeats the error. Wince time...

July 9, 2008

Can reporters raise their game in writing about education research?

I know that I still owe readers the ultimate education platform and the big, hairy erratum I promised last month, but the issue of research vetting has popped up in the education blogule*, and it's something I've been intending to discuss for some time, so it's taking up my pre-10:30-am time today. In brief, Eduwonkette dismisses the new Manhattan Institute report on Florida's high-stakes testing regime as thinktankery, drive-by research with little credibility because it hasn't been vetted by peer review. Later in the day, she modified that to explain why she was willing to promote working papers published through the National Bureau of Economic Research or the RAND Corporation: they have a vetting process for researchers or reports, and their track record is longer. Jay Greene (one of the Manhattan Institute report's authors and a key part of the think tank's stable of writers) replied with probably the best argument against eduwonkette (or any blogger) in favor of using PR firms for unvetted research: as with blogs, publicizing unvetted reports involves a tradeoff between review and publishing speed, a tradeoff that reporters and other readers are aware of.

Releasing research directly to the public and through the mass media and internet improves the speed and breadth of information available, but it also comes with greater potential for errors. Consumers of this information are generally aware of these trade-offs and assign higher levels of confidence to research as it receives more review, but they appreciate being able to receive more of it sooner with less review.

In other words, caveat lector.


We've been down this road before with blogs in the anonymous Ivan Tribble column in fall 2005, responses such as Timothy Burke's, a second Tribble column, another round of responses such as Miriam Burstein's, and an occasional recurrence of sniping at blogs (or, in the latest case, Laura Blankenship's dismay at continued sniping). I could expand on Ernest Boyer's discussion of why scholarship should be defined broadly, or Michael Berube's discussion of "raw" and "cooked" blogs, but if you're reading this entry, you probably don't need all that. Suffice to say that there is a broad range of purpose and quality of blogging, some blogs such as The Valve or the Volokh Conspiracy have become lively places for academics, while others such as the The Panda's Thumb are more of a site for the public intellectual side of academics. These are retrospective judgments that are only possible after many months of consistent writing in each blog.

This retrospective judgment is a post facto evaluation of credibility, an evaluation that is also possible for institutional work. That judgment is what Eduwonkette is referring to when making a distinction between RAND and NBER, on the one hand, and the Manhattan Institute, on the other. Because of previous work she has read, she trusts RAND and NBER papers more. (She's not alone in that judgment of Manhattan Institute work, but I'm less concerned this morning with the specific case than the general principles.)

If an individual researcher needed to rely on a track record to be credible, we'd essentially be stuck in the intellectual equivalent of country clubs: only the invited need apply. That exists to some extent with citation indices such as Web of Science, but it's porous. One of the most important institutional roles of refereed journals and university presses is to lend credibility to new or unknown scholars who do not have a preexisting track record. To a sociologist of knowledge, refereeing serves a filtering purpose to sort out which researchers and claims to knowledge will be able to borrow institutional credibility/prestige.

Online technologies have created some cracks in these institutional arrangements in two ways: reducing the barriers to entry for new credibility-lending arrangements (i.e., online journals such as the Bryn Mawr Classical Review or Education Policy Analysis Archives) and making large banks of disciplinary working papers available for broad access (such as NBER in economics or arXiv in physics). To some extent, as John Willinsky has written, this ends up in an argument over the complex mix of economic models and intellectual principles. But its more serious side also challenges the refereeing process. To wit, in judging a work how much are we to rely on pre-publication reviewing and how much on post-publication evaluation and use?

To some extent, the reworking of intellectual credibility in the internet age will involve judgments of status as well as intellectual merit. To avoid doing so risks the careers of new scholars and status-anxious administrators, which is why Harvard led the way on open-access archiving for "traditional" disciplines and Stanford has led the way on open-access archiving for education, and I would not be surprised at all if Wharton or Chicago leads in an archiving policy for economics/business schools. Older institutions with little status at risk in open-access models might make it safer for institutions lower in the higher-ed hierarchy (or so I hope). (Explaining the phenomenon of anonymous academic blogging is left as an exercise for the reader.)

But the status issue doesn't address the intellectual question. If not for the inevitable issues of status, prestige, credibility, etc., would refereeing serve a purpose? No serious academic believes that publication inherently blesses the ideas in an article or book; publishable is different from influential. Nonetheless, refereeing serves a legitimate human side of academe, the networking side that wants to know which works have influenced others, which are judged classics, ... and which are judged publishable. Knowing that an article has gone through a refereeing process comforts the part of my training and professional judgment that values a community of scholarship with at least semi-coherent heuristics and methods. That community of scholarship can be fooled (witness Michael Bellesiles and the Bancroft Prize), but I still find it of some value.

Beyond the institutional credibility and community-of-scholarship issues, of course we can read individual works on their own merit, and I hope we all do. Professionally-educated researchers have more intellectual tools which we can bring to bear on working papers, think-tank reports, and the like. And that's our advantage over journalists; we know the literature in our area (or should), and we know the standard methodological strengths and weaknesses in the area (or should). On the other hand, journalists are paid to look at work quickly, while I always have competing priorities the day a think-tank report appears.

That gap provides a structural advantage to at least minimally-funded think tanks: they can hire publicists to push reports, and reporters will always be behind the curve in terms of evaluating the reports. More experienced reporters know a part of the relevant literature and some of the more common flaws in research, but the threshold for publication in news is not quality but newsworthiness. As news staffs shrink, individual reporters find that their beats become much larger, time for researching any story shorter, and the news hole chopped up further and further. (News blogs solve the news-hole problem but create one more burden for individual reporters.)

Complicating reporters' lack of time and research background is the limited pool of researchers who carve out time for reporters' calls and who understand their needs. In Florida, I am one of the usual suspects for education policy stories because I call reporters back quickly. While a few of my colleagues disdain reporting or fear being misquoted, the greater divide is cultural: reporters need contacts to respond within hours, not days, and they need something understandable and digestible. If a reporter leaves me a message and e-mails me about a story, I take some time to think about the obvious questions, figure out a way of explaining a technical issue, and try to think about who else the reporter might contact. It takes relatively little time, most of my colleagues could outthink me in this way, and somehow I still get called more than hundreds of other education or history faculty in the state. But enough about me: the larger point is that reporters usually have few contacts who have both the expertise and time to read a report quickly and provide context or evaluation before the reporter's deadline. Education Week reporters have more leeway because of the weekly cycle, but when the goal of a publicist is to place stories in the dailies, they have all the advantages with general reporters or reporters new to the education beat.

In this regard, the Hechinger Institute's workshops provide some important help to reporters, but everything I have read about the workshops are usually oriented to current topics, providing ideas for stories, and a matter of general context and "what's hot" rather than helping reporters respond to press releases. Yet reporters need the help from a research perspective that's still geared to their needs. So let me take a stab at what should appear in reporting on any research in education, at least from my idiosyncratic readers' perspective. I'll use the reporter's 5 W's, split into publication and methods issues:

  • Publication who: authors' names and institutional affiliations (both employer and publisher) are almost always described.
  • Publication what: title of the work and conclusions are also almost always described. Reporters are less successful in describing the research context, or how an article fits into the existing literature. Press releases are rarely challenged on claims of uniqueness or what is new about an article, and think-tank reports are far less likely than refereed articles or books to cite the broadly relevant literature. When reporters call me, they frequently ask me to evaluate the methods or meaning but rarely explicitly ask me, "Is this really new?"My suggested classification: entirely new, replicates or confirms existing research, or is counter to existing research. Reporters could address this problem by asking sources about uniqueness, and editors should demand this.
  • Publication when: publication date is usually reported, and occasionally the timing context becomes the story (as when a few federal reports were released on summer Fridays).
  • Publication where: rarely relevant to reporters, unless the institutional sponsor or author is local.
  • Publication why: Usually left implicit or addressed when quoting the "so what?" answer of a study author. Reporters could explicitly state whether the purpose of a study is to answer fundamental issues (such as basic education psychology), applied (as with teaching methods), attempting to influence, etc.
  • Publication how: Usually described at a superficial level. Reporters leave the question of refereeing as implicit: they will mention a journal or press, but I rarely see an explicit statement that a publication is either peer-reviewed or not peer-reviewed. There is no excuse for reporters to omit this information.
  • Content who: the study participants/subjects are often described if there's a coherent data set or number. Reporters are less successful in describing who are excluded from studies, though this should be important to readers and reporters could easily add this information.
  • Content what: how a researcher gathered data and broader design parameters are described if simple (e.g., secondary analysis of a data set) or if there is something unique or clever (as with some psychology research). More complex or obscure measures are usually simplified. This problem could be addressed, but it may be more difficult with some studies than with others.
  • Content when: if the data is fresh, this is generally reported. Reporters are weaker when describing reports that rely on older data sets. This is a simple issue to address.
  • Content where: Usually reported, unless the study setting is masked or an experimental environment.
  • Content why: Reporters usually report the researchers' primary explanation of a phenomenon. They rarely write about why the conclusion is superior to alternative explanations, either the researchers' explanations or critics'. The one exception to this superficiality is on research aimed at changing policy; in that realm, reporters have become more adept at probing for other explanations. When writing about non-policy research, reporters can ask more questions about alternative explanations.
  • Content how: The details of statistical analyses are rarely described, unless a reporter can find a researcher who is quotable on it, and then the reporting often strikes me as conclusory, quoting the critic rather than explaining the issue in depth. This problem is the most difficult one for reporters to address, both because of limited background knowledge and also because of limited column space for articles.

Let's see how reporters did in covering the new Manhattan Institute report, using the St Petersburg Times (blog), Education Week (blog thus far), and New York Sun (printed). This is a seat-of-the-pants judgment, but I think it shows the strengths and weaknesses of reporting on education research:


CriterionTimes (blog)Ed Week (blog)
Sun
Publication
WhoAcceptableAcceptableAcceptable
WhatWeakAcceptableWeak
WhenAcceptableAcceptableAcceptable
WhereN/AN/AN/A
WhyImplicit only
Implicit only
Implicit only
HowAcceptableAbsentAbsent
Content
WhoAcceptableAcceptableAcceptable
WhatWeakWeakWeak
WhenAcceptableAcceptableAcceptable
WhereAcceptable
AcceptableAcceptable
WhyWeakAcceptableWeak
HowWeakWeakWeak

Remarks: I rated the Times and Sun items as weak in "publication what" because there was no attempt to put the conclusions in the broader research context. All pieces implied rather than explicitly stated that the purpose of the report was to influence policy (specifically, to bolster high-stakes accountability policies). Only the Times blog noted that the report was not peer-reviewed. All three had "weak" in "content what" because none of them described the measures (individual student scale scores on science adjusted by standard deviation). Only the Ed Week blog entry mentioned alternative hypotheses. None described the analytical methods in depth.

While some parts of reporting on research is hard to improve on a short deadline (especially describing regression discontinuity analysis or evaluating the report without the technical details), the Ed Week blog entry was better than the others in in several areas, with the important exception of describing the non-refereed nature of the report. So, education reporters: can you raise your game?

* - Blogule is an anagram of globule and connotes something less global than blogosphere. Or at least I prefer it. Could you please spread it?

July 8, 2008

300 v. 10,000 and the broader discussion of performance pay

A bit more on Obama, performance pay, and the NEA: I commented yesterday about the Mike Antonucci video of Obama's speech to the representative assembly and the light round of boos when he mentioned performance pay (or merit pay or differential pay: take your pick, it doesn't change the substantive matters). Antonucci responds with more about his impression of the response (whether boos or cheers were louder for Obama, for which segments, etc.). I wasn't there, so I'll take his word that I miscounted from the spectacular audio on Youtube. I'm not sure that matters much either for the politics (which is that Obama is popular among teachers, but he and union leaders disagree most about performance pay) or for the substantive policy.

Charles Barone updated his entry on the matter twice, and here's the relevant matter:

I and many of the people who were passing this around are a little more skeptical than Sherman about what is needed to effect the kind of change Obama is talking about. The teacher quality problem is national. And urgent. It requires a national solution, which is frankly long overdue

Here we see what I explain to my undergraduate students: NCLB and education politics more generally have created a vicious circle of distrust. Because of how states respond to NCLB (some of which is pushed by the law and some a matter of state choice), teachers and parents at the local level have an increasingly negative view of NCLB and states. And because of the same choices, national policymakers and the Beltway view states and local actors with even more distrust.

The argument that Problem X "requires a national solution" is more a reflection of this distrust than a result of serious research or policy perspectives about the role of the federal government. (See Manna, Mcguinn, DeBray-Pelot, Kaestle, and others on federalism in education policy.) The federal government can do many things, and some things it must do, but federal education law is pretty blunt. It has never been a policy scalpel. And everything we know about performance pay and merit pay is that the details matter a great deal, a situation where federal mandates would be disastrous and eventually undercut any transient support for merit pay.

I know that the details matter from my observations of a cudgel-like mandate in my own state and also from my own experience with merit pay in higher ed: my colleagues generally like merit pay because departments are in control of the procedures and vote on them. Test scores play no role, and support for merit pay would evaporate if any of the K-12 schemes involving those were floated here. The most quantitatively-oriented department chair I know is least confident about evaluations of teaching and most confident on research, for a variety of reasons. Even so, my colleagues also support across-the-board raises (salaries at USF are in the fourth quintile of research-extensive universities, in terms of the national distribution) and compression-inversion remedies.

July 7, 2008

300 booing is somehow more important than 10,000 delegates

Former Hill staffer Charles Barone wrote early this morning that a video of Barack Obama's speech to the NEA Representative Assembly last week was being watched closely by "Congressional staff and education policy folks." Barone highlights a point in the speech where Obama says he is in favor of performance pay and where you can hear some booing in the background. "Pretty striking, booing a plan to give teachers who do more work, attain certain skills, or take tough assignments more money."

Barone is taking that moment far out of context, and so is anyone who draws a similar conclusion: what sounds like several hundred people booing is in a hall of about 10,000 delegates, and the cheers at other moments easily outweighed the booing. Even the laughter at Barack's comment after that moment was far louder. Bargaining performance pay is a hot topic among teacher union officers, and it should be clear that many union leaders are highly skeptical of any and all performance pay plans. I don't want to paper that over. There are plenty of reasons for union officials to be skeptical, given the history of arbitrary administrative evaluations before unionization, pay plans that have been imposed without bargaining, or pressure tactics that can undermine local bargaining. On the other hand, I can think of several locals (including those in the NEA) who have bargained performance pay when they have been part of its development.

In the end, Barone's comment is sad evidence of a Beltway mentality: Hill staffers know best. Neither members of Congress nor local school board members nor union leaders inherently know best. Where that type of arrogance rears its head, it undermines what should be happening: discussion.

(Disclosure: My own faculty union was the first to propose merit pay many years ago in the statewide contract, and of all the locally-derived money at USF for collectively bargained raises since our first local contract in 2004, two thirds has been for merit pay.)

June 13, 2008

I was manifest(o)ly wrong

Several days ago, I echoed Steve Diamond's argument that the dueling manifestoes this week are related to "the battle for the soul of Barack Obama." Larry Mishel took me to task in comments, and I will now publicly apologize, since David Brooks has now made the same point Diamond and I did. In his Manichean spin, Brooks claims that one cannot agree with both manifestoes, and that they represent the status quo camp and the reform camp. But wait: isn't NCLB the status quo, and high-stakes accountability the status quo in many states before that? And how does Brooks' one-or-the-other story jibe with Arne Duncan's being a signatory on both? (And per Eduwonk's offhand remark, do we really need another controversial local superintendent bumped up to Secretary of Education?) Quick, everyone: post sentries at the camp entrances!

June 12, 2008

Shared responsibilities for children I

I had intended to blog about the responsibilities of schools for a few weeks, since Harry Brighouse responded last month to April's Richard Rothstein-Rick Hess(-and-others) debate and Matthew Yglesias responded a week later to Ezra Klein's comments on education and the economy. I've been swamped by other things and am writing this first entry (of two) during a fragment of my day when I can't do anything else productive. (This is the background piece: the Uber Education Manifesto Du Jour With Humor will be the second entry.) But, in any case, this goes back at least a few weeks before this week's manifestoes presented Tuesday and yesterday. Then again, I suppose we should really go back to Richard Rothstein's Class and Schools (2004). Or maybe Berliner and Biddle's The Manufactured Crisis (1995). But that's only the recent lineage. Other ideas that will appear later in this entry come from Jennifer Hochschild and Nathan Scovronick, Michael Katz, Miriam Cohen, and Stephen Provasnik, among other historians and social scientists who have written about education as part of the state for about 40 years or more. Well, that's not quite accurate: the current line of academic writings is 40 years old, but the North American debates they've covered are several hundred years old. In other words, the relative responsibility of schools for academic achievement is not something that's new or newly struggled over. My goal in this entry is to identify three key issues underlying the current (and older) debate.

Probably the most important issue is the role of schools in citizenship and the welfare state. Because schooling became closely tied to the rhetoric of citizenship two out of the three times that the franchise expanded dramatically in the past two centuries, we think of education today as a birthright. Primary education became common in the U.S. earlier than in other early-industrializing countries, and as a result education is the primary form of social citizenship in this country. As Hochschild and Scovronick note, we imbue education with many of the same functions that a broader welfare state serves in other industrialized countries: education is supposed to advance economic opportunity, better health, happier lives, and so forth. (The last, most corrupt form of progressive curriculum ideas was called the Life Adjustment movement, and it was the reductio ad absurdum of education as a substitute for broader social citizenship.) So now schools are supposed to do everything from resuscitate the economy to save lives to ... oh, I don't know, cure split ends. There is a legitimate and identifiable human capital consequence to education, but the rhetoric on that is overblown. There is an inevitable temptation to see education as the cure for all ills, and the politics of education is liberally infected with panacea attribution disease. One part of the serious debate over accountability is the precise role of schools, and that is intimately tied to questions about the extent of the American welfare state.

One complication in thinking about education is the fact that elementary and secondary schooling is among the most equally distributed resources in the United States. In the states with the worst inequality in school spending, you'll see maybe two or even three times as much spending for some children as for others. Think about the distribution of other resources: access to health care, housing, transportation. All are distributed less equally than schools, because schooling is part of the democratic state and a right of citizenship by politics and state constitutions. That fact does not excuse educational inequality, but it's something we don't talk about openly or think about clearly.

I think there's a way out from the quagmire I've identified above: schools, other agencies, and families share responsibilities for children. Each is independently responsible for a reasonable but critical role in the lives of young people. Schools are not time machines: they cannot go back and undue what happened or didn't happen in earlier years, nor can they provide health care, clean air, and so forth. Nor can they take over the lives of children. But neither are they or teachers able to use the rest of children's lives as excuses; you take the students you have and move them. Period. The same is true for parents: they're not responsible for teaching their children calculus. But neither are they supposed to sit on their butts when things go wrong in schools, nor is it responsible to neglect their children. Oh, yes, and you're responsible for talking with people in the other roles, too.

There is a crucial advantage of having twin principles (responsibilities for both coordination and independent functioning): It fits with the broad sense of U.S. parents and other adults that both families and schools are responsible for academic achievement. I've pointed out this apparent inconsistency for several years, but in reality it's not an inconsistency. It reflects one reasonable solution to the dilemma: we're all supposed to be responsible.

But there's a sticking point in this grand ideal: given that schools have a serious but limited responsibility, how do we define the scope of that responsibility? Let's assume (for now) that we're concerned primarily with academic achievement. What exactly do we want schools to do? The final issue I want to identify is the series of shortcuts we take when talking about standards, proficiency, expectations, and any synonym you can find to the general concept of what we want children to learn. I have made the following point in Accountability Frankenstein among other places, and no one has even challenged me on it: almost every policy displaces the hard choices about expectations into a different forum. That doesn't mean that I have no expectations for my children or for schools. It just means that the process of turning rhetoric into policy mechanism removes the definition of academic expectations from public debate. Some of us say we want "high standards," but that does not say a single thing except in the politics of symbolism. Reformulating the concept doesn't help: growth models are equally suspect. In short, "proficiency" is a cipher.

Oh, damn: and there you thought I was headed into a Grand Bargain, a reasonable solution to all the fighting over accountability? Unfortunately, I'm an historian, not a Nobel Peace Prize winner. And I have somewhere to be in a few minutes. But do not fear: for those who grumble about the lack of specifics in this week's manifestoes or this entry, just hold on (or read the last chapter in my book, which is available without waiting for the second entry on this topic).

June 10, 2008

Missing out of the action, still

I'm swamped by work, so I'm afraid I'm going to be missing the party today on A Broader, Bolder Approach to Education, the collaborative statement on education policy headed up by the Education Policy Institute. Eduwonkette praises the statement. Richard Lee Colvin cautiously praises the emphasis on early childhood education while noting that it is likely to be controversial. Sara Mead's view is highly mixed. Eduwonk and Mike Petrilli are outright cynical.

I'm going to be late in responding to this (and other major stories such as Ed Week's grad-rate release last week). I'd give my brief gloss on the topic, but I've already written a book on accountability, and I'm too exhausted right now for pithy comments.

May 28, 2008

The test-prep nightmare

Over at Ed Sector's blog, former ES intern Danny Rosenthal describes how a test-prep nightmare unfolded in his Texas school. Towards the beginning of the entry, he writes,
I'm OK with test prep. When standardized tests are well-crafted, as they are in my state, teachers should use tests to shape their classroom instruction. Done thoughtfully, "teaching to the test" is a good idea. But at my school, and others in Houston, we execute test prep so poorly that it ends up hurting students more than it helps them.

The concrete description in the rest of the entry shows what happens in the school where he teaches:

... the sticker exercise told us little about our students' needs...

Mostly, teachers made worksheets with questions only loosely related to each other taken from previous TAKS tests, or, in some cases, from math textbooks that are largely unaligned with the TAKS test. Think panicked college students poring over Cliffs Notes for the wrong novel.

Sometimes, the school made all math teachers work off of the same worksheets, regardless of the fact that they taught different subjects....

Our test prep worksheets aim to review important skills. But oftentimes students have not learned these skills in the first place. And the worksheets don't fix that....

Students choose not to try mostly because they think they have no chance to succeed. That's not their fault. At Hastings, we are far too willing to exchange gimmicky test-prep and other instructional shortcuts for real teaching.

Rosenthal's vision of teaching-to-the-test done right is in line with the argument of Lauren Resnick, if TAKS were such a "good test" (many would disagree), and if that incentive pushed the type of instruction Rosenthal prefers (i.e., good instruction). But that's far too rare.

May 21, 2008

Qualitative data on schools

Yesterday's story in the Washington Post (hat tip) on in-person reviews of schools by external committees is one step in the right direction for accountability: using in-person eyeballs instead of just statistical eyeballs to see what should be done. Rhee sent teams of people into schools she wanted to change. There are some questions I still have after reading the article: why only one- and two-day visits? what did the DC teachers union think of the reviews? what did other stakeholders think? But even if there were flaws with this process, having students, parents, and educators visit schools to provide a snapshot is dramatically different from just looking at test scores and prescribing a cookie-cutter "fix."

(Note: Ken DeRosa pointed out the false dichotomy I had when rushing this entry through yesterday, and I trust this is now more "just.")

May 19, 2008

Political science/political philosophy and education policy

I was going to spend some time last night connecting my weekend entry on hubris to the debate over whether a preponderance-of-evidence standard is right for policy, when I discovered that the macrotheoretical gap had already been filled by Leo Casey's point about seeing like a state, not like an educator. I'm expecting two quick-tongued responses today from other bloggers, but I hope that there is more than a fast wit applied to an argument about the way that states behave and how that shapes education policy debate. I didn't use James Scott's book in Accountability Frankenstein, but I easily could have (and probably should have).

That's probably one logical direction for some good academic work to head in, after the solid work done by Manna, Mcguinn, and Debray (three new scholars: go buy their books!). Education governance is such a complicated mess for some who think about school reform, it's thus a wonderful place for academics to play.

April 20, 2008

The Indiana Jones response to philosophy-of-research blogging

Kevin Carey has his say on a preponderance-of-evidence standard on policy propositions (in response to an Eduwonkette discussion of growth measures). Skoolboy responds. I wouldn't go all ad-lib-for-convenience on you all if it weren't 11:20 at night, but I'm tired, and since this is a meta-discussion about judging teachers based on test scores, I'll just say this: It already happens (firing educators based on test scores), it's called reconstitution, and the evidence of its success is mediocre at best. We don't need to go all meta- when there's experience at hand... or specific proposals such as New York City's (which Skoolboy points out fails the sniff test of basic algebra).

If anyone were tempted to go meta-, I'd point out that there is no such thing as a monolithic social scientist's frame for policy. Then again, I'm not only an alleged social scientist, I'm a card-carrying member of the Social Science History Association and have a degree in one of those odd number-crunching realms (demography).

April 13, 2008

Legislative rolling and the New York budget language on tenure

One more thought on the New York state budget's language placing a moratorium on using test scores to deny teachers tenure: I'm wondering how much of the ire directed at the legislature and the calumny aimed at NYSUT (the state teachers union affiliate) is about the process of how this happened—i.e., without the "right" people in control or at the table.

I suspect the substance of the language is all about the waiting game going on with the end of Michael Bloomberg second term as New York mayor. The use of value-added measures as the sole or a primary tenure criterion is now blocked until after Bloomberg is out of office (and after Joel Klein is also likely to be gone as schools chancellor). Whatever decisions are taken after the moratorium ends will be taken by other people, in other political circumstances.

And it's that fact that makes me wonder about the undiscussed process issue. For the last seven and a half years, plenty of players were ignored in education policymaking. That's why the legislature approved mayoral control: to remove large bunches of stakeholders from the decision-making, in hopes that putting power in the hands of one person (Mayor Bloomberg) would aid significant reform. The political regime that followed that decision is something I'll leave to others to describe (and I suspect it would make a great dissertation for someone in the New York area), but the whole point of mayoral control was to remove people from the policymaking process.

So what happened in Albany? According to the critics of the decision who blamed NYSUT, the teachers union used every lobbying trick at their disposal to hide this provision in the budget while it was being drafted/finalized, while others (Bloomberg and allies) were left out of the process. The tone used by DFER head Joe Williams is one of anger and surprise, a "we was robbed" attitude. One informal term for being robbed and beaten up in the process is "being rolled," and that's much the impression I get from the critics of the language, especially the New York Daily News's referring to Albany as in the midst of a "legislative crime wave." No one likes to be rolled politically, but the irony here is that many of those who disapprove of being rolled in Albany haven't said boo about others' being rolled in NYC.

April 9, 2008

There it ain't -- a rap on The Quick and the Ed's knuckles

In The Quick and the Ed today, Kevin Carey boldly overclaims:

The Times is reporting that, at the behest of the teachers unions, last-minute language was snuck into the New York State budget providing that "teacher[s] shall not be granted or denied tenure based on student performance data." There's really not much one can add to that; it's hard to imagine a more unambiguous declaration of the union's total disregard for student learning when its members' jobs are at stake.

I suppose there really isn't much to add except that the Times article clearly states that the provision in question is not a ban but a two-year moratorium. It's hard to imagine a more unambiguous declaration of the union's caution about buying into rash schemes, and it puzzles me why Carey would make such an obvious omission in a way that undercuts his argument. See Eduwonkette for more links.

April 3, 2008

A dozen questions for an official graduation rate

When the OMB clears the draft regs on counting dropouts, we can expect another wave of stories on graduation rates and what they all mean. Sharp reporters and other observers will ask the following questions of the draft regs:

  1. Does the definition of graduation include or exclude non-standard completion categories such as GEDs and "certificates of completion"?
  2. How does the definition of graduation handle students with disabilities with a modified curriculum (that is, with an emphasis on functional rather than academic goals)?
  3. Is the mandatory measure a longitudinal statistic such as the NGA compact or a synthetic measure such as Chris Swanson's Cumulative Proportion Index? (I will assume until proven wrong that it is a longitudinal measure.)
  4. Regardless of the measure proposed, how many states have data systems that can produce the statistics required?
  5. How does the measure address transfers, homeschooling, migration, and mortality?
  6. For the adjustments proposed for transfers, homeschooling, migration, and mortality, are there any requirements that states audit the corresponding codes in their data systems?
  7. How does the proposed measure handle grade retention (e.g., multiple years in ninth grade)?
  8. Does the proposed measure forbid a state from using the Florida tactic of calling a dropout a transfer if the dropout immediately enrolled in a GED program?
  9. How does the proposed measure handle students who graduate in five years?
  10. Do the proposed regs require that school districts and schools must meet benchmarks in graduation in the same way that they must meet benchmarks with % 'proficient'?
  11. If there are such required benchmarks, is there any supporting research to suggest that the status or improvement benchmarks are realistic?
  12. In crafting the draft regs, did the Department of Education consult with more than two of the researchers recognized to have published in the relevant area, such as Chris Swanson, Rob Warren, Melissa Roderick, Russell Rumberger, Bob Hauser, Michelle Fine, or Gary Orfield? I'm an historian, and we're generally trotted out as mantel decorations for such affairs, if at all, but there are plenty of solid researchers in the area who could be consulted. And if you're a reporter, you need to line up a few of those folks to be ready to respond to draft regs.
I'm exhausted from a third straight fragmented day, looking forward to a fourth one... but I suspect the above set of questions covers much of the ground on the anticipated raft regs defining an official graduation rate.

April 1, 2008

Gradu[r]ated

So U.S. Secretary of Education Margaret Spellings Announces Department Will Move to a Uniform Graduation Rate, Require Disaggregation of Data (the true title of the press release today announcing imminent-but-not-published draft regs defining a graduation rate and only a few words away from the type of book title that would cure almost any insomnia). And George Miller huffs some that it wasn't bipartisan (hat tip to David Hoff on the Miller statement). So what's the buzz about?

  1. Spellings is channeling Adlai Stevenson's approach to governance and proudly announcing bold action on issues that are almost consensual and would happen without her intervention.
  2. Especially for this particular issue, the devil is in the details. Florida has a longitudinal graduation measure, but that doesn't mean it's accurate. If the regulatory language released in draft form would allow Florida to keep doing what it's doing officially, you won't see much in the form of transparency (and at least with two issues, you may see things get worse).
  3. Spellings is hoping the gravitas and charm of Colin Powell rubs off. Admittedly, Powell hasn't (yet) been on NPR's Wait, wait, ...

Maybe this is more evidence that Spellings will run for elected office in Texas and claim that she created growth measures, differentiated consequences, and airtight graduation rates. At least she's not claiming to have invented the Internet...

March 19, 2008

"Differentiated accountability"

Alexander Russo links to news coverage of the Margaret Spellings announcement yesterday that maybe not all AYP failures are the same. Here's some blog coverage:

Spellings went to growth pilots, waivers (or turning the other cheek) to allow tutoring before choice, and now differing judgments on failure to meet AYP after others talked about the ideas for years. I think Spellings is just channeling Adlai Stevenson, who once quipped that leadership is seeing where the crowd is heading and getting in front of it.

(Does anyone know the exact wording or source for that?)

Florida ed policy and politics

The legislative session is in full swing (or a more colorful noun), and a bunch of things are in the air either in Tallahassee or elsewhere:

1. Both houses of the state legislature are considering bills to change the role of state testing (FCAT), either by adding other information to the labeling of high schools (the senate's approach) or by a compromise bill that discourages test-prep and sets more specific grade-level standards (the proposal in the house).

2. The ACLU sues Palm Beach County for its low high school graduation. Superintendent Art Johnson suggests it's the state's fault for not providing enough money (scroll down for "But the superintendent..."). (Disclosure: A 2006 paper of mine is mentioned in both stories.)

3. Something that wasn't covered in my local papers in January: Holmes County administrators have banned students from displaying anything related to gay pride. The ACLU of Florida sued. I suspect this one's a no-brainer in a bench trial: in the majority opinion in Morse v. Frederick, Chief Justice Roberts made a distinction between what he thought of as the political speech of Tinker and the display of "Bong Hits 4 Jesus."

The only interest the Court discerned underlying the school's actions [in Tinker] was the "mere desire to avoid the discomfort and unpleasantness that always accompany an unpopular viewpoint," or "an urgent wish to avoid the controversy which might result from the expression." Tinker, 393 U. S., at 509, 510. That interest was not enough to justify banning "a silent, passive expression of opinion, unaccompanied by any disorder or disturbance." Id., at 508.

I think that reasoning clearly applies in this case.

March 11, 2008

Defending Effective Accountability and Assessment Practices

Saturday, March 29, 2008
10:45-12:15
Hilton Washington

Defending Effective Accountability and Assessment Practices is the title of the session I'm a participant in at the NEA/AFT Higher Education Joint Conference.

From what I understand, the tentatively-slated participants include staff members of two institutional associations as well as us faculty. As soon as I have permission to post those names, I'll do that.

February 28, 2008

Is the blind spot on higher-ed accountability that big?

In all the kerfluffle over the senior theses of Hillary Clinton and Michelle Obama, I hope I am not the only person asking the other question that I think is obvious and to the point: What do the theses tell us about the state of undergraduate education for Princeton and Wellesley students at the time?

Similarly, all those who huff and puff about higher-ed accountability are ignoring a huge source of information on the quality of graduate education: dissertations. Want to know what the expectations of students are really like? Go read what students create, when they know it's going in the library, going to be microfilmed, or going to be available electronically to the world.

February 25, 2008

NCLB and where we sit

In my undergraduate social foundations class, I spend some time explaining the politics of accountability. For the last few years, a critical mass of students (either a majority or a vocal minority) have consistently opposed accountability, taking on the mantle of professionalism, and it's my job to rattle their cages and make them see things using at least one other lens.

I usually explain things in words something like the following:

Views of accountability depend dramatically on where you are. At the classroom level, teachers trust what they do and would like to trust parents but aren't exactly sure. Parents may want to trust teachers, if their children's experiences have generally been decent, or may be entirely untrusting if not. Principals generally trust their own judgment and would like to trust teachers but have a supervisory responsibility (and the level of supervision they exercise will depend rather dramatically on a variety of factors).

Once you get above the level of the school, each level tends to want to impose some accountability on the level below it. For NCLB purposes, the key issue is the state/feds split: in a number of states, officials in the state capitol don't trust local districts and feel that it is their responsibility to regulate the districts, while a number of federal officials are skeptical that states will do the right thing unless there is a federal level of accountability.

NCLB forced states to define a variety of measures and set targets for those measures. At the local level, the state plan is often viewed as onerous, unreasonable, and inflexible. But the state plans are inherently compromises, and so various parties in Washington have looked at the state plans with skepticism.

For example, let's take a look at graduation, which states often defined to mean one minus the proportion of high school students identified as dropouts. That too-easily-falsifiable "dropout rate" is very low in many places, for reasons largely unrelated to the actual proportion of teenagers who graduate from high school, and the official graduation rate if defined as the complement will be wildly inflated.

To local residents and some educators, it looks like the state is hiding a sizable dropout rate, which many view as a consequence of out-of-control accountability systems. That's the type of local or educator-centered view many of you have described.

But you also need to look at it from a federal perspective, from those who see state plans and state commitments with enormous skepticism. To them, what would be the logical conclusion drawn about such graduation rates?

Linda McNeil et al.'s recent article on high-stakes accountability in Texas and Charles Barone's entry today, The Games States Play: Graduation Rates, are Exhibits A and B the next time I have this discussion.

Wrong incentive structure for community colleges/technical training

George R. Boggs and Marlene B. Seltzer describe Washington State's incentive structure designed to encourage community colleges to push completion:

Washington's community and technical colleges will receive extra money for students who earn their first 15 and first 30 college credits, earn their first 5 credits of college-level math, pass a pre-college writing or math course, make significant gains in certain basic skills tests, earn a degree or complete a certificate. Colleges also will be rewarded for students who earn a GED through their programs.

On the one hand, focusing on proximate measures on the way to degrees makes enormous sense, at least if we trust Cliff Adelman's work. On the other hand, I worry that such an incentives structure will affect standards in institutions with weak faculty governance and protection of academic freedom: "We need these students to pass these credits, or we lose money."

Better incentive structure: if public funding plus current tuition is sufficient for an institution's operating expenses (a rather big if, as I'm aware in Florida), keep the hands off the potentially perverse incentives inside the curriculum and give students an incentive to do well by keeping tuition stable for students as long as they make steady progress towards degrees. In other words, tuition stability (or a cap on rising tuition) is guaranteed if students are doing well.

The institutional incentives then can be geared towards summary graduation measures, to some extent. Florida's universities are having their first bite of outcome incentives this year, but the budget cut is swamping the effects of it. (Here's the motivational undermining: You don't starve people and then tell them they can earn a little bit of pin money if they work harder. At this point, at least for the universities, it's a matter of looking to the future and probably a system negotiation about formulae.)

There's a lot more to be said about higher-ed accountability, including Gerald Graff's commentary on assessment and Erin O'Connor's response, but I have to chair a proposal defense in 10 minutes...

Update (2/27): Kevin Carey responds:

I'd like to propose that people be more judicious and precise in their use of the term "perverse incentives" by not applying it to any incentive that could theoretically cause someone to act in bad faith.

I'm not going to split hairs by pointing out the adverb potentially up in the original entry (okay, originally potential and then changed to potentially); if I understand it correctly, Carey's argument is that we should not say something is a perverse incentive unless we can really point to the evidence of strong corrupting influences. In this case, my argument is about the pressures on instructors, not students (something different from what Carey inferred). Are colleges susceptible to such corruption when institutional stakes are tied to individual course grades? The scandals each year tied to athletics (e.g, FSU and tutors who helped athletes cheat) tell me the answer is yes.

Teacher performance-pay distributions in Tampa

Yesterday and today, the St Petersburg Times has been covering the distribution of performance pay among different schools in Hillsborough County (one of the few in Florida where the union and school board agreed to the state's merit-pay provisions). See the main story from yesterday and also a tale of two teachers, a basic Q&A sidebar, and then play around with school-level statistics.

What the Times has documented is that teachers were more likely to receive the bonuses in schools where students are more likely to be from well-off families. The district says they'll tinker with the formula for next year. While I love David Tyack and Larry Cuban's book with tinkering in the title, I'm skeptical that tinkering will work in this case.

February 14, 2008

Helen Ladd's common-sense approach

I'm biased because I've made the same recommendations: In a late January Ed Week commentary I should have pointed to earlier, the Duke University professor says we should be Rethinking the Way We Hold Schools Accountable.

February 12, 2008

On excuses for unintended consequences

Oh, my: I head out of town for a week, and when I get back there's a trail of tears blogs on curriculum narrowing:

While there is some question about the extent of curriculum narrowing that followed NCLB (see: no causal language there), the basic argument in these entries is over whether NCLB creates incentives to narrow the curriculum and the extent to which the variation in curriculum narrowing shows that schools don't have to narrow the curriculum to do well on tests.

(...except for Eduwonk's red herring about low bars, which essentially is that because states can set relatively low thresholds for proficiency, that eliminates the incentive to narrow curriculum, stuff test-prep into the kids up the wazoo, etc. No economist or behaviorist would accept an argument of "hey, the marginal change required is low, so that doesn't create an incentive for changed behavior." Either would reply that's a question that should be left to evidence, not speculation. I'm not an economist or a behaviorist, but I don't buy the hand-waving about low bars, either. And, as 'kette points out, isn't NCLB supposed to change behavior? You can't simultaneously say NCLB is changing some behavior you like without acknowledging that it has the potential to provoke behavior we don't like.)

If we agree that thousands of schools are making poor decisions in response to the pressure of test-based accountability, then the operative question is, How do we help schools and educators make better decisions? Charles Barone and others suggest we hold up exemplars and say, "Follow them." That's the effective-schools-literature strategy, and we've paddled that boat since the late 1970s without getting where we want, so we know at least that it's not enough. Robert Pondiscio and other core-knowledge or other-curriculum standards folks would say, "Build the curriculum, and they will follow." That's a step towards regulating input more than outcomes, which I suspect will not be politically viable, but I may be wrong. George Miller, Ted Kennedy, and others propose to increase the number of measures used, with legislative language that assumes that AYP can be finely tuned. I don't buy that argument: test-based accountability is a cudgel, not a scalpel. My instinct is to say, Watch the decision-making, but that's because I distrust black-box handwaving, and I know it's hard to operationalize a procedural standard within a test-prep culture.

The meta-political question is deeper and one that I think most people understand in spots if not generally: you either own reform or you lose the reformer label. If you do not acknowledge problems through implementation and own them, you give up a huge chunk of credibility. Whether I agree with them on an issue or not, I give credit to Ed Trust for occasionally identifying problems with implementation and deciding to own the issue (e.g., growth models). They haven't done that with 100%-proficiency goals or test-prep (yet), but it's a healthy dynamic where they have done it. You could say the same with Fordham and curriculum-narrowing (or Diane Ravitch with the same issue plus test-prep). Or Miller and Kennedy and 100% proficiency (though their concrete ideas on those points are Rube-Goldbergesque).

I haven't seen that nearly as much with Barone, Eduwonk, or some others, and the failure to own problems with NCLB ignores the fundamental fact of post-NCLB politics: Parents of public-school children are far more skeptical of test-based accountability than they were 5 years ago. Own the problems or lose control.

February 11, 2008

Probably not what Tallahassee or Beltway policy wonks intended

So some Florida teachers were fired because they were abusing students, letting a classroom get out of hand, not being prepared ... but the state has forced the reinstatement of the teachers because the districts did not rely on test scores to make the personnel decisions.

Can someone explain to me how this makes sense?

February 7, 2008

One more follow-up on Kennedy/Miller endorsement and NCLB politics

Just one more datum on speculation about the Kennedy and Miller endorsements of Obama mean for NCLB (little, I've said before). Let's suppose for a moment that all this is true, and that the stars are lining up behind Obama from the Democratic Forces for NCLB. If you believe that and the bundling hypothesis about donations to campaigns, and if you know where Bill Gates stands, where do you think the majority of donations from Microsoft employees would be going?

Wrong: Clinton.

February 3, 2008

Matt Miller's fallacy

I must have had a busy month to wait several weeks before correcting the record on Matt Miller's Atlantic article, First, Kill All the School Boards. The real problem, he says, is all of those selfish, parochial school board members and the unions who manipulate them. He paints a romantic picture of Horace Mann, repeats both the truthful and the hoary cliches of the past quarter-century of school reform, and calls for nationalizing education.

To put it briefly, Miller falls into the standard "let's fix the governance structure" fallacy of a certain chunk of education reform wannabes. I just don't buy it. If school-board parochialism were the main problem, then we'd find Hawa'i's schools outdoing the rest of the country because of its unitary system. Or we'd find Southern states outdoing the north because many of them have mostly county systems, in contrast to Northern and Western states with tiny, fragmentary districts. Or New York City's system would be perfect today because of the elimination of the elected school boards through mayoral control. I'm sure that there are governance changes that would matter, but this one? It's bold, provocative, simple, and not very helpful.

Miller refers to a comparative study of education policymaking by economist Ludger Woessmann, and I need to track that down, but I suspect it will support Miller's argument less than he thinks, at least from other writings of Woessmann that I've come across. We'll see.  In the meantime, here's a bit of cold water on the everyone-has-national-standards argument, taken from Accountability Frankenstein:

[N]ot all industrialized countries have a national curriculum framework: Spain and Hungary have a common core, but regions have the authority to adjust the core curriculum or add to it. Italy's and Argentina's curriculum planning has become less centralized in the past decade. Australia, Canada, Germany, and Switzerland have federal systems, like that in the U.S., where there is no central curriculum authority (Chisolm, 2005; Gvirtz & Beech, 2004; Jansen, 1999; O'Donnell, 2001). Even among countries with a centralized curriculum, the focus varies widely (Holmes & McLean, 1992). The United States is not out of step with the world, because there is no international consensus on the appropriate control of curriculum and expectations (or standards), let alone the content.

February 2, 2008

Bill Clinton's Ego, redux

I think Leo Casey is wrong about the politics of Bill Clinton's slamming Ted Kennedy. Since I agree with Leo on a large swath of education policy, including the effects of NCLB, I should explain a bit. For the most part, Hillary Clinton and Barack Obama share significant rhetoric on education and quite a bit of fuzziness on the details. They've both said NCLB has serious flaws, but it hasn't been a focus of their campaigns. That's not much of a surprise, because, despite the efforts of Ed in '08, education is not a huge issue in the campaign. (Bill Gates, get behind in line the folks who want a presidential debate around science.)

Over the past few weeks, both George Miller and Ted Kennedy have endorsed Obama. Has Obama said he agrees with Miller and Kennedy about NCLB? No, not to my knowledge. Maybe he did a backroom deal with both of them about reauthorization, but I've already explained why I think that's not the likely reason for both endorsements.

After being chastised for going after Obama directly and crudely in South Carolina, Bill Clinton did his best to undermine the endorsement of a liberal icon, by linking Kennedy to Bush:

No Child Left Behind was supported by George Bush and Senator Ted Kennedy and everybody in between.
Let me make this clear: I don't think Bill Clinton gives a hoot about NCLB right now, but if he can use it to smear Kennedy and undermine that endorsement, he will. To that end, I think Charles Barone's line-by-line response is tangential. The only phrase that Bill Clinton wanted to get out was "George Bush and Senator Ted Kennedy." Yeah, he can spin a policy tale out of that, but that's not the point.

I know that Hillary Clinton freely acknowledges that she cannot carry a tune in a bucket, but in this case, it's Bill Clinton who's tone-deaf.

February 1, 2008

At least Timothy Leary chose to drop out...

I think I understand Leary's choices, or at least the temptation: It's the end of two very tiring days, when I had a chance to talk for a few hours with one of the folks who tore down Florida's old Pork Chop Gang. Short story: an undergraduate I've been mentoring for a few semesters had an internship with the law firm of this Florida political hero, and after e-mailing back and forth, he needed some questions answered about the background of his senior thesis. So he proposed a joint meeting, first scheduled at the law firm and then moved to my office. I was expecting it to go about 90 minutes. It lasted 150 minutes instead. So we got off on various tangents, since he had the personal experience and I had the history, but the student said it was worth it. I had several meetings today (some planned, some impromptu, some deferred). Lots of things delayed, which is my life these days.

But even if deferred for a few days, the new English-language article of EPAA is out: Avoidable Losses: High-Stakes Accountability and the Dropout Crisis. Its authors combined interview work with following students in Texas as they were left behind in 9th grade and then dropped out. This is very difficult work to do, and the findings are provocative. Two stand out for me: that principals know that they are choosing between education and satisfying the test-score gods, and they reluctantly choose to satisfy the gods; and that to students, there is no distinction between accountability and all the practices that alienate many of them from high school. To the students in this Texas school district in the late 1990s and early 200s, there is a single massive bureaucracy that held them back, denied them opportunities in part to game the system, and never told them that their education was being sacrificed in the name of pressure whose putative goal was to ensure that they were not denied educational opportunities.

Whether you agree with the article's authors or not, I suspect it will be discussed vigorously, which is all to the good. A few years after Jennifer Booher-Jennings' article on triage in Texas, one of the models for NCLB continues to be a focus of criticism and debate.

(No, I've never taken illegal drugs, nor have I ever been tempted to, in reality. But I live on antihistamines when I have a cold...)

Evaluating college teaching

Since my energy is now sapped, I'll address Eduwonkette's four questions from yesterday:

1) How should learning be evaluated in college?

There are two separate questions (what did individual students learn? and what did groups of students learn?), though I think Eduwonkette is asking more about personnel evaluation. The first two can be evaluated using similar questions and data (including student work!), as long as you acknowledge that classroom dynamics can change things quite a bit. Usually, the first question is tied to students' individual grades, and the second is water-cooler (or coffee-urn) talk among colleagues: how was your class in HVN 101 this semester: better than HLL 666 last semester? Faculty rarely get to ask the second question in more systematic ways.

2) Are course evaluations a fair and comprehensive measure of college teaching?

Eduwonkette is either asking a trick question or conflating the end-of-course surveys that students take with either course evaluation or personnel evaluation. Students are evaluating their own experiences throughout a term, so the survey is more a chance for them to express the conclusions they have already reached, in some fashion, at least if the survey items are at least tangentially related to their concerns. Evaluating a course should involve student feedback but also something about what students learned, not just what they felt or expressed. And evaluating faculty as employees involves additional layers involving their contributions to a course, other information and context often unknown to students, let alone research or service assignments.

3) What should universities do with student course evaluations?

See above on my desire to ban evaluation as the term used for student surveys. But to answer the substantive question: they should be written with input from faculty, include an item on how much effort the student expended on the course (for a few reasons), be available to students (except for graduate students, who are students as well as employees and thus should have some privacy protections), and be part of program and personnel evaluations.

4) What are the potential risks/benefits to students and profs of making them public?

When I was a student, I found the comments far more telling than the numbers. But I suspect that this doesn't have to be theoretical or based on anecdote: there have to be institutions where the survey responses are public, and where one could study the consequences. See above on the graduate-student privacy concerns I have.

January 31, 2008

Higher education and the wrong battle

At Education Sector, Kevin Carey (a 4 out of 5 in my book) has an institutionalist lens that is sometimes incisive (4.5 out of 5) , sometimes frustrating (2 of 5), and occasionally both. Such as his complaint yesterday about the "Higher Ed Lobby" (my quotation marks, which are probably 1 out of 5 on style). Here's the gist in his complaint about accreditation agency politics:

But accreditation does a terrible job of creating or providing any kind of public, comparable information about institution-level academic quality.

I'd rate that comment as a 3 out of 5, and the post in general a 2.5 (in comparison with Eduwonkette, whose posts are averaging about 4.87 in the last few months). There are multiple arguments layered into that one statement, but let me focus on two:

  • Lax accreditation has played a significant role in letting the quality of (undergraduate) instruction be lower than it could be.
  • What we need to improve undergraduate instruction is predigested comparisons of quality between institutions.

Thus, yesterday's statement of principles by the Association of American Colleges and Universities and the Council for Higher Education Accreditation is unlikely to satisfy Carey's concerns because it resists the notion that creating quantitative comparisons of student outcomes is a necessary part of the accreditation process. Delving into the broader issue at length requires more energy and time than I have this morning, but I'll put out a few counterclaims:

  • As long as millions of parents and students perceive that they are buying a degree from a college, there will be an inevitable tension between credentialism and the "use value" of a college education. In this environment, accreditation has to answer the face-value "does this college provide an opportunity to learn, and is the degree legitimate?" question.
  • The most savvy students and parents want more than U.S. News rankings, but they're not going to give a hoot about what irks Carey and me about the rankings. Instead, savvy students and parents want to know what happens in the classroom, the lab, the studio, and the field. A case in point: last year, one teen acquaintance of mine was looking for colleges with performing arts programs. In the end, she was accepted to two schools with outstanding reputations, one with local connections that are unbeatable in this subfield, and the other that's in another region, perfectly reputable, but without those networking opportunities. She had the opportunity for one last visit to each place, and what made the difference was watching students rehearse and perform. There was no faux objectivity. My young friend watched students work and decided that the less-networked place had the better education because there was a pop to the work in one place that just didn't exist in the other.

My friend and her parents (whom I've known for years) cared about comparisons, but not predigested ones. They made their own ranking. Kevin Carey, Charles Miller, and others may want to see predigested measures, but they'll be swimming upstream against credentialism, against the needs of students and families who really do want information about educational quality, and against the professional judgment of faculty. Framing the issue as one of the White Hats against the Higher Ed Lobby does everyone a disservice.

One more thing: Last week I tried an experiment and allowed readers to rate my posts on a 1-5 scale. I tried priming the pump by rating a few of them (no, not all 5's), but no one else participated, and I pulled that option. I guess maybe some people are interested in ratings, but not my blog's readers.

January 30, 2008

Chemistry or test-prep?

In Palm Beach County, high schools are ditching real science for FCAT prep. And I thought the election results were the most depressing news of the morning!

January 29, 2008

Alfie Kohn and Diane Ravitch agree!

This week, the zeitgeist in education news is paying students for test scores, as in the Baltimore Sun article yesterday or the USA Today story, but so-called incentive programs have been in the news before and criticized before: See criticisms of Pizza Hut's "Book It" program or Barry Schwartz's column last July, which scored New York City's initiative to pay students for test scores. While they sound good in theory (reward kids for doing well!), it rubs a number of people the wrong way, including Elena Silva of Education Sector, Diane Ravitch, Eduwonkette, and even conservative Liam Julian, who criticized such programs last year (though I'm linking to my blog entry because the original column has suffered linkrot). And virtually the whole education world knows about Alfie Kohn's opposition to tangible incentives. So what could possibly bring folks from very different stripes together; after all, as Robert Pondiscio points out, isn't giving one incentive the same as giving any incentive, and all we're doing is haggling over the price?

First, a bit of disillusionment: while Kohn and Ravitch both talk about intrinsic rewards, I suspect only one of them will agree with the second half of the reasoning below.

There are two problems with paying students cash for achievement. One is that these programs are not finely calibrated. Whether they reward status achievement (straight As or a certain score on standardized tests) or some sort of growth/effort, there are going to be some rewarded students who did not work hard for the reward and other unrewarded students who probably deserve it. Two consequences flow from that fact. First, students will perceive it as unfair, once the money is doled out. Well, maybe we should be teaching teenagers that "merit pay" isn't always distributed on an equitable basis (see Robert Dreeben's work), but I suspect a program that doesn't pass the adolescent sniff test for fairness will alienate rather than motivate students, with the consequences magnified because of the money stakes. In addition to the fairness issue, there is the research question of whether rewarding students' focused effort and improvement is better or worse than rewarding status. Most program administrators probably make decisions based on seat-of-the-pants judgments rather than the research.

There is a second problem with paying students cash for achievement, and that is the question of the reward itself: will it promote continued effort, or will it be tangential to effort? A case in point from my own experience as a parent, and that of many other parents: you go to the library with your elementary-school child and borrow some books that the child chooses. You all return home. The child reads the book. What is the reward for the child's reading the book? My wife and I didn't think about it at the time in this way, but what our children chose was to return to the library to get more books. The reward was another library trip, which promoted reading. Many math teachers have bonus questions on tests to keep some occupied when they finish the main questions earlier than other students. But the bonus questions also reward completing the test by giving the students more opportunity to challenge themselves. Students of moderate means who work their tail off in high school should be rewarded by an opportunity to attend college at reduced cost (a scholarship), which promotes learning. And so forth.

From this, I'd argue that the more fundamental problem with rewarding achievement with cash is that such rewards do not promote additional learning. While Roland Fryer (the designer of NYC's incentives program) is obviously a very smart new scholar, he is thinking of the rewards from a fairly narrow perspective, assuming that all incentives are fungible and ignoring the post-award uses of rewards. We know that Pizza Hut is engaged in marketing rather than a promotion of reading because it rewards kids with pizza instead of with books. And we'll see appropriate incentives when their use is intimately tied to additional effort.

January 28, 2008

Party trumps policy

Last night, Leo Casey hypothesized on Edwize that Kennedy's endorsement of Obama was related to NCLB. Like Scott Elliott (a reporter with the Dayton Daily News), I'm skeptical. While George Miller and Ted Kennedy have both endorsed Obama and are major figures in NCLB politics, they are also stalwarts in the Democratic caucuses in each side of Capitol Hill, and a significant obligation of such folks is to defend the Congressional majority. The defense of that majority will depend on how well Democratic candidates perform in historically Republican states. As Matthew Yglesias has pointed out, within the Democratic party, Obama is convincing officeholders in Republican-dominated states that he can not only win the White House but help Democratic candidates for lower offices.

That potential contrasts with one of the signal legacies of the (Bill) Clinton administration, a cannibalization of the party by the top of the ticket. While Bill Clinton's fortunes thrived, the Democratic party's did not. I don't think Hillary Clinton is nearly as egotistical as her husband, but downticket potential is probably more important to endorsements than the few inches that separate Clinton and Obama on No Child Left Behind.

January 23, 2008

Value-added, with botulism

Before Kevin Carey proclaims that value-added [method] comes of age, he might want to read the real true facts behind the New York City teacher value-added project, wherein we learn that the city's great statistical experts thought three children were enough of a sample on which to base a teacher evaluation, or maybe the ethical problems with the NYC project, or maybe even my comments on value-added or growth measures in Accountability Frankenstein.

No matter what else you can say about growth measures, NYC's project is about the worst example I can imagine to use if one wanted to push the approach.

Update I: Carey responds in his post:

It might [have methodological problems, in NYC], I don't know, I guess we'll find out. But, per above, methodological issues can be worked out, and anyone who thinks the hysterical reaction to the value-added initiative stems from a deep and abiding concern for statistical integrity is willfully not paying attention.

The claim that "methodological issues can be worked out" is evidence that Carey hasn't read the writings of professional researchers who point out that growth models are no holy grail. I am one of those who have written about the difficulties inherent in growth models, but certainly not the only one.

And my response isn't hysterical; it's simply disgusted with the latest shenanigans from Tweed. The title comes from a wordplay (when food "comes of age," you don't really want it).

Update II: Best comment in response to Eduwonkette: skoolboy, who writes, "I'd characterize the New York City Department of Education as loving data but hating research."

January 20, 2008

Where does effective reform come from?

Thursday, Andy Rotherham challenged historians of education:

[H]ere's a question for the historians that might help explain why education does careen from one thing to the next. What are the most compelling examples of where the education system has reformed itself in ways that have demonstrably benefited students? Haven't most of the reforms, for good and ill, come from influences on the outside, whether higher ed leaders, business, etc...?

I'm not sure Rotherham was responding to Diane Ravitch's plaintive query fairly (I read Ravitch's argument to be that the content of Michael Bloomberg's and Joel Klein's reform ideas is nonsense), but let me answer the question as best I can. As David Tyack and Larry Cuban point out in Tinkering toward Utopia (1995), we sometimes confuse noise for reform.  Well, that's not quite their point: they argue in an early chapter that you have to distinguish between cycles of reform rhetoric and institutional trends. We can't look just at the visible reforms, the ones that have someone shouting from the rooftops about them. In other words, the only reforms that might pop up on Rotherham's radar screen would come either from outside reformers or from the louder inside advocates.

But "the most compelling examples of where the education system has reformed itself" might lie precisely in institutional trends that are tough to identify as coming from a specific set of pressures. I would argue that on the whole, elementary schools treat children much better than they did a century ago: only rare beatings (which provoke outright shaming if they become public), much less physical punishment, and a much higher proportion of teachers who understand better ways of motivating kids. That doesn't mean that everyone is perfect, just much better on the whole than teachers from a few generations ago.

One could make a pretty good case that the consistent rise in NAEP math scores in many states is the result of changing practice. As I've argued before, the National Council of Teachers of Mathematics is not perfect, especially in how it communicates ideas, but my guess is that math instruction is slowly shifting, with more use of manipulatives and other varied repertoires in early grades and also in early childhood settings. Again, nothing is perfect, but as a child I never encountered the easy introduction to graphing that my own son had when he was in preschool in the 1990s. (It involved tasting fruits and vegetables, with children in the class putting up an icon of the food when they liked the taste. The result was a vertical bar chart of preferences by food.) I don't think that came from outside schools.

That doesn't exonerate school officials. I've criticized Tyack and Cuban's incrementalist framework, using desegregation as the obvious counter-example. But that history doesn't quite provide an argument in favor of mapping business rhetoric onto schools. Among other things, there's only one city I know where desegregation was supported by the business community: Charlotte. And where were today's advocates of high-stakes accountability in the 1980s and early 1990s, as Presidents Reagan and Bush were appointing federal judges who eventually undermined and reversed the pressures for desegregation? I think only Miller and Kennedy get credit there, and I can think of several who actively tried to undermine desegregation.

I'm not sure that Rotherham's question is even a relevant one: the fact that we can find a few examples of where outside pressure was absolutely appropriate doesn't mean that it's a panacea. Sometimes the "I'm an outsider" and "reform is inevitable" rhetoric trumps informed judgment. If "I'm a professional; trust me" is fallacious, so is "I'm a businessman; trust me."

January 17, 2008

Ranking creates perverse incentives; ranking of lunchtime and liberal-arts colleges, doubly so

Inside Higher Ed has a  great article today, Potemkin Rankings, on how Washington and Jefferson College did everything you'd normally think is right to improve how they look to outsiders and still sank in the U.S. News & World Report rankings. The short story: W&J recruited like crazy to increase the applicant pool and managed to increase selectivity while starting to increase enrollment, hold down the full-price tuition, and still maintain a good faculty-student ratio. Because other liberal-arts colleges increased their endowments and tuition faster, W&J sank in the resources area and thus in the U.S. News ranking.

The problem here is not just with U.S. News. You can find that with almost any system that reduces a complex set of data to a simple ranking. Because the quality of any complex service is never going to be monotonic, there will be inconsistencies in any reductive ranking depending on the relative importance of different factors in the final (reduced) rating. This year, Education Week's Quality Counts report includes a weight your own factor feature, where you can re-rate an individual state based on your own idea of how important you find different elements in the Ed Week database. Well, not really: it looks like the mix within an individual subscale remains the same in the summary number, even if you can come up with different subscale scores. And there's no way to see how the rankings might change based on different weights. (I guess the Ed Week editors didn't really want people to look too closely at the rankings, or at how robust/fragile they might be.)

January 8, 2008

Sixth anniversary present for NCLB

So the Sixth Circuit Court of Appeals has revived the 2005 "unfunded mandate" NCLB lawsuit, and here is where things get interesting, because the original complaint is an interesting argument about statutory limits to the power of the purse, tied specifically to NCLB language that lifted mandates that were not paid for. Given the language of the appeals decision, this is going to be a lot more interesting on reargument, and with the current composition of the Supreme Court, I refuse to hazard any prediction about ultimate disposition.

But it won't get to the Supreme Court, because NCLB will be rewritten before it gets that far. Here are the real consequences of the lawsuit: If the plaintiffs win at the lower-court level or if the Sixth Circuit steps in for the plaintiffs in a substantive manner (as opposed to the procedural decision this week), that victory would shift the initiative in reauthorization. On the one hand, those critical of NCLB provisions will be able to be patient, in contrast to supporters of most of the current structure. On the other hand, without the pressure ratcheting up on schools, NCLB critics may not have quite as much organizing energy behind their battle, and that energy may shift to those who support most of the status quo.

January 7, 2008

Ted Kennedy and frames: 51 to go

Last Thursday, I recklessly created a set of predictions for major 2008 education stories and in the top item (on NCLB) wrote,

If I were a senior member of an education committee, I'd work throughout the year to establish some consensus that would hold at least reasonably well no matter what the results of the election.

Lo! and behold! Ted Kennedy has fulfilled my prediction in less than a week with today's Washington Post op-ed column. To be honest, that's only in the first week, but I suspect we'll see plenty of such efforts in the next 51.

December 21, 2007

Guesting on Edwize!

I've gone and committed guest blogging over at the UFT blog Edwize. The gist of the argument is that Joel Klein's pulling a Microsoft-like maneuver with accountability.

And he's the guy who prosecuted Microsoft for antitrust violations.

December 8, 2007

Waiting for the criticism of Winerip

Michael Winerip reports tomorrow on a new ETS report by Paul Barton and Richard Coley, The Family: America's Smallest School. Shades of Moynihan's response to Coleman, anyone? (And does anyone else know the reference for that?)

I expect the blogs next week will be full of criticisms, at least of Winerip's reporting if not the report. It'll be interesting to see if there's some substantive discussion along with the criticism.

Update: Charles Barone was first off the blocks on this. I wish he weren't so consistently sarcastic; it distracts from the analytical points he's making about Winerip and ETS, and those points are important, if not as much of a trump as he implies.

December 7, 2007

Whose values would be valued in a neoliberal education world: Michelle Rhee's or Marc Dean Millot's?

Marc Dean Millot explains why he's a critic of DC Chancellor Michelle Rhee (hat tip), and here's the key paragraph:

What I see in Chancellor Rhee's approach, abetted, permitted or endorsed by Mayor Fenty, is 1) insensitivity and arrogance towards others, combined with 2) a reliance on fear to control staff, and 3) a considerable willingness not to apply analogous performance criteria and public criticism to themselves. Managers cannot be harder and harsher with others than they are on themselves and expect support from their staff, respect from their board, or trust from the public. And managers without all three cannot succeed in a turn-around.

There are three points here. One is the immediate and obvious one: Humiliation and denigration are not great motivators, nor is "making an example of" a significant proportion of the people you work with. I don't know Rhee, but this is not the first time I've seen reports of her approach to people being problematic. And Millot is right on the general principle.


The second point is that mayoral control of schools is no panacea and often a fig-leaf reform. As Monday's Washington Post story on the matter indicates, politics don't disappear with mayoral control. And that's why I was disappointed to see the brief mention of David Tyack's One Best System in Wong, Shen, Anagnostopolous, and Rutledge's new book, The Education Mayor. Tyack showed how governance reformers in the early 20th century claimed to be "taking politics out of school" in changing ward-based urban school boards to nonpartisan boards often appointed by courts or mayors. Wong et al. seriously misread Tyack in claiming that the historical lesson is that we need to keep politics out of school. Tyack documented how the new boards may have been nonpartisan but were certainly political, elitist, highly connected, and contributors to instead of brakes on bureaucracy. We have seen plenty of the last (continuing bureaucracy) in Chicago and New York City, where mayoral control appears to have changed the address of the bureaucracy instead of the basic facts. Beyond the obscuring of bureaucratic continuation, the arguments in favor of mayoral control contain a romantic view that is all too familiar to historians: change the structure and you can reduce if not eliminate the presumably nasty consequences of education politics. There are at least two fallacies in this romantic view: An unrealistic view of structural change as a panacea, and the blithe assumption that we'd want public education without politics. As long as education is tied to citizenship, politics will inevitably be involved, and that's not a bad thing. (You think Brown v. Board of Education and Title VI of the Civil Rights Act of 1964 weren't political??)

The third point is obvious in the today but subtler when looking at the long term (or long duree if you're a devotee of the French Annalist school): there is a distinction between policy and approaches to handling people, and you don't know what will win out in the end. You can agree with the policy orientations of people whom you'd never trust (Millot's response to Rhee), and you can see and admire the human qualities of people with whom you have fundamental policy disagreements (me and Mike Huckabee, to take one example; I mean my view of him, not the converse). Often, the historical perspective focuses on the policy issues instead of the person, in part because extant records that focus on personality are often sensationalist instead of subtle. One exception is the record of a few common-school reformers from the early 19th century, whose views on "school management" were an intimate and conscious part of their ouvre. While one or two of the crankier education historians from the 1970s portrayed Horace Mann and his ilk as 19th century Darth Vaders, top-down class-oriented stealers of democracy, the truth that good historians of various stripes recognize is that a number of class-conscious reformers had a serious argument about the need to be kinder to students. One of the arguments for women as teachers was that they'd be more nurturing. (Sexist? Yes. Motivated by some understanding that beating kids isn't great? Absolutely. Ignores the fact that in the 19th century, women as well as men beat students? You bet.) And Mann is famous for pointing out that Massachusetts teachers regularly beat and humiliated students... and his argument that such mistreatment was unnecessary and wrong.

That fact notwithstanding, Mann, Henry Barnard, and others still fit into a broad movement of 19th century social reformers who held a set of overlapping traits, which in retrospect we associate with northern Whig parties, the growth of merchant capitalism, concerns about poverty and social disorder, a belief in the ability of the state to address such concerns, and an environmentalist analysis of social problems. When most educational historiography mentions Michael Katz's The Irony of Early School Reform, it is usually in reference to the vote abolishing the high school in Beverly, Massachusetts, but the Beverly story is only the first of three parts. The other two sections emphasize the rise and fall of environmental thinking in the mid-19th century. By the 1870s and 1880s, the optimistic environmentalism from a few decades before had become overshadowed by Social Darwinism and "scientific charity." Katz argued that the early promises of reformatories and other social reforms overpromised and ignored the corrupting influences of institutions and the expenses of running truly beneficial programs. (Disclosure: I'm a Katz student, or I was in grad school.)

Mann's twelve reports are the most interesting body of common-school reform writing to me, in part because there is so much complexity to them. He wanted teachers to be kinder to kids and to use more effective teaching methods. He certainly fit comfortably into the world of early- and mid-19th century Whig reformers, belonging to a temperance society and key in the creation of a state asylum while in the Massachusetts legislature. That reformist attitude was perfectly consistent with the background fear of social disorder. In a letter to a friend, Mann explained his acceptance of the Board of Education secretary position by saying, "Having found the present generation composed of materials almost unmalleable, I am about transferring my efforts to the next. Men are cast-iron; children are wax." Maybe he was influenced by religious riots in Massachusetts in the prior few years, but in any case that fear lasted until his very last report in 1848, which resonated with the news of revolution Europe and the publication of the Communist Manifesto. We had to have common schooling, Mann said, or else we would have classes bent on mutual conflict:

Now, surely, nothing but Universal Education can counter-work this tendency to the domination of capital and the servility of labor. If one class possesses all the wealth and the education, while the residue of society is ignorant and poor, it matters not by what name the relation between them may be called; the latter, in fact and in truth, will be the servile dependents and subjects of the former.

For students of 19th century history, this should be familiar; it is an echo of the developing free-labor ideology in the North. And as Maris Vinovskis has pointed out, Mann had an approach to education that approximated human capital arguments:

But if education be equably diffused, it will draw property after it, by the strongest of all attractions; for such a thing never did happen, and never can happen, as that an intelligent and practical body of men should be permanently poor. Property and labor, in different classes, are essentially antagonistic; but property and labor, in the same class, are essentially fraternal.

Educate the tykes, and they'll all have some prosperity and a stake in society. But Mann's fear is less about the South than events across the Atlantic:

The people of Massachusetts have, in some degree, appreciated the truth, that the unexampled prosperity of the State,-its comfort, its competence, its general intelligence and virtue,-is attributable to the education, more or less perfect, which all its people have received; but are they sensible of a fact equally important?-namely, that it is to this same education that two thirds of the people are indebted for not being, to-day, the vassals of as severe a tyranny, in the form of capital, as the lower classes of Europe are bound to in the form of brute force.

To Mann, poverty and conflict lurk under the surface of an industrial economy, something that only education can forestall. This was not the naked instrumentalism that Bowles, Gintis, and others claimed in the 1970s, but neither were common-school reformers unconnected to early 19th century industrialization: there were intimately vested in it and saw education's connections to it in multiple ways, including ameliorating social tensions.

In the long run, the more child-friendly views of Mann did not become a part of bureaucratic school culture. As hundreds of my students have pointed out to me over the years, common school reforms were far more successful in changing the structure of schools than in directly affecting the cultural practices inside a classroom. Some things changed, certainly: as other historians (e.g., David Tyack and Larry Cuban) note, chalkboards slowly became institutionalized in school construction, and in the early 1960s, Mann's view of an 'unvarnished' Bible reading instead of sectarian instruction had become the norm. But those were compartmentalized practices, the type of add-on that Larry Cuban has frequently noted is easier for schools to accommodate. (Note: I am dramatically underestimating the issues involved in shifting away from sectarian instruction. Nonetheless, )

One operative question that 1970s and 1980s historians wrestled with is the extent to which the growth of bureaucracy and the decline of early 19th century environmentalism were the consequence of early industrial capitalism. We have a much richer and more complex picture of 19th century school history today, and yet that question remains (or should remain) interesting. The truly large-factory model of education tried in early 19th century cities died as many schools shifted from monitorial schools to smaller, self-contained classes and choral recitation. On the one hand, one could argue that the organization of graded elementary school in many ways mirrored the less-mechanized and smaller factories in the U.S. better than they did some of the much larger factories in England, where monitorial instruction was invented. But that argument that emphasizes the parallel between graded elementary schools and factories overemphasizes the importance of larger cities, when much of early industrialization happened in towns rather than the largest cities.

And that city-town distortion ignores rural places. As Nancy Beadie's recent research uncovers, the building of schools in small towns and rural places may have been as important a part of local economic development in indirect terms as in any human capital effects. The marshaling of local resources for something as simple as church or school buildings required a complex web of economic and social relationships, quasi-private loan networks and reciprocal property relationships that helped incorporate small towns and rural places into a regional economic watershed. ("Watershed" is an unfortunately naturalized metaphor, but I'm not sure there are better alternatives: web and ecology are as inapt.) There's far more to industrialization than building schools, but Beadie's work shows the potential subtlety of schooling's effects and the relationship between economic life and formal education.

And even the subtler views skip some important topics, including the role of mid-19th century higher education, a fuzzily-bordered sector that included institutions called academies, high schools, normal schools, and colleges. And then there's the growth of Sunday schools, and the links between Northern missionary groups and Reconstruction education. So I'm feeling still a bit at sea, wanting a more synthetic interpretive history of 19th century education that wrestles with the bigger economic questions.

What is unquestionable is that Mann's kinder, gentler school didn't survive in the nascent bureaucracy that he helped build. School bureaucracies were easily corrupted into hierarchies that held low expectations for the poorest students. We have the historical example of a structurally-oriented school reformer who still held complex views about what should happen inside the classroom, views that did respect the potential and humanity of children in ways that we should not ignore. Yet his humane vision of schools lost out, at least for most of a century. The structure he imagined did not require humane treatment of its inhabitants.

So today, as we witness another experimental phase in the structure of American education, I read Marc Dean Millot's blogging with both a smile and heartache. Millot writes with passion about treating people with respect. Yet he is in favor of building the same type of structure that Michelle Rhee favors. Whose ways of treating humans would win out in that structure?

November 29, 2007

Wherein we excoriate Everyday Mathematics and also demonstrate the plausibility of letting secondary-grade students use calculators

As Joanne Jacobs notes (hat tip), some of the questions on the NYC-used and Texas-rejected Everyday Mathematics series are just absurd: if math were a color, a food, a type of weather, or a political party, ... oh, wait. We have a mashup: if your political party were a color, what would it be?

I've never seen any of that particular series, but it was mentioned in a comment thread on an entry about communicating math standards (a post from two months ago). I wonder if the most vociferous ideological complaints about Everyday Mathematics are by folks who would disagree with letting kids use calculators on tests. I'm very sympathetic to that argument from one perspective: children should learn fluency in tasks such as multiplication. (We have a copy of Bill Handley's Speed Mathematics book in our house, and I absorbed a few ideas from Jakow Trachtenberg's book when I was a child.)

But at the same time, not having calculators leaves multiple-choice problems vulnerable to testwise strategies.  I don't know which states have exams with two- and three-digit multiplication problems, but the following is a fairly easy example of finding the right answer without doing the problem.

Consider an extreme example: 47,583 x 97,621. We know three facts about the answer:

  • The last digit of the answer is 3. (Multiply last digits.)
  • The answer is a multiple of 9. (Cast out nines from the two numbers.)
  • The first digit of the answer is 4. (Estimating 4.7*0.97.)

With that information, I probably don't have to perform any calculations other than addition and single-digit multiplication (1*3, 0*7, and 4*1). 

I wrote all of the above before calling up my computer's calculator. For those who are curious, the answer is 4,645,100,043. That happens to be 9*516,122,227, no remainder.

Are these really the type of skills such tests are designed to measure? I'm not saying the skills are bad to have: estimation is very important, and casting out nines is an excellent check on answers. But there is a rather romantic notion floating around that somehow, if we buckle down and remove calculators from the hands of kids in all situations, men will be real men, women will be real women, and international math and science scores will be real international math and science scores (apologies to Douglas Adams fans).

Somewhere between Everyday Mathematics and macho attitudes towards calculators, there must be sanity.


Addendum/explanation of why casting out nines works as a check on multiplication. Let X=9x+a and Y=9y+b, where |x| and |y| are the largest possible for a and b to be integers as well. (I.e., a and b are the ordinary remainders when you divide X and Y by 9.)

X*Y = (9x + a)*(9y + b) = 81xy +9(ay+bx) + ab. Since the first two terms are multiples of 9, the remainder of X*Y when divided by 9 will be ab. This works with any chosen number to divide everything by, but since we normally work in base 10, 9 is the easiest numeral to work with. (If your species generally had Z fingers and therefore used a base Z system, you'd probably be casting out Z-1's.)

November 26, 2007

Eduwonkette on NAEP Exemptions

It's not part of her theme this week (exploring Fordham and Ogbu's "acting white" hypothesis), but her post on Lies, Damned Lies, and NAEP Exemptions is still required reading, following up on Elizabeth Green's story in the New York Sun on the large number of exemptions in New York City's urban NAEP testing.

November 11, 2007

Finger-pointing 101

Charles Barone responds to news of the delay of NCLB reauthorization with a lament that (at least in his view) unions are crowing over a political victory. He broadens the field a tiny bit and then engages in a touch of nostalgia for times that never were:

...in the education arena, there was a time when the mantra was that "politics should stop at the schoolhouse door." No one ever reached perfection on that. But it was aspired to or at the very least given lip-service. Now, however, such principles are dismissed with impunity. Politics, campaign contributions, and interpersonal feuds have taken over the entire schoolhouse and are staging a sit-in.

If one defines politics entirely as partisanship in an electioneering context, Barone might be partially right. There are plenty of examples of bipartisan support for various education policies in history. But he might be wrong even in that vein: witness bipartisan support for the College Cost Reduction and Access Act.

As important, though, is the fact that Barone views this issue ahistorically and narrowly. Since the Progressive Era, the cry "get politics out of education" has been a common rhetorical trump card that has often meant "get all the political views except mine out of education." For that reason alone, I am skeptical of various claims on that front.

In this particular context (reauthorization arguments), Barone is engaging in a fairly unsubtle form of finger-pointing: who's to blame for the death of reauthorization? I'm unconvinced that Miller-McKeon was enough of an improvement on virtually any front to rush it through. But beyond the issues, if you really want to point fingers, there are a few complicating factors. First is the distribution of blame: if one wants to call NEA obstinate, one has to explain why Educator Roundtable has rounded on NEA, why Ed Trust doesn't deserve equal blame for appearing equally obstinate, Bush for his Department of Ed appointments who allowed cronyism to poison the waters (Neil Bush and COWs, the inadequate control of conflict-of-interest issues with Reading First, etc.), etc.

Even if one wanted to get around the finger-pointing, there remains the fact that the political landscape of accountability has changed: Parents are changing their views of teaching to the test. Any reauthorization that does not address that issue will be politically risky, because most parents really do not want schools turned into test-prep factories (a term Diane Ravitch uses).

November 9, 2007

Janie's mother endorses cliff-diving

"You're too young for make-up, Sweetie. Wait 'til you're sixteen."
"I'm not Janie's mother. I don't do this to be mean."
"If those clothes fit any tighter, you would bust out every seam!"
When did my mother slip inside of me?
--- Brenda Sutton, Mama's Hands

For those of you who truly wanted a test of the famous parental Socratic question--"and if Janie jumped off a cliff, would you do it, too?"--we now have a natural experiment. The University of Wisconsin system has committed to the Voluntary System of Accountability, including standardized testing of learning outcomes (hat tip: Zach Blattner).

The Voluntary System of Accountability is a joint effort by the American Association of State Colleges and Universities and the National Association of State Universities and Land-Grant Colleges to respond to pressures for accountability in higher education. Much of it makes sense except for a rather premature (even nuttily premature) inclusion of standardized testing as a proxy for learning outcomes. Only one of the VSA "learning outcomes" tests has been reviewed by the Buros Mental Measurements Yearbook, and the one that was reviewed (Collegiate Assessment of Academic Proficiency) had a fair assessment ($$) from the standpoint of the VSA:

The validity section of the technical manual is quite brief, and the data provided are not particularly encouraging. There is no information with regard to content validity except the suggestion that each institution should conduct its own content validity assessment.... A major concern regarding content validity of the CAAP relates to the coverage of the CAAP to what is taught in college.... There are skills measures that are certainly important to the social sciences, but the work and tools of the social scientist (hypothesis generation and testing, interpretation of statistical data, the search for alternative explanations of findings, etc.) are fundamentally absent from the assessment.

Less than a few weeks after Miami Dade College's internally-developed portfolio system received positive attention from Margaret Spellings, Wisconsin is essentially drinking the Kool-Aid of poorly-constructed standardized testing as a proxy for accountability. When a young friend of mine had to choose between two schools where she was interested in a performing-arts major, she visited the schools, sat in classes, talked with students, and watched performances. Despite Kevin Carey's desire that she and her family use someone else's ranking to make decisions on college, she used the criterion that made sense: see what students are doing in the field she intends to study. AASCU and NASULGC have made a poor choice that risks the waste of millions of dollars poured into the companies that produce those tests and do little to bring serious accountability to higher education.

November 8, 2007

The Bloomberg-Klein attack on Diane Ravitch

The key clause from Diane Ravitch's reflections on the smear campaign aimed at her recently:

... if they could silence me, I would serve as an example to anyone else who criticized them.

Ravitch is is right: as a well-known, respected, and outspoken critic, she is the safest of Klein's critics. A visible attack on her is an attack all who are more vulnerable. In addition, the sad fact about attempts to intimidate people is that an unsuccessful attack on Ravitch still accomplishes part of the end, by making other critics think twice or three times before opening their mouths.

November 6, 2007

What not to do on pay-for-performance

A new report on pay-for-performance plans (by Joan Baratz-Snowden) was released by the Center on American Progress, and if you strip out all the political and other analysis, here's the gist of the report: We know what not to do on pay for performance. That's important: I'm glad to see my state described as the poster child for ill-advised impositions (we've had several), but Baratz-Snowden's acknowledgment of the thinness of research is reflected in her references, which have only a handful of refereed articles or other similarly-reviewed research papers. That's not her fault: it reflects the simple fact that there is little professional research documenting salutary effects of any pay-for-performance policies (regardless of details). Until we get something on that order, any prescriptions for what to do in a positive sense is foolhardy, let alone inserting any oxymoronic phrase like "proven" strategies into NCLB (from the Miller-McKeon draft language on performacne pay). It's a little tough to mandate "proven strategies" on performance pay when there aren't any.

November 4, 2007

NCLB reauthorization dead until 2008

One Nevada newspaper is report that the Senate Won't Take Up NCLB this year (hat tip: Michele McLaughlin). This wasn't hard to predict, to be honest. Once we get into 2008, the legislative calendar will become increasingly bogged down with other matters, and while individual legislators (including chairs) may have an incentive to move bills, an increasing number of legislators and advocacy groups will want to wait until after the 2008 elections.

In many ways, the Senate's move may make George Miller's job easier in the House, since the debate becomes more about long-term questions than short-term (and jerry-built) fixes. I'll keep my prediction from 2006: by the end of next year, growth models will look much less like a "fix" than they were at the beginning of this year.

October 30, 2007

Diane Ravitch's disillusionment

From Diane Ravitch's latest entry in Bridging Differences:

Now that the president and the U.S. Department of Education have made it their business to show that federal legislation can and will raise test scores, every release of NAEP data is accompanied by a press statement from the U.S. Secretary of Education that magnifies slight gains as huge achievements. This is troublesome. It is troublesome because the federal government's role as the honest, impartial collector and distributor of information gets corrupted when it acts as a cheerleader. And it is troublesome because it is unrealistic to expect test scores to make major leaps in a few years. When they do, one should suspect chicanery of some kind.

Sharon Nichols and David Berliner make the same point about almost all high-stakes testing in Collateral Damage.

October 19, 2007

On metaphors and people

A few days ago I commented on an Eduwonk entry about Michelle Rhee's wanting more convenient dismissal options for non-unionized central-office staff... and teachers, in part to give some positive reinforcement for the decision to allow comments and in part because there are some interesting ideas in the entry that I wanted to follow up on. (You'll have to go there to see the comments.)

But I looked back at the entry last night, and upon rereading, the last paragraph stuck in my craw:

In the case of D.C., this debate is actually larger than whether Michelle Rhee will be able to fire some people from the central office and some low-performing teachers. It's a proxy for how hard she (and Mayor Fenty) will push on the schools. If they lose this one it's an enormous setback and the wait them out game will start in earnest. If they win, they might not have to fire so many people anyway because it will be a clear signal that business as usual is over. For Rhee, a lot riding on this. Insert your own metaphor here.

While we may think partly in metaphors, I'd prefer to think of debates over the terms and conditions of work in something other than a metaphorical sense. Maybe this is because I like the second formulation of Kant's categorical imperative (the one about not treating people as ends), and if so, I'm a softie for unreadable German philosophers. But I don't think either children or adults are metaphorical vehicles. They're people, and we should talk about them as such.

Beyond that, I think Andy Rotherham is mistaken here about the use of power. I've known plenty of people in academe and the K-12 world who have paid far too much attention to symbols of power, from the all-too-important brush-off in person to stressing the importance of a particular goal for ends far beyond what it can possibly mean in reality. Power is also more subtle than the imposition of one's will through forceful means. The principal who inspires and convinces a school's teachers to work their tails off is more powerful than any petty tyrant who might occupy the same office. The true setback in DC would be if Rhee focuses more on acquiring power than in using it wisely.

Addendum: I realized a fast read of this entry may lead readers to erroneously conclude I think Andy Rotherham is into power games. That's not my argument or assumption at all; I suspect that in his own work environment, Andy pays attention to the interpersonal touch and not to imposition of his will on the people who report to him. Maybe the same should be true in school systems...

October 15, 2007

President Bush guarantees irrelevance on NCLB

President Bush has promised a veto of any NCLB reauthorization with significant changes he would interpret as weakening the bill's accountability provisions. The policy influence of this White House continues to recede.

And along with the veto threat, the president decided to misinterpret the concerns many have with teaching to the test:

People say, well, they're just teaching to test. Uh-uh. We're teaching a child to read so they can pass a reading test.

That is the type of petulant rhetoric that ignores a broad current of dissatisfaction with instruction that is largely unproductive by any stretch of the imagination. But his rhetoric is perfectly consistent with the president's general belief that reality has a well-known liberal bias.

Source: President Bush Discusses The Budget

Three shots at graduation rates

Below are three different takes on graduation rates and the Miller-McKeon discussion draft (which includes an elaborate definition of graduation rates and a 2.5% improvement target folded into AYP). This is partly a short description of my reaction to that piece of the discussion draft and partly an experiment in using different multimedia (including Youtube mashups).

Youtube video (straight)

Video with internal object tagging

Video with rebuttal

October 11, 2007

So, um, ... how about writing about MY book?

Kevin Carey wrote yesterday:

At any given moment, there's a limited amount of room in the general consciousness for books about education, and over the past few months a lot of that space has been occupied by Linda Perlstein's new book, Tested. Which, as I explain in my review in this month's Washington Monthly, is too bad.

Fair enough in terms of wishing for different apportionment of air time. Perlstein has the advantage of a mainstream (i.e., large corporate) publisher and publicist. So, Kevin (and anyone else who wishes public attention paid to other materials), why not review some recent books on accountability that are more substantive and analytical?

<whistles and walks away to work on journal editing>

October 5, 2007

Backlash against formative assessment

As reported in the Orlando Sentinel education blog, some educators are worried about the time occupied by tests given throughout the year, tests that school districts hope will track predicted scores on the spring tests in Florida (FCAT):

In plain language there are 8 full student days wasted on these tests. By the time FCAT comes around the students are burned out and I have a strong feeling that they will not be giving 100% on the FCAT. (a correspondent with the reporters)

It's hard to know how to evaluate that claim without knowing more specifics, but there's a fine line between not assessing students enough and wasting time. If you give students a five-minute math quiz every Friday for tracking purposes (apart from any unit tests), that's maybe 10 minutes for test administration a week (at least for students; this doesn't count grading). I think that's reasonable. On the other hand, I wouldn't want to see such quizzes last for 40 minutes every week unless they're very good tests. But then again, in my high school U.S. history class, we wrote an essay a week arguing about the interpretation of the topic of the week. Multiply 45 minutes times the 30-33 weeks that the full curriculum was in force (apart from short weeks and the very start and end of the school year), and that's well over 20 hours of testing in a year on that subject alone. But those were very good tests, as activities in and of themselves.

The danger of very long tests in multiple-choice formats is that they aren't very good, and the school district employee quoted above may well be right: the sheer volume of such testing can alienate students very quickly. (If you disagree, try filling out your income taxes every month as a formative exercise.) And then the longer-term danger is that such effects can undermine the use of formative assessment even when it does have a light footprint in the classroom.

October 2, 2007

The adults v. children meme of facile ed policy talk, part 375

Ruben Navarette (hat tip) captures a thumbnail historical myth embodied in the "adults vs. children" theme in accountability talk:

Public schools have, for generations, crafted an environment that caters to the needs and wants of the adults who work in the schools rather than those of the children who attend them.

As Seymour Sarason has observed, children-first rhetoric such as Navarette voices is actively hostile to reform because it fails to acknowledge some truths about schools as organizations. (Sarason contrasts K-12 schools with higher education, where I work.) Elementary and secondary schools are environments that are about the least adult-friendly you can imagine, outside sweatshops. Where else can adults be vulnerable to being hit by children, be told when they can go to the bathroom, and be told that their own intellectual development does not serve the organization's interests?

Of course schools serve multiple purposes and interests, and yes, one needs to work with that dynamic. But you don't work with the dynamic by setting off one group entirely against another, and that is what Navarette implies: It's a grudge match, teachers vs. students.

September 25, 2007

NAEP scores out

The National Assessment of Educational Progress (2007) scores are out, and here's a quick response on reading for the country and Florida:

1) The U.S. Department of Education report focuses on feel-good comparisons with 2005, when looking back further gives a different picture. Yes, in fourth grade reading for the country's children, the average scale score has gone up 2 points in the past year, but the improvement was better in the four years before 2002 (4 points 1998-2002, vs. 2 points 2002-2007). And in eighth grade, the report claims improvement since 2005, but there's been a slight average scale score decline since 2002. In general, fourth-grade reading has been on a gentle upward slope for the past decade while eighth-grade reading is stagnant. In addition, in most areas there has been no closing of the achievement gap since 1992. (The only achievement gap to show a decline either since 1992 or 2005 was the White-Black comparison in fourth-grade reading.) The take-home story today is that the nation's reading achievement provides no clear evidence that No Child Left Behind has dramatically changed elementary and middle-school reading proficiency.

2) Florida's reading achievement is mixed. There appears to be a long-term improvement in reading in fourth grade but stagnant reading scores in eighth grade since 2002. (There was a decline between 2002 and 2005 and then an increase, so the average scale score in 2007 was 1 point below 2002.) There was a slight increase in the proportion of students excluded from testing, but it's hard to know how that might have affected scale scores. Today's report also gives no trend data by population subgroup, so we can tell nothing about changes in achievement gaps in Florida from today's report.

3) If you look at Florida scores by achievement levels, the conclusions you draw depend on which grade and level you pick. Fourth grade: In both the second (proficient) and third (basic) levels, there is a long-term increase in the proportion of students achieving that level, but the second level's upward trend started in 1998, while the third level's upward trend started in 1994.   Eighth grade: There's been stagnation since 2002 no matter which level you examine, after a four-year uptick.

I'd like to get inside the data more, but the NAEP Data Explorer server is now very busy.

September 20, 2007

Vi8gra for ur tests

In response to the growing arguments over the Miller-McKeon Title II proposal (i.e., encouragement of performance pay), Eduwonk (Andy Rotherham) writes

... until education becomes a field that is comfortable with the idea of performance, it's a field that is in some trouble.

This may say something about my spam filter, but that phrase brought up images of all the potential spam about "your test score size," "plez ur schl brd," etc. Of course, neither schools nor teachers would ever spend money on charlatans promising to increase test scores...

Oh, wait.  Yeah.

Well, at least I've got another few lines for an accountability stand-up routine, and all I had to do was have a mild emergency yesterday and be away from my computer all day.

More seriously, in practice these performance-pay plans are complicated and often undercut the claims of proponents that they will reward teachers who work hard in difficult circumstances. Or, as the Orlando Sentinel reported September 9 about the Orange County (Fla.) plan,

...teachers at predominantly white and affluent schools were twice as likely to get a bonus as teachers from schools that are predominantly black and poor.

Finally, I'll repeat this until I'm blue in the face: Everyone else in favor of performance pay on principle or faith, please show me up and read the literature on goal-setting. I don't want to be better read on this than you.

September 14, 2007

The b-word and education politics

Blogger KDeRosa calls George Miller "a Whiny Bitch" (hat tip), which makes me wonder where this performative name-calling came from (see Andy Rotherham's running joke about Rick Hess, though I don't know where that came from, either, and I thought it would be more appropriate to call him Rick "Baby" Hess, since that's the interjection he uses frequently).

Fundamentally, KDeRosa is trying to slap down Miller's rhetoric ($ after today), which in turn is an effort to pivot around Spellings's claims that the Miller-McKeon draft is too wimpy when Miller pointed out that the federal DOE isn't exactly clean on loopholes.

I never knew that education politics was a macho sport. Maybe we can get it on ESPN now, if we can get a little more trash-talking? Or will they be serving diabetes-sized buckets of cheap beer at the next Washington education think-tank gathering?

Take home message: Any day of the week, I'll take ideas over name-calling. What about you?

Five-Year Plans and Ed Trust flexibility

Trust it to AFT's Michele McLaughlin to find the hidden item in the Ed Trust statement on Miller-McKeon's draft Title I language. Like many others, I had focused on the more belligerent language earlier:

Although the staff draft creates an accountability fig-leaf by preserving the requirement that all students reach proficiency in reading and mathematics by the 2013-14 school year, the heart of the law has been hollowed out.

Sting! But McLaughlin notes the following:

"Additional funding may be included, but money is not the sticking point," says [Ed Trust VP Amy] Wilkins. "The 2013-14 deadline for proficiency is a powerful disincentive to raising standards. If we are going to ask states - and students - to climb a higher mountain, we need to give them more time to get there, and this bill draft does not do that."

McLaughlin correctly notes the hint at flexibility that I (and almost everyone else) missed. In testifying at Monday's hearing, Ed Trust's President Kati Haycock largely ignored Title I to focus on teacher issues. With the exception of data issues, the only pieces of Title I mentioned in her testimony were parts related to which teachers are where.

Hmmn...

I'll ignore the positioning/politicking questions to focus on one thing: There appears to be one less visible supporter for the rigid Five-Year Plannish elements of NCLB.

September 11, 2007

Jack Jennings is right (part 1)

I didn't have my computer for part of the evening, but I did have a way to record my thoughts on Jack Jennings's testimony yesterday at the NCLB reauthorization hearing.

NCLB hearing testimony

I'm trying to find all of the written statements for yesterday's NCLB hearing. Thus far, I have the following:


Correction: All of the testimony is listed on the committee's hearing page, which also includes a video archive of the day. Hat tip: Alexander Russo.

September 10, 2007

Stalemate talk or spin?

Another bit from Alexander Russo today, stemming from an NPR story:

[This is the] first word I've heard of that Spellings is saying she'd rather have the current NCLB than the Miller draft. Saber-rattling? Maybe. But for those who are most worried about multiple measures and all the rest, it's going to be a serious consideration.

I can't believe that Spellings would play that game, because she'll be gone within two years (sooner if she's really looking for a university leadership position). Stalemate will give more time for parents to decide that test-based statistical judgments are a poor idea. Stalemale = greater likelihood of defeat, for Spellings at least. Or stalemate shoves the responsibility for defending the current structure onto Achieve and Education Trust.

Or maybe this is Spellings' way to set up a spin when/if reauthorization doesn't happen this year, much akin to a song a friend of mine wrote: I Meant To Do That.

NCLB reauthorization hearing

Since he's a former Hill staffer, I'd pay attention to  Alexander Russo's comments on today's House NCLB reauthorization hearing.

By having everyone speak, the committee pretty much ensures a certain amount of cacaphony. And by putting Kati Haycock -- one of the draft's most vocal critics -- off in the teacher quality corner, the committee sends a clear message that it doesn't like being called out.

I'm more confident of Russo's first conclusion than his second one: NEA and AFT's representatives are on the teaching/school leadership (not teacher quality) panel. While analyzing a witness list is akin to reading tea leaves a la old-style Kremlinology, maybe that's appropriate for a law whose numerical goals seem awfully Five-Year-Plannish.

Update: podcast available! (Thank my poor ergonomic awareness last week for this one...)

September 8, 2007

NCLB "Shrinklits" spin

I've just finished a substantial detail-oriented task (took about a day or more spread over the past week), and I am just too tired right now to read and talk about the Miller-McKeon discussion draft sequel, esp. Title II. I'm far too tired now to analyze the various spins that people have tried out on the Title I part of the draft, let alone Title II. I'll offer a few Shrinklits versions, and you pick which one you want to use:

  1. It was the best of laws, it was the worst of laws.
  2. All happy reforms are alike; each unhappy reform is unhappy in its own way.
  3. Quickly, word got to the villagers and everybody in the village rushed to the newspaper to see Anansi's school listed under "needs improvement." It was such a shame for Anansi, he ran away and hid in a corner of his room. That is why he is always in the top corner of rooms and why he hides from us.
  4. As someday it may happen that a scapegoat must be found,
    I've got a little list. I've got a little list
    of overblown pol gasbags who might well be DC-bound
    and that never would be missed. They never would be missed!
  5. I am, indeed, mighty world-destroying Discourse,
    Here made part of U.S. Code for destroying the school.
    Even without Statute, none of the teachers here
    Arrayed within the foolish classrooms shall stay in their professions another 37 nanoseconds.
  6. It was a dark and stormy reauthorization.

Any others?

September 4, 2007

Miller-McKeon draft thoughts

So how was your Labor Day? I spent part of mine combing through the NCLB reauthorization discussion draft made available a week ago. (My spouse and I agree that we don't engage in paid work on legal holidays, but we're allowed to do anything that's fun or citizenship oriented. So we're well-acquainted with various loopholes... call it 'gaming the system' if you wish. I called this citizenship, not fun. Yes, I did spend time with my family, with a good book, and in something creative.)

If you want to read my scribbles, you can look at my comments on the Miller-McKeon reauthorization draft (PDF, 12 MB). The first page is my attempt to cross-reference common criticisms of NCLB with pages/sections of the discussion draft that may address those criticisms. The rest are all of the pages of the draft (well, two pages per sheet) with my comments. The file is about 150 pages long, because I didn't scan the sheets I didn't have comments on. Please remember that I was (and still am) not happy with the short turnaround time for comments, so you'll find plenty of snark and a few comments that indicate I need to look up things to check whether the draft has changed language, etc.

I hope to have something more analytical within a day or three, but the first page shows my thinking that this draft attempts to address the vast majority of criticisms in different ways. That statement doesn't say anything about how well the draft addresses criticisms, but with a few notable exceptions, the draft does tackle the well-known gripes.

The exceptions (and these are important):

  1. How NCLB has been followed by the transformation of large numbers of schools into test-prep factories. (This is separate from the issue of curriculum-narrowing.)
  2. The mandate of a limited menu of fundamentally unproven restructuring options (made even more restrictive under the discussion draft).
  3. The failure to hold SES providers accountable in a timely way.
  4. The waste of the 20% set-aside provision for schools in the "needs improvement" category.
  5. The fundamentally arbitrary nature of defining levels of proficiency.

The discussion draft fails to address any of these five criticisms. These are all substantive problems, well-known to anyone who's dealt with NCLB, and the failure to even acknowledge #1 in any way shows how the Beltway conventional wisdom has its head in the sand on test-prep. But despite my somewhat cynical disappointment on these matters, to my surprise, my impression is that the discussion draft provides a reasonable basis for negotiating reauthorization. Of the items listed above, I suspect the only non-negotiable item from the inside-the-Beltway perspective is #4, and I think that is the least important issue to address in the short term (i.e., reauthorization).

August 30, 2007

Parsing Miller/McKeon

Kevin Carey takes the first crack at a moderately-detailed description of the Miller/McKeon discussion draft of NCLB II. From the few sections I've skimmed, I think he's done a good job at description. I agree on some things, disagree on others. The one solid thing I've seen was a definition of a reasonable cohort-plus calculation of graduation rates, though it says something about Washington that this took ... how many pages?  (I forget, and I have to do something else rather urgently tonight, so you can count the pages yourself. I think it's section 1124.)

Having read through Carey's description and quickly skimmed through the assessment/AYP language, my first thought is that this draft is essentially establishing negotiations over what level of failure is politically acceptable.

August 28, 2007

Eight days to read 400+ pages

Time to call BS on George Miller and Buck McKeown: the release of an initial draft of NCLB's reauthorization was accompanied by a letter saying the public would have eight days to comment. I guess that they gave us a day beyond a week to accomodate the Labor Day holiday.

So much for wanting public input.

Update: Andy Rotherham (Eduwonk) points out that the 8-day window isn't a hardship for those inside the Beltway:

First, there has been a lot of opportunity for input so far --and interest groups working on this full time for months -- so this is not really the first cut, second a lot of the language isn't new anyway and it's 400 pages of legislative text, which is different than 400 pages of prose, and third, if people have to read a little over Labor Day that's OK, the staff working on this has worked weekends all summer, one weekend won't kill anyone,...

Okay, so those who have had advanced pre-draft drafts and are used to legislative language can skim through this to spot their favorite and hated items. But this still leaves out about 290,000,000 U.S. residents who haven't had those opportunities and don't have that background. My point stands: the 8-week window is a clear indication that this is an inside game.

(And, yes, I'll squeeze in some time to read it, but I'm not doing it on Rehoboth Beach or anywhere else that the Beltway set go for Labor Day Weekend.)

Parents change their minds on teaching to the test

Since 2002, the annual fall release of results from the Phi Delta Kappa/Gallup Poll of public attitudes towards public education has become increasingly focused on NCLB. Today's release (hat tip) is no exception, and my guess is that most reporters will run with the results of the first section on NCLB and accountability.

My nomination for most significant result is from Table 14, asked of those who agreed in a prior question that "standardized tests encourage teachers to 'teach to the test,' that is, concentrate on teaching their students to pass the tests rather than teaching the subject." The majorities answering yes to that first question (in Table 13) haven't changed much between 2003 (when 68% of public-school parents and 64% of adults without children in school said yes, standardized testing encouraged teaching to the test) and 2007 (with 75% and 66% of each group saying testing encouraged teaching to the test).

While a clear majority has always seen testing as encouraging teaching to the test, American adults have changed their mind on whether that is good or not. In 2003, 40% of surveyed parents with children in public schools thought that teaching to the test was a good thing. This fits in well with arguments by David Labaree, Jennifer Hochschild, and Nathan Scovronick that a good part of the appeal of public schooling is to serve private purposes, giving children a leg up in a competitive environment. In that context, it makes enormous sense to value teaching to the test, since many parents understand how college admissions tests are related to access to selective institutions and scholarships. While 58% of public-school parents thought that teaching to the test was a bad idea in 2003, a sizable minority thought it was just fine.

That opinion has changed, dramatically. In the 2007 poll, only 17% of public-school parents thought that teaching to the test was a good thing. Fewer than one-half of one percent had no opinion, and 83% of public-school parents thought that teaching to the test is a bad thing. Adults who did not have children in school also have changed their minds, with 22% of those surveyed this year thinking that teaching to the test is a good thing.

This question was asked separately from the issue of narrowing the curriculum. While there may be some spillage or confusion of issues, I think the sea change is a warning to advocates of high-stakes test-only accountability: Few parents see benefits in sending their children to test-prep factories. Fix that consequence or see the political foundations of accountability crumble.

August 16, 2007

Multiple issues in multiple measures

In his July 30 statement at the National Press Club, House Education and Labor Committee Chair George Miller said that his plans for reauthorizing the No Child Left Behind Act included the addition of multiple measures, an incantation that has provoked more Sturmunddrang in national education politics than if Rep. Miller had stood at the podium and revealed he was a Visitor from space. While Congress is in recess this month, the politics of reauthorization continue. I'll parse the debate over multiple measures or multiple sources of evidence, and then I'll foolishly predict NCLB politics over the next month or so.

The different issues

Calculating AYP

At one level, the discussion appears entirely to focus on the determination of adequate yearly progress. Add measures and you "let schools off the hook," according to Education Trust (with similar noises from the Chamber of Commerce's Arthur Rothkopf [RealAudio file-hat tip]. No escape hatch, promised Miller when asked. Maybe if you add measures, there are more ways to fail AYP, as one reporter noted at the press conference; not so, said Miller, for we'll figure out some way so that the extra measures only get you over the hump if you're almost there. Since AYP is the largest chunk of NCLB politics, all of the talking points are familiar. In the end, this piece of the debate will get bundled into the most likely package that includes growth measures.

Teaching to the test

As the Forum on Educational Accountability has argued, as well as last week's letter by civil rights groups, narrow measures of learning tend to distort how schools behave in several ways, from narrowing the taught curriculum to teaching test-taking skills and engaging in various forms of triage. One argument in favor of multiple sources of evidence is Lauren Resnick's old one, that a better test is likely to encourage better behavior by schools, both in terms of better assessments and school indicators that penalize schools for triage. To the extent that more input dilutes the incentive for systems to attend to single indicators, that may be true. On the other hand, multiple sources of evidence by themselves will not eliminate the corrupting effect of brain-dead accountability formulas, and to some extent the resolution of the debate over AYP can blunt the effect of multiple sources of evidence. On the third hand, I suspect most of those who support multiple sources of evidence are adults and prefer some improvement over none. Including multiple sources of evidence will not eliminate the deleterious side-effects of high-stakes testing, but they should ameliorate them.

Improving the quality of exams and their cost

Connecticut's NCLB lawsuit is based on the claim that the federal government has not provided enough support for the state to develop its performance-heavy exam for all the required grades. The feds allegedly told Connecticut that it doesn't need to use the performance-heavy exams, claiming that an off-the-shelf commercial test system would work just fine. After investing state money and political capital in the performance exams, Connecticut officials were rather peeved. The Title I Monitor nailed this issue in May, noting that the argument over multiple measures is in part a matter of the quality of assessments and cost. The Monitor also noted a level of denial in the US Department of Education that should be familiar to Bush-watchers:

[A] senior ED staffer acknowledged the benefits of states using varying assessment formats compared to a single test, but challenged the idea that costs and timelines are a barrier to states developing tests with multiple formats.

And the escalation in Iraq is currently providing an environment conducive to the reconciliation of factions. Right. Officials from a variety of states and a number of players in Washington agree that NCLB has essentially stressed if not broken the testing industry's credibility and infrastructure, and the inclusion of multiple measures is part of the negotiations over how much Washington will pay for better assessments.

Reframing accountability

One doesn't have to agree with George Lakoff's version of framing to recognize that the politics of accountability are driven by assumptions about the need for centralization and authoritarian/bureaucratic discipline. These themes are obvious in the dominant inside-the-Beltway narrative about NCLB: We can't trust the states. The best argument for this position is Jennifer Hochschild's thesis in The New American Dilemma (1984), a claim that sometimes we need a non-pluralistic tool to advance democratic aims, a contradiction she saw in desegregation. But we don't have an open debate about this dilemma. We didn't have it about desegregation, and we certainly don't have it about accountability.

Instead of reflecting some honesty about policy dilemmas, the arguments defending No Child Left Behind today are generally at the soundbite level. A common metaphor used by many supporters of NCLB relies on time, such as the Education Trust's organizing an administrators' letter several years ago warning against a thinly veiled attempt to turn back the clockA step forward is another phrase that the same letter uses to describe NCLB, and Education Trust's response to the Forum on Educational Accountability proposals describes them as a giant step backward. This is an ad hominem metaphor: It says, "Our opponents are Luddites. They are not to be trusted to defend anything except their own narrow and short-sighted interests."

The other language commonly used by NCLB supporters is a simple assertion that they own accountability. Anyone who disagrees with them is against accountability. Together, these bits of accountability language imply that there is one true accountability and that NCLB skeptics like me are apostates or blasphemers. Pardon me, but I don't believe in an accountability millennium. 

To shift the debate away from accountability millennialism, critics of NCLB have to provide a counter-narrative. Both the August 7 civil rights-group letter and the August 13 researchers' letter (or the letter signed by mostly researchers) describe the current NCLB implementation with words such as discourage, narrowed, and fail. In its August 2 recommendations for reauthorization, the Forum on Educational Accountability uses the words build, support, and strengthen. The Forum and August 7 letter also use a single word to describe the best use of assessment: tool. In their recommendations, the Forum and its allies use an architectural metaphor: we need to strengthen the system while keeping it mostly intact. The criticisms directed against multiple-choice statistics aren't part of that story, though I suppose a purist would insist on that, some how described as undermining foundations, eroding under the foundation, blowing out a window, or somesuch.

I don't know to what extent the debate over multiple measures will shift debate, but it is potentially the most far-reaching of the consequences of the letter.

Where we're headed in the short term

My guess is that Miller's September draft will bless consortia of states that develop assessments with more performance, authorize funding for more (but not all) of that test development if small states work in consortia, and promise to pay for almost all of the infrastructure needed to track student data.

We will also see the true character of high-stakes advocates in Education Trust and the Chamber of Commerce. The Education Trust is now under the greatest pressure of its existence over both growth measures and the issue of multiple measures. In Washington, almost no one gets their way all the time. How people negotiate and handle compromise reveals their true character.

August 7, 2007

A conversation with Doug Christensen

The audio (mp3) of a discussion last Saturday among Nebraska Education Commissioner Doug Christensen, Maryland teacher Ken Bernstein, and me is now available online. The discussion was recorded at a session of YearlyKos in Chicago.

August 6, 2007

Framing NCLB debates

Matthew Yglesias has a point about the the details of NEA's No Contractor Left Behind flyer passed out liberally at YearlyKos this weekend. Yglesias notes that the message of the flyer relies on sloppy reasoning and is more sensationalist than sensible.

I'm worried by something else about the flyer: it's irrelevant to NCLB policy debates. As I've argued before, you can agree with the conflict-of-interest argument 100% and decide that the appropriate response is to build in more procedural safeguards against such dealings, not change the structure of NCLB. Fundamentally, it's a waste of NEA's resources to push this, and as a member, I'm ashamed at the poor decision-making.

But I think I understand why NEA staff have still diverted it: it holds a certain appeal for those of us angry with the Bush shenanigans. Mike Klonsky's entry on the matter demonstrates the appeal that the flyer holds for some.

(Incidentally, for those who know of Yglesias's relationship with Sara Mead, this isn't a devious insider plan to discredit the NEA. If I were really devious and wanted NCLB to be reauthorized intact, I'd encourage the NEA to waste even more resources on this nonsense. There are real conflicts of interest, but that's not a wise political focus if you want to change policy structures.)

And now, back to editing a 104-page manuscript for EPAA. It's a good one, but as I've discovered the efficiency of giving suggestions on accepting a manuscript, it's labor-intensive. I need to take breaks from the close reading/editing, and the blog will get the benefit of that.

August 4, 2007

YearlyKos

I'm @ the Midway airport, waiting to board. Good trip, combining friends, touristing, politics, union stuff, academics, and even a bunch of cool new t-shirts. Packed 50 hrs!

I'll blog more extensively when I get home and take care of some other tasks, but I had a good time. The session with Nebraska's commissioner of ed was the most consistently substantive, and an unexpected surprise was listening to a conversation between him and George Lakoff. More later!

P.S. No education question in the first part of the pres. candidates' forum, before I had to leave for the airport.

August 1, 2007

The celebrity-faculty fallacy

As noted in The Gradebook, Stanley Fish has now waded into ($) the Florida higher-ed funding battle.  Like many of our fellow Florida faculty, Fish says we can't simultaneously have great, universal, and really cheap higher education. Yet Kevin Carey has a point: Fish's proposed solution is a search for celebrity faculty:

Five straight years of steadily increased funding, tuition raises and high-profile faculty hires would send a message that something really serious is happening. Ten more years of the same, and it might actually happen.

Fish followed the same formula when Arts and Sciences Dean at UIC. A large part of his modus operandi was symbolic and cultural, but a substantial chunk was trying to snag Big Fish. Fish's fishing spent resources that could have been used to hire and reward wonderful and less-famous faculty.

Florida has tried the Famous Faculty Fishing expedition before, among other things with FSU hiring Nobel Prize winner John Robert Schrieffer, who later killed people while driving. His shenanigans are proof that neither universities nor famous faculty are idiot-proof. There is a point in recruiting famous people, so long as the resources devoted to such efforts do not drain the ability of an institution to reward and retain the vast majority of faculty who neither win Nobel prizes nor write best-sellers.

Florida loses 15% of its faculty every year, essentially serving as a farm league for other regions. Hiring a few famous faculty will not stop that attrition, and if it absorbs too much of the university system's resources, such a concentration of resources will prevent us from holding onto the hundreds of darned good faculty we already have.

Sara Mead: We gotta walk the walk

Last night, former Education Sector staff member Sara Mead wrote an important blog entry as a guest for Eduwonk. She pointed out that teachers unions are not the opponents to a number of reforms supported by Andy Rotherham (Eduwonk), Joe Williams (Democrats for Education Reform), and others such as charter-schools and performance pay.

Liberal education reformers need to win hearts and minds by engaging with reform-wary lefties and taking their concerns seriously--not just calling them union hacks or accusing them of not caring about kids. We need to engage in honest self-reflection and be willing to make changes in response to valid critiques from the left. We need to avoid allying with "friends" who undermine our credibility as proponents of social justice. We need to make common cause with other progressive advocates for kids--those working on health care, childcare, and juvenile justice, for instance--rather than undermining them.

We'll see whether others share Mead's perspective; I hope so, but I am especially skeptical that Williams will change his standard rhetorical approach.  Whether that cripples the organization he directs is an open question.

July 30, 2007

George Miller press conference

I'm listening right now to California Rep. George Miller's press conference previewing his NCLB reauthorization bill. My first impression is that he's overpromising. (He's also talking too quickly, but his audience is a group of reporters, and the faster he talks, the more questions they can ask.)

Here come the questions, reported here as topics and answers.

  1. Timing on the bill in the House: Miller waffles while saying the goal for passage out of the House is still September.
  2. Who would decide what a "better test" is: Miller says he is responding to claims that state tests are of poor quality and acknowledges the need for funding for such tests. He says that the bill isprovision for K-12-university-business partnerships to create tests that assess "college-ready" or "work-ready" skills and knowledge. In other words, he didn't respond to the question, other than saying he wasn't for national standards/tests.
  3. Other measures' relationships to AYP and growth: Miller talks about a college-prep curriculum, etc.  In response to a follow-up to the waffling, Miller says schools would still have to perform well on reading and math tests, and adding other measures is "not an escape hatch."
  4. Is the bill bipartisan: Miller says yes.
  5. Performance pay and test scores: Miller says some portion "has to be tied to student achievement," and he refers to "a growth model." "We would honor... collective bargaining agreements [and would not] upset those." He says he understands the reservations given the history of merit pay as an "arbitrary system of rewarding friends." He then talks about needing to creating careers for teachers that look like other careers where teachers can be rewarded for their efforts, time, etc.
  6. Choice options: Miller says it's under discussion, not resolved, about supplemental education services and public-school choice. He acknowledges the difficult of providing choice in "jammed" districts and says the bill will reverse the order of interventions. He says he is concerned with the lack of accountability for supplemental education services.
  7. English language learners and testing: He implies that tools exist for assessing the skills of English language learners, including tests in other languages (and he notes that other countries somehow have assessment in non-English languages, because most of them don't have English as the official language). Miller mentions the "p" word (portfolios). Wow.
  8. Administration response: Miller discusses ongoing discussions with US DOE and "talks nice" about his relationship with Secretary Spellings.
  9. Spending issues: Miller says he doesn't know what additional spending is required by the bill. (WHAT???) He then talks variously about "strategic investments," the lag in spending after the first year or two of NCLB, assistance to students who move, formative testing, and then blathers a bit about the need for data systems. "There's no point in going to a growth model if you don't know where your students have been." Then he mentions supplemental appropriations for education, I think gratuitously.
  10. Portfolios? George Miller says he knows that's a minefield and will have to get back to the reporter. (Okay, it only took three questions for someone to follow up.)
  11. If additional data is not "an escape hatch" on accountability, doesn't that just add more ways for schools to fail? Miller says no... and doesn't say anything substantive. He acknowledges such as system is "not easily constructed." In response to an inaudible foll0w-up, Miller says a student would have to be close on reading or math, and the system would have to be (nonspecifically) complementary.
  12. Is bipartisanship still possible? Miller says "This bill will test that." He then waxes optimistic. Laugh line: "There are no short answers from me. I'm the Joe Biden of education."
  13. Rural districts and flexibility: Miller waffles for a few minutes.
  14. Adding assessments: Miller says that's a state decision. (I don't understand the context for this answer.) Miller then starts talking about teaching to the test (finally!). Miller acknowledges that but claims schools have been successful on the narrow measures without narrowing the test. Miller says "more time on task" in reading and math isn't all that education should be, but then repeats Riley's "learn to read, then read to learn" adage.
  15. Specialized support services for health (nurses, psychologists): No.

Later today there's supposed to be a written summary of bill provisions on Miller's website (or maybe the committee website).

July 21, 2007

Reading... but not what you think

Yes, I was at a local bookstore at midnight, getting two copies of Harry Potter and the Deathly Hallows. But I'm in my office this afternoon, reading student papers. I don't get the Harry Potter until I'm done.

In other news, Jeff Solochek reports correctly that I'm now the representative of the Florida Coalition for Assessment Reform on the Florida DOE's advisory committee looking at the FCAT. I've worked with FCAR co-founder Gloria Pipkin before on a few matters, and I was flattered to receive her request. This is an interesting challenge for me, and Gloria and I took a few steps to make sure that the FCAR board was comfortable with my particular take on accountability.

There are a few things that Solochek didn't get correctly. I don't think of myself as an "FCAT critic" but a critic of the current uses of the FCAT. The conflation of the test with the policy is interesting...

The more serious problem is the way that the Gradebook's thumbnail of my portrait is all fuzzy. You can compare it to the image in the top left corner of this page and see what you think. But I understand the need for thumbnails, and I am here providing a slim, 100-by-100 portrait that should accommodate virtually any blog's storage limits:

Simpsonized Dorn portrait
(after Simpsonization)

Enough silliness.  Back to reading!

July 18, 2007

NCLB identifies wrong target for students with disabilities

Erin Dillon's short piece yesterday, Labeled: The Students Behind NCLB's 'Disabilities' Designation, is a response to criticism of NCLB as unrealistic about the achievement of students with disabilities. Dillon argues that because approximately half of students with disabilities are identified as having learning disabilities, and because of the overrepresentation of minorities in special education, the critics are wrong.  Specifically, she writes that "the majority of special education students have disabilities that do not preclude them from reaching grade-level standards."

There are several issues here:

  • Do schools use special education as an excuse not to educate students identified as having disabilities?
  • Should schools be pushed to educate students with disabilities better?
  • Can students with disabilities reach the proficiency standard identified by states?
  • Is NCLB the best current tool to prod states and schools to educate students with disabilities better?

Dillon's answer to all of those questions is yes, and the clear implication is that the answers are linked: you think the answers to all questions are either yes or no.

I disagree with that assumption. More specifically, I'd say yes, yes, sometimes, and no. Let's at least acknowledge the fictitious nature of grade-level standard; in reality, states set arbitrary proficiency thresholds, but we can agree they divide the range of achievement into two ordinal categories. Given that fact, there is no guarantee that such thresholds are plausible for all students, regardless of the help provided. NCLB critics are correct in pointing out that 100% proficiency is an unrealistic standard in itself.

That fact does not mean that schools should be let off the hook, and NCLB's defenders are correct that having different standards for students with disabilities is dangerous. Yet you have to have different standards. And in Accountability Frankenstein, I have acknowledged the implausibility of using the response to formative assessment as a summative tool. 

A plausible way out is to allow students with disabilities to take different grade-level tests under a few conditions:

  • The student then follows the sequence of grade-level tests up each year (so that if a student is taking a 3rd-grade test in 4th grade, it's a 4th grade test the following year, etc.)
  • There are negotiated limits on the proportions of students allowed to take tests 1 or more years behind grade level
  • There is research to document what proportion of students we should expect to need behind-grade-level tests, with such research informing future limits on such exemptions.

If we are stuck with mediocre to awful annual testing, we should at least do it as sensibly as possible.

Mea culpa: I misread Ms. Dillon's name as Eric. My apologies!

Update: Dillon responds.

July 16, 2007

NCLB reauthorization blog at Ed Week

There's a new Ed Week blog: David Hoff's NCLB: Act II, entirely about reauthorization.

In many ways, this is a wise focus for an education beat blog. Several other blogs have died out when key reporters have moved on, as they typically do after some (relatively short)period of time. Having a blog with a relatively clear criterion for the end of its life gives a reporter some knowledge of when the more intense activity of blogging as well as reporting will end...

... except that this one is about NCLB's reauthorization. Is Ed Week and Hoff betting that it'll happen this year?  If not, I'm guessing this will have a 30-month life. (And I'm not alone.)

July 12, 2007

Read the law, Inky editorial board!

So the Philly Inquirer's editorial board thinks No Child Left Behind should include graduation rates? (HT: NEA's NCLB blog.) AYP already does:

... [AYP] includes graduation rates for public secondary school students (defined as the percentage of students who graduate from secondary school with a regular diploma in the standard number of years) ... (1111(B)(2)(c)(vi))

Wince. Wince. Wince.

Writing or faux writing?

Recent events have brought home the fact that the Florida Department of Education expanded the performance writing exam for students in 4th, 8th, and 10th grades so that they have to answer compartmentalized grammar questions that I thought were obsolete: identify the correctly-puncuated sentence, fill in the blank with the right word, etc. As my wife points out, it's not enough that students show they can write an essay. They have to show that they can do bellwork, too.

It's clearly a double standard. Florida students have to complete sentences, while Scooter Libby doesn't.

(Inspired by an item by Conan O'Brien earlier this week: blame him.)

July 7, 2007

Appearing in Chicago on education policy...

On August 3 and 4, I'm talking in the Hyatt McCormick Place Convention Center at YearlyKos, the convention sprout of partisan DailyKos, on the panels Education Uprising: Education for Democracy (Friday afternoon, 1-2:15) and Rethinking Educational Accountability (Saturday morning, 10:30-11:30). Both are substantially the logistical work of teacherken, a fellow Haverford College graduate, an active blogger at DailyKos, and a teacher in Virginia. He has wrangled the opportunity and has been herding a bunch of cats for the last seven months.

The first panel comes from an ambigious set of DKos diarists who set about to ... rethink education. I promised to (and did, with some effort) complete some initial diaries on the history of education as an initial perspective. Following entries are partially divergent in approaches to education reform, but they are sufficiently interesting and... humanistic ... to justify the panel. (See reference to teacherken's herding cats above.)

The second panel includes teacherken, me, and Doug Christensen, Nebraska's Commissioner of Education. Christensen battled with the U.S. Department of Education to allow Nebraska to use a set of locally-developed tests for NCLB purposes. It should be a very interesting conversation.

June 19, 2007

Don't Know Much about NCLB (but have opinions)

The ETS poll on attitudes towards No Child Left Behind Act is garnering quite a bit of attention, such as from Ed Week. I love pollsters' schizophrenia, simultaneously asserting that the public doesn't know much about NCLB but then talking about public opinion blithely (or is it breezily?). 

The statement designed to gauge how much knowledge affects judgment is fascinating. Here's the description of NCLB used to assess before/after evaluations:

The No Child Left Behind Act provides federal funds for school districts with poor children in order to close achievement gaps. It also requires states to set standards for education and to test students each year to determine whether the standards are being met by all students. In addition, No Child Left Behind provides funding to help teachers become highly qualified. It also provides additional funding and prescribes consequences to schools that fail to achieve academic targets set by their state.

What's missing includes several specifics that matter a great deal in the lives of students and teachers: the AYP calculations, the sequence of prescribed consequences, the definition of "highly qualified teachers," etc. And the end of the briefing powerpoint provides clear evidence that respondents had strong reactions to the different ways NCLB could be framed. Arguments that NCLB encourages teaching to the test were fairly persuasive, and the most persuasive supportive arguments were general (the identification of schools that need intervention, having curriculum standards, and the possibility of improving the law).

Once again, the American public shows itself to be torn over education reform. Any surprise in that finding?

June 16, 2007

Flubbing the FCAT reality check

Only months after Governor Jeb Bush left office, the facade of accurate test scores has fallen away from Florida's high-stakes accountability system. After a drop in third-grade scores from 2006 to 2007, questions were raised about the rise in 2006 FCAT scores in third grade. With some pressure, the Florida Department of Education acknowledged in the spring that the norming for 2006 was incorrect.

Because third-grade test scores have so many consequences, tied to student promotion, the school grades in Florida's accountability system, and judgments of Adequate Yearly Progress for NCLB purposes, the inaccuracies are not just an embarrassment.

This problem was inevitable at some point; test scoring errors are a periodic news item in the U.S., if not in every state every year. But because we have chosen to base so many consequences on single tests, the consequences of the errors are magnified.

Scoring errors do not have the same type of consequence where the issue is a graduation exam. I am skeptical of graduation exit exam policies, but because states allow students to retake graduation exams multiple times, an error that places one test score too low requires a student to take the retest. The existence of graduation exam retesting takes some of the sting out of the inevitable errors in testing procedures. No such buffer is there for promotion policies that rely on a single test or on statistical school accountability policies.

The expansion of discussion from the technical review to the broader uses of the test is not surprising this year (would have been last year) and quite welcome.

June 13, 2007

Margaret Spellings Walks into a Bar

Okay, Alexander Russo: I'll bite on Susan Ohanian's request for a punch line.

Margaret Spellings walked into a bar.

Barkeep sighed with relief. "At least she's not going to give another speech about 'razing the bar.'"

June 11, 2007

National Standards as Policy Machismo

Alexander Russo and I agree on National (Yawn) Standards (Again) (his title), regarding last week's CEP report on state proficiency percentage trends and the NCES comparison of state proficiency cut-scores and NAEP cut-scores and also the double-report week's politics. In a different way, I also agree with U.S. Secretary of Education Margaret Spellings in her dissing of national standards. Same (in a yet third way) with the Education Sector's Danny Rosenthal. And I disagree with all of them.

Russo is right on the politics of national standards: dead for now. He's at his best in pegging the accountability politics, and since that's his focus in the last few weeks, I'll give him a pass for now on where I disagree with him. Spellings is right that the federal government does a better job of collecting data than telling the states what to do. She's wrong that the federal government does a better job of telling the states what to do when it's labeled NCLB. Rosenthal is correct that there is a difference between setting curriculum standards and setting cut scores. He's wrong in asserting that the cut scores are what is important.

The cut-score debate would be a silly one except for the stakes involved in states and the way that cut scores frame the education policy debate inside the Washington, D.C., beltway. As anyone who has taken elementary statistics should know, the division of an interval scale into several tiers creates an ordinal scale. Whether one labels the tiers Expert, Proficient, Basic, and Below Basic; Red, Orange, Yellow, Green, and Blue; or Venti, Grande, and Tall, tying values to ordinal tiers doesn't tell us anything about the tiers themselves other than that someone wanted to label them.

Confusing cut scores with rigor is an act of policy machismo, not common sense. "Yo Mama's so wimpy, she's satisfied with Mississippi's cut scores."

June 5, 2007

No one knows NCLB's effects

In its new report, Answering the Question That Matters Most: Has Student Achievement Increased Since No Child Left Behind?, the Center on Education Policy provides the obvious answer: No one knows. There is some evidence of increasing achievement on some [later: many state test] measures, but attributing that to NCLB is foolhardy.

With such ambiguous conclusions, this report will probably be spun harder than an Elvis single. The White House will somehow claim that it completely vindicates the No Child approach, some opponents of NCLB will claim that 5 years after passage, the lack of solid evidence in its favor should tip Congress towards repeal of the more forceful accountability provisions, and the rest of us will wonder why there's a press conference on a non-finding.

(Hat tip: Ron Matus.)

Update: David Hoff's Ed Week article and Eduwonk have different emphases from mine--I guess I'm jaded by reports such as this. Hoff's article does a very decent job of describing the different reactions, but I disagree with the instant-pundit perception that this will shape the NCLB reauthorization debate significantly. I think that's said the same day that every new report is issued, and if all of the pundits were right, we'd have dozens of incredibly influential reports.  But they can't all be influential individually. CEP has more than the usual gravitas, but we'll just have to see...

June 2, 2007

Bankrupt my pants and tags, NCLB news, and old-fashioned American ambivalence

One of my English friends created a not-quite-acronym in spring 2006 that adequately describes the phenomenon of having been too busy to read one's blog roll, a translation from

been away, not catching up on the flist [friends' list, one's blogroll on Livejournal], point me at it if there's anything you need me to see

to

BANCUOTFPMAIITAYNMTS

which he noted reads disturbingly like 'bankrupt my pants.' So in answer to Mike Antonucci's question of where has all the blogging gone, our collective pants are officially bankrupt. I still need to write about the 2006 scoring errors in the FCAT, a story that continues to unfold, but I have a number of other priorities. Or I've Been Away, Not Catching Up on the News Blogging until My Important Tasks Are Gone, Sorry. The acronym of that is BANCUNBUMITAGS, which I am reading loosely as bankrupt my tags.

I'm currently in ChainCafe, trying to finish a conference paper due Monday, after Tropical Storm Barry swept through Tampa. I also have journal editing to do, teaching stuff, union stuff, not to mention trying to spend time with my family. My tags are clearly bankrupt. I hope yours still have some credit.

In the meantime, here is a quick analysis of the story bandied about regarding one story that the majority of Americans would like changes to or the repeal of No Child Left Behind. Eduwonk correctly points out that "change" is fairly nonspecific. As I've pointed out before, polling over the past few years consistently shows deep ambivalence about who is responsible for addressing educational inequality and achievement. Depending on the wording of the question, you can conclude that the public thinks families are far more responsible for failures of achievement than schools, or the other way around. I'll admit that my reading is idiosyncratic: another explanation is that Americans are fairly clear on what they think and the wording is the issue. I think that puts a little more weight on question wording than is warranted, but your mileage may differ.

May 25, 2007

Holding my fingers back, regretfully

I'd love to type a few hundred words on the major testing story in Florida this week, the Department of Education's acknowledgment Wednesday that they blew the scores on last year's tests. But here's why I haven't:

  • This summer I'm teaching in Sarasota Wednesday evenings, and all that afternoon and evening I was either prepping, in class, or driving between Sarasota and Tampa.
  • Yesterday was the last school day for my children, and I picked up one child at school and then had my beautiful, wonderful children with me for the rest of the day. My daughter decided that she wanted me to drive her to the afternoon martial-arts class instead of go with her mom to the evening class, and my son had baseball practice a little later. (His Little League team is in the county's championship, having won their local league.)
  • I'm on deadline with a paper for the meeting of the Society for the History of Childhood and Youth, "Comparative Educational Attainment Portraits 1940-2002." The paper has to be uploaded by a week from today.

That, plus a few other obligations, has meant that I've chosen not to engage in link sausage or discussion.  And when I discovered this morning that I had the same attainment figures for Venezuelan natives in Venezuela and Mexican natives in the U.S., that gave me some confirmation that I need to concentrate on the task at hand.  (Short explanation: I put a duplicate file of the Venezuela SAS file in the folder where I kept the other materials and mixed them up. All sorted out now.)

(For the 2.5 readers keeping track of my research, this is an extension of last year's Social Science History Association paper, where I tried out some new estimation techniques on U.S. census data. That worked quite well, so now I'm using openly-available historical international census data.)

May 22, 2007

Stop the presses: Conservative blasts merit pay!

Fordham Foundation guy-at-the-podcast Liam Julian criticizes merit pay in a Tallahassee Democrat op-ed. Oh, wait: he's criticizing payments to students based on test scores, not teachers.

Julian argues that the distinction between the two is in the difference between the roles: "Teachers are doing a job and students are receiving a service." Two points here: First, both teachers and students are working, or so I'd like to see. My parents forbade me and my siblings from working during the school year, and they told us, "School is your job." If anything, the argument that students should only have long-term rewards for test scores implies that somehow teachers have shorter memories than students. Second, I'm not sure it's easy to stuff that commodification/reward genie back in the bottle, after Pizza Hut awards for reading, breakfasts and luncheons for honor-roll members, National Merit Scholarships based on test scores, etc.

May 17, 2007

Florida NRT scores

Today, the Florida Department of Education released the scores from this spring's norm-referenced testing in reading and math. The basic story: nothing clear. Some scores went up slightly in terms of percentile rank, some went down, and a number stayed about the same. The spin from the department is that the state's percentile rank is higher than the norming population. That's probably true of all states using the SAT-10: the Lake Wobegone Effect, to use Cannell's term.

Obvious caveat: I've never seen any educational purpose to norm-referenced testing. The only purpose of norm-referenced testing that I've heard argued forcefully and well is the cynical one, to market your district or state as "above-average."  

April 18, 2007

New podcast for Accountability Frankenstein

New podcast (finally!) at Accountability Frankenstein. It's about 4 minutes long (what will be close to the average length), and it has a trivia contest. Go, listen! You might win a copy of the book if you can answer the question correctly!

April 15, 2007

The politics of cut scores

Yesterday, the St. Petersburg Times published Letitia Stein's story describing the twists and turns of setting cut scores in Florida's 10th-grade tests. The deeper story is not the inherent tensions in setting cut scores. There's not doubt that cut scores are arbitrary. The question is whether they are arbitrary in the sense of arbitrary and capricious or arbitrary in the sense of arbitration.

In the end, it's the use that matters. To distribute scarce resources (e.g., interventions), one could justify cut scores. But when careers and diplomas are on the line, the thresholds are far harder to justify.

Update: The original post was written quickly, and one issue I forgot to note was the article's taking the norm-referenced test at face value.  In Florida, students take a test that is supposed to be aligned with state standards, but they also take a few subtests of the Stanford Achievement Tests in math and reading.  Note the word few there; one should be very hesitant to take changes (or stability!) in aggregate norm-referenced test scores at face value in any case, and using a few subscales for this purpose is ... well ... hard to explain.

March 30, 2007

Diane Ravitch: Feds should only do what it can

Diane Ravitch's Huffington Post entry last Sunday is both a solid description of what NCLB requires for accountability and a sharp criticism.  The money quote:

In the future, the federal government should do only what the federal government can competently do. Its historic role has been three-fold: one, to collect and disseminate information about the condition and progress of education in these United States; two, to write checks help schools educate specific groups of students, especially those who are poor and have disabilities; and three, to enforce civil rights laws.

The comments tend to agree with Ravitch but with more extreme rhetoric.

March 24, 2007

Accountability Frankenstein is now printed!

Copies of Accountability Frankenstein appeared in my mailbox this afternoon.  Hurrah!  You can buy it directly from the publisher

(Yes, it's available at Amazon.com, but for small publishers, I understand that Amazon is much like Wal-Mart, and I want to make clear that I'm putting the link in for convenience, not because I prefer you use Amazon.  Certainly, the publisher would prefer you use his site!)

March 19, 2007

News item: Oregon Department of Education falls off continental shelf

According to a quick Google News search, no paper outside Oregon has picked up the story about the testing disaster in the state. Not Ed Week, nor any of the metro dailies.  Wow.

For the reporters who read this blog (all 1.5 of you), um, er, ... isn't the contracting controversy and the massive disruption enough to sell the story to your editor?

Emory talk April 4 on accountability and expertise

I'll be in Atlanta in a few weeks to talk about School Accountability and the Problem of Professional Expertise. The specifics:

11:30am -1:00 pm
Room 206, Tarbutton Hall
Emory University (map)

Sociology, History, and Educational Studies are sponsoring my visit, and I'm delighted to have the chance to talk about my work.

March 18, 2007

Oregon, paper-and-pencil testing, and legal action

The Oregon Department of Education March 14 statement makes clear that the rumors of the disruption of online testing as a negotiating tactic by Vantage Learning have some solidity. As Deputy Superintendent Ed Dennis testified on Thursday, after months of conflict over contractual issues, the technical problems mysteriously appeared right after the state department of education essentially said, "No. Finito.  We have another vendor we have a contract with.  That's it." So the department faced a choice of not having testing (losing millions of dollars of federal funding) or scrambling to conduct paper-and-pencil testing. Or caving into the company's ... well, it's not blackmail or greenmail.  Maybe blue-screen-mail?

Oh, yes, and there's now a lawsuit that the state has filed against Vantage.

March 17, 2007

Oregon moves back to paper-and-pencil tests

After the disaster with Venture Systems' online testing system, the Oregon Department of Education has decided to return to paper-and-pencil testing for the rest of the year. This change is going to be disruptive; how much or the effects on accountability measures, I don't know.

March 15, 2007

Republicans are revolting

Please don't make the obvious joke: Mel Brooks had the best shot at it many years ago. But the Washington Post story on Peter Hoekstra's NCLB-emasculation bill is a telling indicator of the president's declining moral and political authority within his own party. In 2000, he convinced voters he was a "compassionate conservative," and his (deserved or undeserved) claim to being an education reformer was a critical part of that image.

At this point, he has convinced voters and members of his own party that he lacked the other, crucial C: competence. Within the party, that's a matter of political competence. For GOP members of Congress, the fallout is that they are paying far more attention to their constituents than they are to the White House.

March 14, 2007

NCLB "just like a communist country"?

In today's Washington Post story on NCLB and the 100% proficiency goal, Peter Hoekstra (R-Mich.) is quoted with the inevitable comparison to outlandish claims from the Soviet era:

"It's just like a communist country saying that they used to have 100 percent participation in elections," Hoekstra said. "You knew it wasn't true, but a bureaucrat could come up with that answer. And that's what will happen here."

I thought Hoekstra would say something about Five-Year Plans, the Great Leap Forward, and so forth, but instead it's about electoral participation.  Sheesh, guy: If you're going to try the rhetorical roundhouse punch, do it with gusto.  This wimpiness is obviously why Republicans lost Congress in November.  (No, it's not, but I refuse to let this good line go unused.)

My concern with the debate as portrayed in the Post story is that it's all black-and-white rhetoric: Either our opponents are unreasonable or lowering expectations. It's easy to say that we want students to have a "world class education" (whatever that is) or to be "proficient" (whatever that means), and it's tempting to say that instead we should be rewarding "growth" (whatever we decide that might be).  Nowhere is the hard work of deciding what we should expect from students.

Here's an exercise for the reader: take the best work of students from a sample of graduating students in the nearest high school. Get 15 members of the community (some educators, some politically involved, some small business owners, others), and have them look at the work and then answer the following questions:

  • Is the school expecting the right things from students?
  • Are students meeting those expectations?
  • If the answer to either of the first two questions is no, what should happen next?

I expect that the discussion will be long and interesting, but not easy.

March 10, 2007

Electronic Oregon testing system "crashes and burns"

In Oregon, the online system of testing students broke down this week. The Associated Press's Aaron Clark reports that the problems may be part of hardball negotiating tactics of Vantage Learning while it tries to wrest what it can from the state on the contract that's ending this year. One Oregon teacher described the situation as "crashing and burning all over the state."

If there is documentation that the problems are deliberate decisions of Vantage Learning, I suspect you'll see the company have a very hard time getting contracts. In fact, even if they aren't, the company's in serious trouble in terms of documenting capacity.

This story underlines the reality of testing for the last 40 years or more: State departments of education don't create tests.  They manage contracts with outside companies.

Update: The Oregon Department of Education has set a deadline of March 13 for making a decision about how to proceed.

Accountability Frankenstein: the podcast

The Accountability Frankenstein podcast is now online with its first episode (a reading of the book preface, with the permission of the publisher). Successive podcasts will be released approximately twice a month for the next half-year or so (possibly longer).

March 6, 2007

Enron and the social meaning of cheating

Kevin Carey's entry today at The Quick and the Ed references a comment I made yesterday to Enron and Worldcom, in a discussion about test preparation. I'm not sure if he realized he was taking my reference out of context, but he makes an important argument, even if it's only half of the picture.


  • Taking my comment out of context: This is a relatively minor point: I was arguing that not all judgments are amenable to representative data collection or require such data before being concerned about something (in this, test prep; in 2001, corporate fraud). Carey took my mention of Enron and Worldcom and went off in a different direction...
  • An important argument that Carey makes is that the popular demand after the corporate accounting scandals a half decade ago was to insist on more accountability. That's true. However,...
  • It's only half of the picture: The "corporate accountability" rhetoric since 2001 has not been about making corporations more in tune with short-term investor demands but address the gaming-the-system issue. Sarbanes-Oxley does not change the requirement that corporations have to report revenues and other financial information truthfully and transparently, but it adds additional teeth.  Sarbanes-Oxley also does not intensify any of the investor-corporation dynamics that Enron and Worldcom were responding to. So while the rhetoric was more accountability, it didn't change the larger picture and wasn't the same type as what we think of in education. (For the record, I'm all in favor of accuracy and transparency in whatever data is made available publicly.)

As far as I'm aware, apart from the different argument I made, the only reference to Enron and Worldcom fitting Carey's description is in Collateral Damage, and it's in a passage where Nichols and Berliner are discussing the difference between acknowledging a social phenomenon and excusing it. A Google search on several terms brings up exactly two pages, though maybe different keyword configurations might bring up others. 

And without the detritus of the trope, the question remains: apart from investigating specific incidents, what is the social meaning of cheating and gaming the system?

March 5, 2007

Some typical responses to concerns about test-prep

Both Eduwonk and This Week in Education are minimizing the concerns over test-prep that are illustrated by the Washington Post "bubble kids" story over the weekend. Eduwonk (aka Andy Rotherham) calls it "hand-wringing and whining," and and TWiE (aka Alexander Russo) says it's essentially revisiting the issue "whose scope and depth and negative impact remain not entirely clear or documented in this story."

For several reasons, I'm reading those as honest responses rather than spin. I don't understand the minimization, but I think they come by it honestly. On the other hand, there are several political reasons to pay attention to the issue. First, the "this is not a problem" stance wears thing pretty quickly when the reality of parents' and kids' lives look different. Reformers stop looking like reformers when they stop trying to capture problems and own the solutions to them. As far as I'm aware, no strong NCLB advocate has attempted to suggest solutions to the proliferation of test-prep. The only one I know who has acknowledged the problem is Diane Ravitch, to her credit.

The second political reason to pay attention to the issue is the forthcoming arrival of an important book on the topic. No, I'm not talking about Accountability Frankenstein (though I'd love to be proven wrong about that) but Collateral Damage, by Sharon Nichols and David Berliner. Apart from the various surveys of teachers cited in the book, it includes voluminous documentation of anecdotes.  The plural of anecdotes is not representative data, but there are enough concerns over the past 5 years that we can say those who ignore test preparation and other side-effects of high-stakes testing are ignoring reality

... unless any of those happened to say that the fraud at WorldCom and Enron wasn't a reason to be concerned about corporate misdeeds. Then at least they can say they were consistent.

Update: Eduwonk updates his entry to write "it's an issue but my point is that it's not inherent in the policy."

I agree with him that it's possible to have a school in a high-stakes system that doesn't have weeks of test-prep, and at some level it's an administrative decision to respond to pressure in that way. On the other hand, the combination has led to widespread dysfunctional behavior, and I'm not sure it's fruitful asking whether it's high-stakes accountability or the underlying system behavior that's "responsible" for test-prep. That's sort of like asking whether it's the ammonia or the bleach that's the cause of the fumes.

March 4, 2007

The bubble kids

Daniel de Vise's story in the Washington Post today is an illustration of high-stakes testing triage, describing what happened in January in Wood Middle School in Rockville, Maryland:

Principal Renee Foose told teachers to cross off the names of students who had virtually no chance of passing and those certain to pass. Those who remained, children on the cusp between success and failure, would receive 45 minutes of intensive test preparation four days a week, until further notice.

Jennifer Booher-Jennings has documented the same behavior in a school in Texas, along with language I thought was only my invention ("bubble" referring to "teams on the bubble" for the NCAA men's basketball tournament selection process), and these are only the best-documented cases of what happens across the country.

Note what did not happen: kids' being targeted for instruction based on need. Instead, only those "on the bubble" received extra attention, and it is very clear that in this school, triage was for test-preparation purposes.

Making sure that the testing environment is distraction-free?  Sure. Making sure that kids are familiar with the test format? Absolutely. Spending 3 hours a week on test preparation unconnected with the curriculum?  Nuts.

February 21, 2007

At least this test-preparation didn't steal instructional time...

According to the St. Pete Times blog The Gradebook, a small-town Florida elementary principal and several of her teachers prayed for student success the Friday evening before the state writing test early this month.

Then they anointed student desks with prayer oil.

Apart from the church-state issues involved—praying for your students on your own time is constitutionally protected, but I suspect that distributing prayer oil on the desks of students who are from different religions or no religion isn't—I know plenty of Catholic educators who believe in the power of prayer but don't think that prayer is a substitute for instruction.

February 20, 2007

Politician's logic on school reform

According to the St. Pete Times Gradebook, the Florida House education committee chair is interested in mandating single-sex classrooms and uniforms for low-rated schools: "Something has to be done to improve D and F schools. What we're doing now is not enough."

As fans of Yes, Prime Minister's Antony Jay would recognize, this is politician's logic: Something must be done; this is something; therefore, we must do it. Even if there is solid research that uniform policies are only symbolic.

February 14, 2007

Parsing growth and grossing the Parthenon

Kevin Carey criticizes Leo Casey's take on growth measures to evaluate teacher effectiveness.  Casey cited a 2003 RAND Corp. study which cast doubt on the use of student-achievement growth measures to evaluate teachers (something pushed by the Aspen Commission).


Carey makes two points:

  1. How to use the imperfect data tools we currently have available is a policy decision. To Kevin Carey, this type of decision-making includes issues such as the acknowledgment and discounting of technical flaws (in multiple possible meanings of discounting).
  2. One possible reason for the lack of evidence of growth models' ability to be used to judge teachers is resistance to their use in the U.S., except for Tennessee and a few other jurisdictions.

I completely agree with #1. It is the authority of policymakers to tangle with the technical details of policy and the implications of those technical details. Not only do I have no problems with this claim, but I argue that point in Accountability Frankenstein. But the authority also implies responsibility to do so, and I hope Carey understands that placing the marker down at this point means that he'll be holding policymakers to making reasonable judgments on those technical details: No hand-waving and displacing responsibility onto invisible bureaucracies, right? Of course, I doubt he or anyone else can point to any legislature that has set a cut-score for any graduation or teacher competency test... or bar exam, electrical contracting exam, general contractors' license exam, etc. No, I'm not arguing that legislatures should really do that, but Carey's point is all on theoretical authority and very little on acknowledging the fact that legislatures generally do displace responsibility for technical details.

The second point is something I'm going to quibble about.  Yes, Tennessee has had something called "value-added assessment" since the early 1990s, but I have yet to see any evidence that Bill Sanders' system consistently distinguishes anything more than a small proportion of teachers from the vast, vast majority (as either good or bad), and that's even assuming the validity of the TerraNova test results in Tennessee. Sanders acknowledges it, and it's partly an artifact of any multilevel modeling (which tends to swallow a good portion of the variance originally in the data).

The "resistance" point only makes sense if you're restricting us to the U.S., since the U.K. has been attempting multilevel modeling of longitudinal achievement much longer than anyone in the U.S. Go ask Harvey Goldstein what he'd say from the U.K. experience, or read his papers, such as Using Pupil Performance Data for Judging Schools and Teachers (PDF). Basic point: There's still little evidence that growth models are the holy grail either for school-level or teacher-level accountability. (Credit Goldstein for using "holy grail" to refer to various fantasies of growth-model advocates.)

Extra credit to anyone who knows why I used "Parthenon" in the title of an entry referring to Tennessee, apart from the obvious spoonerism.

The "tough love" talk begins on Aspen Commission

The spin has begun, with Diane Schemo's NYT article Tougher Standards Urged for Federal Education Law, Leo Casey's pointing out that growth models can't accurately measure teacher contributions to student achievement,  Michele McLaughlin's identifying problems with the commission's way of framing teacher effectiveness, Andy Rotherham's saying it has plenty of small actionable ideas, and Kevin Carey applauding it for pushing the idea of changing the way teacher quality is defined.

Those of us peons who didn't get access to the report until yesterday's release and who have day jobs (though I started my drive down to the Sarasota-Manatee campus at 6:20 to get here in time for a search committee meeting) will have to digest it in chunks. At least my prediction of a recommendation to include growth models came true, but that wasn't much of a risk. Here are the themes I've identified in the report:

  1. Not enough: That phrase or variants of it pop up repeatedly, with the implication that while NCLB had great goals, neither all of the framework nor the implementation was great.
  2. Effective: This is the most obvious theme that is targeted within a specific area (effective teachers and principals).
  3. Knowledge and tools: I've only spotted this phrase once as a unit, but the idea is sprinkled throughout, especially with the explicit push for formative assessment.

What's missing?

  • Any way of addressing teaching to the test, test preparation, and other unintended consequences. The introduction briefly discusses concerns on p. 19 and then quickly dismisses them with a straw-man argument about rumors (on the bottom of p. 19) and displacing the responsibility away from NCLB and accountability policies in general. The failure to address these unintended consequences is a huge missed opportunity for the Aspen Commission to gain "classroom cred" on the realities of high-stakes accountability.
  • A discussion of the set-aside requirements for schools identified as "needs improvement." In at least one set of recommendation (on tutoring), the commission's recommendations would actively make the problem worse, forcing school districts to waste the funding held in the set-aside through an entire year.

I'll have plenty to say about this report as I chug through it. 

February 13, 2007

Commission Withholds Final Report for NCLB Reauthorization - Aspen Institute

I know that's not the title of the webpage, which properly is Commission Releases Final Report...  Nonetheless, when the livecast link is dead, ... (I've left messages with two staff members and hope that they can correct the technical problem for the video archive.) 

The report is available online (though not on the Aspen Institute site).

February 12, 2007

Tomorrow's news

Looks like the Aspen Institute's NCLB commission will be presenting its recommendations tomorrow in a live webcast, though I'm not sure if the 9:30 a.m. start time is EST or MST. Update: In the comments, Aspen Institute staff note that the starting time is 9:30 a.m. Eastern time, right at the end of a class for me. Do I put on the conference at the end and invite students to watch?

What I do find interesting is the fact that the key stakeholders who will be present are the chairs and ranking members from both relevant Congressional committees.  No signs of a White House presence at the event. 

My prediction: One recommendation will be for a growth-based AYP, but that's a pretty safe prediction. See my stated concerns about growth models as a Holy Grail for accountability. I also suspect the recommendations will mention flexibility on what happens after identification of schools as needing improvement. Beyond that (especially the Title I set-aside), I have no idea.

February 7, 2007

First page proofs of Accountability Frankenstein corrected

I misunderstood some of the directions on sending back corrections of page proofs, so finishing the first set of page proofs took a few extra hours I didn't anticipate today.  But I've sent the corrected PDFs back to the publisher, so Accountability Frankenstein is one more step to publication.

Ugly arguments against NCLB

There are plenty of ways I can criticize NCLB and its implementation, but to whine that it drains resources for the gifted is one of the more disturbing arguments I've read (and today's story by Joseph Berger isn't the first time it's appeared in the New York Times).  Particularly wince-inducing passages...


Even critics of No Child Left Behind say there is no educational goal more important than helping the nation's poorly performing students read and calculate competently. But in a world of scarce resources, a balance has to be struck so that programs for the gifted are not frozen out. After all, many students nurtured by such programs will one day concoct the technology and dream up the ideas that will keep America competitive.

Apart from the blatant editorializing (which source said "a balance has to be struck"?), is there any evidence that adults who were in gifted programs years earlier are the primary source of tech innovation and that, to the extent that they are, it was the existence of gifted ed that's responsible?

Michael J. Petrilli, vice president for national programs at the Thomas B. Fordham Foundation, which supports educational research, said cuts in programs for the gifted hurt "low-income children with tons of potential who may not be getting the attention they deserve."

First, the wording above suggests that Fordham is like the Spencer Foundation (which really does fund education research), but Fordham is a think tank. Second, Petrilli implies that if gifted programs weren't cut, they'd be serving millions of poor students. No, they wouldn't: gifted programs serve a very low percentage of students, and the vast majority of "low-income children with tons of potential" are outside elementary and middle-school gifted programs. The better bet for advancing these children's interests is to improve general academics, not nurture boutique programs for a few.

Survival of gifted programs is not just a matter of money; they have long been a target of complaints that they are elitist, and violate the bedrock egalitarianism that created public schools in the first place.

Yes, Joseph Berger is right on the general criticism, but the history is off. I'm not Paul Violas, but to claim that schools were founded entirely on an egalitarian ethic ignores much of the historical research on the topic over the past 40 years.

Lost in the debate, champions of the gifted say, is that exceptional intellects need hand-holding as much as those below average, that they get restless or disheartened working with material they long ago conquered. Jane Clarenbach, public education director of the National Association for Gifted Children, said research shows that 20 percent of the nation's three million gifted students will drop out before graduating from high school.

There is a grain of truth here hidden by dunes of slipshod reasoning. The grain of truth is that there are plenty of children in school who are bored because they face no challenge at the moment, and some proportion of them get into trouble as a result. My spouse calls this group "Devil's workshop children," and we've known a few. An absolutely legitimate purpose of any gifted education program is to identify those children and make sure they don't have idle hands. (For those who know special education, this redefinition would be the gifted version of response to intervention eligibility criteria.)

But that's just a grain of truth. One fundamental problem with this "gifted kids will drop out if we don't give them extra services" argument is that when resources are devoted solely to students labeled gifted through so-called IQ and other testing programs, such programs commonly concentrate on elementary and middle-school years, long before anyone drops out of school.

Then there's the argument I made earlier: the better route to serving these students (and all others) is by improving the general education curriculum so that no one is bored or alienated.

Nancy Eastlake, coordinator for West Hartford's gifted programs, points out that so-called pullout programs are often criticized "as fluffy activities." Yet, she argued that "when you have children research a topic of great personal interest, that's solid, good learning."

Yes... and students outside the gifted programs don't want to learn about a "topic of great personal interest"?

I understand the parents' dilemma when gifted-education programs exist: do you hold your individual child's interests hostage to the larger principles? In general, the answer is going to be no. I know enough relatives who have been in gifted programs or have placed their children in gifted programs to understand the reasoning.

But many of the opportunities that draw parents into such programs should be available more broadly. One of my daughter's best friends is in all advanced coursework this year, but she was not in a gifted program. Another friend from elementary school (also not in its gifted program) shifted to advanced math in middle school. ("About time!" was my thought at the time.)

Irony: This story appeared one day after the release of the latest statistical report on the nation's largest challenging general-curriculum program, Advanced Placement testing. While I do not think AP programs are the be-all and end-all of academic challenges, their recent history demonstrates that a school can open up challenging opportunities by having counselors broaden rather than narrow the funnel in their gatekeeping role.

In summary, critics of NCLB need the "NCLB hurts gifted ed" argument like we need Charles Murray's "help." And now, if you'll excuse me, I'll return to correcting the first page proofs of Accountability Frankenstein.

Update: See my defense of boutique education, including gifted education, written a few days after this entry. Can't say I'm not finessing things...

February 4, 2007

Cheating/test-prep in Dayton, Ohio, school

It happens every so often: we hear about administrators or teachers who copy items from secure tests and then teach students using those items. (See several item comparisons.) In this case, the alleged miscreant was an administrator of a charter school who passed the items on to a consultant who did the teaching and later fired an underling who identified the problem. This behavior isn't isolated to charter schools, and as Sharon Nichols and David Berliner note in Collateral Damage (to be released in March), there is a range of "cheating" responses to high-stakes pressures. Some of the cheating is clearly self-serving (as in the case in Dayton, if the evidence bears out the allegations).

But all of the non-curriculum responses undermine the trustworthiness of test scores as indicators.

February 2, 2007

In B'ham, 2

Here is why you present material at a conference: I write a book and at the end of the process, there are inevitably some loose ends.  I go to a conference, present the material with some of the loose ends and say, "Hey, here's a loose end.  Anyone have an idea of how to tie it up?"

And this morning two people did.

I also found a way to end my talk with the words, "football will save American education." But I'm in Alabama, and I realized it was a challenge to see if I could do that ethically, truthfully, and artistically. Gauging by the reaction, I think I succeeded. But it was the feedback that was more valuable.

February 1, 2007

Off to Birmingham!

I'm off to Birmingham today for the Southeast Philosophy of Education Society meeting, where I'm talking about expertise and testing. This is a more laid-back academic meeting, and it's a trip where I get to meet one person corresponded with for a while but have never seen face-to-face. Should be interesting, and I'll report on the results!

January 25, 2007

The "Not Ready for Prime Time" NCLB proposal

Diana Schemo's article, Bush Proposes Broadening the No Child Left Behind Act, is a sequel to the many front-page stories about how the president spent half of his State of the Union address Tuesday talking about education and ...

Oh, yeah. 

I'm not sure what to make of the Bush trial balloons. (Better to try balloons than Scooter Libby, maybe?) Probably the most irrelevant was the proposal for a rewritten NCLB to allow the trumping of collective bargaining agreements. It's irrelevant not only because the Democrats control Congress but also because some states have constitutional provisions protecting collective-bargaining rights of public employees.  (No, I don't know which, though Florida is one of them.) And I don't know if the U.S. Department of Education has any lawyers on staff these days, but my best understanding is that the federal government can't use the power of the purse to rewrite state constitutions.

Vouchers are also dead for this session, and if anyone in the White House cared about charter schools, they wouldn't have mentioned the charter-cap issue, since they should know darned well that anything possibly smacking of privatization would be tainted by White House endorsement at this point.

My complete gut-level instinct here, which is probably wrong? The White House staff doesn't care a whit about education, and after writing a few lines in Tuesday's address (and nothing about higher education), they told Margaret Spellings that she could push whatever she wanted to.

And I think she just dropped the odds of NCLB reauthorization this year down to longshot status.

January 9, 2007

BCS and NCLB

Right now, the teachers who work at Boise Senior High School have to be some of most anti-formula individuals in the whole country. First, they're dinged on AYP because the school missed Idaho's targets for one population group (students with disabilities). Now, their beloved local Boise State Broncos are the only undefeated Division I-A football team in the country, but they're not the recognized national champion because the BCS formula gave the University of Florida (and several other BCS-conference teams) a statistical advantage, leaving the University of Florida as Ohio State's matchup in the national championship game last night. Florida trounced Ohio State, and Boise State ended up ranked 5th or 6th (depending on the poll).

So what's the connection with NCLB?


Here's the money quote from Dan Wetzel:
There is no way, no formula, no mix of opinion polls and computers that ever consistently can select the top two teams... No matter how hard it tries, this championship system turns up paper contenders as often as not (Oklahoma 2004, 2003; Nebraska 2001).

As every college football fan from 5 to 95 knows, the BCS formula is not only rigged for certain conferences but inherently arbitrary. The defenders of the system pretend that having a statistical formula means that the system is objective, and it certainly puts out a number. But the existence of a set of numbers does not mean that the formula is the best way to resolve a national championship in a fair, independent manner.

That fact is why most sane individuals—those who aren't tied to the most powerful conferences—favor a playoff, because in sports the way you have an independent judgment of which team is better is to play the game. Only in chess tournaments and Division I-A football do you decide who is the best by asking your TI-93.

One of the problems with formula-based accountability systems is the conflation of statistics with independence from conflict of interest. There is a tremendous need for mechanisms to hold schools accountable in ways that systems would not be unless someone is looking over their shoulders. That's a requirement for accountability independent of the political and other interests of school systems as organizations and collections of people.

Currently, high-stakes accountability addresses the need for independence through formulae: insert numbers, retrieve judgment. The argument in favor of such an approach is that it removes the inherent conflict of interest in having educators judge their own work.

The problem with this argument is that it assumes that statistics are the only way to fashion an independent judgment of school effectiveness. In other walks of life, though, we don't require statistics to remove the conflict of interest from judgments. In sports, you play a game, and the referee or umpire is the neutral dispute-resolution mechanism. (Unless you're into Fantasy Baseball, but I'm not talking about cults here.) In law, you go to court, and a hearing officer, arbitrator, or judge makes a judgment.

The key word here is judgment, that sometimes ineffable quality that allows humans to synthesize information and make a decision. In sports and courts, statistics are tools but not trump cards. Baseball managers have a wealth of statistics at their command, but I will take the late Billy Martin's judgment over any crude number-cruncher today. Judges will hear testimony from experts wielding tables and graphs, but the decisions tumble from their computers as a stream of words, not charts.

I have not only a history degree but a masters in demography, and I am not denigrating the value of well-crafted measures (not that most high-stakes test statistics deserve that description). But statistics cannot replace thought, and I am afraid that we have seen that in school accountability policies.

At least football is just a game. But No Child Left Behind's adequate-yearly-progress standard, a brain-dead mechanism that analysts knew was a problem in 2001? That's federal law.


What is it about the beginning of my semester that a Kappan article by Rick "Tough Love" Hess and Andy "Eduwonk" Rotherham is dangling in front of me while I have a gazillion things to do? World, will you stop putting these temptations in my path?

January 2, 2007

Boundaries, agendas, and meta-narratives

Kevin Carey has an interesting discussion about policy perspectives and POV boundaries in the context of a broader discussion about the role of teacher unions. (Minor point here: to his good, bad, and good and bad perspectives on unionization, I'd add look at the d***ed specifics. Also see Michele McLaughin's response, which I'll just respond to as editor of Education Policy Analysis Archives: Hey, submit stuff for peer review here! Disclosure: I'm a union member affiliated with both the NEA and AFT as well as an education [and maybe even an educational] historian.)

Carey's looking at it from a policy wonk's (and think tank staffer's) perspective: how do you move ideas?  In the long term, you try to reshape political agendas, and Carey's argument about pushing perspective boundaries around is about agenda shaping...


... which brings me to two political books on NCLB published in 2006, Paul Manna's School's In and Patrick McGuinn's No Child Left Behind and the Transformation of Federal Education Policy, 1965-2005. Despite the fact that schools are part of the unrecognized welfare state in the U.S., education politics have gotten precious little attention from grand(ly)-theorizing political scientists. I'm an historian, not a political scientist, but I think Jennifer Hochschild's The New American Dilemma (1984) and Ira Katznelson and Margaret Weir's Schooling for All (1985) were the last books that took school politics as important, serious evidence about American political structures. Manna and McGuinn's books should end that drought and spark interesting dialog.

To put it briefly (and do great violence to their arguments), McGuinn's and Manna's books are part of ongoing arguments about what shapes agendas, something that has been challenged/reworked by Frank Baumgartner and Bryan Jones's Agendas and Instability in American Politics (1993). McGuinn argues that NCLB came about with a change in policy regimes, which I read as a dominant meta-narrative about policy. To him, federal policymakers were finally fed up with state intransigence on accountability in the late 1990s, and members of both parties were happy to jump on board the NCLB bandwagon, an event that would have been unthinkable 7-8 years before. To McGuinn, the underlying story about education policy shifted over 7-8 years, a change that involved partisan politics as well as the arguments of key players in Washington. McGuinn's focus is at the national level, and most of his evidence is there.

In contrast to McGuinn, Manna explicitly focuses on the interrelationship of federal and state actors, and as a result his story is different. To him, states were active in the 1990s, and they were willing to borrow strength from the federal government  in building an agenda and let the feds borrow it from them as well, either in the political rationale for action or the capacity for action. So to McGuinn, NCLB represents the hidden strength of governors, subtly letting the federal government claim all sorts of honors as long as it served their purposes. The reverse is true, at least in theory, but McGuinn tends to write his story from a state POV, while McGuinn's POV is clearly at the federal level.

Each book has some strengths in terms of detail. McGuinn's interviews with selected key federal actors provide retrospectives that I don't think you'll get anywhere else. The description of the AYP-definition train wreck in 2001 is Manna's surprise contribution. But the larger clash is one of levels of government and emphasis on meta-narratives vs. initiative. McGuinn's eye is on the federal level, while Manna's is on the interplay between federal and state. McGuinn focuses on policy regime (what I think of as meta-narrative), while Manna's is on who has the initiative in agenda-setting.

There are some irritating flaws that I found discomforting in each book. For McGuinn, the national teacher union affiliates are shadowy figures who are recalcitrant, anti-reform, anti-accountability, but he never provides any details though he had an NEA lobbyist as an informant. For McGuinn, Shanker's activism in the late 80s and most of the 90s was invisible, Bob Chase didn't exist, and he must not have asked his NEA interviewee any hard questions. For his part, Manna relies for the depiction of the importance of education at the federal level on one of the more trite types of political-science evidence, mentions of words in presidential speeches. Someone looking at both books would wonder why Manna failed to look at legislation (which McGuinn at least touches on in some depth, even if he ignored the issue of classroom space from the 1950s). In a book devoted to the interplay of different levels, that odd reliance on symbolic speech is... well, odd.

One last thing: Neither discuss the other's ideas much, though I suspect they know of each other's work (McGuinn had read Manna's dissertation, at least). I would love to get both of them in a room, have them talk about the issues, decide what things they really disagree on and why, and get the recording online. But both books should be required readings in education policy programs, in part for the substantive background on NCLB and in part for their very different and interesting uses of federal education policy to illuminate political dynamics.

December 28, 2006

Preface passage: accountability and test contracts

My job here is not quite done: I need to write the acknowledgments and complete the marketing questionnaire. But the rest is done. Done. Done! (at least until the copyeditor comes back with 1001 queries)

The following is a short passage from the preface. I scrolled down and up fairly randomly and then snipped a bit from the paragraph:

Many states established a link between testing and school accountability by the end of the 1970s, though the local history varied.... When creating testing programs, state departments of education knew they did not have to write or administer the testing program itself, because the testing industry already existed from the decades-long relationships with local school districts. State officials thus became and remain contract managers, overseeing a private company assigned the responsibility of creating, printing, and scoring tests.

There will be more stuff on the book's themes in the near future, but that's enough for now. 

December 27, 2006

Down to the scutwork

I think the last substantive revisions are done. Left to do:

  1. Write a paragraph of acknowledgments at the end of the preface.
  2. Check the last two chapters, the last ones that I haven't yet, for awkward phrasing (such as unnecessary, the stuff you don't need, parenthetical comments).
  3. Check references against the text.
  4. Complete the marketing questionnaire for the publisher.

Then it's time to upload the manuscript and questionnaire.

December 25, 2006

Down to the last few tasks...

I hope everyone who celebrates Christmas is having (or have had) a wonderful day. For those who aren't but keep the Julian calendar system, happy Isaac Newton's birthday! For those of us who operate on the Gregorian (modern) calendar and who don't celebrate Christmas, have a great day anyway. (For most of us, then, Newton's birthday will fall January 4.)

It's raining in Tampa and for me a day to finish off this head cold (I hope!) and knock off a few more tasks for the book. The current to-do list:

  1. In chapter 5, expand section on the outcomes of high-stakes accountability
  2. In chapter 5, tie child-saving better to the overarching argument, perhaps switching it to appear later in the chapter
  3. Cite in-press MS and other materials on consequences of high-stakes testing; find Jones, Jones, & ?.
  4. Insert more from Frankenstein itself
  5. Check references against text.
  6. Complete marketing questionnaire

If you're comparing this to the prior entry, I've completed several pedestrian note-checking tasks and still have left one substantial chunk of writing left along with one organizational issue, two polishing tasks, a required bit of drudgery (which will wait until I can get the 67-page references section printed at work tomorrow), and the task that most academic authors stink at (can I remember everyone and everything news of this should go to?).

December 24, 2006

Expertise tackled; the world is next

Whew! I've split up the former chapter 2 into two chapters, expanded the analysis of expertise and the political drives for test-score accountability in the new chapter 2, and polished the new chapter 3 a bit. The rest of the revision checklist is now much narrower:

  1. Change chapter summaries in preface and new chapter 6
  2. Expand section on the outcomes of high-stakes accountability in chapter 5
  3. Cite in-press MS and other materials on consequences of high-stakes testing; find Jones, Jones, & ?.
  4. Insert more from Frankenstein itself
  5. Explain technocracy in the preface
  6. Be clearer on how the goals follow A Nation at Risk in chapter 4 section, "From expectations of schools to..."
  7. Better citations for historical material
  8. Check that each excerpt is < 150 words
  9. Complete marketing questionnaire
  10. Tie child-saving better to the overarching argument in chapter 5, perhaps switching it to appear later in the chapter
  11. In Chapter 5, watch the conflation of working-class with immigrant
  12. Chapter 6, explain Lake Wobegon phenomenon better (more?)
  13. Chapter 6: revise "no illusion" passage
  14. Look for substitute Webster quotes in 1965 volume.

Time for me to head home (I'm currently in a chain cafe). We don't celebrate Christmas, but I think my family will want to see me sometime today.  For those who do celebrate it, have a safe and merry Christmas.

December 21, 2006

Theodore Porter, Trust in Numbers, and picking the right fights

On the way to and from my mother-in-law's house today, I finished Theodore Porter's Trust in Numbers (1995). (I should say that I finished it while my spouse was driving!) While I was distraught this morning at Porter's style, I slogged through, a matter which I knew was important. And the book has plenty of food for thought. But the (dis)organization remained problematic, and not surprisingly, the book reviews varied fairly dramatically in terms of how they read the main argument. In particular, the reviews in the Economic History Review and Technology and Culture read Porter's book as less deterministic than I thought he was in the end.


That determinism is a critical question. Is autonomy such a driving force that weak disciplines and administrative apparatuses under political threat will resort to statistics as a buffering mechanism to protect autonomy, even while higher-status disciplines or bureaucracies can still turn to networks of trust and rely on elite status? If so, then test-score accountability was inevitable, as Stephen Turner suggests. But I think the details in Porter's book belie that argument of virtual inevitability (which Porter makes clear, I think, in the second-to-last chapter). As Porter notes, weights and measures have historically been more negotiable than we assume today, and his description of the origins of the Chicago futures market is a fascinating tale of contingent events. There was nothing inevitable in it.

We don't have to look at NCLB and debates over NAEP to see how flexible truth is and the porous factual claims that permeate education. Evidence of how negotiable education "facts" are lies in the current debate over measuring graduation—or, as is more common, mismeasuring graduation. There is no agreement on how to measure graduation, the sides are frequently identified as biased in terms of other issues (support of public schools v. vouchers), and even the terms of the debate are vigorously argued, an argument that suggests that education facts are not completely behind the boundaries of expertise.

The debatability of education facts suggests another way of looking at accountability: given the fact that accountability systems will produce arguments, maybe one way of thinking about them is to structure the system so you get the argument that you want. If proponents of high-stakes accountability are sick of educators responding to accountability by blaming parents, maybe they should look in the mirror: didn't the system predictably set up that argument?  And if so, what's the argument that you want to have? 

Maybe it's because my father grew up on Flatbush Avenue, but I don't think there's anything wrong with a good argument, as long as it's about the right things. Do we really want to keep arguing about whether the scores mean something or who's responsible? I can predict continuing arguments precisely on these issues for as long as accountability is based entirely on test scores. I know of one commendable accountability mechanism—Rhode Island's site-visit system—that produces enormous discomfort in schools that are judged wanting and some arguments, but I think they're arguments worth having, about the nature of the school, what isn't happening, and what could be happening. Those arguments can only happen if you get beyond test scores.

Kids, don't write these sentences at home

I was wondering why I just couldn't get far in Theodore Porter's Trust in Numbers (1995) until I came across the following (on p. 15):

Lorraine Daston instances Charles Dufay, a French researcher of the 1720s and 1730s, to epitomize a different experimental ideal.

So it's not my lagging attention due to a virus. It's just subtly awful writing. Porter's book is looming large in the revisions of Accountability Frankenstein, but his frequently elliptical (dis)organization and style make the reading painful.

December 20, 2006

Danziger, Contructing the Subject, and the dangers of following the trail

Thanks to a trail of other readings, I'm now delving into Theodore Potter's Trust in Numbers (1995) and Kurt Danziger's Constructing the Subject (1990), both relatively dense books discussing topics on the edges of my concerns with testing and professional expertise. While reading the page proofs of a book that will be coming out in just a few months, I've already had one basic assumption rattled (it's a minor point in the book, David, but it forces me to rethink the question of psychometrics as a profession and how we treat teachers). Then I picked up Stephen Turner's Liberal Democracy 3.0 (2003), about whose provocative arguments about expertise and democratic political theory I've written elsewhere (on Education Policy Blog).


So in this trail of expertise, professional history, and our social trust in test scores, I've come to two very different chunks of the literature. Theodore Potter has written two books on the social history of statistics, one on The Rise in Statistical Thinking (1988) in the 19th century and a second one (Trust in Numbers), which is a little more broad and ambitious in its argument. I've left that fairly early to tackle the Danziger book, which is a brilliant little book that rocks you with a gem of insight every chapter. Danziger argues that Wundt's laboratory circle in Leipzig both established the concept of subject and also became an alternative view of subject (where the experimenter and observer frequently exchanged roles) to the later, more common notion of subject as of a different social status and knowledge position than the experimenter (and report author).

One point that is both suggestive and devastating is Danziger's suggestion that schools may have influenced the path of psychology as much as the other way around, for three reasons: first, schools created a huge resource of subjects once those became defined as a separate social group from experimenters; second, schools became a target of marketing of applied research; and third, in their dramatic expansion in the late 19th century and the organization around bureaucratic forms (graded multi-classroom schools, for example), the new bureaucratic school systems both produced and consumed huge numbers of the type of population statistics that are akin to censuses, creating the idea that one could capture the sense of schools and children with a sort of social census. That statistical consumption may have shaped psychology's turn from reporting the introspective observations of individuals to the reporting of aggregate statistics, what Danziger calls a "psychological census."

In turn, this broad (and ironic) argument brings me to two other issues: John Dewey and Daniel Calhoun. Most people in education describe Dewey as a sort of demi-god, creating a humane vision of education. What my colleague Erwin Johanningmeier argues is that Dewey used schools as a way to inform his writings on pragmatism more than attempting to define what schools should do. I suspect this may be a matter of different perspectives on the same writings, but Johanningmeier's argument parallels Danziger's.

The second is that Danziger cites Calhoun's The Intelligence of a People (1973), of which Dorothy Ross aptly said, "Any reader who spends a few minutes with Calhoun's ... book will learn that it is infuriatingly difficult of access." She also noted, again accurately, "But it will repay the reader's persistence." So I need to delve back into that (which I haven't touched since grad school). There are two copies on the shelf in my library: BF431 .C256. Please don't grab both, as I need them. 

December 18, 2006

So there's debate among NCLB critics? Big deal...

The reaction of Mike Antonucci to news that Susan Ohanian calls the NEA's leadership distancing themselves from the "dismantle NCLB" petition "bullying"? "[F]un watching NEA and Ohanian thrash it out for themselves." (Hat tip: Eduwonk.)

Maybe Antonucci is unfamiliar with something called deliberative decisionmaking, which involves discussion, decisions, and acknowledgment of dissent, but the NEA has had that on NCLB, in its last annual meeting. The NEA is acting consistent with the decision of its 9,000-plus delegates.  Ohanian doesn't have to like it (she obviously doesn't), but I don't think that disagreements are a problem in public debate.  We call that democracy in this part of the world.

Disclosure: I've already commented on the petition.

Miami Herald "legacy" piece on Jeb Bush

Marc Caputo has a smart article, Exam has changed how teachers teach, on the FCAT. He points out the double standard on accountability (how voucher schools are not accountable in the same way) and how Jeb Bush can be considered an education policy hero if only kids graduated high school at the end of fourth grade.

Note the letter in response by John Kirtley, Tampa millionaire, Republican contributor, and now vice chair of the Alliance for School Choice.

December 14, 2006

The problem with the McGraw-Hill conflict-of-interest argument

Since Stephen Metcalf's 2002 article on family ties between the McGraw-Hill publishing company and the Bush family, it has become a minor cottage industry to assert that the (quite possible) conflict of interest is evidence of the inherent corruption of No Child Left Behind. (See the Students against Testing, DailyKos diary entry, Jim Trelease, and Jim Horn pages on this as examples.) The same narrative has been played out with the Inspector General's report on Reading First and conflicts of interest. (See the response by Jim Horn, as an example.)

I don't think anyone outside a small circle will contest the problems with conflicts of interest in education programs. But I also don't think that basing criticism of accountability on conflicts of interest will work. Conflict of interest stories are a recurring theme in the politics of liberal democracies, and there is a standard solution: require arm's-length decision-making. There is nothing inherently in the existence of a conflict of interest that dooms the program touched (though the stench can force restructuring or at least a fig-leaf version of reform). Gary Stager's column on Reading First illustrates that. He's disgusted with what he sees as corruption, but it's all within the normal liberal parameters of wanting clean policy that's based on science.

Note: I've put up a few more extensive discussions of testing and their role in a democracy over at Education Policy Blog, which is up for a 2006 Weblog award (vote for us!). See part 1 and part 2 of that longer discussion.

December 13, 2006

Peter Boyle, RIP

Wouldn't you know it: I'm revising the manuscript to Accountability Frankenstein, and Peter Boyle dies.

For those who don't know, he was The Creature in Young Frankenstein.

December 12, 2006

NCLB odds

Eduwonk Andy Rotherham lays out the first set of NCLB reauthorization odds this morning. He and I agree that the most likely scenario is everyone's punting until after the '08 election. But he puts straight reauthorization as more likely than a dramatic revision by what he calls NEA's "rewrit[ing] the law to its liking" or a "Conservatives['] rollback [of] the federal role in elementary and secondary education." I think all of that is equally unlikely, perhaps because Rotherham and I have different political sources (and his are much, much closer to "the action").

If the stars line up, reauthorization may happen in 2007 with minor revisions, probably adding a growth component and changing the consequences of AYP failure. But that's looking unlikely to me, primarily because the Democrats will have a harder time finding internal unity on this issue, and there are other issues that have higher apparent political salience.

What is more likely is that we'll end up with a two-year discussion of NCLB in Congress, with no reauthorization in sight once we hit 2008 but with some bipartisan agreement on some substantial changes after the 2008 election. If we get into 2008, I suspect The Powers That Be (of various sorts) will realize that growth models don't solve the underlying political questions. Congresscritters on both sides of the aisles may well acknowledge the need to revise or toss the current AYP formula and probably invent a whole new mechanism based on some state's version of accountability. The real wonkish readers here may recognize an opportunity for strategists to start planning a la John Kingdon's triple-stream approach.

None of this necessarily addresses what should happen, of course, though it affects the likelihood of what should happen happening.

November 25, 2006

Draft of Accountability Frankenstein completed

The manuscript is done, or at least the initial drafting is done. It's a relatively short MS—approximately 68,000 words, excluding references. In revision, that will probably stretch to 70,000, but it's unlikely to get much larger. (December 28 update: The revised manuscript is about 73,000 words, with 12,800 words in the references section.) In contrast, Jennifer Hochschild and Nathan Scovronick's The American Dream and the Public Schools (2003) is 97,000 words long, excluding references. Their book has about 200 pp. of text.

This is the last call for anyone who wants to read parts of or the entire manuscript. If you have the time to read it and return thoughtful criticism to me by December 8, you will have my eternal gratitude and a copy of the published book. Just e-mail me with your interest.

I thought I'd finish yesterday, but I had a tough nut to crack with readings on expertise. Is the problem of our reliance on test scores and policy autopilot one of kowtowing to experts? It didn't match up well to the concerns of the existing literature on technocracy and professional expertise (whether critics such as Frank Fischer or more measured analysts such as Stephen Turner, who is a fellow faculty member at USF). I've decided that the authority given psychometricians is less the cognitive authority that Merton and Turner discuss than a referential authority ("the Thing exists, and there are Experts who can handle it") and the blithe assumption that test scores mean something concrete. While I have some concerns about the closed process, the greater danger is the assumption of the reality of test scores beyond a limited heuristic purpose.

But my conceptual wrestling is over, or at least the first round. Sometime in the next few days I need to start cleaning up the references section, which is currently 66 double-spaced pages long (about 14,000 words). It'll get a page or two longer as I flesh out some sketchy citations, but most of it is wrestling things into APA shape. And then on to substantive revisions.

November 24, 2006

Brief gloss on pragmatism vs. sociology of knowledge

I'm now done with drafting everything in Accountability Frankenstein except one section, on the problem of expertise in society. For a variety of reasons, I've been thinking about this in connection with John Dewey, Walter Lippmann, Karl Mannheim, and William James. And I've figured out the difference between the sociologist of knowledge and the pragmatist philosopher.

The sociologist of knowledge said, "Science and other endeavors of discovery are inevitably bound up with the social context in which they arise. Even the basic questions are interwoven not only with the state of knowledge at the time but the circumstances of the researchers and thinkers themselves."

The pragmatist said, "Yes, and that's good."

The sociologist of knowledge shook her head and said, "You don't understand."

The pragmatist said, "Yes, I do, and in case my reasoning's shoddy, I've got the epistemologists just in case."

The sociologist of knowledge stared back. "That's an unfair trump card. Epistemologists are just rationalizing what people do anyway."

The pragmatist smiled and whispered, "Well, yes, of course. That's what we designed them for. And isn't your whole modus operandi a trump card to put social processes above science?"

The sociologist of knowledge narrowed her eyes. "The only reason why you said that is because you think your mother never loved you."

November 22, 2006

More on teaching to the test

Jal Mehta (guest blogger for Eduwonk) continues the blogule discussion of teaching to the test with a lament about the inability to follow through on the New Standards Project desire for high-quality tests that were not easily susceptible to test-prep.

For some reason, this strikes me as very similar to the laments I hear about the death of trolleys in mid-20th century L.A., except that nostalgia for the heady days of the early 90s seems, well, a bit misplaced. For that matter, so is the nostalgia for the Red Cars, which were geared as much to opening up the San Fernando Valley (see a system map from 1910) as to mass transit, and probably more geared towards speculative land development.


There are a few things that are important about the proposal to create demanding performance tests:
  • When tried, performance tests have been expensive, and the psychometric qualities controversial, to say the least.
  • Very quickly, states figured out they could 'hybridize' the idea (to use Larry Cuban's expression) by incorporating some performance items in a test that would remain mostly multiple-choice, satisfying the demand for some performance items while lowering the cost and the statistical problems (in the eyes of such officials); here, Florida was a leader.
  • Where it was tried more extensively, it's unclear to see how the existence of performance tasks dramatically changed the dynamics of high-stakes systems. Of the states that tried performance items, one was killed for political reasons (in California). The history of Kentucky's KIRIS system gets read in many different ways, but that was a substantial package of reforms, where pulling out the test format and other characteristics is hard to justify, analytically.
  • The proposal for demanding performance assessments demonstrates that focusing on tests puts the cart before the horse. Suppose we established a mandatory history test in Florida that would be essay-based. Take any item in the national history standards (which are essentially essay prompts), and make a student write on that for an hour. (Example: Evaluate how minorities organized to gain access to wartime jobs in WW2 and how they confronted discrimination.) That's meaningful and demanding, and getting students to the point where they could succeed on such a task might require most of a decade in terms of changing the curriculum, textbooks, and history-class routines. But the sequence that Tucker and Resnick suggested would bollix that up—we are so focused on short-term changes in test results that everyone would assume that lousy scores for several years means that history teachers aren't changing things, even if there's a deliberate effort to change practices.

I'd love to believe the New Standards theory of action, because it's comforting to think that we just have to craft the right test. Nor am I saying that we should be happy with what currently exists! But I'm afraid the New Standards story is a bit of a fairy tale. You can't just tweak the test and expect the rest to follow.

All right: enough procrastinating. Time to get back to the last chapter and describe how we can save the world...

November 21, 2006

The last chapter problem

In Improving Poor People (1997), Michael Katz wrote:

Historians and other social scientists who offer interpretive accounts of social issues always face a "last chapter" problem. Readers expect them to extract clear lessons from history, offer unambiguous recommendations, and foresee the future. My standard response—my role is to analyze and explain the problem; I have no special expertise in devising solutions—although honest, rarely satisfies. When historians tack on a set of conclusions, more often than not they appear utopian, banal, not very different from what others have suggested, marginally related to the analysis that precedes them and far less subtle. The reason, of course, is that no set of recommendations flows directly from any historical analysis. Understanding the origins and dimensions of a social issue can lead in very different policy directions. (p. 7)

As someone (Groucho Marx?) once said, I resemble those remarks! I've spent 80% (or more) of my current book-in-progress analyzing high-stakes accountability from different perspectives (often historically rooted), and I'm now in the last chapter. Do I repeat what Katz said and wash my hands of any specific recommendations? I can't. I'm too deeply into this and, what's more important, a significant part of my argument is that test experts have no business trying to decide what a democratic process should craft.  To say that I demur because I am not an expert would be hypocritical! So I will take a citizen's and not an historian's right to make recommendations, however rooted they are in my sense of humanity's quirks and the institutional and political legacies we have inherited.


However, Katz's warnings about utopianism, banality, and the disconnect from the rest of the book are well-warranted. I have no magic charms against banality, but I can take a few steps against the others. After returning home bleary-eyed after 10 pm last night, I told my spouse I had just spent a few hours skimming over the chapters already drafted so I could be consistent.  She nodded, "Readers might have a few concerns if you're essentially making a new and completely different argument in the last chapter." 

And to make sure that I don't step towards utopianism, I will describe three utopian accountability mechanisms that will not appear in the book. Correction: One does appear in the book, largely to explain why it wouldn't work. (Why these are utopian is left as an exercise for the reader.)

  • A recursive system based entirely on formative assessment: teachers analyze student data formatively, then principals analyze teachers formatively using how teachers use data formatively, and those over principals analyze principals formatively using... you get the picture.
  • A high-tech way of finding out what students are working on: sample the written work of five students in each grade daily.  Have a random draw of students in the morning, get them to turn in the previous evening's homework and anything completed that day, cover up their names and the teachers' names, scan their written work, and upload it to a central server that's entirely public.
  • TeacherCam: A video camera in every classroom and in the hallways, allowing the public to see what happens anywhere in any school.

Now that I've gotten that out of my system, it's time to write about something that's workable.

November 19, 2006

NCLB prediction: 20% impurities

Take a gander at the NCLB reauthorization recommendations from the National Association of State Title I Directors, compiled early in October. Most of the recommendations are a standard opening gambit, but then there are recommendations 4-6:


  1. Districts may pay for reasonable administrative costs from the 20
    percent set aside to implement supplemental educational services
    (SES) and public school choice.
  2. SEAs receive additional federal funding to fulfill their
    administrative responsibilities for public school choice and
    supplemental educational services (SES).
  3. Provided the LEA is in compliance with parent notification
    requirements and there is a good faith effort to expend the funds,
    with SEA approval, districts are permitted to annually exceed the
    15 percent carryover by the amount of unspent Title I funds set
    aside for public school choice and supplemental educational
    services (SES).

Then look at the National School Board Association's NCLB Recommendation #8: provide school choice and tutoring only to the subgroups not meeting AYP.

Now what in the world is going on here?  From the education blogule, I thought that the big fights in NCLB are going to be about growth models, ELL testing, instructional-level testing for students with disabilities, etc. And they are, but not in the next two years. I'll stick to my (and others') predictions that NCLB will not be reauthorized until 2009 but will be extended until then, as long as there is a tacit agreement between Congress and the White House about not changing the statutory language on AYP. So I'm fairly sure that's what will happen, in terms of the apparently major stuff. I may not be 99% pure or sure, but I'm close, at least on the second adjective.

On the other hand... there has been considerable griping about the rigidity of the Needs Improvement intervention list, and even a (non-statutory?) waiver allowing a few places to reverse tutoring and choice. So the White House will probably accept an appropriations extension with more flexibility in the list.

Given that fact, and depending on the signals given out by the White House, school boards and the states might push for what looks like minor statutory changes but would dramatically change the consequences of NCLB:

  • There will be heavy insider negotiations about the 20% set-aside restrictions. (What? you ask.  Explained below...)
  • There will be heavy insider negotiations about targetting choice and tutoring, aiming directly at the group identified as not meeting AYP.

In both cases, the federal law creates huge logistical problems for school districts.  Consider this year's pupil-allocation letter from my state's Title I director to the local district Title I directors. It explains how the 20% set-aside rule works: School districts must set aside 20% of Title I funds (or an equivalent amount) to be used for various consequences, including "corrective action." But they don't know how they're supposed to use that money. 5% is supposed to go for tutoring (for students signed up for it), and 5% is supposed to go to transportation for public-choice plans. But the district cannot plan for that money until well after the rest of the budget planning process is over, because each district must wait for parents to exercise options. For districts whose public choice and tutoring provisions are underenrolled, that set-aside leaves a huge chunk of change sitting around, doing nothing. Title I directors are probably aghast at the waste mandated by federal processes. They could imagine plenty of ways to spend that money... but by the time they've waited for parents, often there's nothing they can do productively with the funds, but they can't carry over that money to the next year beyond a 15% limit. Of course, districts which have huge enrollment for choice or tutoring find that they are not paid nearly as much by the federal government as the programs cost. It's a horrible Catch 22.

My guess is that school officials will push for minor tinkering in statutory language to get rid of the 20% problem and to help districts whose first corrective programs (choice and tutoring) are oversubscribed. Will they be happy if NCLB is reuthorized just with those changes? Probably not. But they'll be happy NCLB isn't getting worse (yet), and if they can start a 2009 reauthorization having solved the 20% set-aside and oversubscription problems, many school officials will be relieved.

Carrots and sticks in education

I'm now more convinced than ever that discussions of consequences in K-12 accountability systems happen in bunkers, without enough understanding of how things work together (or don't).


But first, a digression about literature searches. I'm in the middle of writing chapter 4 of Accountability Frankenstein, a chapter tentatively titled "Consequential Thinking," about the consequence systems that make accountability high-stakes. I have my own historian's spin on this, but the general arguments I've read in various places are heavy on management jargon, moderate on cognitive-psych jargon (extrinsic vs. intrinsic motivations), and very thin on research in fields that should take this on (industrial-organizational psychology and industrial-relations research). There's something about this impression that struck me as wrong, so tried to find anything I could in "I-O" psychology to bring to bear. (Odden and Kelley's 2002 book discusses theories of motivation, drawn from I-O and other realms, but they're an exception, I'm not sure how many policymakers have that book, and I have a few concerns about that chapter that I won't get into.)

Consider, for example, goal-setting theory, which posits that establishing goals is crucial for motivating most people on their jobs. But how do I search for that in the literature on teacher pay or school probation? Goal setting is a generic enough term (in contrast to expectancy theory--see one MBA-ish website for a brief explanation, and then please me a better URL in comments!), and looking for that was frustrating. 

But there's a marvelous trick to use: citation indices. The most well known is ISI's Web of Science, which includes the Science Citation Index and the Social Science Citation Index. There's also a citation tracker in Google Scholar. The most common use of citation indices some years ago was to find out who cites whom, in terms of influence on a field of study. (Junior faculty going up for tenure generally do this in the year before going up.) But there's another use: looking for literature that cites a germinal article. The most recent "classic" article on goal-setting theory is Locke and Latham's "Building a Practically Useful Theory of Goal Setting and Task Motivation" (American Psychologist, 2002). Is there anything with teacher pay, merit pay, performance pay, or similar terms, which cite this article?  Yes.  Marsden and Belfield's Pay for performance where output is hard to measure: the case of performance pay for school teachers (a book chapter published this year) cites the Locke and Latham in the literature review.

The fact that few in the area cite this work (or a 1990 book by the same authors) is troubling.  There's the Odden & Kelley book and there's also a 1998 Heneman article on Charlotte's school-based performance pay system. Heneman's use of the 1990 book is especially problematic:

Moreover, borrowing from goal-setting theory (Locke & Latham, 1990), characteristics of the student achievement goals may help shape expectancy perceptions in terms of intensity, focus, and persistence. In particular, goals that are perceived as meaningful, clear, specific, and challenging will foster high expectancy perceptions by teachers. (p. 45)

Unfortunately, there's also a body of literature contrasting goal-setting for simple tasks (i.e., can unionized truck drivers get higher loads on their trucks, to save a company time?) and the more complex tasks involved in teaching. My understanding is that while goal-setting is important, its effect depends dramatically on context. Distant goals are less useful than proximate ones, and goals involving complex skills may be better to combine with specific goals targetting acquisition of strategies to accomplish the distant, slightly vague goals in a complex area.

My conclusion: the performance-pay advocates are even more in isolated bunkers than I expected, with folks not looking closely at relevant literature bases elsewhere. And that doesn't even touch the question of whether anybody discusses carrots and sticks in combination.  But that's for another entry...

November 15, 2006

Test-prep debate

Craig Jerald's comments on test-prep sparked Matt Yglesias's discussion and a debate about test prep, and then Jerald's follow-up.

I don't think there's that much disagreement about the facts: some schools respond to test pressures in inappropriate ways by narrowing the curriculum or engaging in instruction that they'd stop (and often do) right after testing. The question is why and how to get better responses.

Some advocates of high-stakes testing say that the problem lies with administrators, not tests.  To some extent, that's correct: the existence of a test does not require drill-and-kill responses. But on the other hand, as Yglesias notes, you can't expect the existence of a test by itself to change dysfunctional behavior into functional behavior. Schools that are truly in crisis are in that way not generally because its teachers are lazy but because many principals and teachers don't know how to teach any better.

Compounding this problem (and the failure of high-stakes policies to acknowledge or address it) is another simple, nasty fact: the test-prep genie is out of the bottle. Not only do people generally acknowledge that somewhere, somehow, people are getting higher test scores without learning or knowing more, but they think it's a good idea! Ask parents if they think high schools should teach kids how to score highly on the SATs and ACTs, and the answer will be, Heck, yes! I want my son/daughter to get scholarships, and it's the responsibility of schools to prepare kids for college. Because we often associate schooling with the private interests of getting ahead (i.e., social mobility), and because test-prep is framed as an activity that benefits individual mobility, many parents and others view the job of schools to prepare kids for testing.

Regardless of the origins, the long and short of it is that parents and others see test-prep as legitimate activities in response to high-stakes pressures. Any advocate of high-stakes testing who does not address that fact is failing to follow a simple rule: you need compliance with what you'd like to happen in classrooms more than you need paper compliance.

November 6, 2006

Student suggestions on writing

This semester, I promised my masters-course students that they'd see every draft chapter of Accountability Frankenstein, and I've given them credit for ripping the chapters apart as they've come to them.

It was one of the best things I've ever done in my career.

Not only has the pressure pushed me to write faster than I otherwise would (and think of it as part of my teaching load), but I've had some wonderful suggestions. I've peeked at the comments this week (some students have finished writing them though they're not due until tomorrow night), and one student suggested I put chapter 3 at the start of the book. I've skimmed through the chapter and apart from one passage that I'll have to shift to another chapter, it works better in the sequence of presentation. Wow. Thanks, C. P.

October 23, 2006

Book progress pleasurable but costly

I have one paragraph left before I finish the third chapter of Accountability Frankenstein. Unfortunately for my sanity and time, I think I'll be reading the equivalent of 3-4 books to write that paragraph (international perspectives on curriculum decision-making). But I'm quite happy with how the chapter ended. I've divided our education woes into three fundamental questions (generational challenges, inequalities, and true crises) and tried to think through and write clearly about setting goals (standards would be the current buzzword).

If I've done my job right, forceful advocates of both high-stakes testing and no standards at all will be gritting their teeth at the end of the chapter, but most should acknowledge that they have to defend their views in different ways because of it.

At this point, I think the manuscript is a little over 200 pages. Approximately 150 pages are text, and a little over 50 pages comprise the draft references section.

The next chapter will require additional reading in an area I don't usually read (industrial and organizational psychology). Time to bone up!

October 22, 2006

On standards, the curriculum, and acorns

One of my frustrations with some types of education policy writing is its irritatingly acontextual nature, as if nothing but that era (usually This Era) and the conceptualization in use at the time (i.e., a particular buzzword) is relevant for the question at hand. The writing-that-frustrates-Dorn can often be very detailed, accurate, and descriptive, but in a flat, largely uninteresting way. Make connections! one part of my mind screams as I plod through the piece. But, inevitably, the only connections made are to last year, or to nearby states, and only with regard to the buzzword-in-focus.

The standards movement is one of those buzzwords that is a particular magnet for acontextual writing. Writing that assumes meaningful curriculum development didn't exist before the late 1980s makes me want to pull my hair out. No, not really; I just squirm in my seat, with muscles in my torso and arms tensing. There are multiple problems with the term standards movement, including the elision of different types of expectations (the purpose of schools with our expectations of student performance) and the elision of two separate developments in the late 1980s and early 1990s (performance assessment commonly associated with the New Standards Project and Laura Resnick, on the one hand, with efforts to create state or national curricula, on the other).


But the greater problem with most writing on standards is a failure both to look at curriculum history broadly conceived and also to think comparatively, with the U.S. as one of many countries with curriculum policy. Advocates of standards often talk about the need for alignment: we test what we say we expect from students, which should have something to do with what we plan for them to learn. Curriculum-studies folks would point out that this is parallel to their observations for many decades that there are different levels of curriculum.  Terms such as the formal curriculum, the taught curriculum, and the tested curriculum abound in curriculum writings, and essentially the argument for alignment is that the formal, tested, and taught curriculum should be identical. Alignment is really about aligning different types or levels of curriculum. In abstract, that's fine as far as it goes, but alignment doesn't guarantee that the learned curriculum will be the same, nor that alignment will eliminate the hidden curriculum.

The ahistorical nature of most writing on the standards movement is more problematic. It is true that the early 1990s was the first time when we could witness most states trying to write formal curriculum expectations across a range of academic subjects. But states have written expectations before into specific parts of the curriculum, and somehow advocates for aligning different curriculum levels haven't been interested in looking at that history. The narrow definition of alignment also assumes that those who have gone before and only focused on one type of curriculum (say, the tested curriculum if you look at minimum competency testing) didn't know what they were doing in terms of its effects.  And I suspect that's baloney: Even if a particular effort only targeted one chain in the desired link from expectations to what happens in student minds, advocates often have had a very clear idea of what they hoped would happen. Connecticut common-school advocate Henry Barnard, for example, hoped that blindly-graded admissions testing to high schools would drive the curriculum in grammar schools, even when only a small minority attended high schools at the time. We cannot clearly identify what is truly new in the last 15-20 years of curriculum (including the standards movement, for want of a better term) unless we look at the history with more than very narrowly-defined questions.

So, too, with international perspectives. Advocates of standards and alignment occasionally will refer to the existence of a national curriculum in other countries, most famously France, but it is not true that every other industrialized country has a long history of a centralized curriculum. I am not a comparativist, but I have a sneaky suspicion that parents have often thought that parts of the French national curriculum are compartmentalized drivel, but that's less important than a little bit of skepticism we need about the inevitability of centralized curriculum. (We can talk about de Tocqueville's model of history later.) For decades, West Germany had a clearly-articulated lack of national curriculum in reaction to the national curriculum of the Nazis. After the end of apartheid, South Africans (of all ethnic and racial groups) started looking at its prior national curriculum with considerable shame. Looking internationally, I don't get the sense that the U.S. is out of step with some universal consensus on curriculum centralization.

In fact, as my astute spouse has pointed out to me on occasion, we have a nationalized curriculum in the oddest places. One of the cultural norms of elementary schools in the U.S. is to teach about the calendar. We want young children to get a sense of time, and one way to help them understand the concept of a year is to talk about seasons.  But the way we do so, in all parts of the country, is tied to temperate parts of the country.  In southern California and Florida, kids learn about temperate climates--that leaves turn colors in fall, that it snows in winter, and so forth. In Florida??? Leaves don't turn colors in the fall here, and deciduous trees often drop their leaves in February (especially oaks). In Florida, fall is the time of year when acorns fall, and when it gets a bit drier and more comfortable after Halloween. And don't even talk to me about teaching kids about snow. But you'll see plastic colored maple leaves in Florida classrooms this time of year. I remember the same growing up in southern California, and I suspect it's also the same in many Hawaii, Arizona, and New Mexico classrooms. We don't follow Noah Webster's exhortations to speak with the same accent, but we have the same plastic maple leaves!  

(A slightly different version appears on Education Policy Blog.)

October 6, 2006

Statistical magic and record linkage

Highly recommended link on a way-cool statistical technique in record linkage: The Bristol Observatory, where Steven Banks and John Pandiani have developed probabilistic population estimation, using two data sets with just birthdates. It's not really magic but relies on a classic puzzle in probability (and an elementary one to solve, apparently).

Banks and Pandiani developed this technique to solve a serious evaluation problem with mental health programs: how do you identify who used two services, or showed up in two different places, if the two agencies cannot reveal personally-identifiable information for privacy reasons? 

They went around that problem to rephrase it:  the operative question for program evaluation is not who shows up in two places but how many. The first requires invading privacy to some extent. The second, not at all.  Their technique requires information only about birthdates and such other nonidentifiable information as would allow them to subdivide a population for greater accuracy, but no names, addresses, phone numbers, or Social Security numbers. They don't even need to know the unduplicated birthdates. It also bypasses all the attendant problems of keeping separate databases up-to-date.

Is this applicable to education research and my own work? Well, suppose you want to know if a specific intervention leads kids to graduate from high school, but the local school district (or some relevant agency) won't release identifiable information.  All you need is the birthdates, sex, and maybe ethnicity of those who graduate from the district (though since ethnicity is more malleable than sex, that's a problem), and you can estimate the numbers of graduates who also came from your participants (or a segment of your participants).

Banks and Pandiani have patented their work, so someone wanting use this specific procedure needs to work with them, but there is another technique that is similar and publicly usable. I'll post on that one after I've had a chance to absorb it.  (I have a demography masters and can read statistical explanations, but sometimes I need more time to absorb it.)

But definitely go to Banks and Pandiani's website.  And check out the video, which explains the principles of their technique!

September 11, 2006

NCLB reading for the week

As suggested by Eric in one of Jenny D.'s Hechinger Institute confab entry, here's a recommendation for Hayes Mizell's speech in 2003. Mizell's explanation of NCLB's origins ("For decades, local policymakers and school officials turned a blind eye to a set of vexing problems in public education") assumes there was no such thing as accountability before 2002, and I doubt his suggestion for creativity will solve the many problems with NCLB, but his argument about the dangers of both conspiracy theories and blind compliance efforts is well taken.

September 9, 2006

Backpedaling on the achievement gap

I've been puzzled by Kevin Carey's blog entries Thursday and Friday, in response to a Richard Rothstein essay and AFT's NCLBlog, respectively. Here's the critical claim of Carey (from Thursday):

NCLB is not based on the premise that good schools can erase the achievement gap. It's based on the premise that good schools can raise disadvantaged student performance to a defined level, proficiency.

In a word, dear readers, this is baloney (technical educational policy term). The mechanisms of AYP notwithstanding, rhetoric about NCLB has consistently been about the achievement gap. I'm not sure if Carey is echoing Fordham's Michael Petrilli (see my earlier entry on that), but when anyone starts to lowball expectations, it's, well, it's, well, ... soft bigotry? I'll avoid the purple prose and just note that defenders of the AYP mechanism and high-stakes testing are engaging in this rhetorical dance because NCLB and most accountability frameworks avoid concrete discussions of standards. We must have them, proficiency must be defined, but your everyday Joe wouldn't know what that means. Heck, I don't.

I'm not sure if we're headed towards the worst possible outcome of oversold education reforms (regression towards deterministic views of human capacity), but when defenders of high-stakes accountability start backpedaling as fast as they can from the rhetorical framework that's been the political underpinnings of NCLB, it's not good news. No, this is not earthshaking stuff, but I don't think it's helpful.

September 7, 2006

What do you think of this accountability mixed metaphor?

I'm now done with the preface and chapters 1-2 of Accountability Frankenstein, and I just noticed the following mixed metaphor:

For a school official to say blithely that elementary teachers should deemphasize -ing word endings in language arts because -s and -es endings are on the test but -ing endings are not is letting the fear of test consequences drive common sense out the window (Tobin & Winchester, 2004)

Normally, you don't drive in and out of windows. But on the other hand, maybe the idea of a car going through a window is precisely the image I want to convey. 

September 5, 2006

Jay Mathews discusses the prospects for national testing

Sunday's column by Washington Post writer Jay Mathews focuses on renewed arguments for a national test that every student would take. The dramatic difference between the proportions of students in various states' being labeled proficient based on the state test vs. the National Assessment of Educational Progress is the fulcrum of the argument. I think. I suspect this is a piece largely engaging in celebrity-wonk showcasing (no chance that a president with approval ratings in the 30s will successfully push any major education initiative that rocks political pathways, as I explain below), but I'll address two issues raised in the article.


Standards vs. Cut-scores

Jerry Bracey has been going after NAEP's value-laden labels for the achievement categories for a decade or more, and his pre-publication e-mail to Jay Mathews gives you a sense of what he thinks of this reporting. Essentially, Bracey's argument is and has been that NAEP's labels don't mean much: The governing board set "basic" to be fairly close to the mean scores many years back, which makes one wonder what they thought the majority of students were doing in school. That's an interesting counterweight to the hype in this story (and others) that students in the U.S. are mediocre.

The truth is that neither point has much weight except for political symbolism. The levels or bands on NAEP are ordinal in the sense that scores in higher bands represent greater achievement on the NAEP scale than scores in lower bands. Yes, Bracey is correct that the values chosen have no inherent meaning. Moses did not come down from the mountain and have tablets written that such-and-such a score on the NAEP is proficiency. That's true of all cut-scores. But I think Bracey is missing the forest for the trees.

The greater sin of the reporting by Mathews is the confusion of standard-setting with cut-score setting. The rhetorical flourishes to justify a national test this go-round (oh, my, the states and feds don't agree on the proportion proficient!) imply that if the stats disagree, the states must not be setting standards, and a test must do that.

That's balderdash or policy bravado, and I'm not sure which. Oh, and it might just be sloppy reporting, too.

For more than 15 years, the education policy ether has been filled with discussion of standards and alignment. Standards are sets of statements of what we think students should learn and be able to demonstrate. Students should know the reasons for the writing of the Declaration of Independence and the ways in which people have used its rhetoric for political and social purposes is an example of a standard one might write for history. I don't think any state has such a standard, because it crosses historical periods and countries, but you could write such a standard.

In the theory of action of standards and alignment, the establishment of standards would determine both the focus of instruction and the scope of assessment. That hypothetical history standard would probably force a reorganization of history teaching if it was central to a course, for example, because it would break down the neat compartmentalization of the Declaration in the 18th-century "unit" (though there might be a text that pushes students to ask such questions). Back when Lauren Resnick was our standards theory king, the tests were supposed to be challenging, assessing higher-order thinking, and so we needn't worry too much if teachers taught to the test. The 1994 reauthorization of the Elementary and Secondary Education Act had all of that high-falutin' stuff in there, taking the standards piece of the Goals 2000 legislation and saying, Thou must apply this standards stuff to schools with high concentrations of poverty, too. In 1997, the reauthorization of the federal special-education law (the Individuals with Disabilities Education Act) added ... and to students with disabilities as well.

How far we have fallen, if one of our star education reporters can't see the differences among standards, tests, and cut scores.

I don't know if Mathews is accurately reflecting the views of those he interviewed (I figure that if a journalist gets 70% of the facts correct, he or she is doing a decent job), but any attempt to use a test to "set standards" is getting things backwards. Don't we first decide what we want students to do? Of course, part of the problem with the debate is how the backwards-reasoning is promoted by the AYP requirements in No Child Left Behind. You want high-stakes testing? Fine: Let's see how You, Ms. Education Commissioner/Superintendent, game the system. First, you say you agree with the goal that 100% of students will demonstrate proficiency in reading and math by 2014. Then you quickly ask your staff what cut score would allow your state to declare AYP in most schools for a number of years until the policy changes or you intended to retire or run for higher office, anyway. Next step: Define proficiency so you get that cut score, and then finally figure out some way to tie that notion of proficiency to your standards. If you're currently paying for off-the-shelf commercial tests, or you haven't written standards, you get to design this set of links from scratch.

No, not all states have done this. Some have overpromised and are reaping the rewards of such expectations by having the state label most schools as Peachy-Keen while the state AYP definition labels most as Not Meeting AYP. But the debate still revolves around differences in labels rather than whether instruction is decent and what evidence exists.  And the talk about a national test says nothing about the expectations we should hold for students and schools. Does anyone think that any sort of national test would mean we'd first have an exercise in setting national education standards?  Whooooooooooeeeeeee! We tried that in the late 1980s and early 1990s, and the result were some mediocre standards, some decent standards, and a set of history standards that was viciously lied about by the future Second Lady.

So if there were a national test every child takes, I predict that there would be a yawning gap between the test and any sense of real standards or expectations.

(For those who think of standards as a term of art in assessment with a different definition, please accept my apologies. I don't think there is an agreed-upon term-of-art called standards, so I'm going with the wonkish use of the word over the past 15 years. But I'll be happy to be educated on this.)

"Local control"

The phrase "local control" is policy short-hand that refers to the fact that education is not mentioned in the constitution, that it really did used to be controlled at the community level, and that members of Congress jealously guard their states' ability to decide on policy within the state. The confusion of Mathews is understandable, but it's important to note here that local control in education politics at the federal level refers more specfiically to state decision-making and not control by a community. As an historian, I'm not sure how fair it is to term the state politics as local, since states wrested control away from rural and truly local communities early in the 20th century.

We would be more accurate if we talked about state police powers rather than local control, but I suspect we won't be very accurate here. Why should education policy debates be accurate?

But, in any case, Mathews and other observers are correct that the idea of national tests currently have about as good odds of coming true as John Kerry's getting 70% of the vote in Houston. Whether you call it state police powers or local control, few politicians with state or local constituencies (i.e., those in Congress) will have the stomach for much centralization at this point.

Also see commentary by Jim Horn. Update: also see Andy Rotherham's followup, which refers to a July 2006 explanation of cut-scores he wrote and the Fordham Foundation piece on standards and tests. The one thing I noticed right away was Checker's and his coauthors' inattention to baseball pop culture. The phrase is not "If you build it, they will come" but "If you build it, he will come." Mathews also implied that all of the wonkish folks surveyed in this not-quite-Delphi-questionnaire process were coauthors or endorsers of the Finn et al. approach, something that is not true.

September 3, 2006

Losing our way

Shameless plug: Fellow Haverford grad Ken Bernstein has a new post up at The Wall of Education called I think we have lost our way. Provocative, thoughtful stuff, regardless of the extent to which you agree with his perspective.

(Disclosure: I'm a fellow writer on this multiblog.)

August 22, 2006

Bulleted ed-policy sliders

Charter-school report? PDK/Gallup poll? I've been preparing for the semester and don't have time for a long entry on this, but here are a few quick (and idiosyncratic) perspectives on the big education news today:

  • While the responses on the poll to reform vs. alternative system have remained somewhat consistent over the years Gallup has included the question, the responses on vouchers specifically are more volatile. My prediction: no one else pays attention to the up-and-down nature of the results.
  • It looks like neither the public nor some charter-school advocates or skeptics understand what charter schools are.
  • The public retains inconsistent views on accountability, at least regarding both expectations and mechanisms.
  • This summer has been the Education Department's Zen PR season, effectively highlighting the importance of newly-released studies by not.

Consider these to be my gnomic utterances for the month.

August 9, 2006

Necessary but Dull Wednesday

Today I'm pulling my own teeth, metaphorically speaking. No, I'm not talking about my checkup this afternoon but working on the references to Accountability Frankenstein. After finishing the text of chapter 2, there are a bunch of citations that I didn't yet have in the references file, others I need to complete, and so forth. This is also a decent time to dump in citations for the first chapter that I was lazy on, get in there things I know I'll need later, and so forth. Currently, there are 25 pages of citations right now, after a chunk of the work.

Blogging as a break from the tedium? How did you ever guess?

Thus far, I'm pleased with the structure I've set for the chapters. In both the first and second chapters, the end leads naturally into the beginning of the next chapter. While I paid attention to the beginnings and ends of the chapters in Creating the Dropout, my nose had not experienced the full olfactory sensation of being at style's ground level until I read Joe Williams' Style: Toward Clarity and Grace. It's a wonderful book, and I'll be reviewing it before I turn in a final manuscript.

August 6, 2006

Want to get a copy of Accountability Frankenstein without paying?

I'm still on chapter 2, which has ballooned beyond the size I anticipated. I'm at the fun stage, filling in holes that largely involve writing what I already know. It's akin to putting the last pieces into a jigsaw puzzle. There are three substantive chunks left and then I need to redraft the chapter conclusion. Then it's on to chapter 3!

But to the offer: I'm interested in getting outside perspectives on chapter drafts as I finish them. 2-3 readers per chapter would be useful. And here's the deal: if you volunteer to read a chapter and I choose you, I'd send you a draft chapter and would need a 1-2 page response (strengths/weaknesses) in 2-3 weeks. When the book is completely drafted, you get the file (though I wouldn't be looking for in-depth critiques then). Then, when the book comes out, you get a copy. And you get into the acknowledgments, of course.

If you're interested, please e-mail me.

(P.S. Did anyone notice the typo in the original version of this entry? Are there any left?)

August 5, 2006

Was there a difference between intelligence and achievement tests 80 years ago?

Today's passage touched briefly on the overlapping development of intelligence and achievement tests early in the 20th century. Most of the historiography has focused on so-called IQ tests, their flaws, and their political uses. But at the same time that school districts were purchasing millions of IQ tests, they were also purchasing millions of the early achievement tests in academic subjects as well as the early achievement batteries. Lewis Terman had a heavy hand in prominent tests on both sides (the Stanford-Binet Intelligence Test and the Stanford Achievement Tests), and I wonder whether there is much evidence that school districts saw a difference between the two in the 1920s and 1930s. Chris Mazzeo's 2001 article in Teachers College Record documents the uses of testing for guidance purposes in the first half-century of standardized testing, and that could use either IQ or achievement tests.

Aggregate achievement test results were reported for what one might call quasi-evaluative purposes (sometimes publicly in the Progressive Era, often internally through the mid-20th century, as I've seen in archives). But I wonder if that was mostly an afterthought piggybacking on using testing for tracking purposes, at least initially. By the 1960s, I know some state officials were interested in using achievement testing for evaluative purposes (what we call accountability today), and I wonder if anyone might be able to trace that development in a concrete case or two.

But I wonder if there's an important lesson in the overlapping of so-called intelligence and achievement testing. We see that overlap in the ambiguity in the meaning of the SATs, which has stood at different times for Scholastic Aptitude, Scholastic Achievement, and Scholastic Assessment Test. The primary motivation for purchasing tests 80 years ago could have been very similar. And we know that the skills involved in profitable test-publishing (both in test construction and also marketing/contracting) were similar. Hmmn...

July 26, 2006

Claudio Sanchez says "HLM"

Todays' the day that NPR education reporter Claudio Sanchez said hierarchical linear modeling on air. This story was really about the public brouhaha over the did-she-or-didn't-she-downplay-the-study controversy on the public/private analysis of NAEP, but it's nice for the phrase to be out in public. Maybe this means that Sanchez will report on Jenny D.'s dissertation? Maybe we'll have comic books with Harvey Goldstein as multilevel superhero, meta-methodological wrestling matches between Larry Hedges and Robert Slavin, or poetry slams about evaluation perspectives with Michael Scriven, Michael Q. Patton, and Patricia J. Rogers.

Probably not, but one can hope, right?

But since Claudio Sanchez came out of the closet as a multilevel-modeling fanboy, I need to get cracking on the section of chapter 2 that covers growth models.

And, yes, I did look after my wonderful 11- and 14-year-old offspring while Elizabeth was at work today. Why do you ask?

More on the politics of thresholds

Just a few days after I wrote about cut scores or thresholds, uber-conservative Charles Murray criticizes NCLB for relying on them for proficiency labels (subscription required), accountability nihilist Susan Ohanian praises him, and Ford Foundation staffer Michael Petrilli talks back at Murray. (Hat tip: Andrew Rotherham.) Update: Manhattan Institute's Jay Greene and Marcus Winters have also responded to Murray.

The most fascinating part of Petrilli's column is the bit at the end where he entirely eschews closing the achievement gap:

Sure, there will always be a bell curve, but couldn't better instruction, higher expectations, and well-prepared teachers move the entire curve to the right, getting most or all students past the "proficient" line? That's exactly what NCLB is aiming to do, rhetoric about closing the achievement gap aside.... schools [do not] have to close the gap in the average performance of subgroups. And for good reason. No one would support a policy that gave schools an incentive to hold down the performance of white students in order to show gains in closing the achievement gap.

Retired philosopher Tom Green wrote about this phenomenon in a 1980 book, Predicting the Behavior of the Educational System, and his argument went roughly like this: Any education system is pushed by external forces to make sure that the vast majority of the middle class (or the equivalent in a particular society) get a certain normative level of education. At some critical point, the normative level education rises to the next system level. Richard Freeman took this up a few years earlier from a different perspective and an argument about credentialism, The Overeducated American (1976).

I think Green underestimates the relationship between labor markets and secondary/tertiary schooling, but I never expected an accountability jingoist make his argument as if it were a good thing. Petrilli essentially transfers Green's argument about attainment to achievement: to Petrilli, we want the existing achievement gaps to continue, just as long as achievement generally increases.

This is what we call a shell game, folks. NCLB has been (and continues to be) advocated for on the basis of closing the achievement gap, ending the "soft bigotry of low expectations," as President Bush is wont to say. Now we get a different story, and it's as ugly as you can get in education: Inequality's okay. It's good. It's politically necessary to preserve it.

And this fascinating twist by a Fordham staff member makes the politics of such mundane things as cut scores so important. After a bit of reflection (11:40 pm): Will this shift be noticed by folks like Andy Rotherham, who wrote last year about the importance of putting equity over the conventional wisdom about giftedness? Petrilli's stance doesn't count as progressive in my book, except in a minimalist sense. The big picture, however, is that there are always underlying tensions behind the statistical choices made for accountability. Sometimes, they erupt into public view, as with complaints in New York City last December about the priorities of the Bloomberg administration. I'd rather have the battles out in public, to be honest, rather than obscured by pseudo-technocratic babble about proficiency levels. Isn't accountability about transparency?

(Incidentally, Petrilli's wrong about NAGB's great wisdom, but that's a different question. The best you can say about NAEP's or anyone else's different levels is that they're ordinal at the macro level.)

July 22, 2006

Cut scores and democracy

This evening, I've been trying to boil down the controversy over cut scores (jump-started by Gene Glass in 1978, in the first wave of minimum-competency tests), and I've been trying to pay close attention to the counter-arguments. The one that pulls at me most is one advanced separately by Michael Scriven and James Popham in the same issue of the Journal of Educational Measurement where Glass lambasted the techniques used for standard-setting: the use of arbitrary cut-scores is justified not by their technical merits but by their use to improve education.


I'm not sure that's exactly fair as a characterization of their common position—Popham was saying that the possibility of a defensible cut-score definition should be magnified by the potential loss of educational benefits from going without (in the case of minimum competency tests), and Scriven was doing his Scriven-ish (or maybe Scriven-er) thing of discussing alternatives in the case of admissions tests (i.e., that the predictive validity of the underlying measure was minimally sufficient to make decisions).

On the other hand, there was a common assumption behind both remarks—the case for the use of cut scores depends on consequences. If the use of a cut score improves education, then the process used to set the cut score does not need to meet high professional standards. I have heard the same argument from many others. Why quibble with test scores, if it gives a kick in the pants to schools? These are advocates of a scarlet-letter policy, to shame schools into improvement.

There are a number of problems with this kick-in-the-pants approach to accountability and the glossing over technical problems. Fundamentally, it sees the ranking as the purpose of testing, completely unconnected to information about student performance. Using the ends to justify standardized tests and cut scores for high-stakes purposes displays a profound distrust of democratic processes. If this argument is correct—that the pressure applied to schools is more important than the basis for such pressure—then we can only reform a critical government institution through deception, deception practiced by an expert profession.

This is a chilling thought.

July 20, 2006

A bookstore provides the secret to standardized testing

I've been working today in the air-conditioned comfort of a bookstore café, eating and drinking my way through the next few pages. My task today: finish the description of standardized test characteristics key to accountability. Two of them—comparability and consistency—set the stage for test preparation. Because tests sort students either explicitly or implicitly, and because they do so on a consistent basis (which I consider to be broader than the technical reliability of test scores), someone can predict test construction and figure out how to beat the test. Princeton Review has used Adam Robinson's "Joe Bloggs" approach to make millions off middle-class families' test anxieties for a quarter-century now. The existence of test-preparation both reinforces the belief that one can game the system without actually teaching students more about a subject and also creates a certain problem with test scores: how much is the test score a reflection of what students know (which we'd like to measure—wishing away a bunch of other test issues), and how much is it a reflection of test-wiseness (which we should expect to be distributed unequally in society)? Whoops.

I've known about the Joe Bloggs approach since the 1980s but was curious how Princeton Review modified it for computer-administered "adaptive tests," where you can't rely on a paper-and-pencil progression of difficulty. I wrote most of the passage, and then I took a break and walked around the books. Are there any Princeton Review books targeted at adaptive tests? Yes! And it turns out that while the Joe Bloggs technique is a little less useful for adaptive tests, it still personifies error as Joe Bloggs's answers, the relative attractiveness of different answers in the context of item difficulty.

A quick trip back to the café and a sentence or two more, and the passage was complete.

On the multiblog The Wall of Education today is my entry on another subject, Russ Whitehead Doesn't Know Daniel. Unfortunately, Jack wasn't available to be ignored by Dr. Whitehead.

July 16, 2006

Making sense of expertise

I'm back to rolling on the book today—deleted some outline stuff, but I added about 5-6 pages, the organization is much more clear (I hope!), and I know much more clearly where I'm going with the rest of the chapter.

Most of what I added today concerned the birth of a testing industry as part of the growth of expert professions in the late 19th and early 20th centuries. One notable trait of professions, as many have observed (and I'm racking my brains to think of a few I might link to), is that professionals often have more of an affinity to their peers in other organizations than to immediate coworkers or their workplace. So historians think of themselves as historians and gather together with historians in other universities rather than gather with engineers. This is an oversimplification, of course, but there is one functional truth to this conventional wisdom: expertise is helpful in both managing and controlling inter-organizational tasks.

And that's one of the open secrets of education: public schools are often as much about their relationships with contractors as about what happens inside a classroom. This set of relationships includes the test industry. I know that David Tyack disparaged it as an "interlocking directorate," and he had a point, but I think he could have gone further if he had looked at that phenomenon a little more analytically. It sounds remarkably like the classic "iron-triangle" relationship popular among political scientists (or that was popular among them a few decades ago).

And now... off the laptop for a bit.

The politics of composite scales

The logjam is gone now, at least for chapter 2, thanks to a discussion with a classroom teacher (my wife). Talking the ideas over last night made me realize I had been dancing around the central dilemma of relying on tests: The same qualities that satisfy technical requirements for psychometrics distance us from knowing concretely what children do. As someone with statistical training, I know all the advantages of test construction that leads to a composite scale that ranks individuals in a consistent fashion. (Here, it doesn't matter particularly whether we are ranking individuals against each other or against a particular standard.)

Yet such a composite scale cannot be the basis for a transparent accountability system. As far as I am aware, no test currently used by any state—nor NAEP—can tell us how many 9-year-olds know their multiplication tables. We can quibble all we want about whether such a statistic captures what we want of children's knowledge of math—thus, the advantage of a composite scale—yet turning achievement into a composite scale creates a distance between what children do and what we know about their skills. Imagine for a second if we did have such knowledge: we'd probably debate whether knowing multiplication tables is a reasonable goal for 9-year-olds, where we should want automaticity in arithmetic, whether multiplication tables say anything about math as a more complex subject in general, etc. That debate would be raucous, splintered, and absolutely appropriate for a democracy. That we don't have such a debate about expectations impoverishes us and our schools.

Focusing on different aspects of testing—the sociology of knowledge (e.g., Noel Wilson's critique of psychometrics), how test construction provides an opening for test-prep, the political legacy of IQ tests, etc.—is important, and I will address several in the chapter, but I needed someone to push me on the central issues. Thanks, Elizabeth!

July 15, 2006

John Kingdon and accountability

John Kingdon's Agendas, Alternatives, and Public Policies (1984) is now a classic in political science and policy analysis, arguing that issues attain policy salience when the streams of publicly-recognized problems, policy solutions, and politics intersect. My colleagues Larry Johnson and Kathy Borman used Kingdon's framework to analyze higher-ed politics in Florida (and the temporary absence of a statewide governing board for our universities here).

Kingdon's contribution to policy analysis is the framework he provides for analyzing contingency in policy creation and the adept way that his argument handles the existence of policy entrepreneurs. I've been trying to figure out why Kingdon's approach still doesn't appeal to me with long-term questions such as the shape of high-stakes accountability, and I think I have it (though I should look in the poli sci literature to see who has more sophisticated critiques): Kingdon's framework alone cannot easily explain long-term patterns in a political system. Why doesn't the United States have a European-like welfare state? Theda Skocpol's classic Protecting Soldiers and Mothers has plenty of contingency, but I don't think she cited Kingdon at all. You think about problem, solution, and politics streams and ... rrrrrgggggg. Nothing there on the long-term radar screen. There certainly have been policy entrepreneurs, as the new-institutionalism literature points out, but Kingdon's analysis doesn't necessarily point in a single direction for the long term. There's nothing wrong in that, of course, but it's a limitation.

Time to read Julian Zelizer's 2004 article on political science and history, I think. There are plenty of syllabi with both Kingdon and the new institutionalists (e.g., a 2000 policy course syllabus at the University of Illinois at Chicago), so I suspect the relationship between the two hasn't been ignored.

Practically speaking, for me the question is the degree of contingency in the development of high-stakes accountability systems. Kingdon's approach doesn't quite ring true here, but I should be able to wring something useful out of it, and I can't quite yet.

June 29, 2006

Testing as technology

I've spent my book-time today pondering the nature of testing as technology. For the first time, I'm using concept-mapping software (CMAP), because, well, I haven't been able to get my mind to think about this rigorously, and anything's worth a shot.

So it's been fruitful. One thought in my head before today, which I knew was incomplete, was the technical focus on consistency, consistency among different levels of objects (content standards and item specification, item specification and items, items and total tests) as well as the type of consistency we think of as the technical term reliability. Item response theory (which I only grasp in a general way, having had no practice in it at all) is a tool in service to this consistency.

Trying to wrap my head around this made me think of the obvious criticism of this focus on consistency—the way our real-life skills are not consistent, despite the conformity of tests to this consistency standard. But that doesn't really touch on the core tensions between technologies (such as testing) and democratic politics (no matter what your political theory). Consistency is more a matter of minimum quality, a key concept in the same way that optimization is a driving force in engineering. It doesn't really tell you anything about the politics of technocracy.

So there are two thoughts that are running through my head this evening, after this exercise:

  • The control and oversight in testing has some interesting professional characteristics—internal accountability (within an organization) is critical, the standards are technical, market demands shape behavior (even if the buyers are generally public agencies), and the public legitimacy of testing is crucial to the survivability of the whole enterprise.
  • I need to discuss political-science notions of the iron triangle of industry, Congress, and regulation.

I have nothing more useful or synthetic right now, other than just wanting to jot these down to ponder overnight. Tomorrow is my son's last morning in the summer chess camp (more about chess and history after that's done), so I'll spend the morning in some place trying to ponder/write, pick him up, and then head to campus for a few errands. But this is enough to ponder.

June 28, 2006

Growth models, technocracy, democracy, and algebra

I had thought that my being anxious and out of sorts for several days was from that line-drive hit into my leg two weeks ago and the resulting swelling and discomfort. But no—it was from the interruption of the book-writing for other things that are quite valuable (the journal, the article on migration and graduation estimates, and a few other tidbits) but that stopped my writing momentum. I think I have it back now, and I'm much happier. I know—I should be delighted that I've accomplished so much with even a minor injury. But I have a compulsion (hopefully not a disorder!) to get this book done.

Current status: I have a contract I need to sign with one publisher (hurrah! I'll add that information as soon as I receive my completely-signed copies), and I'm on chapter 2 right now. That's the chapter on the relationship between technical expertise and democracy, one that explores the politics of accountability statistics and how that is rooted in a long-term tension between technocracy and democracy. The first section of the chapter explores the Progressive-Era origins of prestigious technical expertise and the ambivalence our society has with expertise (with IQ testing as a prominent example). The second section of the chapter explains the organizational life of testing and how the fragility of the world of testing undermines our ability to use high-stakes testing with confidence. I suspect I need to add a separate section on some of the stuff that didn't fit in the first chapter, on the civil-rights meme that's only just emerged as a major rationale for high-stakes testing.

This morning, I'm working on the last section, on growth modeling. I've discussed growth models before. It's a paradigm* of the dilemma in balancing both technocracy and democracy. My goal is to describe both the technical difficulties with growth measures and the way that the holy grail of growth has obscured the political questions involved: how we set expectations for schools and students.

Addendum (added after a few minutes' thought waiting in a coffee line): the tricky part of this chapter is figuring out what technical discussion is necessary without going over the heads of the potential audience. What can I assume? Practically speaking, I think I need to assume some knowledge of plain multivariate regression. These days, principals and superintendents need some statistical reading skills (though I won't call it statistical literacy) not to be bowled over by waves of school accountability statistics and bad research.

And there's another place where algebra has some use in everyday life. If you understand a linear equation, you can understand multivariate regression. This means that administrators, and any teacher (at any level!) who wants to be an administrator, needs to know algebra.

* Paradigm is now associated with Thomas Kuhn's (mis)usage of the word to mean "social model." I'm using the older meaning.

June 17, 2006

In NSA and McGraw-Hill, we face problems of both technology and democracy

An Associated Press story today on using cryptography to allow more data-mining while (at least in theory) protecting privacy has an important point buried deep in the story: many of us just don't trust the cryptography. Without some external, independent, transparent review of the technology, a good many of us don't want the NSA or other federal agencies having access to our phone or credit-card records without a court-approved search warrant. This is a problem where issues of technology and democracy collide.

So, too, with testing. Whenever stories appear, such as the Florida Commissioner of Education's see-no-evil approach to the qualifications of test graders, we see a collision between issues of technology and democracy. Democracy demands transparency, independence, and accountability (precisely those qualities that lead proponents to defend high-stakes standardized tests). But the tests are produced and graded in secret, and every scoring error and other foul-up that is revealed from behind the veil creates the clear impression that these folks just can't be trusted with something as important as accountability.

The problem is that testing and accountability is a case where both the technology and the democratic issues are important. This is something that is hard both for defenders of high-stakes testing and some opponents to grasp. You can't go full-bore with high-stakes accountability without understanding the serious limits of any assessment. But it's close to Luddism to reject any attempt at assessment. It's tempting, certainly, given the sad history of test misuse. But this is a dilemma we have to tackle, both democratically and technocratically.

Longitudinal student database glitches

It's rare when you can combine SAS (stats package) geekery and education policy analysis, so I have to take advantage of this opportunity. This morning, I had a discussion with a staff member of the Florida Department of Education, specifically in its data-warehouse unit. Very kindly, the staff in the data warehouse allow state researchers to use the data (once identifying information such as the real school ID, name, etc., is removed). Over the past 17 months, I've been playing off and on with several data sets they sent me from the 1999-2000 and 2000-01 school years, as I've been tinkering with my ideas for measuring graduation and other attainment indicators. Someone pointed out at some point that the enrollment numbers I was working with for 1999-2000 was a chunk smaller than the next year's (over 10%). That's embarrassing! I finally did some follow-up (checked through the monthly figures) and discussed this with my acquaintance in the FDOE.


I learned today that the data set I was using (what's called the attendance file, which has enrollment and disenrollment dates) is not what the FDOE uses. For their annual enrollment count, they use a database of students uploaded by each district from those students enrolled in the relevant week (e.g., Oct. 11-15, 1999). But this enrollment file doesn't have dates of entry and doesn't always have an exit date. And the attendance file (that I was using) isn't as reliable as the enrollment file (according to my informant). Practically speaking, after I merge the sets of data, I'm left with a record of students for whom I sometimes have enrollment/disenrollment dates and codes and sometimes don't.

My strategy is fairly simple at this point: after merging the data, I impute one of the end-points for the enrollment interval and then impute the enrollment length. Because of the structure of the data (monotone, for my readers who know multiple imputation), I'm first imputing the withdrawal date and then the length of enrollment. I'm too tired to follow up with the analysis tonight, so that will wait until tomorrow.

But the gaps in coverage for enrollment are significant for anyone who thinks building a longitudinal database with individual-student records is easy to any degree. Florida has been at this longer than anyone (I think a few years longer than Texas), and we still have problems. Essentially, the data is split among many tables, and key information is entered by poorly-paid data-processing clerks at each school without significant edit checks in the software. Sometimes, that leads to records that are just silly: I found a few individuals whose records show they were born before 1900 or after 2000, including one child born in 2027 whose enterprising parents or grandparents enrolled her or him about a quarter-century before her or his birth. Now, that problem could be solved with a simple software check on dates, including an explicit question along the lines of The data you entered indicated that this student is X years old. Is that correct? Others are harder: as my acquaintance told me, records that should be uploaded (attendance records for students who are enrolled in a school) aren't.

And that doesn't touch the questions of auditing the withdrawal codes (how do we know someone showed up at another school when they said they were transferring, not dropping out?) or anything that touches on the longitudinal record of achievement. Please remember that Florida is one of the best-case scenarios with data integrity, as there's considerable investment in this data in terms of infrastructure, training, and an incremental approach to adding elements. Even with that, it's clunky and prone to errors—errors that might appear small but affects everything we have come to assume about schools (i.e., the official statistics).

Update: I forgot the SAS geeking. Last night I discovered PROC MI and PROC MIANALYZE, two procedures that make much easier the the type of multiple imputation Rubin's (1987) book describes. I realized this morning that I had made an error in the merging by including records for which one of the other variables clearly indicated the student had not attended, and so there was a spurious set of rare cases with withdrawal but not entry dates. Removing those cases means that the missing date data is in identical cases. Technically, I can impute either variable first and then impute the length of enrollment. (Quick logic puzzle for the reader: why wouldn't I just want to impute the two dates independently?)

The other information I have: school (and county), school year, race, gender, ethnicity, birth year and month, lunch-program participation, and grade (first grade, second, etc.). Obviously, the imputation has to be done separately by year (otherwise I might have starting and ending dates in the wrong academic year), and I could have separate imputations by county. I'm using the predictive mean matching for the endpoint date (to avoid dates that are beyond the ends of the school year—I'm so glad my campus's version of SAS has that option), and I'm not sure whether to use predictive mean matching or straight regression for the interval. The obvious thing is to try it different ways and see if it makes a difference.

Further update: Oh, rats. Imputing dates doesn't work, because either a regression or a predictive mean matching system gives me dates that are about 250 days apart (give or take a few days), no matter what, because the vast majority of students are in the same school for the whole academic year. That gives me less variation than calculating the variable of interest (was the person in school on day X in that year) and imputing that variable directly, so I'm going with that. But the nasty bit is that there is a lower proportion of 1999-2000 records needing such imputation than 2000-01 records. This doesn't take care of the undercoverage in the 1999-2000 record set. Yes, it's a problem.

June 16, 2006

Standardized testing and accountability have a complicated history

As I've caught up on other work tasks, caught a line drive in my leg, and caught up with some reading, I've only pecked away at the remaining tasks on chapter 1 (on the political roots of accountability). Well, not really—I've received quite a bit of advice on the history of standardized testing and accountability and skimmed a lot of sources. As usual when I hit this type of passage, one subtopic (the history of standardized testing) deserves a full-length history in itself, and while there are plenty of books on parts of it (Gould on IQ, Lehmann on the SATs [with plenty of qualms by historians], etc.), there isn't a well-researched academic history of standardized testing as far as I'm aware. But that's not the book I'm writing. I only have a few pages for it.

The themes that are popping up are going to be familiar to historians:

  • the way that experiences with testing establishes a "grammar of schooling" for standardized tests (see Cuban and Tyack's book for a discussion of that term)
  • long-term business relationships between testing firms and states before the 1960s established an easy route to accountability
  • social networks from early-20th-century researchers to late-20th-century folks in the guts of state departments of education (e.g., the three-degree difference from Charles Judd to Tom Fisher, the head of state testing in Florida who retired just a few years ago)
  • resistance to the use of testing to judge schools in the 1960s (with the National Assessment of Educational Progress) and later
  • two key researchers outside colleges of education who carried ideas into the policy realm (James Coleman and Pat Moynihan)

To what extent were civil-rights motives important to the start of the modern accountability movement? I'm seeing considerably mixed evidence, and I have to make a judgment call on that, obviously. As an historian, it's one of those big-picture questions that I prefer to sleep on. I suspect it requires a nuanced reading of the evidence that just isn't in my head yet.

June 12, 2006

No writing yesterday, and we'll see about today

I've spent the past 36 hours doing non-writing work when I've had the chance (which is complicated by dropping my daughter at camp, then worrying about said daughter because of Tropical Storm [soon-to-be Hurricane] Alberto), plus other things. So with regards to chapter 1, I've been going through the 10 books I brought home to read as well as an article. Then there's other work stuff, of course. I suspect I'll get at least an hour or two to write in the evening.

June 10, 2006

Chapter 1 with identifiable holes

I spent a few hours writing and then more time organizing what remains to be done with chapter 1 (on the political origins of accountability) and then diving into the library for resources. So I'll take home 10 books to read and an additional list of things to finish. But the first chunk of the book may be done by the end of next week.

June 8, 2006

More on incrementalism or tinkering

Since I discussed incrementalism a few days ago, let me continue in that vein here and both praise and criticize the most prominent historians who advocate incremental school reform. Eleven years ago, David Tyack and Larry Cuban's Tinkering toward Utopia appeared, as a short but pithy analysis of top-down school reform efforts. It was not strictly geared at accountability as such (the book came out 7 years before NCLB!), but it is still one of the most popular books on education reform, both in course assignments and in popular outlets.

I'll gladly admit to being one of those who assigns it in a course. There is nothing else in print that makes such a tightly-constructed argument about the dangers of utopian education reform. They pick wonderful examples from the past and make solid historical that many readers see as subtle (such as the point that schools change reforms as much as the other way around). The book easily deserved the praise it has received and more.

And yet...


There is something quintissentially late-20th century in the book, not in the explicit arguments but in the basic assumption that incremental reform is more likely to stick and be successful than dramatic reform. Far from criticizing tinkering, they see real value in it and deliberately choose something as mundane as minor changes to classroom design to illustrate what they see as the ideal scale for school reform.

The fly in the ointment here—only made possible because the book is so fine—is desegregation. Desegregation is a perfect counter-example of a reform that was far from incremental in its plans or eventual execution (even if you or I might have wanted it to be different in several ways). There was really no way to "tinker" towards desegregation. Well, okay, there was, but that was the reason why there was no substantive desegregation for the first ten years after Brown. In her 1984 book The New American Dilemma, Jennifer Hochschild argued that desegregation was most effective (and had the least disruptive implementation) when it was sudden, without compromise, and affected the lives of the youngest children immediately. She pointed out that these traits conflicted with classic pluralist doctrine in political science, which emphasizes compromise and incrementalism as a fundamental feature of the American political system.

In Tinkering toward Utopia, at least, Tyack and Cuban write as pluralists, emphasizing incrementalism and compromise. Yet I think they would acknowledge that desegregation was a good thing, a necessary event. But I don't know how they would reconcile the two.

June 7, 2006

The choice of standardized testing

Last night, I was thinking about the moment in the early- to mid-1970s when state legislatures began experimenting with some form of accountability (as they then termed it, perhaps borrowing from Leon Lessinger, an associate commissioner of education in the Nixon administration). For example, Florida passed something called an accountability act in 1971, Governor Rubin Askew talked about it in speeches in his first term, and the legislature tried different things over the next several years, including requiring more detailed reporting of spending (from the fiscal metaphor of accountability) and choosing standardized testing as the main measure of academic achievement.

I don't think anyone has adequately explained that choice. In the 1970s, standardized testing was coming under fairly harsh criticism for both its construction (claims that they were generally biased in content) and use (especially the group administration of IQ tests frequently used as screening devices for special education). It was in the 1970s that Congress changed the requirements for special-education assessment. From the general criticism, I sometimes wonder if one of the motivations for ETS's famous 1975-76 "blue-ribbon" panel analyzing the SAT decline was a subtle way of relegitimizing the SATs. (No, I don't have time to look into ETS's archives for that.) So why did legislatures such as Florida's choose standardized testing? Last night, my wife gave the usual answers (it's cheaper and easier to number-crunch with them), but that doesn't quite satisfy me as an historian, in part because those are ahistorical claims and in part because I'm not sure where I'd find evidence to confirm those hypotheses.

Any other possible answers?

June 5, 2006

The politics of incrementalism

In the last few hours, I've been thinking quite a bit about the political viability of incrementalism in school reform. Tyack and Cuban argued for it in Tinkering toward Utopia (1995), and while that's still very popular, their argument didn't win the day in school reform. The AYP requirements of No Child Left Behind look incremental, until you think about the 100%-proficiency deadline of 2014. Proponents of NCLB certainly haven't touted it as incremental, either.

Could incrementalism survive the political cauldron of education politics and the organizational cauldron of school systems?


Let's consider both the incrementalism of summative evaluation, such as what Robert Linn has proposed, and also the incrementalism of formative evaluation, such as progress monitoring (formerly known as curriculum-based measurement). Linn proposed that targets for student achievement have a foundation in what real schools were achieving rather than figures pulled out of a hat (to use the gentler analogy). Those who work on progress monitoring, such as Stan Deno (who wrote the germinal article in the 1980s), argue that long-term outcomes for students with disabilities are dramatically improved if teachers make inductive decisions based on trends from frequent assessment. This is all based in solid research and the observations of many gray eminences.

And yet I worry about the political or organizational viability, for two reasons.

  1. Incrementalism is not part of most adults' mental images of a real school (to borrow from Mary Heywood Metz) or the grammar of schooling (from Tyack and Cuban), all of which are deeply impressed instead with abstract notions of absolute standards (schools give grades, and 90% is an A, even if we don't know what the scale referent is).
  2. School systems are now deeply enmeshed in behavior that is summative and non-incremental in nature and that takes gaming the system (or test-prep) as a legitimate response to virtually any policy.
The mental images of school

I've been struggling with a counter-image or contrary meme to schooling is grading. I am less concerned in this context with the utility of grading than with its suffocating alternative concepts of evaluation. I think I have a tentative idea for Linn's summative incrementalism: what we're really trying to do in education reform is boost the average knowledge of this generation of children above what the average knowledge of their parents are, and this will require a generation to accomplish.

But let's try on a few for size specifically for progress monitoring:

  • The stock market. Daily stock prices allow one to track the market value of a company and respond accordingly—except day trading is a foolish activity, and I'd like to keep the question of incrementalism separate from issues of competition in schooling.
  • Baseball (or other sports) training stats. This morning, my son started a summer baseball camp, and the coaches there used a radar gun to test his bat and throwing speed. By the end of the week, I'm sure he'll have improved, and that's a parallel to progress monitoring... except that the stats in baseball camp are used primarily to confirm the value of the camp and boost confidence. I'm not sure if coaches would use the trends in these stats to change strategy.
  • Weather and other environmental data. We certainly keep track of all sorts of weather data daily (and even hourly) and respond accordingly (changing one's clothing or deciding whether to take an umbrella depending on the forecast). But we don't generally assume we can shape the weather in a short-term way (and the debates about human influence on global warming have no real parallel to education reform).
  • American can-do problem-solving. There is plenty of literature on American positive attitudes, especially problem-solving ones, such as David Potter's People of Plenty (though he thought this had a serious downside). Maybe one can look at progress monitoring, and ideally a teacher's use of the data, as the educational equivalent of problem-solving... except that the techie connotations of that are a one-time problem-solving: someone sees a technical problems and quickly figures out a clever workaround. That doesn't really have a parallel with progress monitoring.
  • A right to know/forewarning. As a parent of an 8th grader, I've had a few moments in the last 2-3 years when I really wanted more information at my fingertips, as my daughter heads into the age where she no longer babbles everything that happened that day. Most of her academic teachers in middle school posted homework assignments online—something for which I was very grateful. And in the last year, most also posted grades online, and I had a similar reaction. Transparency is certainly an American value... but this doesn't quite get at the inductive response to data that's the key to progress monitoring.

As you can tell, I'm still looking for hooks for progress monitoring. More ideas are welcome!

Summative, non-incremental, and gaming behaviors

I deeply fear that test-prep and other behaviors are genies that are out of the bottle. I have heard second- or third-hand that teachers have told the Florida Center for Reading Research staff of their efforts to prep students for DIBELS by teaching lessons geared specifically for the format, that they want to use DIBELS data to retain some kindergarteners, and that some principals want to use DIBELS data to reward or punish teachers, professionally. I understand the staff was (properly) horrified by these ideas. And yet they come directly from current behaviors in schools. Formative, incremental, and inductive behavior is neither rewarded nor experienced very frequently in schools. There needs to be a way to explain the difference. Something Elizabeth said this evening suggested the following: "We're testing the instruction, not the kids." But I just don't know if it would stick.

Ideas welcome here, too!

June 4, 2006

Accountability jingoists and nihilists

I've been searching for a while for how to explain the Manichean-style debate we've been having on accountability, and I think I have it. Those who advocate for high-stakes testing as currently wrought (or who seek an intensification of it) are accountability jingoists. As with foreign-policy jingoism, accountability jingoism is belligerent in tone and identifies disagreement as either foolish or undermining American values. On the other hand, some of those who disagree with the current moment in high-stakes testing are accountability nihilists, who turn in frustration to a denial of anything associated with the current regime.

You can see this "the other side is full of villains and fools" rhetoric in a recent Eduwonk post and in the song in Kathy Emery and Susan Ohanian's Why Is Corporate America Bashing Our Public Schools? (on p. 4, singable to "If You're Happy and You Know It, Clap Your Hands"):

If you cannot find Osama, test the kids.
If the market hurts your Mama, test the kids.
If the CEOs are liars,
putting schools on funeral pyres,
screaming, "Vouchers, we desire!"
test the kids.

I understand the value of snarky comments in blogs and outlandish words in songs, and I have agreed with both Andrew Rotherham and Susan Ohanian about various matters, but these are not isolated examples of Manichean rhetoric. In the long run, I don't think that either jingoism or nihilism helps us get to saner accountability policies. While my sensibilities in many ways are close to the nihilists—I see the current system of accountability as out of balance—I disagree with a broader worldview that fails to acknowledge that there might be something of value or politically rooted (in a positive sense) about accountability politics.

June 3, 2006

The critics of standards and accountability

I have started to write a new book, titled Accountability Frankenstein, that I hope to finish by the end of the summer and get to a press shortly afterwards. If I can stay disciplined and write a few hours every day, I should accomplish my goal. And, on the way, I'll be putting in a few teasers of short excerpts or paraphrases of some material. Today's is the first...


Susan Ohanian, Alfie Kohn, Marion Brady, and many others have criticized accountability systems well before I did. But, unlike these critics, I do not see standards as evil in themselves. Others, though, see the technocracy of accountability as unnatural. Ohanian, for example, calls advocates of high-stakes testing standardistos. For years, Kohn has railed against the use of rewards and punishments at all in schools, either for individual students or for educators. Brady has argued that standards themselves, commonly rooted in conventional disciplinary definitions, are inappropriate. I think I understand these humanistic critics of standards and accountability. To use technocratic tools to improve education threatens to remove the individualism, spontaneity, and joy of the best education many of us have experienced. The real world, to Brady and others, is more complicated than our compartmentalized curriculum. Good education, to Ohanian and Kohn, relies on something more than scripting. Most of us have had our “aha” or eureka moments, where something a teacher said gave us a new perspective, or where we finally understood a concept we had struggled with. And in most cases, these moments did not come in scripted lessons or during fill-in-the-bubble tests. To many critics of high-stakes accountability, trying to improve education by standardizing it is an obscene marriage of technocracy and democracy.

November 28, 2005

Growth models

All right—having beaten a future article for Education Policy Analysis Archives halfway into shape, I'm taking some time for relaxation and my sidewise way of looking at education policy, or at what passes for it.

Since the announcement this month that the Department of Education would promote the piloting of so-called growth models of accountability, there have been a number of reactions, many of which are skeptical, from George Miller and Ted Kennedy, the Citizens' Commission on Civil Rights (a private organization, despite the similarity in names to the official U.S. Civil Rights Commission), Education Trust, and Clintonista Andrew Rotherham, who points out that only a few states have even close to the sufficient longitudinal-database elements to carry this off.

Since a few journalists have had a reaction-fest with this, there has been no acknowledgment of the existing literature on so-called growth models, their political implications, or the gaps in the literature....

I'll state up front that it's fine to focus on political questions—moreover, I've argued in The Political Legacy of School Accountability Systems that the political questions are the important ones, ultimately, and it's impossible to have a technocratic solution to political problems—just so long as you don't ignore the technical issues (and for that, see Linn, 2004). Haycock of the Education Trust is ultimately right about the focus on philosophical questions, regardless of whether I might agree with her on specifics.

Big political questions

So what are the policy/political questions? A few to consider:

  • The dilemma between setting absolute standards and focusing on improvements. As Hochschild and Scovronick (2003) have pointed out, there's a real tension between the two, and it's impossible to completely resolve the two. On the one hand, there are concrete skills adults need to be decent citizens (yea, even productive). On the other hand, focusing entirely on absolute standards without acknowledging the work that many teachers do with students with low skills is unfair to the teachers who voluntarily choose to work in hard environments. And, no, I'm not going to take BS from either side claiming that, on the one hand, we need to be kind to kids (and deny them the skills they need??) or, on the other hand, that we need to take a No Excuses approach towards those lazy teachers (and who are you going to find to teach in high-poverty schools when the teachers you've insulted have left??)
  • The question of how much improvement to expect. Here, Bill Sanders' model (we'll take it on faith for the moment that he's accurately representing his model—more later on this point) is close to an average of one-year's-growth-per-year-in-school (see Balou, Sanders, & Wright, 2004, for the most recent article on his approach). But for students who are behind either their peers or where we'd like them to be, Haycock is right: one year's growth is not enough (see Fuchs et al., 1993, for a more technical discussion and the National Center on Student Progress Monitoring for resources).
  • The tension between the public equity purposes of schooling and the private uses of schooling to gain or maintain advantages. Here's one thought experiment: Try telling wealthy suburban parents, We want your kids to improve this year, but not too much because we want poor kids in the city or older suburb nearby to catch up with your children in achievement and life chances. If anyone can keep a straight face while claiming the parents so told would just sit back and say, Sure. That's right, I have some land to sell you in Florida.
  • Where is intervention best applied? Andrew Rotherham's false dichotomony between demographic determinists and accountability hawks aside, arguments by David Berliner are about where to intervene to improve children's learning, not about giving up. (I should state here that of course I have heard teachers and some of my students fall into the trap of this dichotomy, but that's a constructed dynamic from which we can and must escape. To dismiss Berliner and others as if they fall into the trap is to shut off one escape route. Shame on those who careless elide the two.)
  • Assumptions that technocratically-triggered sanctions based on (either) growth or absolute formulae work. I am yet to be convinced that such a kick-in-the-pants effect is strong enough or without side effects. This is not to say that I don't believe in coercion. I am just a believer in shrewd coercion, not the application of statistical tubafors (you'll have to search for the term on that page).

Statistical issues with multilevel modeling

Among education researchers, probably the tool(s) of choice for growth right now for measuring growth is so-called multilevel modeling. Explaining why multilevel modeling is the tool of choice for growth is probably an accident of recent educational history (combining the more recent pushes for accountability with the development of multilevel statistical tools), but it allows a variety of accommodations to the real life of schools, where students are affected not only by a teacher but a classroom environment in common with other kids as well as the school and their own characteristics (and family characteristics). That's a mouthful and only skims the surface.

Of multilevel modeling pioneers, the best of the bunch by far (beyond Bryk and Raudenbush, whose names are most familiar in the U.S.) is Harvey Goldstein, whose papers for downloading is a treasure-trove of introductory material for those who have some statistical background. The Centre for Multilevel Modeling (which he founded) is one broader source, as is UCLA's multilevel modeling page and Wolfgang Ludwig-Mayerhofer's. A Journal of Educational and Behavioral Statistics (Spring 2004) special issue on value-added assessment is now required reading for anyone looking at multilevel modeling and the question of adjustment for demographic factors.

But there are both technical and policy/political issues with the use of multilevel modeling software (and I use that more generic term rather than referring to specific software packages or procedures). Let me first address some of the technical issues:

  • Vertical scaling. In some statistical packages, there is a need for a uniform scale where the achievement of students at different grades and ages are on the same scale. That way, a score of a student who is 7 can be compared to an 8-, 9- or 10-year-old's achievement, resulting in some comparison across grades. This is not necessary with packages that use prior scores as covariates, but anything that looks at a measure of growth in some way strongly begs for a uniform (or vertical) scale. There are two problems with such vertical scaling, stemming from the fact that it is very, very difficult to do the type of equating across different grades (and equivalent curricula!) that is necessary to put students on a single scale. Learning and achievement is not like weight, where you can put a 7-year-old and a 17-year-old on the same scale. Essentially, equating is a type of piecemeal process of pinning together a few points of separate scales (each more closely normed). At least two consequences follow:
    1. Measurement errors in a vertical scale will be larger than errors in a single-grade scale, which test manufacturers have far more experience norming.
    2. The interpretation of differences in a vertical error will be rather difficult. One reason is the change in academic expectations among different grades, unless you narrow testing to a limited range of skills. But the other reason is subtler: the construction of a vertical scale can only be guaranteed to be monotonic (higher scores in a single-grade test will map to higher scores in the cross-grade, vertical scale), not linear. There will almost inevitably be some compression and expansion of the scale relative to single-grade test statistics. That nonlinearity is not a problem for estimation (since models of growth can easily be nonlinear). But the compression/expansion possibility makes interpretation of growth difficult. Does 15-point growth between ages 10 and 11 mean the same thing as 15-point growth between ages 15 and 16? Who the heck knows!
  • Swallowing variance. As Tekwe et al. (2004) point out in a probably-underlooked part of their article, the more complex models of growth swallow a substantial part of the available variance before getting to the "effects" of individual schools and teachers. This is inevitable with any statistical estimation technique with multiple covariates (or factors, independent variables, or whatever else you want to call them), but it has some serious consequencees for using growth models for accountability purposes. It erodes the legitimacy of such accountability models among statistically-literate stakeholders, who see that most variance is accounted for (even if in a noncausal sense) by issues other than schools and teachers. In addition, this process leaves the effect estimates for individual teachers and schools very close to zero and each other. Thus, with Sanders' model used in Tennessee, the vast majority of effects for teachers (in publicly-released distributions) are statistically indistinguishable. Never mind all my other concerns about judging teachers by technocracy: this just isn't a powerful tool even for summative judgments.
  • Convergence of estimates. In the packages I know, the models don't always converge (result in stable parameter estimates), given the data. Researchers with specific, focused questions will often fiddle manually with equations and the variables to achieve convergence, but you can't really do idiosyncratic adjustments in an accountability system that claims to be stable and uniform over time—or, rather, you shouldn't make such idiosyncratic adjustments and keep a straight face in claiming that the results are uniform and stable over time.

Political complications of multilevel models

In addition to the technical considerations, there are issues with multilevel modeling that are more political in nature than technical/statistical:

  • Omissions of student data. This is true of any accountability system that allows exemptions, but it's especially true of any model of growth that omits students who move between test dates. It's a powerful incentive for schools to perform triage on marginal students in high school, either subtly or openly. I've heard of such triage efforts in Florida, though it's hard to demonstrate intentionality. But even apart from the incentive for triage, it's hard to claim that any accountability system targets the most vulnerable when those are frequently the students who move between schools, systems, and states. And the more years included in a model, the less that movers count in accountability.
  • The complexity factor. Technical issues with complex statistical models are, well, complex and difficult to understand without some statistical background, and such complexity requires sufficient care with educating policymakers. That's especially important with growth models, which are pretty easy to sell to lawmakers who may be looking for a technocratic model that they don't have to think too hard about. Here's a reasonable test: will Andrew Rotherham's blog ever mention the technical problems with growth models? Will the briefs put out by various education policy think tanks explain the technical issues, or will they prove the term to be an oxymoron?
  • Proprietary software. I think that William Sanders still holds all data and the internal workings of his package to be proprietary trade secrets, even though they're used as public accountability mechanisms in Tennessee, at least (anywhere else, dear readers?) (Fisher, 1996). How can anyone justify using a secret algorithm for public policy in an environment (education) where everyone (and the justification for accountability itself) expects transparency? (For other commentaries about Sanders' model, see Alicias, 2005; Camilli, 1996; Kupermintz, 2003, and an older description of my own involvement in the earlier discussions of Tennessee's system. For his own description, see Balou, Sanders, & Wright, 2004; Sanders & Horn, 1998.)

Life-course models

One of my concerns with the increasingly complex world of statistical models of growth is their amazing disconnect from fields that should be natural allies. We have great statistical packages that are incredibly complex, but some days they seem more like solutions in search of problems than a logical outgrowth of the need to model growth and development in children.

As stated earlier, one problem is the attempt to put student skills, knowledge, and that vague thing we call achievement in an area on one scale. Unlike weight, there isn't a cognitive measuring tool I'm aware of in which all children would have interpretable scores—nonzero measures on an equal-interval scale, to choose one goal. But for now, let's assume that someday psychometricians find the Holy Grail of vertical scales (or maybe that would be a Holy Belay Line to climb down after scaling the...). Even waving away that problem, I'm still troubled by the almost gory use of statistical packages without some thoughts about the underlying models.

Even if one were interested largely in describing rather than modeling growth, you could start with nonparametric tools such as locally-weighted regression (or LOESS) and move on to functional data analysis. Those areas of statistics seem logical ways to approach the types of longitudinal analysis that the call for modeling growth seems to require.

Then there is demography. I'll admit I'm a bit partial to it (having a masters from Penn's demography group), but few education researchers have any formal training in a field whose model assumptions are closer to epidemiology and statistical engineering analysis than psychometrics. In demography, the basic conceptual apparatus revolves around analyzing the risk of events that a population is exposed to. The bread and butter of demography are births and deaths, or fertility and mortality. The fundamental measure is the event-occurrence rate, and the conceptual key to mathematical demography is the assumption that behind any living population is a corresponding stationary population equivalent, a hypothetical or synthetic cohort that one can conceive as exposed to the conditions in a population in a period of time rather than conditions a birth cohort experiences. It's as if you had a time machine at the end of Dec. 31, 1997, and a group of 1000 babies born all at the first instant of January 1, 1997, would be flipped back to the beginning of the year for all who survived to the end. It's an imaginary, lifelong version of Groundhog Day, but one with the happy consequence that the synthetic cohort would never hear of Monica Lewinsky. What happens to that synthetic cohort never happens to a real birth cohort, but it does capture the population characteristics of 1997. You can find the U.S. period life table for 1997 online in a PDF file, with absolutely no mention of Monica Lewinsky. (There is much I'm omitting in this description of a stationary population equivalent, I know!)

Demography offers a few aids to this business of modeling growth, because its bailiwick is looking at age-associated processes. Or, as a program officer for the National Institute on Aging explained at a conference session I attended a few weeks ago, aging is a lifelong process. Trite, I know, but it's something that the growth-modeling wannabes should learn from, for two reasons.

One is the equally obvious (almost Yogi Berra-esque) observation that as children grow older, their ages get bigger. Unfortunately, most school statistics are reported by administrative grade, not age, but this makes comparability on almost any subject (from graduation to achievement) virtually impossible. The only reputable source of national information about achievement that I'm aware of based on age, not grade, is the NAEP Long-Term Trends reports, pegged to 9-, 13-, and 17-year-olds tested in various years from 1971 to 2004. Some school statistics used to be reported by age—age-grade tables, which I'm finally figuring out how to use reasonably. But you could have some achievement testing conducted by age and ... well, enough of that rant.

The broader use of demography should be the set of perspectives and tools that demographers have developed for measuring and modeling lifelong processes. Social historians have an awkward term for this—life-course analysis. What changes and processes occur over one's life, and how do you analyze them? Some education researchers acknowledge at least a chunk of this perspective, most notably in the literature on retention, where you cannot take achievement in a specific grade's curriculum as evidence of the (in)effectiveness of retention in improving achievement. You can only find out the answer by looking at what happens to children as they grow older.

Some of the more sophisticated mathematical models of population processes have direct parallels in education that could be explored fruitfully. To take one example unrelated to achievement growth, parity progression (women's moves from having 0 children to 1 to 2 to ...) is an analog of progression through grades, and more could be done with using parity progression ratio estimates to see what happens with grade progression.

But, to growth... variable-rate demographic models hold some considerable promise at least in theory for analyzing changes from cross-sectional data. In the standard (multilevel model) view, you focus on longitudinal data and toss cross-sectional information, because (you think) that there is no way to separate out cohort from real growth effects. Aha! but here demography has an idea—stationary population equivalents—and a tool—variable-rate modeling. While the risk model of demography requires proportionate changes, natural logs, and e to the power of ... well, you get the idea, I'm going to provide a brief sketch and two possible directions. For more details, see Chapter 8 of Preston, Heuveline, and Guillot (2001). (And remember, we're magically waving away all psychometric concerns. We'll get back to that a bit later.)

We're going to consider the measured achievement of 10-year-olds in 2005 (on a theoretically perfect vertically-scaled instrument) in two different ways, one related to changes among 10-year-olds and a second way, in the experience of this cohort, and use that to relate observed information from two cross-sectional testing administrations to the underlying population dynamics (in this case, achievement growth through childhood).

First, let's compare the achievement of 10-year-olds in 2006 to 10-year-olds in 2005. It doesn't matter whose is better (or if they're equal). My son is now 10 years old (and will still be 10 for the next round of annual tests here in Florida), so let's suppose that the achievement of 10-year-olds in 2004 is higher than for 10-year-old students the year before. Then we could think of achievement as follows:
The achievement of 10-year-olds in 2006 = achievement of 10-year-olds in 2005 and some growth factor in achievement among 10-year-olds between 2005 and 2006
For now, it doesn't matter whether the and refers to an additive growth factor, a proportionate one, or some other function. And if the 10-year-olds in 2005 did better, the growth factor is negative, so it doesn't matter who did better.

Second, let's compare the achievement of 10-year-olds in 2006 to 9-year-olds in 2005 in a parallel way:

The achievement of 10-year-olds in 2006 = achievement of 9-year-olds in 2005 and some growth factor in achievement between the ages of 9 and 10 for 2005-06.
Note: this "growth factor" is part of the underlying population characteristic that we are interested in (implied growth in achievement between ages, across the ages of enrollment).

Now, let's combine the two statements into one:

the achievement of 10-year-olds in 2005 and some growth factor in achievement among 10-year-olds between 2005 and 2006 =
the achievement of 9-year-olds in 2005 and some growth factor in achievement between the ages of 9 and 10 for 2005-06.
Without assuming any specific function here, this statement explains the relationship between cross-sectional information across ages as one that combines changes within a single age (across the period) and changes across ages (within the period). Demographers' models of population numbers and mortality are proportional, so the and in both cases are multiplicative functions. But one could assume an additive function, also, or something else (a variety of functions), and the concept would still work. Once one estimates the changes within single years of age, one can then accumulate those differences and, within the model, estimate the underlying achievement growth between ages, which is the critical information of interest. When the interval between test administrations is equal to the interval between the ages (four years, for NAEP long-term trends), then the additive version with linear interpolation of age-specific change measures is identical to the change between 9-year-olds in 1980 and 13-year-olds in 1984, etc. But this method allows estimating those period-specific rates when the test dates aren't as convenient, and the exponential estimates are different.

Of course, this assumes perfect measurement, something that I'd be very cautious of, especially given the paucity of data sets apart from the NAEP long-term trends tables. I've played around with those, and the additive and proportionate models come up with virtually identical results with national totals, assuming linear change in the age-specific growth measures (since we only have measures for 9-, 13-, and 17-year-olds).

NAEPmath.gif
(Units for the vertical axis come from the NAEP scale.)

NAEPreading.gif
(Changing the interpolation of age-specific growth rates to a polynomial fit doesn't change the additive model much. It shrinks the estimates of growth in the exponential model a bit but doesn't change trends. And, yes, I'm aware of the label problem: arithmetic should be additive or linear.) Click on either graph to see a larger version.

There are odd results (does anyone know of reasons why the reading results were unusually high in 1992? are the results for 17-year-olds in 2004 unusually low for any reason? I was using the bridge results), and there are all sorts of caveats one should use for this type of analysis, from the complexity of estimating standard errors of derived data to changes in the administration for students with disabilities to the comparability of 2004 results and, oh, I'm sure there's more. The point is that demographic methods provides some feasible tools precisely for looking at age-related processes, if we'd only look.

References

Alicias, E. R. Jr. (2005). Toward an objective evaluation of teacher performance: The use of variance partitioning analysis, VPA. Education Policy Analysis Archives, 13(30).

Balou, D., Sanders, W., & Wright, P. (2004). Controlling for student background in value-added assessment of teachers. Journal of Educational and Behavioral Statistics, 29(1), 37–65.

Camilli, G. (1996). Standard errors in educational assessment: A policy analysis perspective. Education Policy Analysis Archives, 4(4).

Fisher, T. H. (1996, January). A review and analysis of the Tennessee Value-Added Assessment System. Part II. Nashville, TN: Comptroller of the Treasury.

Fuchs, L.S., Fuchs, D., Hamlett, C.L., Walz, L., & Germann, G. (1993). Formative evaluation of academic progress: how much growth can we expect? School Psychology Review, 22, 27-48.

Hochschild, J.L., & Scovronick, N.B. (2003). The American dream and the public schools. New York: Oxford University Press.

Kupermintz, H. (2003). Teacher effects and teacher effectiveness: A validity investigation of the Tennessee value-added assessment system. Educational Evaluation and Policy Analysis, 25(3), pp. 287 – 298.

Linn, R. L. (2004). Accountability models. In S. H. Fuhrman & R. F. Elmore (Eds.), Redesigning accountability systems for education (pp. 73–95). New York: Teachers College Press.

Preston, S.H., Heuveline, P., & Guillot, M. (2001). Demography: Measuring and modeling population processes. Malden, MA: Blackwell Publishers.

Sanders, W. L. & Horn, S. P. (1998). Research findings from the Tennessee value added assessment system (TVAAS): Implications for educational evaluation and research. Journal of Personnel Evaluation in Education, 12(3), 247–256.

Tekwe, C. D., Carter, L. R., Ma, C., Algina, J., Lucas, M. E., Roth, J., Ariet, M., Fisher, T., & Resnick, M. B. (2004). An empirical comparison of statistical models for value-added assessment of school performance. Journal of Educational and Behavioral Staistics, 29(1), 11–35.

Update! (12/2)

Today, the Financial Times is publishing an article on the UK system of league tables, and reporter Robert Matthews cites Harvey Goldstein extensively. Thanks to Crooked Timber for the tip.

Update (12/8)

I foolishly forgot to mention a 2004 RAND publication, Evaluating Value-Added Models for Teacher Accountability, which describes the limits of growth models for accountability. Thanks to UFT's Edwize blog for point it out (though I have a few bones to pick with the larger post—don't have enough time to right now...).

Update (12/13)

Andrew Rotherham discusses two technical issues with growth models (longitudinal databases and vertical scaling of measures), to his credit.