July 30, 2010

You're telling me I can't teach everything I know in a semester?

I've been revising my plans for the upcoming fall undergrad history of ed class based on a bunch of things that have been percolating in my head this summer, including the need to recertify it for my university's gen-ed program (or at least apply for recertification), the Utah Tuning project in history and what we expect from undergraduates, some thoughts about formative assessment in history, and other items. As a result, I've tinkered with the major writing assignments and the exam item structures, linked some of the weekly work more tightly to a major writing assignment, changed how I address attendance, bit the bullet on students and laptops, and then realized I have 28 class sessions (T-Th class, so Veteran's Day and Thanksgiving are gone).

Hoo, boy, that's a limit. So I decided how I'd handle some of the longer primary-source documents, tied the shorter ones I wanted everyone to read to a calendar, and looked again. Still "hoo, boy." So I created a table to sketch out each day of class, the immediate upcoming assignments (i.e., how students might look at the near-term future for a class), and the theme of readings. In went the obvious topics I address every semester, at the logical locations. This morning, I did the rest: figure out in more detail than I ever have before what I could do with each class session--the stuff that will take up a whole class, the stuff that won't and what combinations would work where, the topics where simulation/debriefing would make sense, topics for "fishbowl" discussions, topics for certain types of structured activities, and one week where I'm not sure what I'm doing in detail but I know how to set up the intellectual puzzle.

Apart from that week in mid-October where I'm a bit at loose ends, there are still plans I need to make in the ordinary course of things: how to guide students for certain activities I haven't tried before, or haven't tried with a particular topic, how to set days up (with a motivating issue/puzzle, by foreshadowing something earlier in the semester, by tying it into a major course theme, by tying it to a major assignment, etc.), and so forth. Nonetheless, this is fairly detailed. I won't follow this plan to the letter, guaranteed, but this has been a very useful planning activity, in part to guarantee that as many loose ends are tied up as possible by the end of the semester. One of the conclusions I drew from reading the draft and final state social-studies standards a few years ago is that I share the "topic-a-week" symptom of my fellow historians: I'm competent at addressing the topic du jour, and I can tie things together impromptu when it's appropriate and obvious. But taking an intro/survey course and making those connections explicit, so that the intellectual core of a class is clear and the work for students is as easy as possible (and by this I mean easy to accomplish, not easy by lowering the bar)? I needed to carve out a few days and be selfish for that.

The reward for students should be a better course. The immediate reward for me: in addition to generally liking teaching this class, I've also got something very specific to be excited about for every class.

"Pushback" week

It's almost as if Nick Anderson and Ruth Marcus worked at the same paper, because "pushback" appears to be the talking point of the week on education policy. Yesterday, Anderson reports, President Obama "pushed back" against some civil-rights groups' criticism of Race to the Top, and Marcus applauded him when the president "took the opportunity to push back." Oh, wait: they do work for the same paper. Well, at least we know that at the Post, some colleagues talk with each other, unlike the one who fired Dave Weigel last month and the other who hired him this month. Then again, the fools at the Post, Inc., appear to be management and bull-male columnists, not rank-and-file reporters.

There are four major stories that dominated national education news in the past week, at least as far as I was paying attention:

  • The drama surrounding the civil-rights group report and non-presser and the two major education speeches this week by Duncan and Obama.
  • Continuing problems in trying to attach state aid to federal bills (after the emergency war appropriations, there's the inability to break the small business aid bill, which had jobs money attached).
  • Michelle Rhee's plans to fire several hundred teachers based on the IMPACT evaluation system.
  • The New York state testing cut-score embarrassment.

Pushback was used in the Post's coverage of the first story, but I think you can say it's a theme for the week. House and Senate members are now in almost open warfare over education jobs riders to bills (possibly extending to the FMAP aid to states on Medicaid, stuck in Congress since early this year). There is debate over how many teachers Rhee is firing and how bad a system IMPACT is. And Joel Klein is twisting himself in knots trying to explain how the mistakes in proficiency rates that he used to puff up his record really isn't a problem and, uh, Lady Gaga shows how good the New York City schools are. I'm half-expecting him to talk about New York's smog swampy beauty, the East River though, doesn't it split the Park Slope from the Palisades? Someone get Bill Shatner to read Joel Klein's ratiocinations!

Some things behind the headlines that seem obvious to this historian:

  • Part of the loose (and fragile) coalition criticizing the Obama administration's turnaround policy stems from unions concerned about due process for employers and community-based organizations worried about the closure of public facilities in poor neighborhoods and the role of public employment in providing a leg up to the middle class. That's not new, and it's complicated. The civil-rights group interest in public employees can be salutary (my understanding is that Black teachers were a solid core of local NAACP chapters in the mid-20th century) but sometimes at cross-purposes with other interests: I heard informally from some observers that part of the pushback against the decentralization of Chicago schools in the late 1980s was the role of the central school bureaucracy in providing a leg up into the middle class, and the reduction of the central bureaucracy threatened those positions. Today, the invisible risk is the position of minority teachers' aides and other non-certified employees. My guess is that they've been disproportionately affected by school-system layoffs that try to hold onto classroom teachers.
  • I still don't have a clue how much test scores played a role in the firing of DC teachers, and my guess is that you don't, either. IMPACT included test scores, but you'd have to look at the details of individual employees to know whether an individual firing is a case where all the indicators (including the required five observations) pointed in the direction of an incompetent teacher or whether test scores trumped supervisory judgment for any. Normally employers have broad discretion in evaluation systems, but the failure to bargain IMPACT may put the DCPS in some jeopardy of an unfair labor practice finding. (That depends on both the structure of DC collective-bargaining law and the details of what happened with IMPACT and WTU's requests for bargaining.) Double jeopardy for Michelle Rhee: the inclusion of the pseudoscientific "learning styles" in the IMPACT observation system. My guess is that the AFT (the national affiliate for the Washington Teachers Union) can quickly get their hands on well-known psychologists to rip that to shreds for any teachers where the tipping factor was a supervisor's judgment that they didn't cater to student "learning styles."
  • Joel Klein's dancing around the cut-score fiasco in New York illustrates once again that the performative setting of cut scores is often a result of the tension between bravado and "reform testosterone," on the one hand, and politically acceptable failure and the political need to game the system, on the other. We'd like to think that cut-score setting is arbitrary in the sense of arbitration, but it's too often arbitrary in the sense of caprice and politics. Two years ago, Jennifer Jennings and I wrote a commentary for Teachers College Record ($$ required) about the dangers of trusting threshold-based proficiency percentages as opposed to central tendencies such as means and medians, with New York City as the object lesson. She's too mature for this, but I have no such reticence with the last week's revelations: nyah nyah nyah, we told you so. And from those of us who warned years ago about the fragility of growth/value-added statistics? same message.

Bottom line here for administrators: test-based measures should only be used as a case to fire teachers or administrators where they strongly point in the same direction as observation-based evaluation instruments that are developed with some common sense, with unions and excising crap such as learning styles.

July 26, 2010

"Opportunity to learn" revived?

As Ed Week's Michele McNeil is reporting, a coalition of civil rights groups has issued a white paper today through a (new?) organization, the National Opportunity to Learn Campaign. Last night, Diane Ravitch was tweeting her reading of the paper as a gentle but firm rebuke of the Obama administration's approach to accountability. To some extent, I think she's right: the 17-page report briefly referred to the inappropriateness of judging schools and teachers primarily by test scores, but that was a brief reference.

For the longer and more committed passage criticizing policy prejudices towards school closures, I read the argument differently, because of the other arguments in the paper in favor of more money for early childhood education, wraparound care programs, and NCLB's public-school choice provisions and against budget cuts. And then there's the name that's a throwback to early-90s arguments in favor of opportunity to learn standards. To me, that all looks like a straightforward community-civil-rights approach more than an argument against high-stakes testing. In that context, the argument against school closure is an argument against withdrawing resources from a community institution that may be one of the few public facilities in a poor neighborhood.

That also fits with how the coalition's paper addresses Race to the Top: don't withhold resources or programs from poor children. Instead, combine formula grants with conditions. Notably, the paper states that a limited competition is acceptable, suggesting that the constituent organizations would not directly oppose Race to the Top as long as its structure does not permanently replace formula grants in ESEA. I know what others are going to say in response: we have plenty of conditions on federal funding, but the federal government almost never penalizes states for falling down on the job.

To a great extent, the politics of and posturing around education reform are all depressing to me: education reform policies are dwarfed by the state of the country's economy right now. In fact, that's a crucial part of the argument of the Broader, Bolder Approach. So you should maybe focus your efforts on the national economy right now? Or if not the national economy, maybe focusing on states, where the real action is going to happen over the next few years?

I think the coalition is moving about 15 months too late, if the key movers intended to shape federal policy. It's very likely that there won't be more RTTT, there won't be ESEA reauthorization, and there won't be a heck of a lot of things that should be happening from the perspectives of a variety of people on different sides of this debate. I wish I had been been wrong a month ago, but it looks more and more that I was right in predicting that David Obey's gambit last month was a stupid gamble instead. I was wrong in guessing that Obey would be frustrating George Miller, but I think I'm right on the general picture. To be clear, it's far from the biggest SNAFU of the Congressional session: that's the too-small size of the stimulus in early 2009 and the failure of the White House to nominate (or recess-appoint) enough Fed governors. But I'm still depressed, and puzzled by the strategic choices.

(One final puzzle is the group's website. The contact information is for the Schott Foundation in Massachusetts, which is consistent with the few blog entries (written by Michael Holzman) and the press-kit stuff. But there are no staff members or individuals listed on the website, just organizations. The whois entry for otlcampaign.org shows that the domain name has existed since sometime in 2009, but it's registered through a proxy, and the Internet Archive has no history of the website (blocked at the site). This is all perfectly legal, but it's odd.)  

The passive-aggressive student

I'm working this year with several thoughtful, independent-minded grad students right now, and this afternoon I realized that I've been quite lucky the last few years in terms of my experiences with grad students (and I hope the converse as well!). Since that hasn't always been the case, I thought I'd put down my thoughts while there wasn't anyone I was advising to whom this applied: how can an advisor explain what's not working for a grad student beyond low grades?

I've been thinking specifically about two former grad students on whose committees I sat (not as advisor). (Some details have been obscured to protect the guilty.) In one case, I had already been a bit concerned because in one of my classes, the student had turned in a paper that didn't meet my standards, and the student struggled to revise in response to quite specific feedback. So we get to comps: the student's comprehensive exam essays were largely unresponsive to the questions, though the questions were worded broadly enough to give the student a chance to show off what she or he knew about the field and how he or she could think either synthetically or critically. We (the committee) gave the student a chance to revise: the revision was nonresponsive to our concerns. A second revision?  We hemmed, hawed, and passed the student (barely). Those of you who know grad school can probably figure out what happened next: the student crapped out in the thesis phase.

The other student was someone who seemed to waver between wanting to dive into reading parts of the field she or he was deeply interested in, on the one hand, and skimming by with only the required readings of other classes/topics, on the other hand. I know of at least a few faculty who said, roughly, "If you want to be a faculty member someday (the person's stated goal), you'll need to be well-read in your field, broadly understood." Time comes for the comprehensive exam and the person's weaknesses shine through: the references are to a very small handful of readings rather than to broad areas of the literature. Because many of our college's comps are closed-book, limited-time (three days of exams, each with one question and a half-day to type an answer), I don't expect students to remember dozens and dozens of names. But maybe I have a reasonable expectation that a doctoral student will know and talk about more than three pieces of the literature per day???!!!

Okay, so these students were over the line in terms of not meeting expectations, but it wasn't entirely clear until comps that they'd screw up so badly. Ultimately, it was their choice to screw up, but I wonder if their choice might have been different with different faculty behavior. And I think part of my role as a faculty member is not only to set expectations but identify problems earlier than we sometimes do. That's NOT good for the student or the program. For a few years I've been thinking about it as a "damnit, listen to the faculty" issue, but that's not fair or appropriate for several reasons. What if a faculty member is totally nuts in advice? And what about conscientious dissent on a matter of intellectual controversy? So there has to be a different way of explaining where a grad student is off the tracks.

Today, I'm thinking of it as a matter of being passive-aggressive: saying "yeah, yeah, I'm listening," and then not changing behavior at all or responding substantively to feedback on papers. There are lots of reasons for grad students to be passive-aggressive, from the power dynamics at a university to our collective experiences with impersonal institutions and supervisors who are unreasonable/unreasoning. But it's generally dysfunctional both for students and for professional environments. I have no problems with students who disagree with me and revise papers to strengthen their arguments. But someone who tinkers here and there and doesn't respond substantively to feedback? That gives me a rending-garments feeling: this isn't what I put time into advising for.

So the next time I face passive-aggressive behavior from a grad student, I'm inclined to say something like the following:

The last time we talked, I gave you some specific advice. In what you sent me recently, I don't see evidence of your response to the advice. That means I don't see evidence either of your changing something to address my advice because you agreed with it or your working to demonstrate I'm wrong if you didn't. I have no problems if you can persuade me through your intellectual work that my advice was wrong, but there was a reason why I gave you advice. I hope I don't have to cite chapter and verse from those who study academics to persuade you that the way a community of scholars works best is by conversation. Being largely nonresponsive to feedback on an intellectual matter is a very effective way of telling me that you don't want to be in that community, if you had wanted to send that message. But if you don't want to send that message, you need to engage in the type of conversation that being a grad student requires.

I haven't wanted to say that for several years, and I hope I never again feel that I'm facing a passive-aggressive grad student. But my opportunities for advising doctoral students is relatively limited, and I may be off my rocker. For those who have advised far more grad students than I, does this make sense?

July 24, 2010

Firings in DC

Andy Rotherham is correct that the termination notices in the DC public schools this week included about a third of the total who had not met licensure standards, and a greater number were rated in the highest classification in the annual evaluations. Nonetheless, what is newsworthy about the terminations is the public nature of outright firing of a chunk of teachers for nonperformance. It wasn't the firing of a third of the district teachers, but significantly less than 10%. Let's assume a similar number of those given notice of "underperformance" this year either quit or are fired next year. That would be the firing of around 13-16% of the teachers for nonperformance in two years. It's noticeable.

By itself, the number is neither good nor bad, though many will argue the point either way without additional information. I say we wait. First, we wait for the Washington Teachers Union to sort through the information to see if any teachers were fired without the five classroom observations required for the evaluations. The grievance mechanism that exists in the union contract is on procedural grounds, and here we'll see how careful Rhee's bureaucrats have been. Then, we wait to see if there are any examples of firings that don't meet a basic smell test--anyone who had won teaching awards and plaudits but were given low ratings for reasons of favoritism or obviously inappropriate application of student test scores. Either procedural errors or plausible miscarriages of justice are reasonable grounds on which the union will fight for members and has an ethical obligation.

Nor is that willingness to fight for individual members inconsistent with a union's willingness to try different methods of evaluation. My chapter can and does file grievances when we think an individual's procedural rights were violated in the tenure review process. That says nothing about the standards of review. It says that we'll fight for the integrity of the review process.

July 23, 2010

A more realistic view of standards

This week I've been spending most of each day in a workshop on the Spanish Civil War for area history teachers. In it, teachers learn about the war in general and also the involvement of American volunteers for both medical services and fighting on the Republican side (what's now known as the Abraham Lincoln Brigades). We've given them a number of books and other resources. They've had a chance to hear from and ask questions of an author of several books on the war and the aftermath (Peter Carroll), read both books and a wide sampling of primary sources, and yesterday they visited Ybor City's Centro Asturiano and listened to some older Tampa residents who had both direct and vicarious experiences of the war (such as that of Aida Gonzalez). Today they worked on developing specific lessons or assignments based on what they've been learning, such as DBQ exercises for Advanced Placement classes.

For those who have run or participated in such summer workshops, this is probably familiar (with the exception of hearing from eyewitnesses or participants, in the case of workshops on the Civil War, ancient civilizations, and the like). We've had some wonderful classroom teachers as participants in the two years I've been involved, and they tell us they appreciate both the chance to learn about a subject in depth and our treatment of them as adults. I just get to tag along, except for the bit about standards. And since there's an ongoing discussion of whether the common core standards in math and reading adopted by a majority of states mean much, maybe a practical discussion might help.

I'm not a "social studies methods" specialist, but when we were planning last summer's workshop, I knew what was missing: a connection for teachers between what they were learning in the week and the new state social studies standards in high school. I think this is all that justified my presence in the workshop because when I write, "we were planning," I am using "we" in the social convention form, not in the "I earned a significant chunk of the plaudits" form. Most of the credit for this goes to Peter Carroll, the USF history chair Fraser Ottanelli, and a former area teacher who is an adjunct at USF, Robert Alicea. I see the beautiful plans (okay, they were somewhat fuzzy until things fell into place in a practical schedule) and think, "Ah! They're missing the help-the-teachers-with-bureaucracy part. I can do that." And occasionally chime in to expand discussion.

Keep in mind that the participants in the workshop were already high school history teachers, the vast majority with experience in AP classes or in an International Baccalaureate program. We don't need to tell them how to plan a year, and we'd have been a fifth wheel had we done so. Especially in AP courses (and most especially for the drink-from-the-firehose AP world history course), teachers have to manage the coverage issue very carefully, and in many cases teachers explicitly used the materials for 1-3 days last year. (One teacher regularly has after-school enrichment opportunities, where he walked students through the James Lardner papers as an extended exercise in primary sources that tell a story.) So why hand out standards lists?

Last year, there were two reasons for me to sort through the new standards, identify which ones were related to the Spanish Civil War, and then sort those by some obvious themes (the narrative within Spanish history, world context, American involvement, art and popular media use, and historical skills). First, the state had approved the standards in 2008, but there had been almost no professional development, and this was an opportunity to show teachers what they were written like in a context when it has some use and it's not just a verbiage dump on teachers. (Teachers will know what I mean by that.) Second and most immediately, I reorganized the benchmarks so that they would help teachers generate ideas for lessons, assignments, or other ways to use the materials. In reality, I suspect I didn't need to do that much, since the primary sources and talking with eyewitnesses to history are far better inspiration than standards. Third, showing teachers how to tie a specific lesson to official state standards lets them justify doing what they think is professionally appropriate. In a large high school, an assistant principal for curriculum isn't going to push anything like a pacing calendar on teachers in most subjects, but some of them will ask what standards are met by a lesson, assignment, unit plan, etc. Giving teachers standards gives them something to put at the top of their plans as an official stamp of approval on lessons. (Well, it does if the standards make sense: the benchmarks mentioning Franco, the lead-up to World War 2, the Spanish-American War,* or the social movements coming out of the Great Depression are going to make more sense here than a benchmark on early federal history.)

This year we have some middle-school teachers, something I didn't know until Monday. So I felt horribly guilty when I realized my organized handout for high-school teachers was useless for them, except as an illustration of what high school teachers would expect from students. On top of that, the middle-school curriculum is up in the air with a legislative mandate to teach civics in seventh grade. That doesn't change anything about the middle-school standards, because civics is always going to be somewhere in social-studies standards (and is prominent in the middle-school benchmarks). But it does mean that many districts don't yet know how they're going to organize the middle-school curriculum into specific courses, though the standards provide some clear direction and emphasis on history and civics (ancient civilizations in sixth grade, U.S. history through 1877 in eighth grade, and now obviously civics in seventh grade). So teachers who had been focusing quite a bit on geography? They'll have to retool, and for now a great deal of their own initiative may seem like a waste if they'll be moving in different directions in a year or two. Yes, I've gone through the middle-school standards this week and identified a few dozen with clear connections to the Spanish Civil War, ironically more in social sciences than in history because of the topics selection for middle-grades standards. (Example: map use. Military maps in the war, historical maps as secondary sources, socially generated maps such as the map of mass grave sites and other war-related sites in Spain.) But the dynamics of "the curriculum is up in the air" are still prominent.

This is a commonplace about life on the ground with curriculum. The abstract talk about standards and alignment ignores the multiple layers that shape the taught curriculum, from idiosyncratic course expectations (e.g., the more deterministic nature of AP classes) to legislative mandates, textbook choices, the item specifications on state assessments, and the program du jour of the district that gobbles up curriculum either directly (Hillsborough county bought into the Springboard program several years ago, a decision that diverts a day or three each quarter for its mandates, if within the curriculum) or by absorbing time (by adding a tangential curricular module such as anti-drug education and forcing administrators to stuff it in some class). Curriculum mandates and pressures metastasize.

As a result of these multiple mandates and pressures, I am less persuaded than others either by the argument in favor of a common core curriculum or the philosophical or political arguments against a common core curriculum. First, the idea that something is truly a "core" that will only take up a small part of the year is pure bunkum; given the other structures of school, anything called a "core" will inevitably become "pretty close to all." And even then, there will be much slop between the formal expectation and what happens in a classroom and also what's assessed. Yet I am also unpersuaded by the argument that teachers should not have a structured curriculum, or that somehow a set of curriculum standards is evil. As I've written before, the first round of state curriculum standards was generally awful, but I don't think you could have expected them to be good, so that doesn't tell you what standards might look like, and there are now some reasonable examples of the right balance between generality and specificity. (My historical cynicism is out in force this morning.) Yes, standards advocates make a weak argument with the international comparison rationale (the claim that our chief international competitors have national standards, so we must, too), but that's not the central argument for curriculum structure. The most important arguments for some curriculum structure are (a) requiring teachers to design curriculum from scratch is cruel and unusual punishment; and (b) there are some overlapping content areas that most students would find fairly practical to get under their belt.

Why do I believe that requiring teachers to design curriculum from scratch is abusive? I'm a Ph.D., with more specialized expertise than the bulk of the American population, and I would find it extraordinarily challenging to design all of my classes from scratch every semester. I don't; and I would view it as an exploitation of junior faculty to ask a new assistant professor at a research university to prepare an entire curriculum from scratch at the same time she or he has to gin up a research program. There's one college I know that talks about creating new courses on a regular basis, Evergreen State College, but even there the courses (or "programs," as they're called at Evergreen) regularly reappear so a faculty member isn't completely designing things from scratch. And most of the faculty there are veteran teachers, and the programs are commonly cotaught by at least two faculty. For K-12? Let's just say we're putting in a whole week so teachers can design a single lesson or assignment each (and then share the fruits of the work). It is one thing to point out that many veteran teachers can design a class; it is another thing entirely to suggest that all teachers have to.

In addition, there is a legitimate argument that some overlapping content is important for students. It is an easier argument to suggest common material for a field such as math or U.S. history than in areas such as world history or English literature, which is why I'm using the term overlapping. But the point still exists that high school students are meeting some common expectations when they can correctly manipulate an algebraic expression, explain how evolution complicates medical treatment, and talk intelligently about the historical struggles of Americans to get the country to fulfill its ideals. (And for those who are wondering why I think algebra is a sensible expectation, it's less important for someone to be able to solve word problems about westbound trains from Chicago than understanding what Paul Krugman means by lower bounds for effective interest-rate policy.)

As I stated above, I'm a bit cynical about structural school reform, and I do not believe there is One True Way of constructing standards. To take U.S. history as an example, generally most state standards (and the effort by Crabtree and Nash) use a fairly common approach to periodization and important questions, and one that starts from the centrality of the nation-state. One could imagine equally legitimate approaches that focus on international context, and you can find such syllabi for college classes. But you have to construct a course around something, preferably something coherent, and the most common approach is not evil just by its being common. The practical question is how much of an overlap we truly need, understanding that every time we say X needs to be a common part of the curriculum, we're squeezing out something else.

* The U.S. victory in the Spanish-American War eliminated the bulk of the remaining Spanish empire, leaving a social and structural problem for the Spanish army at the beginning of the 20th century, stuffed as it was with a high proportion of officers and a much shrunken set of territories to control.

July 16, 2010

Gates in Tampa ... no, my daughter's school!

Two chances in one week to provide personal perspective on Gates' philanthropy. Along with a few thousand other AFT delegates, I saw Gates's speech last Saturday. Today's comment comes via the Business Week article on the Gates Foundation's education program. The article is one of the better journalistic portraits of the foundation, including historical perspective by Maris Vinovskis and some technical perspectives from Howard Wainer and Daniel Koretz. And then in the second half, the article quotes some teachers such as JoAnn Parrino and Kathy Jones. I expected the article to quote either Hillsborough superintendent MaryEllen Elia or Hillsborough Classroom Teachers Association president Jean Clements, and then suddenly the focus was on some teachers at Chamberlain High School, where my daughter graduated in the spring. Yes, she had both Parrino and Jones, as well as a few others mentioned indirectly in the article as Daniel Golden followed Hillsborough's Gates project staff into a teacher meeting at the high school.

Both teach AP social studies courses, Parrino with human geography (taken by ninth graders in Chamberlain) and economics (I forget whether it's micro or macro). Jones teachers the world and European history classes. Both have their student admirers within the school. In the article, Parrino is quoted in favor of random classroom visits, and Jones on a different topic, whether there is such a thing as a year-over-year growth measure when the class is a one-year class such as a topical social studies class. And the music teachers apparently scoffed at the notion that their competence can be measured by student performance on an end-of-semester music theory class. Most of the teachers I've met at the school are reasonably thoughtful at the least, and the article begins to touch on their perspectives and skepticism.

What is notable is that none of the discussion Golden reports is the type of "we can't be expected to do great things with poor kids" excuse that's the common straw-man argument by advocates of high stakes testing. Jones is right to be skeptical that there is any competent value-added measure for history, and the band and chorus teachers are absolutely correct that a music-theory class is an awful measure of their competence. Want to know what a Florida band or orchestra or chorus director pushes their students to perform in? Music Performance Assessments, or MPAs. These are juried festivals of school groups, and teachers in Hillsborough take them very seriously. To use music-theory paper exams instead of MPAs is a pedagogical crime. Do you think the Hillsborough High School band director should be judged by how well my son and his fellow sax players know a Napoleonic 6th, or how well they can blend in a performance of "Take the A Train"?

At some point, advocates of using student outcomes as part of teacher evaluation need to get some sense about implementation. Hillsborough is clunking along right now, and it'll need to adjust things on that part of the evaluation system. The rigid "everyone has to be evaluated in the same way even if it makes no sense" system is not viable in the long term. But it's what the mantra of "50% must be on student outcomes" will lead to unless Charlie Barone and others come out in favor of common sense in the use of student outcomes, and that includes telling their friends when they're wrong in a formulaic approach.

July 14, 2010

Fat tails and audit trails in Florida test scores

I'm starting the day behind on a bunch of things, thanks to a week at the AFT convention in Seattle and the beauteous handling of bad weather by Delta. I arrived in Tampa about 23 hours after leaving Seattle, and let's leave it at that.

So I'm a bit behind on the background behind the evolving controversy over test scores in Florida. NCS Pearson was way, way late on releasing scores, and part of the reason was what Florida DOE officials called glitches in the demographic files Pearson had on students, or how test scores are tied to students and then teachers.

I have a sneaking suspicion that's also behind the controversy that's developing, as first the urban and then a bunch of other system superintendents complained that the proportion of elementary students not making adequate progress year-to-year just didn't fit with any sense of reality (on the low side). Head to the St Pete Times for the published stories and blog entries, including new complaints that the organization auditing Pearson's work is a subcontractor of Pearson, but here's the reason why I suspect the demographic files are a good starting point: Florida's "growth" measure is not the mean or median growth year-over-year on some vertical scale, nor is it a regression-based measure of deviation from some version of expected growth. Instead, it is a jerry-built dichotomous variable of whether an individual student made a particular growth benchmark in a year: yes/no.

It's been a few years since I looked at the details of this "growth" definition, but there's some inherent sensitivity in any measure based on thresholds to variability around the relevant threshold. In the case of Florida's growth measure, the vulnerability is going to be less around the construction of a particular scale at a point in an individual test because the measure depends on a student's prior-year score. So a psychometric vulnerability is going to be two sources: the general characteristics of tests in two years, and the added variability that you get from comparing scores in two years (there's measurement error in both scores, and the measurement error when you compare the scores is going to be greater than the measurement error in either base year or following year).

Since the two-year-variability issue has been a fact of life for this measure for a number of years, I would be surprised if that were the issue. So then the question is whether this year's fourth- or fifth-grade reading test scores have unusual distributions that would cause interesting problems at the thresholds for "making gains" for students who were low-performing in the prior year. A particularly fat tail at the low end might cause that, but that's speculation, and I suspect an obviously fat-tailed distribution would have been picked up by the main auditor, Buros.

But you can have a non-psychometric wrench in the works, because Florida's dichotomous variable is highly sensitive to one other matter: the correct matching of student test scores from year to year. If the student data files were messed up, and student scores from 2009 were matched to the incorrect student scores from 2010, you'd have all sorts of problems with growth. I strongly suspect that's what tipped off problems with the data files earlier in the spring. If the failures were general, you'd have a skewed distribution of the dichotomous growth variable as the lowest-performing students from 2009 would be the most likely to be matched (incorrectly) to higher scores in 2010 and vice versa, so the first clue would be markedly high growth indicators for 2009's low-performing students and markedly low growth indicators for 2009's high-performing students.

But that's not what school districts are reporting: they're reporting unusually low growth proportions for low-performing students from 2009. I can think of a few different ways you'd have that after Pearson tried to correct any obvious problems it saw earlier, but that's speculation. What needs to happen is an examination of the physical artifacts from this year for a sample of schools: the booklets, the student demographic sheets, and the score sheets. We're talking about more than a million students tested, but we can start with a sample of schools that the urban-system superintendents are worried about and track the data from beginning to end with a small enough set to see exactly what happened to the satisfaction of local school officials, policymakers, and the general public.

And if Pearson destroyed all physical artifacts so you can't trace the path of data? Cue "expensive lawyer" music...

July 12, 2010

Gates speech at AFT

Originally written Saturday, July 10: I've figured out how to hang this electronic device onto the back of the chair in front of me while my old PDA foldable keyboard is synced and sitting on my lap, so I can write this blog entry in the middle of the AFT session. AFL-CIO President Richard Trumka gave a spirited speech before lunch, and then the floor approved a resolution on teacher evaluation without amendment.

This afternoon, we started with resolutions on community support and career/technical education (CTE) programs. For the most part, the resolutions this afternoon were neither going to be the controversial resolutions nor the controversial part of the afternoon session, which was Bill Gates' appearance at the convention. Very popular was a resolution urging public meetings for the national commission on fiscal responsibility and reform and giving AFT an official position in favor of progressive effective tax policy instead of Social Security benefits cuts that are regressive. As I've written before, a number of people simultaneously want policies that would end in significant layoffs of teachers over 50 and also significantly reduce pension benefits and contributions to public-employee pensions. Evidently, there is some group of self-defined reformers who are in fear that somewhere, someone is enjoying a retirement free from fear of destitution.

The Gates appearance started at 4:15. From what a colleague told me later, he helicoptered over from his island estate. Randi Weingarten at first started speaking from the sheet announcing Innovation Fund awardees and then turned to introducing Gates. She took care to quote from Gates's annual letter at points where he specified opposition to solitary use of test scores to evaluate teachers and supported evaluation as a tool to help most teachers. With a smattering of boos, Weingarten smiled and said, "I thought you guys were leaving," referring to the threats of a boycott by the small dissenting caucus By Any Means Necessary (BAMN). The majority of delegates roared. Later, there were about 25 delegates out of several thousand present who walked out as Gates stood at the podium. So much for the huge boycott of Gates's speech...

Gates started by publicly congratulating AFT for the approval of the resolution on teacher evaluation/development and on steps taken thus far, including the AFT locals who are working with the Gates Foundation on specific programs. He mixed in some misleading statements about "declining" graduation rates (as opposed to stagnation) with some fair statements and a clear statement that teachers must be included in reform. He spent a few moments discussing the failed small-schools initiative. The greatest applause lines came when Gates criticized the existing record of poor administrators' evaluations and when he acknowledged that people who have never taught in a classroom do not understand how difficult teaching can be.

The BAMN protesters then had pretty awful timing, coming back towards the hall shouting protests ... just as Gates said some teachers have challenges with students who are bored or engage in disruptive behavior. The hall erupted in laughter at the irony.

Gates's weakest argument was the individual teacher equivalent of effective-schools rhetoric: see what teachers do when students demonstrate great achievement. It's a high-risk claim, to assert that the development of a teacher evaluation system can also document which a priori behaviors are best. What may be easier is the collection of videos of different teachers, with a broad enough sample that some will turn out to be great teachers. Gates also highlighted two project districts in AFT: Hillsborough, Florida, and Pittsburgh, Pennsylvania. As is common with description of risky projects in early days, the rhetoric was a bit breathless, and I could hear a few oohs and boos in the audience when he mentioned merit pay, Race to the Top, and tying tenure to student achievement.

Gates ended with the obligatory reference to Al Shanker and the need for teacher voice in reform. "Don't give it back, take the risk, and keep it up." "No other union is doing what you are to make this [reform] happen."

Additional thoughts a few days later: Gates got some personal mileage by appearing at AFT. He spoke with a few reporters afterwards, and his appearance generated some newspaper stories at the St. Pete Times and Washington Post that were more about the Gates Foundation than the AFT convention. At AFT, I don't think delegates had their minds changed much by Gates, since they were likely to be aware of what he's done and where he agrees and disagrees with them.

Gates's rhetoric is compartmentalized. In a good part of what he said, teachers were at the center of what he describes as reform, including teacher evaluation. But then the sore-thumb statement popped out about tying due-process protections to student test scores, unmediated by professional judgment. It's as if there's a switch inside his head, where he can talk either about test scores or about better evaluation of teacher practice. Reform rhetoric as a quantum effect? I don't know. But it's poor strategizing and a poor contribution to discussion. One of the wealthiest men in the world should be able to be more sophisticated.

Brief note justifying the lack of productivity this week

I have gotten less done this week in Seattle than I expected. I did the right thing and walked around the city as much as I could when not in the AFT convention's business meetings. Since Seattle has real hills, this is healthy exercise, and I used the Pike Place Market six blocks from the convention center as an excuse to get some exercise every day. But that means that I spent at least an hour or two walking every day instead of reading, blogging, etc. I also spent a few hours last night taking a bus to the Greenwood area to watch a coffeeshop concert some acquaintances were performing in.

My shins and calves are telling me exactly how many hills I've walked up and down, and they've been doing so every night. Again, this is good. It is also absolutely exhausting. So my thoughts on Bill Gates's speech yesterday will not be posted tonight. And I'll be spending some time tomorrow reading a doctoral student's proposal rather than having it read already. And so on and so forth. I have to figure out how to handle the jet lag when I get home, but I did not eat convention-hotel food, nor was I sedentary.

July 10, 2010

Convention dynamics

This is my second AFT convention. I observed the 2006 NEA representative assembly as a microphone volunteer, and I've participated in a number of other organizational meetings of various sorts over the years. With both the NEA and AFT, there's an internal meta-procedural discussion about the openness of the meeting itself, and that happened in today's business sessions with complaints ranging from inadequate discussion on an item (and specifically, a request for more discussion that did not point any fingers) to assertions that people were engaging in tactics to control debate (and alleging poor intent as well as procedural abuse). I've read and heard different allegations about business meetings of the NEA Representative Assembly, the Modern Language Association, the American Historical Association, etc., and maybe it's time for perspective on organizational politics.

First, most organizations have structured discussions. Sometimes they're structured intentionally, as when there are formal rules, either all written out or with a baseline set of rules (commonly Roberts) and organization-specific variations. Sometimes the structures are more cultural than formal, as with consensus in Quaker circles. But you can't have an unstructured discussion of organizational policy and direction, even if you figure out what it is after the fact. And sometimes people gripe about the specific structures, even if the nature of a particular structure is a matter of an arbitrary choice that you have to take rather than the One True System for running organizations.

Picky example: who gets to speak on the floor of the AFT and NEA, and how someone is recognized. In AFT, the presiding officer rotates recognition among a set of numbered microphones. Whoever is at the front of the line for the microphone next in line is recognized. In NEA, people who wish to speak go to a microphone, complete a slip that the microphone volunteer calls in to a set of staff on the dais, and the slips recorded at the dais are passed to the presiding officer, who calls on the next person eligible in the NEA's rules rotation (giving immediate preferences to points of information or order and afterwards alternating perspectives). If you have a large enough group of people, you can greatly influence speaking (i.e., "control the mics") in either organization.

All organizations have filters on the types of proposals that are considered seriously. Sometimes the filters are at the bare-minimum "if this doesn't require us to act illegally" sort, sometimes at the "we'll have a mechanism to educate the body on what this entails," and sometimes at the "we have an explicit filter mechanism" level. My friends in MLA report that it's fairly close to the minimal filter level. My experience is that NEA has the "education filter" mechanism, which comes generally through recommendations of the executive committee and each state affiliate and this year also operated through the estimated cost figures that the resolutions committee attached to each new business item (and almost anything with a price greater than $1000 lost, from what I heard). AFT uses committees that meet the first day of the convention as a formal filter mechanism. If you're not lucky or not part of a group with a well-executed mic tactic, your best chance for influencing resolutions is before floor action, in the NEA by joining the resolutions committee or by participating actively in a state caucus, in the AFT by participating actively in one of the convention committees.

All organizations rely on the parliamentary/procedural skills of the presiding officer to let most of the time focus on substantive discussion. When people lose on a substantive issue, sometimes they want to use procedural error as a trump card, and a good presiding officer persuades the vast majority of participants that they've had their say even if they didn't get their way. That doesn't hold if there truly are internal provocateurs who have no intention of losing gracefully, but if sane people are grumbling, there is some skill missing. Reg Weaver ran the NEA RA with a great deal of humor. I haven't seen Dennis Van Roekel in action on that front (I'm not an NEA delegate for my local for this biennium), and I've heard conflicting stories about his skill as a parliamentarian. I suspect there's a backstory behind Randi Weingarten's use of a professional parliamentarian this year, and I think she's exercised some good judgment in spots and made some choices I would not have in others.

There are three magic phrases I try to keep in the back of my head when I'm running a meeting: "If there are no objections," "Let me suggest what you can do within the rules," and "I'm terribly sorry I have to stop you/ask you to hold on for a second." The first is to accomplish something probably noncontroversial--my most common use is when someone suggests "a friendly amendment," which doesn't exist in Robert's Rules, and I have to say, "Well, there's nothing called a friendly amendment in this set of rules, but does anyone object to the amendment that X proposed?" The second phrase lets someone know that the specific mechanism they want to use is inappropriate but that there's a way for them to make a proposal if they take the chair's advice. The third is the hardest to use because it really does require interruption. I'm awful at interrupting even the most tangential comments, because many of our parents raised us with the standard that interruption is rude, and what harm can a few seconds of diversion do? It's an awful choice. When more than one microphone is live, the interruption of a speaker is not only very vivid, but if the speaker refuses to let the presiding officer explain the interruption, you get more time absorbed. There were several times I thought Randi should have interrupted a speaker, but that's an after-the-fact judgment, and I've also seen hard feelings at other meetings come from pretty firm interruptions that the membership clearly thought were too rough. It's a hard line to walk.

Or maybe the floor sergeants at arms need to have some yellow cards and red cards, and if you're issued one red or two yellows, you're suspended from floor participation for a day...

Update: In his Twitter stream, Mike Klonsky claimed that AFT was a "tightly-controlled convention" because not all resolutions were debated on the floor. Aaagh. This is precisely the type of misunderstanding that people have when they view their experience in one organization as setting the norm for other organizations. NEA lets delegates send any new business item to the floor with fifty signatures out of 10,000 delegates at the convention. AFT requires all resolutions be filed by a certain date several months before the meeting and then go through an assigned committee on the first day of the convention. Every delegate is assigned to a committee, and while I don't know how many delegates get their first choice of a committee, I have both times. The AFT's rules and the limited floor time means that not every resolution is heard on the floor. That doesn't mean that AFT is "tightly controlled" any more than NEA's rules require anarchy on the floor. If you persuade the majority of delegates at an AFT convention committee to recommend approval of a resolution and that the resolution should be a high priority, it comes to the floor.

When a resolution is not debated on the floor, it means you didn't persuade the people in the room to send it to the floor. That means you lost the parliamentary debate. In AFT, delegates don't have the right to debate every motion. Those are the rules currently operating in AFT. In NEA, you get to debate almost every motion. Just because you think the California delegation is hogging the floor doesn't mean you get to control debate, it means you didn't get yourself and others organized to file new business items before resolution sponsors from California. If you get yourself organized first, you get your new business items heard first. Those are the rules currently operating in NEA.

To those who don't like the AFT rules or don't like the NEA rules, all I can say is quit your bellyaching and persuade the majority of delegates that you're right. You want to demonstrate the susceptibility of NEA rules to silliness? Fine: get enough delegates together next year to sponsor 200 new business items filed on the first day. I bet that forces the organization to change the rules. You want to demonstrate the problems of limited debate in AFT? Fine: persuade your local or state delegation to submit so many fabulous resolutions in one area typically debated first that the convention committee that is the obvious place for all of the resolutions finishes later on the first day of the 2012 meeting than the start of the Progressive Caucus meeting that day. Incidentally, you're not going to persuade me to help you with either strategy; I'm not a believer in the One True Parliamentary Rules. I'm just pointing out that there are ways of making the point about the structure of the rules in a way that follows the rules and makes your fellow delegates see your point. Those who gripe about the putatively "tightly-controlled" AFT floor debates or the "anarchic" NEA just prefer the organizations they understand. Fine. But stop assuming that your experience is the One True Way.

July 3, 2010

Happy George day!

Today is the Day Between for those of us who live in the U.S., though I suspect for many it's more of a Carsickness holiday, or a How in the Heck Do I Light the Charcoal Again? day (though sometimes people delay that joy until the next day). July 3 is the Day Between, the day between July 2, the day the Continental Congress formally approved a resolution of independence, and July 4, its approval of the Declaration of Independence. Yeah, I know: Lexington and Concord were in April 1775, and the formalities came a year later, with the articulated argument last of all. What can I say? Maybe we're just a nation of shoot first, answer questions later.

This year, July 3 falls on a weekend here in North America. Before you let the day go by with preparations for tomorrow, give a thought to two important Georges in history and their actions associated with July 3. On July 3, 1775, George Washington took command of the Continental Army. And on July 3, 1863, Union forces under George Mead's command destroyed Pickett's Charge and ended the battle of Gettysburg.

In the case of Washington's command, he had been appointed by the Continental Congress several weeks before, in the middle of June, and it took that time for him to travel to the main rebels' army in New England. Planning a war in the 18th century was a plodding affair. The battles of Lexington and Concord had been in April, and the Continental Congress started meeting in May. The first British reinforcements arrived in late May, and the next clash was June 17, at Breed's Hill (now called the battle of Bunker Hill, I think because that was the frist target of orders to help with the siege of Boston). So the first major battle of the war (and with surprisingly high British casualties) happened just the day after Washington accepted the commission of the Congress.

Meade's command of troops in July 1863 was almost accidental, since he was a replacement for Joseph Hooker and notified of his new appointment in late June, just a few days before Gettysburg. I'm not a military historian, so I'll let others judge Meade's command at Gettysburg in the context of his earlier commands of smaller groups and his post-Gettysburg career. But at least in the first week of his command of the Army of the Potomac, Mead helped save a nation.

