June 21, 2006

Ed Week grad rates: GIGO for Detroit

Paul Gazzerro of S&P's School Matters data compilation service thought I was wrong in asserting that the major problem with the Ed Week Detroit graduation "rate" for 2003 was not accounting for migration. True, I used all the enrollment data from all of PK-12, not just high school,* but Gazzerro then made the error of assuming that Swanson was using the 2001-02 and 2002-03 sets from the US DOE's Common Core of Data. He wasn't. He was using 2002-03 and 2003-04. But I'm glad Gazzerro pushed me to look at the CCD Detroit data, because it shows exactly how bollixed up the Swanson method can be. Added Thursday: the main issue here is the original data. It may not be a true test of the algorithm, because the data from schools can be so unreliable. See below on procedural issues.

The Swanson formula has the following in the numerator:

Diploma (end of year 1) * 12th grade enrollment * 11th * 10th (all except diplomas from fall of yr 2).

In the denominator is the following:

12th enrollment (fall of year 1) * 11th * 10th * 9th (all from the fall of year 1).

So for Detroit for the 2003 Swanson CPI, year 1 is 2002-03 and year 2 is 2003-04, and here are the details:

Numerator: 5,975 * 5,244 * 7,421 * 9,899 = 2,301,729,842,459,100
Denominator: 6,020 * 7,795 * 11,275 * 20,025 = 10,595,017,688,062,500

For Detroit the prior year, year 1 is 2001-02 and year 2 is 2002-03, and here are the details:

Numerator: 5540 * 6020 * 7795 * 11275 = 2,931,155,954,650,000
Denominator: 4618 * 6355 * 9291 * 14494 = 3,952,029,707,502,060

The ratio is 74.2%. How in the heck could Detroit go from 74.2% graduation to 21.7% graduation in a single year, in reality? In reality, Detroit reported a bulge in enrollment at all high-school grades in 2002-03, and the 22% rate is an artifact of that. I don't know if the bulge came from some amazing (and unbelievable) transient surge in population or just lousy record-keeping by Detroit or the state of Michigan. Update: Okay. I'm fairly certain it's bad data.

But I'll repeat: the Detroit data is useless, and I'm surprised no one in Swanson's shop even took the basic step of looking at the prior year's CPIs to see if, maybe, possibly, there might be some unreliable instability in the numbers. Added Thursday: Part of this problem is from the nature of the Common Core of Data (CCD) as an unaudited database, or rather one that is the responsibility of each state to correct its own figures. Bad record-keeping by a state = bad data. Recent years of data (including 2002-03, with the should-be-infamous Detroit enrollment) are explicitly noted as preliminary by the CCD, but those are the figures that everyone uses to update CCD-based measures. In sociology and demography, you always look at time series of the raw numbers and assume that you need to smooth the data to some extent, or you're likely to be tripped up by the vagaries of administrative record-keeping problems. (Age-heaping, for example, is the phenomenon of people rounding their ages to the nearest 5 in areas without birth registration systems or cultural celebrations of birthdays.) Big lesson here: be wary of CCD figures. My instinct was to pile on here about the failure to accommodate migration, but I was wrong. Yes, I think there's still a problem not adjusting for migration (and Larry Mishel and Joydeep Roy would point to the 9th-grade enrollment figures as a problem), but I may have caught the problem with Detroit because I was sensitive to the implications there. In reality, it's a problem of bad data.

Update (2:30 pm Thursday): I just received an e-mail from Chris Swanson: "We're taking a closer look at Detroit and a couple other places. One of the things I would like to build into our online database is a set of flags or notes to call attention to situations like this." Good.

Update (2:50 pm Thursday): Thanks to an informant with information about Michigan, it turns out that 2002-03 was the first year of a new data-collection system, and the CCD data are a bit different than the corrected figures released in 2005 for that year:

GradeEnrollment reported to CCDEnrollment reported in 2005

On the other hand, that correction only raises the 2003 CPI to 28.2% and lowers the 2002 CPI to 62.2%. There's still stuff wrong with the data. (Given that this is Detroit, as one correspondent put it, the newly-installed school CEO in 2002-03 may have resulted in exaggerated pupil counts.)

* — It's quite true that migration rates vary by age, and one would not want to use elementary-aged data and extrapolate to adolescents without a huge caveat. On the other hand, there is no way to separate attrition and returns from transfers out and in at the high school ages without an audit of the records. In the case of Detroit, I suspect we just have bad, unaudited data, not a migration issue. But this provides a pretty good example of how sensitive these formulas can be to misstatements of migration.

