March 17, 2008

Grad rates of the fanciful and research kind

While Erin Dillon and Inside Higher Ed are creating mash-ups of the NCAA hoops brackets and college graduation rates, I'm working on reframing the CPI when I can't handle fighting fires any more, except this evening I'm working with the 2900 counties in the U.S. with upper-grades enrollments reported for 2001-02 through 2005-06 and regular diplomas reported for 2002-03 through 2004-05 (reported with the 2003-04 through 2005-06 enrollment data, thanks to CCD lag). That omits 249 counties (about 8%), including all of Alabama, Hawaii, New York, and Wisconsin, whose county reporting for the Common Core of Data had a gap for at least one year and variable.

predicted v estimate Swanson counties.JPG

r2 = .84 for both the log equation (not shown) and the transformation back to predicted vs. estimated smoothed CPIs. Ignore that incredible outlier in the upper right-hand corner. I'll hunt down that outlier, remove it, and redo, and it'll end up almost the same except for a slightly lower r2 and a more evenly-distributed graph. Two points here:

  • Again, this is not to suggest one can accurately estimate graduation rates from the cross-sectional formula I'm working with. Rather, the biases one should be worried about with the CPI are mostly captured in any biases that show up in the cross-sectional data.
  • This graph was produced by the open-source R Project for Statistical Computing package. Despite what others have reported, the learning curve for R is not that steep. If you have to pay for SAS or SPSS, use moderate sized data sets, and expect to work in quantitative fields for the next few decades, it'll probably pay off in the long term to switch now.
Listen to this article
Posted in on March 17, 2008 8:21 PM |