October 6, 2006

Statistical magic and record linkage

Highly recommended link on a way-cool statistical technique in record linkage: The Bristol Observatory, where Steven Banks and John Pandiani have developed probabilistic population estimation, using two data sets with just birthdates. It's not really magic but relies on a classic puzzle in probability (and an elementary one to solve, apparently).

Banks and Pandiani developed this technique to solve a serious evaluation problem with mental health programs: how do you identify who used two services, or showed up in two different places, if the two agencies cannot reveal personally-identifiable information for privacy reasons? 

They went around that problem to rephrase it:  the operative question for program evaluation is not who shows up in two places but how many. The first requires invading privacy to some extent. The second, not at all.  Their technique requires information only about birthdates and such other nonidentifiable information as would allow them to subdivide a population for greater accuracy, but no names, addresses, phone numbers, or Social Security numbers. They don't even need to know the unduplicated birthdates. It also bypasses all the attendant problems of keeping separate databases up-to-date.

Is this applicable to education research and my own work? Well, suppose you want to know if a specific intervention leads kids to graduate from high school, but the local school district (or some relevant agency) won't release identifiable information.  All you need is the birthdates, sex, and maybe ethnicity of those who graduate from the district (though since ethnicity is more malleable than sex, that's a problem), and you can estimate the numbers of graduates who also came from your participants (or a segment of your participants).

Banks and Pandiani have patented their work, so someone wanting use this specific procedure needs to work with them, but there is another technique that is similar and publicly usable. I'll post on that one after I've had a chance to absorb it.  (I have a demography masters and can read statistical explanations, but sometimes I need more time to absorb it.)

But definitely go to Banks and Pandiani's website.  And check out the video, which explains the principles of their technique!

Posted in Accountability Frankenstein on October 6, 2006 4:32 PM