November 25, 2009

My phone number is more accurate than your research

One minor irritation while I've been an editor this past half-decade has been the occasional slight sign that a manuscript author dumped the entire output of SPSS or SAS into a table, complete with 10 digits of specificity. Ten digits! Yes, these folks who compute regression coefficients, p-values, and R2 to the tenth digit based on a sample of 157 individuals are wizards of the inferential algorithm. (My apologies for the sloppiness: 157.0000000 individuals.)

There's only one problem: my phone number is more accurate, having 11 digits.  Here it is in all its glory: 1-813-974-9482. (That's my office number, incidentally, and if you're tempted to call it, you will receive a very nice recording of me pointing out that my e-mail is the better method to reach me, especially the day before Thanksgiving, though I don't mention Thanksgiving in the recording because it would be obsolete for most of the year.) I know that most North American chauvinists would think of my phone number as having only ten digits, but it really has 11. In North America, you can omit it, but that's a printing convention that's local rather than universal. Henceforth I will print my office number in all of my statistically-oriented manuscripts. And since, as we all know, he who can compute social-science numbers to the mostest digits wins, I shall be crowned Social Science Heavyweight at the next World Social Sciency Championship.

You laugh at me? Okay, here's the test: remove the last digit and see if your results stand. I'll bet they do. Then remove the last digit from my number and try to reach my office line. Ha! In my phone number, the last digit matters. Not so in your statistics.

Think about the last time you scanned a table with the results of an inferential statistical procedure. What do the digits of inferential procedural results tell you? The sign and the first nonzero digit tells you the direction of the relationship, the order of magnitude, and a rough scale within that order of magnitude. What does the second nonzero digit tell you? Generally in education research, the first digit is important, the second is close to bullshit, and given the frailties of even well-designed research, the third is far beyond bullshit. I remember some years ago when I submitted a manuscript to a journal edited by Howard Wainer, and if I recall correctly one of his editorial remarks declining the manuscript was his stance that in general no statistic should be printed with more than two nonzero digits. 

That's not quite true with descriptive statistics tables, where the measures may change only in the third or fourth digit. But it is certainly true with a vast array of inferential statistical claims where authors simply don't think before dumping the results, and that extends beyond manuscripts to the printing of various official statistics. There is one purpose I can imagine to printing meaningless digits: to check for fraud. But maybe we can stop at the second or third digit, or just provide all the meaningless digits in a public archive?

Incidentally, as long as the table is formatted as a table (not with tabs, dear author), I can export to Excel and round to an appropriate number of digits. No big deal to me as an editor, which is why it's a minor irritation... just something that makes me a tad more skeptical of newly-submitted manuscripts. But the general use of Too Many Digits (TMD) is a debasement of the public use of statistics (not that it was ever that high to begin with).

Listen to this article
Posted in The academic life on November 25, 2009 10:20 AM |