Just saw an amazing XKCD comic and it reminded me to write a post that I had been thinking about for awhile. First the comic:
This falls under the sad but true category. People never publish papers or write articles saying that nothing was statistically significant. That’s not interesting. If you fit a bunch of 95% confidence intervals, each has a 5% chance of being “significant” due to random chance. Go ahead, generate 100 pairs of random numbers and run a regression on each. A few of the parameters will be statistically significant even though we know they are not because they were generated randomly.
WIth that in mind, I’ll finally lay out the thesis of this post: confidence intervals are bullshit. This isn’t to say they aren’t useful tools, but the entire idea of boiling down statistical significance to a binary decision (in or out of the interval or equivalently a p-value larger or smaller than your alpha value) is naive. At this point I should note two things:
- The concept of statistical significance is incredibly nuanced and not well understood by most.
- If you don’t understand my point about p-values and confidence intervals being equivalent you need to retake intro stat (or hire me to consult for you).
Unfortunately, the phrase “statistically significant” was given a definition by lawyers years ago, which until April 2, 2011, was law. This is particularly scary because some very large fraction of lawyers have never taken (or passed) an intro stat course. Before April 2, the phrase implied statistical significance at the 95% level. The Supreme Court case Matrixx Initiatives v. Siracusano, 09-1156 overturned this silly convention. To understand why this is important lets first remember the definition of a 95% confidence interval (more than half of most intro stat students get this wrong on both the midterm and final). It is a procedure for estimating an interval for values of a parameter (ie proportion of people who will vote for Obama, average height of men, etc.), where at least 95% of the intervals generated will contain the true value of the parameter. It is not correct to say: “There is a 95% chance that my interval contains the true parameter.” Each interval either contains the true value or it doesn’t. You can’t be 95% pregnant, you either are or you are not.
A 95% CI implies a 20:1 ratio for false positives. This is entirely reasonable. The Supreme court simply ruled that if you estimate a 19.9:1 ratio for a false positive of a particular side effect (which is technically less than 20:1 but in practice the same), you can’t simply ignore that info or not report it to consumers of your product, shareholders, doctors, the FDA, etc. Under current law, if your drug has some horrible side effect (heart attack, stroke, death, etc.), it was okay not to report it given you were slightly less than this ratio. Any reasonable person understands this, and now the Supreme Court agrees. No idea if this will be followed/implemented in practice, but its at least a step in the right direction and a moral victory for statisticians everywhere.
For those curious, more background on the court case can be found in this WSJ article.