# Defining statistical significance?

**Posted:**April 26th, 2011 |

**Author:**Alex Braunstein |

**Filed under:**Statistics | 4 Comments »

Just saw an amazing XKCD comic and it reminded me to write a post that I had been thinking about for awhile. First the comic:

This falls under the sad but true category. People never publish papers or write articles saying that nothing was statistically significant. That’s not interesting. If you fit a bunch of 95% confidence intervals, each has a 5% chance of being “significant” due to random chance. Go ahead, generate 100 pairs of random numbers and run a regression on each. A few of the parameters will be statistically significant even though we know they are not because they were generated randomly.

WIth that in mind, I’ll finally lay out the thesis of this post: confidence intervals are bullshit. This isn’t to say they aren’t useful tools, but the entire idea of boiling down statistical significance to a binary decision (in or out of the interval or equivalently a p-value larger or smaller than your alpha value) is naive. At this point I should note two things:

- The concept of statistical significance is incredibly nuanced and not well understood by most.
- If you don’t understand my point about p-values and confidence intervals being equivalent you need to retake intro stat (or hire me to consult for you).

Unfortunately, the phrase “statistically significant” was given a definition by lawyers years ago, which until April 2, 2011, was law. This is particularly scary because some very large fraction of lawyers have never taken (or passed) an intro stat course. Before April 2, the phrase implied statistical significance at the 95% level. The Supreme Court case Matrixx Initiatives v. Siracusano, 09-1156 overturned this silly convention. To understand why this is important lets first remember the definition of a 95% confidence interval (more than half of most intro stat students get this wrong on both the midterm and final). It is a procedure for estimating an interval for values of a parameter (ie proportion of people who will vote for Obama, average height of men, etc.), where at least 95% of the intervals generated will contain the true value of the parameter. It is not correct to say: “There is a 95% chance that my interval contains the true parameter.” Each interval either contains the true value or it doesn’t. You can’t be 95% pregnant, you either are or you are not.

A 95% CI implies a 20:1 ratio for false positives. This is entirely reasonable. The Supreme court simply ruled that if you estimate a 19.9:1 ratio for a false positive of a particular side effect (which is technically less than 20:1 but in practice the same), you can’t simply ignore that info or not report it to consumers of your product, shareholders, doctors, the FDA, etc. Under current law, if your drug has some horrible side effect (heart attack, stroke, death, etc.), it was okay not to report it given you were slightly less than this ratio. Any reasonable person understands this, and now the Supreme Court agrees. No idea if this will be followed/implemented in practice, but its at least a step in the right direction and a moral victory for statisticians everywhere.

For those curious, more background on the court case can be found in this WSJ article.

I’m surprised you don’t advocate a Bayesian viewpoint here. That’s basically what people are slipping into when they misinterpret confidence intervals.

The fact that there is a definite true answer but one that’s hidden from you seems to drive home the point that the uncertainty is due to gaps in your knowledge.

That legal business is weird — good to know about, thanks — but it seems like the abuse of 0.05 alpha is basically universal.

My cousin recommended this blog and she was totally right keep up the fantastic work!

Like David, I’m surprised you didn’t mention the Bayesian perspective here. Given the title of your dissertation, I would expect you not to take the frequentist hard line.

You said: “It is not correct to say: ‘There is a 95% chance that my interval contains the true parameter.’ Each interval either contains the true value or it doesnâ€™t.”

What if I generate millions of confidence intervals based on independent samples of data and then give you one of the CIs at random? Surely then you’d say that there is a 95% chance that your interval contains the true parameter. That’s essentially what we’re dealing with when we look at all of science and trying to determine which results are real.

Also, I think the point of the xkcd comic was to point out the need for doing meta-analyses and using corrections for multiple tests, such as the Bonferroni correction.

[…] discipline, the US legal system, pharmaceutical companies, and the FDA is 1 in 20. I’ve ranted about the definition of statistical significance before, but think of it this way. If one in every 20 things your friend told you was wrong, would […]