Convenient Definitions of Statistical Significance

Posted: July 9th, 2012 | Author: | Filed under: Statistics | No Comments »

This week there has been tremendous (even excessive) buzz in the popular media about the Higgs Boson (aka The G-d Particle). Though unofficially reported back in December, it was still an exciting day for the Physics community and general science nerds like me.

What struck me most about the article was the reason for the timing of the announcement. The two teams of 3,000+ scientists attempting to experimentally “prove” the existence of this particle decided to wait until, “the likelihood that their signal was a result of a chance fluctuation was less than one chance in 3.5 million.”

While this may seem a bit too cautious to some, people with a Ph D in Physics are generally smarter than either of us, so we should trust them on how to define the correct standard of proof, right? Okay, does one in 1 million sound more reasonable? One in 1000? The current accepted standard across essentially every other academic discipline, the US legal system, pharmaceutical companies, and the FDA is 1 in 20. I’ve ranted about the definition of statistical significance before, but think of it this way. If one in every 20 things your friend told you was wrong, would you trust them? If one in every 20 medications or vitamins you took had harmful and unreported side effects, would you still take them? If one in 20 natural gas pockets that are mined leak poison into your drinking water, would you let them drill in your backyard?

While these examples are obviously artificial, it begs the question: is the 1 in 20 standard stringent enough? If scientists use a one in 3.5 million standard to announce the existence of a particle, something which honestly doesn’t directly impact any of us at all, why are we using a 1 in 20 standard for deciding which drugs are safe, something which could sicken, injure, or kill us or our loved ones? I think thats one solid reason to revaluate, but here are several billion others: Diet Drug Wins FDA’s Approval. Pharmaceutical companies are heavily incentivized to prove that their drugs are safe, otherwise they lose millions in research and development. Even if you believe the 1 in 20 threshold is stringent enough, it needs to be properly followed, which I’m not convinced is always the case. Not only are many drugs later taken off of the market, but Pharmaceutical companies are routinely fined, for illegal marketing and false claims about their drugs. GlaxoSmithKline was just slapped with a several billion dollar fine, the largest of its kind.

There isn’t a magical threshold I’m suggesting for your statistical tests. In my own professional life and research, I tend to use 1 in 100 (p-value of .01), but it changes drastically based on the particular application. Keep the following things in mind when you are deciding your own statistical burden of proof:

  • What are the implications of a false positive? If its not a big deal, a lower threshold is probably fine (ie testing alternate website designs or fantasy sports predictions)
  • Consider the amount of data you have. Are you not seeing statistical significance because you don’t have much data yet or do you have tons of data and the correlation just doesn’t seem to be there? Plotting your significance level over time can help with this, but in general isn’t very precise. Does it look like its converging smoothly or is it “freaking out?”
  • Add a few random noise variables as a predictors. If they are more significant than the variable you care about, be skeptical.
  • Think about how many variables you have. The more you have, the more likely one will spuriously be “significant.” Use Bonferroni (though this is almost always too aggressive), Scheffe, or other intervals to adjust for the problem of “multiple comparisons.”

Happy testing!

Leave a Reply