A few of my favorite intro stat graphics

Posted: April 28th, 2011 | Author: | Filed under: Statistics | No Comments »

Intro stat is a tough class to teach. Most students don’t want to be there, are mathphobic, and/or are bad at math. I always tried to mix up the class with fun examples that students could relate to (ie about beer and frat parties) to keep it interesting. Here are a few of my favorite graphics from the visualization section of the class.

I should note that none of these are my original graphics. I have received them in emails from several people, so I have no idea how to correctly attribute them.

Defining statistical significance?

Posted: April 26th, 2011 | Author: | Filed under: Statistics | 4 Comments »

Just saw an amazing XKCD comic and it reminded me to write a post that I had been thinking about for awhile. First the comic:

XKCD Comic on Statistical Significance

This falls under the sad but true category. People never publish papers or write articles saying that nothing was statistically significant. That’s not interesting. If you fit a bunch of 95% confidence intervals, each has a 5% chance of being “significant” due to random chance. Go ahead, generate 100 pairs of random numbers and run a regression on each. A few of the parameters will be statistically significant even though we know they are not because they were generated randomly.

WIth that in mind, I’ll finally lay out the thesis of this post: confidence intervals are bullshit. This isn’t to say they aren’t useful tools, but the entire idea of boiling down statistical significance to a binary decision (in or out of the interval or equivalently a p-value larger or smaller than your alpha value) is naive. At this point I should note two things:

  1. The concept of statistical significance is incredibly nuanced and not well understood by most.
  2. If you don’t understand my point about p-values and confidence intervals being equivalent you need to retake intro stat (or hire me to consult for you).

Unfortunately, the phrase “statistically significant” was given a definition by lawyers years ago, which until April 2, 2011, was law. This is particularly scary because some very large fraction of lawyers have never taken (or passed) an intro stat course. Before April 2, the phrase implied statistical significance at the 95% level. The Supreme Court case Matrixx Initiatives v. Siracusano, 09-1156 overturned this silly convention. To understand why this is important lets first remember the definition of a 95% confidence interval (more than half of most intro stat students get this wrong on both the midterm and final). It is a procedure for estimating an interval for values of a parameter (ie proportion of people who will vote for Obama, average height of men, etc.), where at least 95% of the intervals generated will contain the true value of the parameter. It is not correct to say: “There is a 95% chance that my interval contains the true parameter.” Each interval either contains the true value or it doesn’t. You can’t be 95% pregnant, you either are or you are not.

A 95% CI implies a 20:1 ratio for false positives. This is entirely reasonable. The Supreme court simply ruled that if you estimate a 19.9:1 ratio for a false positive of a particular side effect (which is technically less than 20:1 but in practice the same), you can’t simply ignore that info or not report it to consumers of your product, shareholders, doctors, the FDA, etc. Under current law, if your drug has some horrible side effect (heart attack, stroke, death, etc.), it was okay not to report it given you were slightly less than this ratio. Any reasonable person understands this, and now the Supreme Court agrees. No idea if this will be followed/implemented in practice, but its at least a step in the right direction and a moral victory for statisticians everywhere.

For those curious, more background on the court case can be found in this WSJ article.

Coachella 2011

Posted: April 22nd, 2011 | Author: | Filed under: Uncategorized | No Comments »

I went to Coachella for the first time this year and it blew my mind. I’d try to describe it, but no description I heard comes nearly close to doing it justice. I’ll just say that if you love music, you should go. I saw the following bands:

Rural Alberta Advantage, Moving Units, Warpaint, Lauryn Hill, Interpol, Cut Copy, Beardyman, Trampled by Turtles, Freelance Whales, Foals, Two Door Cinema Club, Broken Social Scene, One Day as a Lion, Fedde Le Grand, Animal Collective, Arcade Fire, Good Old War, Eliza Doolittle, Menomena, Jacks Mannequin, City and Colour, Jimmy Eat World, Best Coast, The National

My top 5:

  1. Arcade Fire
  2. Two Door Cinema Club
  3. Good Old War
  4. Jimmy Eat World
  5. Freelance Whales

Honorable mentions: City and Colour, Interpol, and Trampled By Turtles. Animal Collective was the only band that I hated. They were horrible.

I watched 36 hours of music in 3 days, the temperature was > 100 each day, and I walked > 20 miles between the houses and stages. Coachella was exhausting, but exhilarating. I’ve already started a mental countdown until next year’s festival.

My new laptop

Posted: April 20th, 2011 | Author: | Filed under: Uncategorized | No Comments »

I finally bought a new personal laptop after having the same first generation Intel MacBookPro since early 2006. Technically it was still working in the sense that it still turned on, but only operated while plugged in, even with a new battery. It was time. Though I went back and forth deciding between a MacBook Air and the new MacBook Pro, I ultimately ended up with the 15″ model of the latter with the high-resolution anti-glare display, because:

  1. I didn’t want to worry about an external monitor and 13.3″ just didn’t seem big enough
  2. quad core i7 >> core 2 duo
  3. anti-glare had made a huge difference
  4. I can’t really tell the difference between a 3lb and 5.5lb laptop when its in my laptop bag

This laptop is FAST. Unfortunately, upgrading laptops is a huge pain. I made a checklist of content/programs to transfer or install so I figured I would share that as well:

  1. quicksilver
  2. R
  3. adjust terminal settings – homebrew (because I’m so 1337)
  4. texshop
  5. chrome
  6. numpy/scipy
  7. music, pictures, some old documents, but hopefully I’ll be moving most of this to the cloud soon

Am I missing anything? Any ideas for cool laptop decorations?

Installing numpy and scipy on Snow Leopard

Posted: April 20th, 2011 | Author: | Filed under: Uncategorized | No Comments »

If you want to install numpy and scipy on Snow Leopard, don’t mess around with checking out and building the standard packages yourself. This is frustrating and involves building gFortran and a few other things yourself. I messed around with this for about half an hour before discovering Chris Fonnesbeck’s Scipy Superpack for Mac OSX. He should be given a medal.