From Karen to Katie: Using Baby Names to Study Cultural Evolution

Posted: July 10th, 2012 | Author: | Filed under: Statistics | 1 Comment »

I’m happy to announce that my paper From Karen to Katie: Using Baby Names to Understand Cultural Evolution has been accepted for publication in the journal, Psychological Science. The article also received some press in Time magazine.

In this paper we track the evolution of baby names from 1882 through 2006. First, we demonstrate that names increase in popularity when other, phonetically similar, names have been popular in previous years (the more you hear it the more you like it). Second, we found the effects to be non-linear. In particular over-popularity hurt adoption (think of how annoying it was to have 8 people named Alex in your class, names/sounds can be too popular). Third, we found varying strength in these relationships across name position. The effect was strong for the first phonemes/sounds and less strong for ending phonemes/sounds (often rhyming) and internal phonemes/sounds.

Finally, we confirmed the initial study, by considering the impact of hurricane names on baby name frequency the following year. Hurricane names are a good case study as they are picked ahead of time and effectively an exogenous shock to the system. We found that increased severity (which correlates with increased mentions), yielded larger increases in frequency of similar names the following year.

This paper was a fun exercise in Bayesian Hierarchical Modeling. I’ll call it a zero inflated poisson regression, with all the usual priors.

We received a shout out from Wharton, which is always fun.

If you want to build models like this or read up on Bayesian Statistics, here are 3 suggested books:


Convenient Definitions of Statistical Significance

Posted: July 9th, 2012 | Author: | Filed under: Statistics | No Comments »

This week there has been tremendous (even excessive) buzz in the popular media about the Higgs Boson (aka The G-d Particle). Though unofficially reported back in December, it was still an exciting day for the Physics community and general science nerds like me.

What struck me most about the article was the reason for the timing of the announcement. The two teams of 3,000+ scientists attempting to experimentally “prove” the existence of this particle decided to wait until, “the likelihood that their signal was a result of a chance fluctuation was less than one chance in 3.5 million.”

While this may seem a bit too cautious to some, people with a Ph D in Physics are generally smarter than either of us, so we should trust them on how to define the correct standard of proof, right? Okay, does one in 1 million sound more reasonable? One in 1000? The current accepted standard across essentially every other academic discipline, the US legal system, pharmaceutical companies, and the FDA is 1 in 20. I’ve ranted about the definition of statistical significance before, but think of it this way. If one in every 20 things your friend told you was wrong, would you trust them? If one in every 20 medications or vitamins you took had harmful and unreported side effects, would you still take them? If one in 20 natural gas pockets that are mined leak poison into your drinking water, would you let them drill in your backyard?

While these examples are obviously artificial, it begs the question: is the 1 in 20 standard stringent enough? If scientists use a one in 3.5 million standard to announce the existence of a particle, something which honestly doesn’t directly impact any of us at all, why are we using a 1 in 20 standard for deciding which drugs are safe, something which could sicken, injure, or kill us or our loved ones? I think thats one solid reason to revaluate, but here are several billion others: Diet Drug Wins FDA’s Approval. Pharmaceutical companies are heavily incentivized to prove that their drugs are safe, otherwise they lose millions in research and development. Even if you believe the 1 in 20 threshold is stringent enough, it needs to be properly followed, which I’m not convinced is always the case. Not only are many drugs later taken off of the market, but Pharmaceutical companies are routinely fined, for illegal marketing and false claims about their drugs. GlaxoSmithKline was just slapped with a several billion dollar fine, the largest of its kind.

There isn’t a magical threshold I’m suggesting for your statistical tests. In my own professional life and research, I tend to use 1 in 100 (p-value of .01), but it changes drastically based on the particular application. Keep the following things in mind when you are deciding your own statistical burden of proof:

  • What are the implications of a false positive? If its not a big deal, a lower threshold is probably fine (ie testing alternate website designs or fantasy sports predictions)
  • Consider the amount of data you have. Are you not seeing statistical significance because you don’t have much data yet or do you have tons of data and the correlation just doesn’t seem to be there? Plotting your significance level over time can help with this, but in general isn’t very precise. Does it look like its converging smoothly or is it “freaking out?”
  • Add a few random noise variables as a predictors. If they are more significant than the variable you care about, be skeptical.
  • Think about how many variables you have. The more you have, the more likely one will spuriously be “significant.” Use Bonferroni (though this is almost always too aggressive), Scheffe, or other intervals to adjust for the problem of “multiple comparisons.”

Happy testing!


Fibonacci Sequence as a conversion for miles and km

Posted: July 7th, 2012 | Author: | Filed under: Statistics | No Comments »

The Fibonacci sequence is a good approximation for the conversion between miles and kilometers.

1,1,2,3,5,8,13,21…

1 mile = 1.6 km
2 miles = 3.21 km
3 miles = 4.82 km
5 miles = 8.04 km
8 miles = 12.87 km
13 miles = 20.92 km

You can generate the sequence on your own, by starting with two 1’s and then adding together the last two numbers to get the next. The ratio between numbers in the sequence converges the Golden Ratio, phi:

phi

which is of course numerically very close to the actual conversion ratio of miles to kilometers, 1.609344.