Stat 101 in a Blog Post

Posted: March 28th, 2011 | Author: | Filed under: Statistics | No Comments »

I taught Stat 111 (equivalent of Stat 101 for the College of Arts and Sciences) at the University of Pennsylvania during the Summer of 2009. Here are my notes, problem sets, and practice midterm and final exams. We used JMP as our statistical software package. A 30 day free trial is normally available on their site.

There are a few errors in the slides that I didn’t bother fixing, but if you are looking for another perspective on stat 101 or more examples, this may help. We used Moore and McCabe, but the lectures are mostly self contained.

Lecture Notes:

  1. Lecture 1
  2. Lecture 2
  3. Lecture 3
  4. Lecture 4
  5. Lecture 5
  6. Lecture 6
  7. Lecture 7
  8. Lecture 8
  9. Lecture 9
  10. Lecture 10
  11. Lecture 11
  12. Lecture 12
  13. Lecture 13
  14. Lecture 14
  15. Lecture 15
  16. Lecture 16
  17. Lecture 17
  18. Lecture 18

Problem Sets:

  1. Homework 1
  2. Homework 2
    Dangers Fuel Consumption Fuel Economy

  3. Homework 3
  4. Homework 4

  5. Homework 5

Midterm and Final Prep and Solutions:

  1. Sample Midterm
  2. Solutions
  3. midterm scores
  4. Sample Final
  5. Solutions

Playing around with mongodb

Posted: March 25th, 2011 | Author: | Filed under: mongodb | 1 Comment »

In my first post on this blog, I set a goal of filling the holes in my tech education. Goals 4 and 5 were to play around mapreduce/hadoop for large scale data processing and gain MySQL knowledge. I decided to squish them both into one and try out mongodb, which is deployed at companies like shutterfly, foursquare, intuit, and others.

So lets start from scratch. Download the correct distribution here and untar it (your version number may be different):

curl > mongo.tgz
tar xzf mongo.tgz

Now make a directory for the data, open a new terminal window in the same working directory, and start mongodb:

mkdir -p /data/db

If you see: Error: couldn’t connect to server shell/mongo.js:79 its because you did ./mongodb-osx-x86_64-1.8.0/bin/mongo, which connects to an existing database, rather than ./mongodb-osx-x86_64-1.8.0/bin/mongod, which initializes one.

In your other open terminal do:


then you are good to start creating entries and querying. { a : 1 } )

Thats it. Also, here is a handy MySQL to mongodb conversion chart. I’m going to work on a fun and quasi-advanced stat function, but I’ll write that up in a separate post.

Bayesian computation is so hot right now!

Posted: March 23rd, 2011 | Author: | Filed under: Python, Statistics | No Comments »

For everyone really into baseball AND Bayes’ Theorem, this post is for you. I finally got around to posting the python code implementing the algorithm described in my paper, A Point-Mass Mixture Random Effects Model for Pitching Metrics. Sabrmetrics, the study of statistical patterns in baseball, is a huge mess. Everyone is proposing new metrics and those evaluating them usually don’t have the credentials or skills to do so correctly. In this paper, we take a statistically rigorous approach and argue that metrics must (i) have a large fraction of players which are different from the league average and (ii) give high confidence about which players are not league average. We of course rigorously define these requirements within the paper.

The .tar.gz contains 5 files:

  1. – the main function that runs the sampler
  2. – class and function definitions
  3. BABIP.csv – a data file for the BABIP (batting average on balls in play) metric
  4. pitching_column_info.csv – index file needed because we were doing all runs concurrently on the Wharton Grid
  5. – shell script for running the sampler under some default parameters for BABIP

A companion manuscript, A Bayesian Variable Selection Approach to Major League Baseball Hitting Metrics, is currently under review.

I <3 Baseball

Posted: March 23rd, 2011 | Author: | Filed under: Uncategorized | No Comments »

It’s been 5 months since baseball. We are all a littler older, perhaps a little wiser, and I have finally gotten over the devastating defeat of the Phillies in the NLCS and then the subsequent heartbreak exacerbated by weeks of celebration outside my apartment on Valencia street in San Francisco. No really, I’m over it. I won’t go on about the excessive amount of PHILLIES PHEVER I have right now, but I will provide this chart of which team you should cheer for that made me giggle.

Speaking at the Wharton Global Alumni Forum

Posted: March 20th, 2011 | Author: | Filed under: Uncategorized | No Comments »

Exciting news! I have been asked to speak at the Wharton Global Alumni Forum in San Francisco June 23-24, 2011. I will be giving a talk titled Evaluating Social Search, in which I will discuss my experiences leading the evaluation of this new ranking feature.

The abstract:

Google has been hard at work integrating a social layer on top of search and its various properties, but not all social content is created equal. In this talk we discuss the re-launch of Social Search from a quality evaluation perspective. We briefly address the general quality evaluation framework, several difficulties introduced by social and other personalized content, and resolutions to several of these issues. Finally, we discuss a machine learning application within the Social Search launch, which matches Google profiles and third party profiles to increase coverage of this new and very successful ranking feature.

The full schedule for my session can be found here.

American Option Valuation Code

Posted: March 20th, 2011 | Author: | Filed under: Uncategorized | No Comments »

I finally added the American option valuation code from my Master’s Thesis. The python library provides call and put valuations as well as most greeks from the following papers/authors:

  1. An Approximate Formula For Pricing American Options. N. Ju and R. Zhong, Journal of Derivatives, 7, 2, 1999, p 31-40.
  2. The American Put Option and Its Critical Stock Price. David S. Bunch and Herb Johnson, Journal of Finance, Oct 2000, p 2333-2356.
  3. A New Analytical-Approximation Formula for the Optimal Exercise Boundary of American Put Options, S. P. Zhu, International Journal of Theoretical and Applied Finance, 9(7) 2006, p 1141-1177.

I wrote this code just after learning python so its readability is very poor and it is not object oriented. Not a high priority for me to go back and clean it up right now, but I had received a few random emails requesting the code, so I wanted to put it up. This code requires my Black-Scholes python library, which has Standard Black-Scholes Greeks. It also requires scipy for some numerical integration and optimization function calls. Hope you find it useful even with all these restrictions.

Filling the holes in my tech education

Posted: March 6th, 2011 | Author: | Filed under: Uncategorized | No Comments »

People, especially my mom, believe that I understand all tech because I work Google. This is false. Other than writing code in python, R, and some C++ in graduate school (and one crappy, simple .html geocities site when I was 12), my “tech savviness” is pretty limited. When the printer is broken at work, I just wait and assume it will be fixed eventually. Maybe, if it’s urgent I’ll try turning the printer off and then back on again.

I’d like to change this, at least for certain areas. I’ve decided to beef up my knowledge in:

  1. Webpage design/UI with wordpress and other platforms (this post/site is a first step towards this goal)
  2. More generic web programming skills ie CSS, JSON
  3. Social web APIs: Twitter, Facebook, LinkedIn, etc.
  4. Mapreduce and hadoop for large scale data processing
  5. basic MySQL knowledge

Put more simply, if I had an amazing idea for a website/business, I should be able to prototype it in a few weekends or weeks. I’d also like to increase my online presence on sites like twitter and Quora and speak at several conferences. Let me know if you think I’m missing anything!