How to hire a data scientist or statistician

Posted: August 9th, 2011 | Author: | Filed under: R, Statistics | 7 Comments »

In April, I interviewed with Chomp, Bing, Facebook, Foursquare, Zynga, and several other companies. Each company repeatedly expressed the difficulty of finding qualified candidates in the areas of data science and statistics. While everyone that interviewed me was incredibly talented and passionate about their work, few knew how to correctly interview a data scientist or statistician.

Hilary Mason (@hmason), chief scientist at bitly created an excellent graphic describing where data science sits (though I think math should be replaced by statistics)

I am obviously biased with respect to the importance of statistics based on my education, though other people seem to agree with me. During interviews, we tend to either ask questions that play to our individual strengths or brainteasers. Though easier, this approach is fundamentally wrong. Interviewers should outline the skills required for the role and ask questions to ensure the candidate possesses all the necessary qualifications. If your interview questions don’t have a specific goal in mind, they are shitty interview questions. This means, by definition, that most brain teaser and probability questions are shitty interview questions.

A data scientist or statistician should be able to:

  • Pull data from, create, and understand SQL and noSQL dbs (and the relative advantages of each)
  • understand and construct a good regression
  • write their own map/reduce
  • understand CART, boosting, Random Forests, maybe SVM and fit them in R or using some other open source implementation
  • take a project from start to finish, without the help of an engineer, and create actionable recommendations

Good Interview Questions

Below are a few of the interview questions I’ve heard or used over the past few years. Each has a very specific goal in mind, which I enumerate in the answer. Some are short and very easy, some are very long and can be quite difficult.

Q: How would you calculate the variance of the columns of a matrix (called mat) in R without using for loops.

A: This question establishes familiarity with R by indirectly asking about one of the biggest flaws of the language. If the candidate has used it for any non-trivial application, they will know the apply function and will bitch about the slowness of for loops in R. The solution is:

apply(mat, 2, var)

Q: Suppose you have a .csv files with two columns, the 1st of first names the 2nd of last names. Write some code to create a .csv file with last names as the 1st column and first names as the 2nd column.

A: You should know basic cat, awk, grep, sed, etc.

cat names.csv | awk -F “,” ‘{print $2″,”$1}’ > flipped_names.csv

Q: Explain map/reduce and then write a simple one in your favorite programming language.

A: This establishes familiarity with map/reduce. See my previous blog post.

Q: Suppose you are Google and want to estimate the click through rates (CTR) on your ads. You have 1000 queries, each of which has been issued 1000 times. Each query shows 10 ads and all ads are unique. Estimate the CTR for each ad.

A: This is my favorite interview question for a statistician. It doesn’t tackle one specific area, but gets at the depth of statistical knowledge they possess. Only good candidates receive this question. The candidate should immediately recognize this as a binomial trial, so the maximum likelihood estimator of the CTR is simply (# clicks)/(# impressions). This question is easily followed up by mentioning that click through rates are empirically very low, so this will estimate many CTRs at 0, which doesn’t really make sense. The candidate should then suggest altering the estimate by adding pseudo counts: (# clicks + 2)/(# impressions + 4). This is called the Wilson estimator and shrinks your estimate towards .5. Empirically, this does much better than the MLE. You should then ask if this can be interpreted in the context of Bayesian priors, to which they should respond, “Yes, this is equivalent to a prior of beta(2,2), which is the conjugate prior for the binomial distribution.”

The discussion can be led multiple places from here. You can discuss: a) other shrinkage estimators (this is an actual term in Statistics, not a Seinfeld reference, see Stein estimators for further reading) b) pooling results from similar queries c) use of covariates (position, ad text, query length, etc.) to assist in prediction d) method for prediciton logistic regression, complicated ML models, etc. A strong candidate can talk about this problem for at least 15 of 20 minutes.

Q: Suppose you run a regression with 10 variables and 1 is significant at the 95% level. Suppose you then find 10% of the data had been left out randomly and had their y values deleted. How would you predict their y values?

A: I would be very careful about doing this unless its sensationally predictive. If one generates 10 variables of random noise and regresses them against white noise, there is a ~40% chance at least one will be significant at a 95% confidence level. This question helps me understand if the individual understands regression. I also usually ask about regression diagnostics and assumptions.

Q: Suppose you have the option to go into one of two bank branches. Branch one has 10 tellers, each with a separate queue of 10 customers, and branch two has 10 tellers, sharing one queue of 100 customers. Which do you choose?

A: This question establishes familiarity with a wide range of basic stat concepts: mean, variance, waiting times, central limit theorem, and the ability to model and then analyze a real world situation. Both options have the same mean wait time. The latter option has smaller variance, because you are averaging the wait times of 100 individuals before you rather than 10. One can fairly argue about utility functions and the merits of risk seeking behavior over risk averse behavior, but I’d go for same mean with smaller variance (think about how maddening it is when another line at the grocery store is faster than your own).

Q: Explain how Random Forests differs from a normal regression tree.

A: This question establishes familiarity with two popular ML algorithms. “Normal” regression trees, have some splitting rule based on decrease in mean squared error or some other measure of error or misclassification. The tree grows until the next split decreases error by less than some threshold. This often leads to overfitting and trees fit on data sets with large numbers of variables can completely leave out many variables from the data set. Random Forests are an ensemble of fully grown trees. For each tree, a subsample of the variables and bootstrap sample of data are taken, fit, and then averaged together. Generally this prevents overfitting and allows all variables to “shine”. If the candidate is familiar with Random Forests, they should also know about partial dependence plots and variable importance plots. I generally ask this question of candidates that I fear may not be up to speed with modern techniques. Some implementations do not grow trees fully, but the original implementation of Random Forests does.

Bad Interview Questions

The following are probability and intro stat questions that are not appropriate for a data scientist or statistician roll. They should have learned this in intro statistics. These would be like asking an engineering candidate the complexity of binary search (O(log n)).

Q: Suppose you are playing a dice game; you roll a single die, then are given the option to re-roll a single time after observing the outcome. What is the expected value of the dice roll?

A: The expected value of a dice roll is 3.5 = (1+2+3+4+5+6)/6, so you should opt to re-roll only if the initial roll is a 1, 2, or 3. If you re-roll (which occurs with probability .5), the expected value of that roll is 3.5, so the expected value is:

4 * 1/6 + 5 * 1/6 + 6 * 1/6 + 3.5 * .5 = 4.25

Q: Suppose you have two variables, X and Y, each with standard deviation 1. Then, X + Y has standard deviation 2. If instead X and Y had standard deviations of 3 and 4, what is the standard deviation of X + Y?

A: Variances are additive, not standard deviations. The first example was a trick! sd(X+Y) = sqrt(Var(X+Y)) = sqrt(Var(X) + Var(Y)) = sqrt(sd(X)*sd(X) + sd(Y)*sd(Y)) = sqrt(3*3 + 4*4) = 5.

A few closing notes

Don’t ask anything about traversing a tree or graph structure that you learned in your algorithms class. This is a question for a software engineer, not a data scientist or statistician. If you are a software engineer interviewing a data scientist, ask your data scientist friends for questions beforehand. I do this when I interview software engineers and its a much better experience for everyone involved. If you don’t know any data scientists feel free to steal these or email me for more. Finally, I’d love to hear about your favorite interview questions, worst interview experiences, or anything else related to this topic.


June App Search Analytics

Posted: July 28th, 2011 | Author: | Filed under: Chomp, Statistics | No Comments »

Each month I create an App Search Analytics report for Chomp. You can read about the full details on the Chomp blog, but I wanted to quickly mention some of the monthly highlights here.

June App Search Analytics Highlights

Indirect Content Privacy Surveys: Measuring Privacy Without Asking About It

Posted: June 25th, 2011 | Author: | Filed under: Google, Privacy, Statistics | No Comments »

Awesome news! My most recent publication: Indirect Content Privacy Surveys: Measuring Privacy Without Asking About It became a featured publication on the Google Research Homepage. For when they take it down I have included screenshots of the homepage (paper is in bottom right hand corner):

and tweet announcing its post:

This is exciting for a couple of reasons. First, recognition by Google is great validation of the importance of the work. Googlers publish tons of papers, and its a great honor to have mine showcased on the front page of the research blog. Second, should I ever want to return to academia, this paper now adds much more academic “street cred.” Finally, a press piece about the article was written by Thomas Claburn (@ThomasClaburn). This is the first time anything I’ve written has ever received any sort of non-academic press coverage. His article can be found here. Thomas did contact me for comment several hours before he sent the article, but my coauthors and I were unable to run things through the necessary PR people.

I’m not really going to comment on his article, because I am no longer at Google, don’t want to take on the role of spokesperson, and technically anything I say on the subject should go through the Google press/PR people. I’ll simply say that understanding your user is key to making ANY good product. Laura, Jessica, and I didn’t write this paper or conduct this research with any specific agenda or to right any wrong. We wanted to understand how users feel about and share their content, so we asked. Interesting patterns in their responses emerged, so we investigated and reported our findings. Thats it.

Though I have a disclaimer in my “about” section, I want to again emphasize that all opinions expressed in this post are strictly my own. In particular, they do not reflect those of any past, present, or future employer, especially Google.


Update: Wharton Global Alumni Forum 2011

Posted: June 19th, 2011 | Author: | Filed under: Chomp, Statistics | No Comments »

Update! The subject of my talk at the Wharton Global Alumni Forum in San Francisco June 23-24, 2011, has changed. The new title and abstract can be found below.

Searching by Function: App Search v. Web Search

The age of apps has officially arrived. There are more than 500,000 apps for iOS and nearly 300,000 on Android. Last year alone 8 billion apps were downloaded, and nearly that many have been downloaded this year. Growth in this multi-billion dollar industry continues to accelerate. Navigating this massive landscape has become a real problem for smartphone users, because traditional keyword-based search algorithms fail to perform as efficiently. Consumers search by app function, requiring a very different approach to ranking. In this talk we dissect current issues with app search and discuss several solutions implemented at Chomp.


Disregard anything said by the International Air Transport Association

Posted: June 13th, 2011 | Author: | Filed under: Statistics | No Comments »

Today I read the least consequential and most pointless news article in the history of journalism: Is It Really Safe to Use a Cellphone on a Plane?. The article enumerates some recent findings of the International Air Transport Association, concerning the danger in use of personal electronic devices on airplanes (not cellphones specifically). The agency concludes that 75 events over the years 2003 – 2009 have possibly been linked to these devices. It then provides some scary quotes about the use of personal electronic devices. My favorite was: “A clock spun backwards and a GPS in cabin read incorrectly while two laptops were being used nearby.” Sounds like a voiceover clip from a crappy B movie.

There are approximately 32,000 flights over the US every day, which totals 81,760,000 over the course of the study. 75 incidents implies an incident rate of .0001%. Which is more likely: instruments working slightly less than 99.9999% of the time, or cell phones causing instruments to break, but only .0001% of the time?

So who wrote this journalistic gem? ABC journalists Brian Ross @brianross (who usually produces very high quality work) and Avni Patel. ABC’s own expert John Nance explains, “If an airplane is properly hardened, in terms of the sheathing of the electronics, there’s no way interference can occur.” If your own expert thinks the report is wrong, why report on it? Is this really the only story they could come up with? How about reporting on something of value, rather than going for shock value and misleading headlines? In the words of my parents, “I’m not upset, just disappointed.” I’d love to get my hands on this report, but apparently its a “confidential industry study,” which I believe is code for “embarrassing and wrong.”


Why the WHO cellphone and brain cancer study is meaningless

Posted: June 2nd, 2011 | Author: | Filed under: Statistics | No Comments »

Not surprisingly, everyone has massively overreacted to the WHO cellphone and cancer report (including my mother). Here is a sampling of my favorite headlines from yesterday:

The report actually said: radiation from cellphones is a class 2B carcinogen. To put this in context, I have listed the full range of classifications and a few items from each:

  • Group 1: The agent is carcinogenic to humans (tobacco)
  • Group 2A: The agent is probably carcinogenic to humans (frying things, indoor combustion aka having a fireplace)
  • Group 2B: The agent is possibly carcinogenic to humans (pickled vegetables, alcoholic beverages)
  • Group 3: The agent is not classifiable as to its carcinogenicity to human (caffeine, Tylenol, hydrogen peroxide)
  • Group 4: The agent is probably not carcinogenic to humans (caprolactam)

Only one of the 900 tested items has ever been classified group 4. To me, the preceding fact and this scale sounds like a bunch of lawyers covering their asses.

After reading the entire report, their conclusion was based on one study, ending in 2004, which found a 40% increase in cancer among heavy (>30 minutes per day). This is only one report of hundreds, each with several age groups, several levels of usage, and many other variables. Think of it this way. If you took 10,000 people, gave them each a quarter, and asked them to flip it 10 times. Would you be that surprised if one flipped 9 or 10 heads. Of course not, you would attribute it to randomness. The WHO or anyone else claiming that cellphone radiation and brain cancer are linked because of one group in one study is like claiming the person who flipped 9 or 10 heads is a skilled magician who could repeat this feat over and over again. Science is repeatable. If only one study finds this link when hundreds have tried, its not very believable.

CNET posted an excellent and lengthy article with the radiation levels of a long list of cell phones. If you are curious (or really bored) here are the list of all things currently classified by the WHO, the full WHO press release, and the conflicts of interest for those on the panel. The “official” results will be posted in the academic journal, The Lancet Oncology.

Finally, I promise that not every one of my posts will be titled “Why X is meaningless”


Why your Klout score is meaningless

Posted: June 1st, 2011 | Author: | Filed under: Klout, Statistics | 45 Comments »

As a Ph D Statistician and search quality engineer, I know a lot about how to properly measure things. In the past few months I’ve become an active Twitter user and very interested in measuring the influence of individuals. Klout provides a way to measure influence on Twitter using a score also called Klout. The range is 0 to 100. Light users score below 20, regular users around 30, and celebrities start around 75. Naturally, I was intrigued by the Klout measurement, but a careful analysis led to some serious issues with the score.

Everything in life can be measured. Some quantities live on natural measurement scales: height, weight, temperature, etc. Some quantities are derived measurements: happiness, deliciousness, hunger, etc. Though all useful measurements, research has repeatedly shown derived measurements to be inconsistent and not trustworthy individually. Specifically, if two individuals tell you their happiness levels are an 8 and a 9 on a scale of 10, we have no way to know:

  • what this means for each individual without significant amounts of context
  • which individual is “happier” even if 8 is less than 9

I argue that Klout is far more similar to a derived measurement and has several suboptimal properties. Specifically, there are 3 basic, desirable properties the Klout score should satisfy:

  1. Ordering by Klout should make sense in the real world – the score should roughly represent the degree to which one is influential or has clout
  2. The score should not be easy to game – people should not be able to hack their klout in a few days by getting bots to RT, squatting on hashtags, or simply connecting a Facebook account
  3. The score should be monotonic – if another member has higher stats than me in ALL categories, then he/she should have a higher score

To demonstrate the issues Klout has with these principles, we provide 4 groups of Klout score comparisons:

  • a set of individuals with Klout in the 40-49 range
  • a set of individuals with Klout in the 55-64 range
  • a set of individuals with Klout in the 70-79 range
  • a set of individuals with Klout >= 80

The four groups were chosen to span the Klout range and contain bloggers, executives, tech pundits, and celebrities of varying levels of activity in social media, notoriety, influence, importance, etc.

Group 1 (Klout 40-49)

Alex Braunstein (me), @alexbraunstein – Statistician, Research Scientist at Chomp, X-Googler
Ben Keighran, @benkeighran – the CEO of Chomp
Binh Tran, @binhtran – the co-founder and CTO of Klout,
Chomp, @chomp – app search engine
Vic Gundotra, @vicgundotra – SVP and head of social at Google
Carla Borsoi, @u_m – VP of Consumer Insights at AOL

Let’s consider a few pairwise comparisons. First, Ben’s stats dominate mine excepting likes per post and comments per post, however, his Klout score is 7 points lower than mine. Next, Binh’s stats completely dominate my own in EVERY category, often by very large factors, yet we have identical Klout scores. Carla’s scores also completely dominate mine, but her score is lower. Finally, consider Chomp and Vic Gundotra. Vic’s stats blow Chomp out of the water, yet his Klout score is lower. In the “real world” sense of the word clout, Vic should dominate this group. The group 1 comparisons demonstrate the Klout score violating rules 1 and 3 from above.

Group 2 (Klout 55-64)

Paul Graham, @paulg – the fearless leader of Y Combinator.
Y Combinator, @ycombinator – startup incubator
500 Startups, @500startups – startup incubator
Adria Richards, @adriarichards – a tech consultant and popular blogger (also my roommate)
Stefanie Michaels, @adventuregirl – go to person for everything travel

In group 2, my roommate has a higher Klout score than Paul Graham? Really? By 5 points? Paul has 6x more followers, 2x total RTs, and 4x as many unique RTs, but he hasn’t linked his FB account. Adria has incredibly low FB stats (she uses it sparingly), but apparently that still gives her a tremendous boost. Adding a FB account is far too easy a way to game your score higher. I understand that Klout wants to incentivize the attachment of FB accounts and keep growing virally, but this aspect of the Klout score seems broken. Additionally, the pairwise comparison of Y Combinator and Paul is confusing. Paul’s stats are much higher, but they are assigned the same score. One could argue different, perhaps more Klout-tastic people, are following Y Combinator, however, I find that unlikely given that Paul is in charge of it. Finally, its wrong that Adventure Girl’s Klout is so low. She has been named one of the top 100 people on Twitter, has been featured in Time magazine, etc., but her Klout is only two points higher than Adria’s.

Group 3 (Klout 70-79)

Tim Ferriss, @tferriss – author of the 4 Hour Workweek and 4 Hour Body
Jack Dorsey, @jack – Executive Chairman of Twitter and CEO of Square
Matt Cutts, @mattcutts – head of web spam team at Google
MG Siegler, @parislemon – my favorite writer for Techcrunch
Klout, @klout – the service I’m trashing in this post
David Pogue, @pogue – tech guy from the NYT
Jeffrey Zeldman, @zeldman – designer, writer, and publisher

Things get very confusing in this group. Jack Dorsey’s stats dominate those of David Pogue, but his score is 4 points lower. Matt Cutts has 4000 more total RTs but 1.5M fewer followers relative to Jack Dorsey, so his Klout score is 1 point higher? I’ll go out on a limb and state that 4000 incrementral RTs seem FAR less valuable than 1.5M incremental followers. Klout, the company, has fewer followers, total RTs and unique RTers by a factor of at least 6, but 7K more unique mentioners, so Klout’s Klout score is 4 points higher than Jacks? But if unique mentions are so valuable, how can Jack Dorsey have a lower score than Matt Cutts when he has 16K additional unique mentioners? This is just the start of the inconsistencies.

Without FB, MG Siegler’s score would likely be 10 points lower. Jeffrey Zeldman’s blog is super high quality, but does he deserve to have more Klout than David Pogue? Again, Facebook puts him over the top. I think that Klout’s score is far too high, though perhaps its not surprising Klout does well on its own metric. Finally, I included Tim Ferriss not just because I’m a huge fan, but his stats provide an interesting counterpoint for even more interesting pairwise comparisons. It will lead you to several more contradictions concerning the relative value of followers, RTs, unique RTers, and unique mentioners.

Group 4 (Klout >= 80)

Robert Scoble, @scobleizer – blogger, tech evangelist, and author
Perez Hilton, @perezhilton – master of celebrity gossip
Charlie Sheen, @charliesheen – #winning
Guy Kawasaki, @guykawasaki – entrepreneur and former Chief Evangelist at Apple
Justin Bieber, @justinbieber – never saying never

The pairings of Scoble/Hilton and Sheen/Kawasaki again demonstrate the severe miscalibration regarding Facebook scores. Also, I’m not sure I trust any system which has Justin Bieber as most influential.

In conclusion, there are some serious inconsistencies with Klout that render it nearly meaningless in some circumstances. It often does not correctly order individuals in terms of how influential they are, is easy to game higher simply by adding a Facebook account, and does not respect some very basic monotonicity rules. Put simply, it acts like a derived measurement. From this analysis, I have gleaned the following rough rules of thumb for understanding your Klout score:

  • Connecting an additional account (ie Facebook) will ALWAYS increase your Klout.
  • The degree to which your followers are influential seems to be irrelevant or matter very little
  • The differential between number of people you follow seems to be irrelevant or matter very little.
  • In terms of value to your Klout score: follow < RT < unique RT < unique mention but this can be inconsistent
  • In terms of value to your Klout score: like < comment but this can be inconsistent

To be fair, Klout does not want their score to be completely transparent. Then it would be easy to rip off and even easier to game. That being said it should be possible respect the three conditions I enumerated and still keep a lid on their secret sauce. As I have time, I’m going to mess around with the Klout API a bit and gather more comprehensive data to further demonstrate the points made in this post, including a similar study concerning the Klout of companies/brands. Additionally, I will submit several questions regarding my analysis to Joe Fernandez’s (the CEO of Klout) Klout chat, and hope the company follows up. I’ll post any details/answers I receive here.

More info about Klout can be found in Techcrunch articles about their initial launch, series A funding, series B funding, addition of Facebook to their ranking, and their crunchbase profile.

As any good statistician should, I need to qualify my analysis. There is of course selection bias in the examples enumerated above. Although not as egregious, these head scratching scores are the rule, not the exception. All data was pulled on 5/29/11, and may not reflect current scores. Finally, please remember that this is my personal blog and reflects my opinion alone. In particular, it does not reflect the opinions of any employer past, present, or future.


A few of my favorite intro stat graphics

Posted: April 28th, 2011 | Author: | Filed under: Statistics | No Comments »

Intro stat is a tough class to teach. Most students don’t want to be there, are mathphobic, and/or are bad at math. I always tried to mix up the class with fun examples that students could relate to (ie about beer and frat parties) to keep it interesting. Here are a few of my favorite graphics from the visualization section of the class.








I should note that none of these are my original graphics. I have received them in emails from several people, so I have no idea how to correctly attribute them.


Defining statistical significance?

Posted: April 26th, 2011 | Author: | Filed under: Statistics | 4 Comments »

Just saw an amazing XKCD comic and it reminded me to write a post that I had been thinking about for awhile. First the comic:

XKCD Comic on Statistical Significance

This falls under the sad but true category. People never publish papers or write articles saying that nothing was statistically significant. That’s not interesting. If you fit a bunch of 95% confidence intervals, each has a 5% chance of being “significant” due to random chance. Go ahead, generate 100 pairs of random numbers and run a regression on each. A few of the parameters will be statistically significant even though we know they are not because they were generated randomly.

WIth that in mind, I’ll finally lay out the thesis of this post: confidence intervals are bullshit. This isn’t to say they aren’t useful tools, but the entire idea of boiling down statistical significance to a binary decision (in or out of the interval or equivalently a p-value larger or smaller than your alpha value) is naive. At this point I should note two things:

  1. The concept of statistical significance is incredibly nuanced and not well understood by most.
  2. If you don’t understand my point about p-values and confidence intervals being equivalent you need to retake intro stat (or hire me to consult for you).

Unfortunately, the phrase “statistically significant” was given a definition by lawyers years ago, which until April 2, 2011, was law. This is particularly scary because some very large fraction of lawyers have never taken (or passed) an intro stat course. Before April 2, the phrase implied statistical significance at the 95% level. The Supreme Court case Matrixx Initiatives v. Siracusano, 09-1156 overturned this silly convention. To understand why this is important lets first remember the definition of a 95% confidence interval (more than half of most intro stat students get this wrong on both the midterm and final). It is a procedure for estimating an interval for values of a parameter (ie proportion of people who will vote for Obama, average height of men, etc.), where at least 95% of the intervals generated will contain the true value of the parameter. It is not correct to say: “There is a 95% chance that my interval contains the true parameter.” Each interval either contains the true value or it doesn’t. You can’t be 95% pregnant, you either are or you are not.

A 95% CI implies a 20:1 ratio for false positives. This is entirely reasonable. The Supreme court simply ruled that if you estimate a 19.9:1 ratio for a false positive of a particular side effect (which is technically less than 20:1 but in practice the same), you can’t simply ignore that info or not report it to consumers of your product, shareholders, doctors, the FDA, etc. Under current law, if your drug has some horrible side effect (heart attack, stroke, death, etc.), it was okay not to report it given you were slightly less than this ratio. Any reasonable person understands this, and now the Supreme Court agrees. No idea if this will be followed/implemented in practice, but its at least a step in the right direction and a moral victory for statisticians everywhere.

For those curious, more background on the court case can be found in this WSJ article.


Stat 101 in a Blog Post

Posted: March 28th, 2011 | Author: | Filed under: Statistics | No Comments »

I taught Stat 111 (equivalent of Stat 101 for the College of Arts and Sciences) at the University of Pennsylvania during the Summer of 2009. Here are my notes, problem sets, and practice midterm and final exams. We used JMP as our statistical software package. A 30 day free trial is normally available on their site.

There are a few errors in the slides that I didn’t bother fixing, but if you are looking for another perspective on stat 101 or more examples, this may help. We used Moore and McCabe, but the lectures are mostly self contained.

Lecture Notes:

  1. Lecture 1
  2. Lecture 2
  3. Lecture 3
  4. Lecture 4
  5. Lecture 5
  6. Lecture 6
  7. Lecture 7
  8. Lecture 8
  9. Lecture 9
  10. Lecture 10
  11. Lecture 11
  12. Lecture 12
  13. Lecture 13
  14. Lecture 14
  15. Lecture 15
  16. Lecture 16
  17. Lecture 17
  18. Lecture 18

Problem Sets:

  1. Homework 1
  2. Homework 2
    Dangers Fuel Consumption Fuel Economy

  3. Homework 3
  4. Homework 4
    Golf

  5. Homework 5

Midterm and Final Prep and Solutions:

  1. Sample Midterm
  2. Solutions
  3. midterm scores
  4. Sample Final
  5. Solutions