Indirect Content Privacy Surveys: Measuring Privacy Without Asking About It

Posted: June 25th, 2011 | Author: | Filed under: Google, Privacy, Statistics | No Comments »

Awesome news! My most recent publication: Indirect Content Privacy Surveys: Measuring Privacy Without Asking About It became a featured publication on the Google Research Homepage. For when they take it down I have included screenshots of the homepage (paper is in bottom right hand corner):

and tweet announcing its post:

This is exciting for a couple of reasons. First, recognition by Google is great validation of the importance of the work. Googlers publish tons of papers, and its a great honor to have mine showcased on the front page of the research blog. Second, should I ever want to return to academia, this paper now adds much more academic “street cred.” Finally, a press piece about the article was written by Thomas Claburn (@ThomasClaburn). This is the first time anything I’ve written has ever received any sort of non-academic press coverage. His article can be found here. Thomas did contact me for comment several hours before he sent the article, but my coauthors and I were unable to run things through the necessary PR people.

I’m not really going to comment on his article, because I am no longer at Google, don’t want to take on the role of spokesperson, and technically anything I say on the subject should go through the Google press/PR people. I’ll simply say that understanding your user is key to making ANY good product. Laura, Jessica, and I didn’t write this paper or conduct this research with any specific agenda or to right any wrong. We wanted to understand how users feel about and share their content, so we asked. Interesting patterns in their responses emerged, so we investigated and reported our findings. Thats it.

Though I have a disclaimer in my “about” section, I want to again emphasize that all opinions expressed in this post are strictly my own. In particular, they do not reflect those of any past, present, or future employer, especially Google.


Klout perks: Nudie jeans party at Rolo SF

Posted: June 24th, 2011 | Author: | Filed under: Klout | No Comments »

Thursday June 23, I attended my first Klout perk party. Apparently I’ve built up enough Klout by bashing Klout to deserve an invite. The event was showcasing Nudie Jeans at the store Rolo in the SOMA district of San Francisco.


They had free food:

and a DJ:

If you’ve ever wondering what I’d look like as a hipster, here’s a shot of me in some very tight hipster jeans.

For those curious, this pair is the Slim Jim Org. Dry Dark, which can be yours for $179. Overall it was a fun event and one that may signify a move by Klout into the local space. This was the first event of its kind.


An interview with Klout

Posted: June 20th, 2011 | Author: | Filed under: Klout | 2 Comments »

After my initial Klout blog posts, I followed up with their Marketing Manager Megan Berry @meganberry and Director of Ranking Ash Rust @ashrust in a series of emails. I sent them a barrage of questions; below are their answers.

Q: How Klout will deal with identical individuals across multiple networks?
K: We look at each platform holistically to try and determine what the signals of influence are. We then perform sophisticated analysis to weight the different platforms appropriately for each person.
A: No information was conveyed in this answer. In academia and finance the word “sophisticated” is a completely loaded term roughly translating to “it’s actually trivially easy, but I think you are too stupid to understand.”

Q: My Foursquare friends are a strict subset of my Facebook and Twitter friends. Will they double count?
K: Follower and friend count are really not part of what we do — it is about ability to drive action.
A: I thought this was a decent answer. I have a few friends that are very active on foursquare, but relatively inactive on Twitter. If my activity drives actions on both Twitter and foursquare, I should get more credit.

Q: CEO Joe Fernandez stated: “Klout Score is not about followers or your activity level but about how people react to your content. ” This is a bit vauge. Now that we have more than 140 characters, can you elaborate a bit?
K: Yes, we don’t believe that followers or friends are a good measure of influence. Instead we’re looking at the engagement you get from people (i.e. RTs, @msgs, likes, etc.) and how influential those people are.
A: I agree that followers/friends should be secondary to actions indicating that individuals are actively engaged with your content (i.e. RTs, @msgs, likes, etc.). I also agree that a Robert Scoble or Michael Arrington RT or @ reply should be worth far more than my mom RTing something I say.

Q: Why do you feel you are better than competitors such as Peerindex and twittergrader?
K: We are the emerging standard in this industry — we are used by over 2000 applications and major brands to understand and measure online influence.
A: Worst answer ever, even worse than your parents saying “because I said so.” I more or less agree with their assessment that they are the emerging industry standard, however, would they still be if they didn’t get a 1 year jump start over all other competitors? What if PeerIndex’s infrastructure were more scalable and could handle the same scale as Klout? I expected some statement assessing their relative quality in terms of ranking or infrastructure, not a catch 22 or tautological response.

Q: I think the new K+ system is awesome, but am worried about spam. What steps are you taking to ameliorate the risk?
K: We’re watching this very carefully to understand how people are using it. We also limit people to 5 +K’s a day.
A: Here’s my favorite example so far:

This wasn’t spam; the label was generated by Klout for Daniel Bogan @waferbaby. I strongly believe that Daniel is authoritative on unicorns so I even voted for his Klout in this area

His current Klout topics (which still include unicorns) are available here. It also seems like Klout slightly changed their shade of orange from the pics above.

On a more serious note, I think the K+ system is an incredibly important step in the evolution of Klout. The next step is to provide topic specific scores. Using these they can start to tackle the holy grail of influence measures: individualized influence scores. A serious problem with existing Klout scores is that it removes individual context from the equation. Justin Bieber will forever have a Klout of 0 for me, even though his systemwide Klout is 100. It would be easy to check that Justin Bieber has no Klout for the topics I am interested in (apps, startups, statistics, mongodb, etc.) This approach is much more computationally expensive and harder to get right. Tweets, Facebook posts, etc. are not labeled with topics, so these must be inferred. This is a VERY difficult problem due to the small amount of text. I’m sure Klout and PeerIndex are both working very hard to tackle the problem. Whoever gets that right will take the “influence” market.

Here are several questions they refused to answer:

  • What % of Klout systemwide is attributed to Twitter v Facebook?
  • Will adding another network ALWAYS increase your score? If not always, empirically, what % of the time does an increase occur?
  • In a recent Kloutchat, the statement was made: “Nearly 50 variables in generating klout score but it all boils down to how people react to your content” Whats the most interesting/surprising variable that you are willing to divulge.

It’s not surprising that Megan and Ash did not answer all of my questions. I think the answer to the first question is roughly 85/15, but they won’t say that publicly because a) it might piss of Facebook if they think they are “underweighted” b) they don’t want people outside the company arguing about how this should be weighted. Im sure they have had tons of discussions about this internally. For the second question, the answer is yes, until Klout tells us otherwise. I won’t go on a rant about the silliness of this, however, if you want to boost your score, attach your FB account. I’ll do a longer “Klout SEO” post in a few weeks.

I sent PeerIndex CEO Azeem Azhar the same group of questions, and will post his responses when I hear back. Next post, I’ll answer the question: “If you were creating my own Klout/PeerIndex/Twittergrader competitor, what signals would you use?” I bet I can come up with a set of 50 variables very close to those used by Klout.


Update: Wharton Global Alumni Forum 2011

Posted: June 19th, 2011 | Author: | Filed under: Chomp, Statistics | No Comments »

Update! The subject of my talk at the Wharton Global Alumni Forum in San Francisco June 23-24, 2011, has changed. The new title and abstract can be found below.

Searching by Function: App Search v. Web Search

The age of apps has officially arrived. There are more than 500,000 apps for iOS and nearly 300,000 on Android. Last year alone 8 billion apps were downloaded, and nearly that many have been downloaded this year. Growth in this multi-billion dollar industry continues to accelerate. Navigating this massive landscape has become a real problem for smartphone users, because traditional keyword-based search algorithms fail to perform as efficiently. Consumers search by app function, requiring a very different approach to ranking. In this talk we dissect current issues with app search and discuss several solutions implemented at Chomp.


Disregard anything said by the International Air Transport Association

Posted: June 13th, 2011 | Author: | Filed under: Statistics | No Comments »

Today I read the least consequential and most pointless news article in the history of journalism: Is It Really Safe to Use a Cellphone on a Plane?. The article enumerates some recent findings of the International Air Transport Association, concerning the danger in use of personal electronic devices on airplanes (not cellphones specifically). The agency concludes that 75 events over the years 2003 – 2009 have possibly been linked to these devices. It then provides some scary quotes about the use of personal electronic devices. My favorite was: “A clock spun backwards and a GPS in cabin read incorrectly while two laptops were being used nearby.” Sounds like a voiceover clip from a crappy B movie.

There are approximately 32,000 flights over the US every day, which totals 81,760,000 over the course of the study. 75 incidents implies an incident rate of .0001%. Which is more likely: instruments working slightly less than 99.9999% of the time, or cell phones causing instruments to break, but only .0001% of the time?

So who wrote this journalistic gem? ABC journalists Brian Ross @brianross (who usually produces very high quality work) and Avni Patel. ABC’s own expert John Nance explains, “If an airplane is properly hardened, in terms of the sheathing of the electronics, there’s no way interference can occur.” If your own expert thinks the report is wrong, why report on it? Is this really the only story they could come up with? How about reporting on something of value, rather than going for shock value and misleading headlines? In the words of my parents, “I’m not upset, just disappointed.” I’d love to get my hands on this report, but apparently its a “confidential industry study,” which I believe is code for “embarrassing and wrong.”


Comparing Klout competitors and alternatives: PeerIndex and Twittergrader

Posted: June 8th, 2011 | Author: | Filed under: Klout, PeerIndex | 2 Comments »

After spending two blog posts on the shortcomings of Klout, it only seems fair that I look into the quality of its competitors PeerIndex and Twitter Grader.

Like Klout, both consume your Twitter stream and provide a summary score of your “influence” between 0 and 100. PeerIndex also allows you to connect your Facebook, LinkedIn and Quora accounts, while Twitter Grader is limited to Twitter. In my previous post I claimed that any good measure of influence should have the following properties:

  1. Ordering should make sense in the real world – the score should roughly represent the degree to which one is influential or has clout
  2. The score should not be easy to game – people should not be able to hack their score in a few days by getting bots to RT, squatting on hashtags, or simply connecting a Facebook account
  3. The score should be monotonic – if another member has higher stats than me in ALL categories, then he/she should have a higher score

The primary competitor is PeerIndex, so we will start by seeing if their score satisfies the above conditions more than Klout. Here is my PeerIndex summary:

Though overall PeerIndex better satisfies the three rules better, the first thing I noticed is a bug: my PeerIndex is reported as 50 at the top of this screenshot and 44 at the bottom. PeerIndex focuses on three areas: Activity, Authority, and Audience (as compared to Network Influence, Amplification Probability, and True Reach on Klout). These individual numbers are also different at the top and bottom of the screenshot. Though I don’t have a great idea of how these scores are calculated, Activity and Audience (followers) sound much more easily gamed to me. Authority is in theory a great measure, but in practice it falls short. I encountered several other bugs. Mouseover elements in the graph don’t always disappear, and some accounts just aren’t able to be added to the comparision. @Williamsonwines (my favorite winery) and @uberjon a product marketing executive at Facebook and X-Googler are two examples. Several other issues and bugs are highlighted in the comparison screenshot below:

PeerIndex claims that @feltron should have an audience score of 0, however, the well known data analyst and designer had 11,219 Followers and was listed 907 times at the time of this post. @coachella, the epic music festival in Indio, CA, has an activity score of 0, though the account has tweeted 436 times and several immediately preceding this blog post. Finally, @MummNapaWinery has 0’s across the board even though the account has 312 tweets, 3,122 followers, and is one of my favorite wineries in Napa. I also find it ironic and mildly awkward that PeerIndex gives itself a low authority score and Klout a slightly higher one.

For due diligence purposes, I performed the same group comparisons as in my original Klout post, but I didn’t find the same monotonicity issues. As a result, they are harder to poke fun at, so I placed them at the bottom of this post. You will notice a few individuals are left out relative to the original Klout comparisons. The UI requires 6 or less people, so I took a few out randomly. If you stare at the authority scores in these examples long enough, I think you’ll agree that they are kind of wonky (check out Vic Gundotra and Carla Borsoi, VPs at Google and AOL, respectively). Put more succinctly:

PeerIndex Advantages:

  • No or fewer monotonicity issues
  • Facebook, LinkedIn, and Quora already integrated (Klout is just adding LinkedIn)
  • Twitter Elite Lists are VERY interesting: Top Users, Top Women, Top Brands, Top Cities

PeerIndex Disadvantages:

  • Can’t see or track your score over time
  • Fewer details in summary
  • VERY slow to update (~7 days initially and updates only every couple of days)
  • Authority score needs more explanation

In summary, PeerIndex is a legitimate competitor, but they need to fix the user facing bugs I highlighted and really speed up their scoring cycle. If a user shows up to the site and can’t immediately access their score they will have HUGE retention issues. One of the most under appreciated features of Klout, regardless of your feelings concerning the legitimacy of its score, is its infrastructure. The fact that they can ingest the Twitter and Facebook streams, process the data, and update every day is an incredible engineering feat, especially for a startup of its size. Perhaps there is an unseen tradeoff between speed and quality of score within Klout and PeerIndex.

Next we consider Twitter Grader.


Their approach is totally opaque. The summary simply lists stats that I can get from my own twitter account, a number from 0 to 100, and a relative rank. I have no idea where this rank comes from, especially because there are way more than 9 million people on Twitter. Perhaps this is the number of accounts ever scored on Twitter Grader? I certainly hope not, because that would be incredibly biased. They do provide an article in their help center: How does Twitter Grader Calculate Twitter Rankings. Honestly their score may be great, but I just don’t have enough information and their site is lacking many of the features found in PeerIndex and Klout.

Twitter Grader Advantages:

  • Pulls your data and calculates score instantly
  • Associated with Hubspot (which has a great reputation)
  • It only uses twitter (see also disadvantages)

Twitter Grader Disadvantages:

  • It only uses twitter
  • Can’t see or track your score over time
  • Lack features

In my next post on the subject, I’ll have more followup from the ranking folks at Klout and the CEO of PeerIndex

As previously mentioned, the PeerIndex score comparisons mirroring those from the original Klout post can be found below:

Comparison 1

Alex Braunstein (me), @alexbraunstein – Statistician, Research Scientist at Chomp, X-Googler
Binh Tran, @binhtran – the co-founder and CTO of Klout,
Chomp, @chomp – app search engine
Vic Gundotra, @vicgundotra – SVP and head of social at Google
Carla Borsoi, @u_m – VP of Consumer Insights at AOL

Comparison 2

Paul Graham, @paulg – the fearless leader of Y Combinator.
Y Combinator, @ycombinator – startup incubator
500 Startups, @500startups – startup incubator
Adria Richards, @adriarichards – a tech consultant and popular blogger (also my roommate)

Comparison 3

Tim Ferriss, @tferriss – author of the 4 Hour Workweek and 4 Hour Body
Matt Cutts, @mattcutts – head of web spam team at Google
MG Siegler, @parislemon – my favorite writer for Techcrunch
Klout, @klout – the service I’m trashing in this post
Jeffrey Zeldman, @zeldman – designer, writer, and publisher

Comparison 4

Robert Scoble, @scobleizer – blogger, tech evangelist, and author
Perez Hilton, @perezhilton – master of celebrity gossip
Charlie Sheen, @charliesheen – #winning
Guy Kawasaki, @guykawasaki – entrepreneur and former Chief Evangelist at Apple
Justin Bieber, @justinbieber – never saying never


Pymongo: distinct items and an example map reduce on subset of db

Posted: June 7th, 2011 | Author: | Filed under: mongodb, pymongo, Python | 6 Comments »

I’ve been playing around with Pymongo for a few weeks now, and I’m slowly discovering quirks and differences in syntax relative to the mongodb shell. The two I’ll cover in this post are:

  1. Using distinct in pymongo
  2. map reduce example in pymongo on a queried subset of the db

Using distinct in pymongo

Let’s say you want to group users by some sort of id on a day (I’ll use May 21 as an example). From the mongodb shell this command is simply:

db.raw_data.distinct(“id”, {“_date”: “2011-05-21″})

Running this command from within a Python file yields the following error

File “test.py”, line 13, in
foo = db.raw_data.distinct(“id”, {“_date”: “2011-05-21″})
TypeError: distinct() takes exactly 2 arguments (3 given)

It turns out Pymongo makes you do the find and then distinct the records

foo = db.raw_data.find({“_date”: “2011-05-21″}).distinct(“id”)

This is exactly what the mongodb shell interpreter is doing, its just annoying that the syntax is different.

map reduce example in pymongo on a queried subset of the db

Following the example from my previous post, you simply add query = {} and add out = for the collection in which you want your results to end up. Most of the examples I found on stack overflow or personal blogs were wrong or tried to pass these parameters in together. I tried roughly 865868 variations and what I have below is the only combination that worked.

#!/usr/bin/env python
from bson.code import Code # for some this needs to be pymongo.bson
from pymongo import Connection

# code for example map/reduce
db = Connection().map_reduce_example
map = Code(“function () {”
“emit(this.id, 1)”
“}”)
reduce = Code(“function (key, values) {”
” var total = 0;”
” for (var i = 0; i < values.length; i++) {" " total += values[i];" " }" " return total;" "}") # code without query result = db.things.map_reduce(map, reduce, "map_reduce_example") # code with simple query result = db.raw_data.map_reduce(map, reduce, out = "map_reduce_example", query = {"date": "2011-05-21"}) # code with query that grabs all records from May 2011 result = db.raw_data.map_reduce(map, reduce, out = "map_reduce_example2", query = {"date": {"$regex": "^2011-05"}})

This may seem like a silly and simple blog post to write, but this wasn’t documented anywhere else online so I wanted to save anyone else with the distinct error or trying to run a map reduce with a query 5-10 minutes.


Klout reacts

Posted: June 2nd, 2011 | Author: | Filed under: Klout | 5 Comments »

Seems like I struck a nerve with my earlier post about Klout. Binh Tran, the CTO of Klout, Megan Berry, marketing manager at Klout, and Klout itself are all now following me on Twitter. In addition, I received a lengthy response from Ash Rust, the Director of Ranking at Klout, which I have included in full at the end of the post.

Ash’s three main points were:

  • Klout is just beginning and has flaws
  • Your Klout Score is about quality not quantity
  • Adding additional networks should increase your Klout

I appreciate Ash’s candid response. On the first point, he’s right. It’s unrealistic for me or anyone to expect perfection of Klout or any of the competing metrics/companies, especially at this relatively early stage of Klout. Think about Google 2 years afters its launched and how far its come since then. Still, companies need to know what users find wrong with their products to iterate and improve. I didn’t receive hundreds of RTs because my writing was so exceptional or witty; I received them because I articulated a set of issues seen by others in Klout.

Ash’s second and third points seem to contradict each other. Categorically stating that adding another network will always increase your score, seems to be a victory for quantity, not quality. In the Klout Chat yesterday, CEO Joe Fernandez announced that Foursquare and LinkedIn will soon be added. Without the proper Twitter/Facebook balance in the current system, I worry that adding two additional networks to the mix, will exacerbate existing issues. Additionally, I wonder how Klout will deal with identical individuals across multiple networks. My Foursquare friends are a strict subset of my Facebook and Twitter friends. Do they double count? Should I get any credit at all for adding friends already accounted for elsewhere in the system?

As I have time over the next few days, I’ll gather a few unanswered questions from the #Kloutchat in addition to others I have. I’ll send Ash a list that they will hopefully answer. Let me know if you have a few to add to the list.

As promised, here is Ash’s full response:

Hi Alex,
I’m the Director of Ranking here at Klout and wanted to respond directly to some of the points you raised here.
1) Thanks a lot for writing this.
It’s great feedback on the understandability of our score and mirrors a lot of the (intense) debates we have internally around how the score works and what data to deliver to our users.
2) Klout is just beginning.
We believe we’re at the very first stage of development for this paradigm, much like online document search was in 1998 when Google was founded, so we can expect some growing pains especially given the volume of data we process. That said, we know we need to do better and we’re working hard to improve, we have a team of excellent scientists working on improving the score.
3) Your Klout score is about quality not quantity.
While some users may have amassed many thousands of friends and followers, those people may not be listening or may not even be real people at all; this is why we use our own audience metric: True Reach. We also assess the influence of each person in your audience, so if someone you interact with is very influential that can have a much larger impact on your score than a group of people with lower levels of influence; for example if @BarackObama retweets you, it’ll increase your score more than if I do.
4) Adding additional networks to your Klout should increase your score.
We can only measure the data we have. If you add a network, like Facebook, and are influencing people on that network, then it should increase your score; assuming you’re influencing people on that network. If I influence 10 people on Facebook and then add my Twitter account, where I influence 3 people, Klout can now see me influencing 13 people, hence the score increase.
I hope this answers some of your questions and please feel free to follow up with me directly.
Thanks
—-
Ash Rust
Director of Ranking | Klout
http://klout.com
ash [at] klout dot com
@AshRust


Why the WHO cellphone and brain cancer study is meaningless

Posted: June 2nd, 2011 | Author: | Filed under: Statistics | No Comments »

Not surprisingly, everyone has massively overreacted to the WHO cellphone and cancer report (including my mother). Here is a sampling of my favorite headlines from yesterday:

The report actually said: radiation from cellphones is a class 2B carcinogen. To put this in context, I have listed the full range of classifications and a few items from each:

  • Group 1: The agent is carcinogenic to humans (tobacco)
  • Group 2A: The agent is probably carcinogenic to humans (frying things, indoor combustion aka having a fireplace)
  • Group 2B: The agent is possibly carcinogenic to humans (pickled vegetables, alcoholic beverages)
  • Group 3: The agent is not classifiable as to its carcinogenicity to human (caffeine, Tylenol, hydrogen peroxide)
  • Group 4: The agent is probably not carcinogenic to humans (caprolactam)

Only one of the 900 tested items has ever been classified group 4. To me, the preceding fact and this scale sounds like a bunch of lawyers covering their asses.

After reading the entire report, their conclusion was based on one study, ending in 2004, which found a 40% increase in cancer among heavy (>30 minutes per day). This is only one report of hundreds, each with several age groups, several levels of usage, and many other variables. Think of it this way. If you took 10,000 people, gave them each a quarter, and asked them to flip it 10 times. Would you be that surprised if one flipped 9 or 10 heads. Of course not, you would attribute it to randomness. The WHO or anyone else claiming that cellphone radiation and brain cancer are linked because of one group in one study is like claiming the person who flipped 9 or 10 heads is a skilled magician who could repeat this feat over and over again. Science is repeatable. If only one study finds this link when hundreds have tried, its not very believable.

CNET posted an excellent and lengthy article with the radiation levels of a long list of cell phones. If you are curious (or really bored) here are the list of all things currently classified by the WHO, the full WHO press release, and the conflicts of interest for those on the panel. The “official” results will be posted in the academic journal, The Lancet Oncology.

Finally, I promise that not every one of my posts will be titled “Why X is meaningless”


Why your Klout score is meaningless

Posted: June 1st, 2011 | Author: | Filed under: Klout, Statistics | 45 Comments »

As a Ph D Statistician and search quality engineer, I know a lot about how to properly measure things. In the past few months I’ve become an active Twitter user and very interested in measuring the influence of individuals. Klout provides a way to measure influence on Twitter using a score also called Klout. The range is 0 to 100. Light users score below 20, regular users around 30, and celebrities start around 75. Naturally, I was intrigued by the Klout measurement, but a careful analysis led to some serious issues with the score.

Everything in life can be measured. Some quantities live on natural measurement scales: height, weight, temperature, etc. Some quantities are derived measurements: happiness, deliciousness, hunger, etc. Though all useful measurements, research has repeatedly shown derived measurements to be inconsistent and not trustworthy individually. Specifically, if two individuals tell you their happiness levels are an 8 and a 9 on a scale of 10, we have no way to know:

  • what this means for each individual without significant amounts of context
  • which individual is “happier” even if 8 is less than 9

I argue that Klout is far more similar to a derived measurement and has several suboptimal properties. Specifically, there are 3 basic, desirable properties the Klout score should satisfy:

  1. Ordering by Klout should make sense in the real world – the score should roughly represent the degree to which one is influential or has clout
  2. The score should not be easy to game – people should not be able to hack their klout in a few days by getting bots to RT, squatting on hashtags, or simply connecting a Facebook account
  3. The score should be monotonic – if another member has higher stats than me in ALL categories, then he/she should have a higher score

To demonstrate the issues Klout has with these principles, we provide 4 groups of Klout score comparisons:

  • a set of individuals with Klout in the 40-49 range
  • a set of individuals with Klout in the 55-64 range
  • a set of individuals with Klout in the 70-79 range
  • a set of individuals with Klout >= 80

The four groups were chosen to span the Klout range and contain bloggers, executives, tech pundits, and celebrities of varying levels of activity in social media, notoriety, influence, importance, etc.

Group 1 (Klout 40-49)

Alex Braunstein (me), @alexbraunstein – Statistician, Research Scientist at Chomp, X-Googler
Ben Keighran, @benkeighran – the CEO of Chomp
Binh Tran, @binhtran – the co-founder and CTO of Klout,
Chomp, @chomp – app search engine
Vic Gundotra, @vicgundotra – SVP and head of social at Google
Carla Borsoi, @u_m – VP of Consumer Insights at AOL

Let’s consider a few pairwise comparisons. First, Ben’s stats dominate mine excepting likes per post and comments per post, however, his Klout score is 7 points lower than mine. Next, Binh’s stats completely dominate my own in EVERY category, often by very large factors, yet we have identical Klout scores. Carla’s scores also completely dominate mine, but her score is lower. Finally, consider Chomp and Vic Gundotra. Vic’s stats blow Chomp out of the water, yet his Klout score is lower. In the “real world” sense of the word clout, Vic should dominate this group. The group 1 comparisons demonstrate the Klout score violating rules 1 and 3 from above.

Group 2 (Klout 55-64)

Paul Graham, @paulg – the fearless leader of Y Combinator.
Y Combinator, @ycombinator – startup incubator
500 Startups, @500startups – startup incubator
Adria Richards, @adriarichards – a tech consultant and popular blogger (also my roommate)
Stefanie Michaels, @adventuregirl – go to person for everything travel

In group 2, my roommate has a higher Klout score than Paul Graham? Really? By 5 points? Paul has 6x more followers, 2x total RTs, and 4x as many unique RTs, but he hasn’t linked his FB account. Adria has incredibly low FB stats (she uses it sparingly), but apparently that still gives her a tremendous boost. Adding a FB account is far too easy a way to game your score higher. I understand that Klout wants to incentivize the attachment of FB accounts and keep growing virally, but this aspect of the Klout score seems broken. Additionally, the pairwise comparison of Y Combinator and Paul is confusing. Paul’s stats are much higher, but they are assigned the same score. One could argue different, perhaps more Klout-tastic people, are following Y Combinator, however, I find that unlikely given that Paul is in charge of it. Finally, its wrong that Adventure Girl’s Klout is so low. She has been named one of the top 100 people on Twitter, has been featured in Time magazine, etc., but her Klout is only two points higher than Adria’s.

Group 3 (Klout 70-79)

Tim Ferriss, @tferriss – author of the 4 Hour Workweek and 4 Hour Body
Jack Dorsey, @jack – Executive Chairman of Twitter and CEO of Square
Matt Cutts, @mattcutts – head of web spam team at Google
MG Siegler, @parislemon – my favorite writer for Techcrunch
Klout, @klout – the service I’m trashing in this post
David Pogue, @pogue – tech guy from the NYT
Jeffrey Zeldman, @zeldman – designer, writer, and publisher

Things get very confusing in this group. Jack Dorsey’s stats dominate those of David Pogue, but his score is 4 points lower. Matt Cutts has 4000 more total RTs but 1.5M fewer followers relative to Jack Dorsey, so his Klout score is 1 point higher? I’ll go out on a limb and state that 4000 incrementral RTs seem FAR less valuable than 1.5M incremental followers. Klout, the company, has fewer followers, total RTs and unique RTers by a factor of at least 6, but 7K more unique mentioners, so Klout’s Klout score is 4 points higher than Jacks? But if unique mentions are so valuable, how can Jack Dorsey have a lower score than Matt Cutts when he has 16K additional unique mentioners? This is just the start of the inconsistencies.

Without FB, MG Siegler’s score would likely be 10 points lower. Jeffrey Zeldman’s blog is super high quality, but does he deserve to have more Klout than David Pogue? Again, Facebook puts him over the top. I think that Klout’s score is far too high, though perhaps its not surprising Klout does well on its own metric. Finally, I included Tim Ferriss not just because I’m a huge fan, but his stats provide an interesting counterpoint for even more interesting pairwise comparisons. It will lead you to several more contradictions concerning the relative value of followers, RTs, unique RTers, and unique mentioners.

Group 4 (Klout >= 80)

Robert Scoble, @scobleizer – blogger, tech evangelist, and author
Perez Hilton, @perezhilton – master of celebrity gossip
Charlie Sheen, @charliesheen – #winning
Guy Kawasaki, @guykawasaki – entrepreneur and former Chief Evangelist at Apple
Justin Bieber, @justinbieber – never saying never

The pairings of Scoble/Hilton and Sheen/Kawasaki again demonstrate the severe miscalibration regarding Facebook scores. Also, I’m not sure I trust any system which has Justin Bieber as most influential.

In conclusion, there are some serious inconsistencies with Klout that render it nearly meaningless in some circumstances. It often does not correctly order individuals in terms of how influential they are, is easy to game higher simply by adding a Facebook account, and does not respect some very basic monotonicity rules. Put simply, it acts like a derived measurement. From this analysis, I have gleaned the following rough rules of thumb for understanding your Klout score:

  • Connecting an additional account (ie Facebook) will ALWAYS increase your Klout.
  • The degree to which your followers are influential seems to be irrelevant or matter very little
  • The differential between number of people you follow seems to be irrelevant or matter very little.
  • In terms of value to your Klout score: follow < RT < unique RT < unique mention but this can be inconsistent
  • In terms of value to your Klout score: like < comment but this can be inconsistent

To be fair, Klout does not want their score to be completely transparent. Then it would be easy to rip off and even easier to game. That being said it should be possible respect the three conditions I enumerated and still keep a lid on their secret sauce. As I have time, I’m going to mess around with the Klout API a bit and gather more comprehensive data to further demonstrate the points made in this post, including a similar study concerning the Klout of companies/brands. Additionally, I will submit several questions regarding my analysis to Joe Fernandez’s (the CEO of Klout) Klout chat, and hope the company follows up. I’ll post any details/answers I receive here.

More info about Klout can be found in Techcrunch articles about their initial launch, series A funding, series B funding, addition of Facebook to their ranking, and their crunchbase profile.

As any good statistician should, I need to qualify my analysis. There is of course selection bias in the examples enumerated above. Although not as egregious, these head scratching scores are the rule, not the exception. All data was pulled on 5/29/11, and may not reflect current scores. Finally, please remember that this is my personal blog and reflects my opinion alone. In particular, it does not reflect the opinions of any employer past, present, or future.