mongorestore error: don’t know what to do with…

Posted: August 19th, 2011 | Author: | Filed under: mongodb | 2 Comments »

I like to develop for mongodb on my local machine to make sure everything is fast and won’t nuke one of our production dbs. To do this I often use the mongodump command to grab a collection and then load it locally so I can work with a snapshot of production data. Here is a problem I ran into recently that I thought would be worth blogging about. First I dumped the collection rob_job_metric:

mongodump –db data –collection rob_job_metric –out – > rob_job_metric.mongodump

Aftering moving it to my local machine, I tried mongorestore:

alex@Alexander-Braunsteins-iMac-2:scripts> ~/Downloads/mongodb-osx-x86_64-1.8.1/bin/mongorestore ~/Desktop/scripts/rob_job_metric.mongodump
connected to: ###########
don’t know what to do with [/Users/alex/Desktop/scripts/rob_job_metric.mongodump]

Really mongodb? I’m pretty sure you know what to do with this file as you just created it. After messing around a bit I discovered that mongorestore requires the file to end with .bson, even though the file was already in that format, just not named so:

alex@Alexander-Braunsteins-iMac-2:scripts> mv rob_job_metric.mongodump rob_job_metric.mongodump.bson
alex@Alexander-Braunsteins-iMac-2:scripts> ~/Downloads/mongodb-osx-x86_64-1.8.1/bin/mongorestore ~/Desktop/scripts/rob_job_metric_unique.mongodump.bson
connected to: ###########
Wed Aug 17 10:29:29 /Users/alex/Desktop/scripts/rob_job_metric.mongodump.bson
Wed Aug 17 10:29:29 going into namespace [scripts.rob_job_metric.mongodump]
22752893/23620007 96%
Wed Aug 17 10:29:32 113872 objects found


Pymongo: distinct items and an example map reduce on subset of db

Posted: June 7th, 2011 | Author: | Filed under: mongodb, pymongo, Python | 6 Comments »

I’ve been playing around with Pymongo for a few weeks now, and I’m slowly discovering quirks and differences in syntax relative to the mongodb shell. The two I’ll cover in this post are:

  1. Using distinct in pymongo
  2. map reduce example in pymongo on a queried subset of the db

Using distinct in pymongo

Let’s say you want to group users by some sort of id on a day (I’ll use May 21 as an example). From the mongodb shell this command is simply:

db.raw_data.distinct(“id”, {“_date”: “2011-05-21″})

Running this command from within a Python file yields the following error

File “test.py”, line 13, in
foo = db.raw_data.distinct(“id”, {“_date”: “2011-05-21″})
TypeError: distinct() takes exactly 2 arguments (3 given)

It turns out Pymongo makes you do the find and then distinct the records

foo = db.raw_data.find({“_date”: “2011-05-21″}).distinct(“id”)

This is exactly what the mongodb shell interpreter is doing, its just annoying that the syntax is different.

map reduce example in pymongo on a queried subset of the db

Following the example from my previous post, you simply add query = {} and add out = for the collection in which you want your results to end up. Most of the examples I found on stack overflow or personal blogs were wrong or tried to pass these parameters in together. I tried roughly 865868 variations and what I have below is the only combination that worked.

#!/usr/bin/env python
from bson.code import Code # for some this needs to be pymongo.bson
from pymongo import Connection

# code for example map/reduce
db = Connection().map_reduce_example
map = Code(“function () {”
“emit(this.id, 1)”
“}”)
reduce = Code(“function (key, values) {”
” var total = 0;”
” for (var i = 0; i < values.length; i++) {" " total += values[i];" " }" " return total;" "}") # code without query result = db.things.map_reduce(map, reduce, "map_reduce_example") # code with simple query result = db.raw_data.map_reduce(map, reduce, out = "map_reduce_example", query = {"date": "2011-05-21"}) # code with query that grabs all records from May 2011 result = db.raw_data.map_reduce(map, reduce, out = "map_reduce_example2", query = {"date": {"$regex": "^2011-05"}})

This may seem like a silly and simple blog post to write, but this wasn’t documented anywhere else online so I wanted to save anyone else with the distinct error or trying to run a map reduce with a query 5-10 minutes.


More fun with mongodb, map/reduce, and sorting records by value using pymongo

Posted: May 11th, 2011 | Author: | Filed under: mongodb, pymongo | 2 Comments »

A couple of weeks ago I promised a fun application in mongodb. That time has arrived. Suppose you have a collection of records and you want to group by some id (canonicalized url, user id, checkin venue) and see how many user actions (perhaps clicks, status updates, or checkins for the three examples listed) are associated with each. Those familiar with mongodb would ask, “Why not do this with the group() function?” Its limited to 20,000 unique ids. Many applications have more than this, so map/reduce is the way to go. Below I provide code to do this map reduce and then sort by value. From the shell sort is done as:

result.find().sort({u’value': -1}):

but if you run this from within python using the pymongo driver you will receive the error:

TypeError: if no direction is specified, key_or_list must be an instance of list

If this occurs make sure you sort with:

result.find().sort(u’value’, -1):

Some more hardcore/sophisticated mongodb examples/applications will be coming soon!

#!/usr/bin/env python

from bson.code import Code
from pymongo import Connection

# code for example map/reduce
db = Connection().map_reduce_example
map = Code(“function () {”
“emit(this.id, 1)”
“}”)
reduce = Code(“function (key, values) {”
” var total = 0;”
” for (var i = 0; i < values.length; i++) {" " total += values[i];" " }" " return total;" "}") result = db.things.map_reduce(map, reduce, "myresults") for doc in result.find().sort(u'value', -1): print doc


Playing around with mongodb

Posted: March 25th, 2011 | Author: | Filed under: mongodb | 1 Comment »

In my first post on this blog, I set a goal of filling the holes in my tech education. Goals 4 and 5 were to play around mapreduce/hadoop for large scale data processing and gain MySQL knowledge. I decided to squish them both into one and try out mongodb, which is deployed at companies like shutterfly, foursquare, intuit, and others.

So lets start from scratch. Download the correct distribution here and untar it (your version number may be different):

curl http://downloads.mongodb.org/osx/mongodb-osx-x86_64-1.8.0.tgz > mongo.tgz
tar xzf mongo.tgz

Now make a directory for the data, open a new terminal window in the same working directory, and start mongodb:

mkdir -p /data/db
./mongodb-osx-x86_64-1.8.0/bin/mongod

If you see: Error: couldn’t connect to server 127.0.0.1 shell/mongo.js:79 its because you did ./mongodb-osx-x86_64-1.8.0/bin/mongo, which connects to an existing database, rather than ./mongodb-osx-x86_64-1.8.0/bin/mongod, which initializes one.

In your other open terminal do:

./mongodb-osx-x86_64-1.8.0/bin/mongod

then you are good to start creating entries and querying.

db.foo.save( { a : 1 } )
db.foo.find()

Thats it. Also, here is a handy MySQL to mongodb conversion chart. I’m going to work on a fun and quasi-advanced stat function, but I’ll write that up in a separate post.