Pymongo: distinct items and an example map reduce on subset of db

Posted: June 7th, 2011 | Author: | Filed under: mongodb, pymongo, Python | 6 Comments »

I’ve been playing around with Pymongo for a few weeks now, and I’m slowly discovering quirks and differences in syntax relative to the mongodb shell. The two I’ll cover in this post are:

  1. Using distinct in pymongo
  2. map reduce example in pymongo on a queried subset of the db

Using distinct in pymongo

Let’s say you want to group users by some sort of id on a day (I’ll use May 21 as an example). From the mongodb shell this command is simply:

db.raw_data.distinct(“id”, {“_date”: “2011-05-21″})

Running this command from within a Python file yields the following error

File “test.py”, line 13, in
foo = db.raw_data.distinct(“id”, {“_date”: “2011-05-21″})
TypeError: distinct() takes exactly 2 arguments (3 given)

It turns out Pymongo makes you do the find and then distinct the records

foo = db.raw_data.find({“_date”: “2011-05-21″}).distinct(“id”)

This is exactly what the mongodb shell interpreter is doing, its just annoying that the syntax is different.

map reduce example in pymongo on a queried subset of the db

Following the example from my previous post, you simply add query = {} and add out = for the collection in which you want your results to end up. Most of the examples I found on stack overflow or personal blogs were wrong or tried to pass these parameters in together. I tried roughly 865868 variations and what I have below is the only combination that worked.

#!/usr/bin/env python
from bson.code import Code # for some this needs to be pymongo.bson
from pymongo import Connection

# code for example map/reduce
db = Connection().map_reduce_example
map = Code(“function () {”
“emit(this.id, 1)”
“}”)
reduce = Code(“function (key, values) {”
” var total = 0;”
” for (var i = 0; i < values.length; i++) {" " total += values[i];" " }" " return total;" "}") # code without query result = db.things.map_reduce(map, reduce, "map_reduce_example") # code with simple query result = db.raw_data.map_reduce(map, reduce, out = "map_reduce_example", query = {"date": "2011-05-21"}) # code with query that grabs all records from May 2011 result = db.raw_data.map_reduce(map, reduce, out = "map_reduce_example2", query = {"date": {"$regex": "^2011-05"}})

This may seem like a silly and simple blog post to write, but this wasn’t documented anywhere else online so I wanted to save anyone else with the distinct error or trying to run a map reduce with a query 5-10 minutes.


6 Comments on “Pymongo: distinct items and an example map reduce on subset of db”

  1. 1 Sal said at 3:14 pm on July 18th, 2011:

    Nope, not silly. Thank you! I have failed to find any pymongo examples of result code beyond the simplest (map,reduce) stuff.

    Also, the comments in collection.py for map_reduce are helpful:
    https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/collection.py

  2. 2 Shadab said at 7:04 am on January 22nd, 2014:

    Indeed helped me save my head tearing time.
    Thanks :-)

  3. 3 Eric Brown said at 7:43 am on April 21st, 2014:

    Silly? Nope. Here it is 2014 and there is very little Python Map/Reduce stuff on the interwebz. Thanks for sharing.

  4. 4 Kam said at 4:47 am on July 23rd, 2014:

    Like other posters said, not silly. I was trying to figure out why distinct() kept throwing that error. Thanks to your post I figured it out quicker, before I rage quit.

  5. 5 Dan Marsh said at 11:17 am on August 4th, 2014:

    Wow! What a mega-fail on the part of pymongo to not have that documented anywhere. Unbelievable.

  6. 6 kikuso said at 11:12 am on February 8th, 2015:

    Definitively not silly at all.

    Already 2015 and this is still one of very few usefull documents about map reduce // pymongo.

    THANKS A LOT


Leave a Reply