Event Logging: Analyzing Funnels, Retention & Viral Spread

Notes on the "Event Logging: Analyzing Funnels, Retention & Viral Spread" presentation at MongoSF:
  • Major questions:
  • Who does what, and how? = Funnels
  • How valuable are groups of users? = Virality
  • Are our changes working? = Retention / Funnel Conversion
  • Dream: A general framework for creating, deploying and analyzing A/B tests in terms of Funnel, Virality and Retention
  • Hadoop, is time intensive to write, buggy, and only works really well for matrix inversions.
  • Count how many events per bucket?
    • select event_name, bucket, count(*) from events group by event_name, bucket;
    • Mongo: for small data sets collection.group
    • Mongo: for larget data sets mapReduce
    • you can do any amount of data processing in the map and in the reduce
  • Mongo can do data processing in real time: event_db.event_counts.update({ "$inc": {count_key: 1}}, upsert=True, multi=True)
  • Map/Reduce is pretty fast in Mongo
  • When you want to do unique counting, you break up your map reduce tasks
    • Grab all the user objects, pull through the events (aggregate by user) - "this user in this bucket hit these steps"
    • Sum the steps
  • As soon as we realized that Mongo Map Reduce was really fast, we ended up running periodic map reduce jobs.
  • Big deletes slow things down in Mongo, if you have a data set that can grow very large that you need to delete things from, use capped collection, those are great for log file analysis
  • Replaced a memcached server with MongoDB