Event Logging: Analyzing Funnels, Retention & Viral Spread
Notes on the "Event Logging: Analyzing Funnels, Retention & Viral Spread" presentation at MongoSF:Dream: A general framework for creating, deploying and analyzing A/B tests in terms of Funnel, Virality and Retention Hadoop, is time intensive to write, buggy, and only works really well for matrix inversions. Count how many events per bucket? Mongo can do data processing in real time: event_db.event_counts.update({ "$inc": {count_key: 1}}, upsert=True, multi=True) Map/Reduce is pretty fast in Mongo When you want to do unique counting, you break up your map reduce tasks As soon as we realized that Mongo Map Reduce was really fast, we ended up running periodic map reduce jobs. Big deletes slow things down in Mongo, if you have a data set that can grow very large that you need to delete things from, use capped collection, those are great for log file analysis Replaced a memcached server with MongoDB
- Major questions:
- Who does what, and how? = Funnels
- How valuable are groups of users? = Virality
- Are our changes working? = Retention / Funnel Conversion
- select event_name, bucket, count(*) from events group by event_name, bucket;
- Mongo: for small data sets collection.group
- Mongo: for larget data sets mapReduce
- you can do any amount of data processing in the map and in the reduce
- Grab all the user objects, pull through the events (aggregate by user) - "this user in this bucket hit these steps"
- Sum the steps