MongoDB Administration

Notes on the "Administration" presentation at Mongo SF

  • Mongo runs on almost every platform
  • Tools
  • mongod - database server (e.g. mysqld)
  • mongo - shell (e.g. mysql)
  • mongo uses stdout very heavily
    • Tries to make the logs really useful - verbosity level can be set 0-10
  • Use netstat to tell how many open connections have been made to the database
  • Try out the shell
  • mongod --fork = runs as a forked server
  • to backup the database, fsync + lock could be better: http://www.mongodb.org/display/DOCS/fsync+Command
  • Please use replication
  • delayed replication lets you replicate hours behinds
  • iostat - x 2 - to look at disk io
  • by default mongo listens on 208017
  • webconsole tells you how long things are going to take
  • db.serverstatus gives you all mongostat information
  • monitoring for mongo: munin / ganglia / nagios / cacti
  • db.help() shows database commands
  • Event Logging: Analyzing Funnels, Retention & Viral Spread

    Notes on the "Event Logging: Analyzing Funnels, Retention & Viral Spread" presentation at MongoSF:
    • Major questions:
    • Who does what, and how? = Funnels
    • How valuable are groups of users? = Virality
    • Are our changes working? = Retention / Funnel Conversion
  • Dream: A general framework for creating, deploying and analyzing A/B tests in terms of Funnel, Virality and Retention
  • Hadoop, is time intensive to write, buggy, and only works really well for matrix inversions.
  • Count how many events per bucket?
    • select event_name, bucket, count(*) from events group by event_name, bucket;
    • Mongo: for small data sets collection.group
    • Mongo: for larget data sets mapReduce
    • you can do any amount of data processing in the map and in the reduce
  • Mongo can do data processing in real time: event_db.event_counts.update({ "$inc": {count_key: 1}}, upsert=True, multi=True)
  • Map/Reduce is pretty fast in Mongo
  • When you want to do unique counting, you break up your map reduce tasks
    • Grab all the user objects, pull through the events (aggregate by user) - "this user in this bucket hit these steps"
    • Sum the steps
  • As soon as we realized that Mongo Map Reduce was really fast, we ended up running periodic map reduce jobs.
  • Big deletes slow things down in Mongo, if you have a data set that can grow very large that you need to delete things from, use capped collection, those are great for log file analysis
  • Replaced a memcached server with MongoDB
  • Ruby Development and MongoMapper

    Notes on the "Ruby Development and MongoMapper" session at MongoSF:
    • Posterous uses mongomapper
    • Schema is versioned with the rest of the code
    • MongoMapper includes "Set" and "Date" types that aren't supported by Mongo Mapper
    • You can define your own set: self.to_mongo / self.from_mongo
  • MongoMapper can do typeless
  • Uses a fork of validatable to do validations
  • callback (e.g. before_validation) <-- taken from ActiveSupport
  • Relationships work for both documents and embeded documents
  • MongoMapper is all plugin based, plugins are super easy to write
  • MongoMapper joint plugin -> gridfs handling
  • MongoMapper::Document.append_extensions(Module)
  • http://mongotips.com/
  • Real-Time Ecommerce Analytics at Gilt Group

    Notes from "Real-Time Ecommerce Analytics at Gilt Group" from MongoSF
    • Real-time analytics is a sweet spot for MongoDB
    • Gilt gets a ton of traffic at noon, that's when their sales start.
    • "None of our items page, because our users like to scan through"
    • "Things sell out, we have limited depth."
    • How do we improve the conversion of our gifts section?
    • Capture data in mongo
    • Analyze w/ Map Reduce
    • Update TXN systems
    • Repeat
    • Takes 30-35 minutes, mostly running map-reduce... we didn't want every visit to the page to be different... so we ran every 30 minutes
  • "We put Mongo on the server in december, haven't logged into the server since"
  • Pageview sends analytics info via AJAX
  • "Good failure mode" everything works except we don't get analytics
  • We just put into Mongo, if it's too slow, we'll add more nodes and run map reduce over the nodes
  • "Nothing production should hit a relational database at Gilt"
    • Key transaction systems run on Voldemort
  • Map-Reduce is waaay easier in Mongo than it is in Hadoop.  Running in Map-Reduce in mongo is 1-2 seconds.
  • "Relational databases for transactional systems don't work."
  • Node.js mongo
    • var mongo = require("lib/mongodb")
  • All node and mongo is run on a single EC2 server and runs about 2% of CPU
  • Practical Ruby Projects with Mongo DB

    Notes from the "Practical Ruby Projects" session at MongoSF
    • MongoDB doesn't have joins, if you want to do a join, you have to it yourself.
    • If scaling out is easy, who cares about the DB getting "too large"?
      • Why do I need to use less space and keep things in 3NF?
      • Who cares about saving space?  "I only care because MySQL is a pain to scale horizontally."
    • If you need db level transactions you shouldn't use Mongo?
    • I never use logs, but logs are only used when something goes wrong
    • Capped collection:
      • Fixed-size, auto-age out collections
      • Fixed insertion order
      • Super fast (faster than normal writes)
      • Ideal for logging and caching
    • bunyan - thin ruby layer around a MongoDB capped collection - http://github.com/ajsharp/bunyan
    • "I hate around filters"
    • Mongolytics - http://github.com/tpitale/mongolytics
    • db.posts.ensureIndex({'posts': 1})
    • db.posts.find({'tags.name': 'ruby'}) <== killer feature (search on properties of an array of embedded documents)
    • If you are going to go over the Mongo 4mb document object, use GridFS - http://www.mongodb.org/display/DOCS/GridFS+Specification
    • We use FactoryGirl for all our tests using MongoMapper - http://github.com/thoughtbot/factory_girl
    • Mongoid is another mongo ORM mapper that's similar to ActiveRecord

    Zero to Mongo in 60 Hours

    Notes from the Zero to Mongo in 60 Hours session of MongoSF.
    • MyPunchbowl.com uses Mongo for analytics and data tracking.
    • "MongoDB" gives me the warm fuzzies that Rails did.
    • http://railstips.org/blog/archives/2009/12/18/why-i-think-mongo-is-to-databases-what-rails-was-to-frameworks/
    • Very good support: had founders respond in 60 seconds at 11:30 at night.
    • mongo-ruby-driver - for playing with Mongo on Ruby
    • mongo shell - runs javascript
    • mongo-java-driver - is way faster than mongo-ruby-driver
    • mongo_mapper - is becoming the standard ruby ORM mapper
    • "Documents" look like JSON objects, you don't need to define a schema
    • No transactions, has atomic operations instead ("upsert")
    • Indexes in Mongo rock!
    • Can create composite indexes
    • Can add deep, embeded indexes on inner objects
  • Flexibility and performance in querying and aggregating
  • '$inc' => mongodb increment builtin
  • config/environments/test.rb -> $mongo_db = Mongo::Connection.new.db 'mongo-sf-db'
  • No transactions, so it needs to clean-up after itself
  • No database migrations, use them to add indexes (but Mongo Mapper can do that for you)
  • mongoexport to dump database (replication seems iffy)
  • Updating the db requires downtime
  • Mongo has an ordered hash (ruby)
  • Can do range based queries
  • mogosphinx - mongo's adhoc full text search is ok, but lacks infix
  • Limited Liability Companies - How long does it take?

    It's currently averaging 41 days to create an LLC in California: www.sos.ca.gov/business/be/mail-processing-times.htm

    Unbelievable.

    ---------- Forwarded message ----------
    From: Partnerships@sos.ca.gov <partnershipsmail@sos.ca.gov>
    Date: Wed, Apr 28, 2010 at 2:15 PM

    We received the referenced LLC document on March 30th, however, the document has not been processed.  Please note, due to the large volume of business entity filings received, response times will vary.  All documents received in this office are processed in chronological order.  Current mail processing times are provided on the Secretary of States Business Entities Mail Processing Times webpage at www.sos.ca.gov/business/be/mail-processing-times.htm.

     Business Filings
    (916) 653-2318


    Submitted: 4/22/2010 11:17 AM
    ------------------------------------------------------------------------
    Customer's Message:

    Hi,

    I mailed in the articles of incorporation for an LLC on March 26, but haven't heard back from the state yet.  How long does it typically take to get a response?

    Thanks!
    Gerad