Terry : MongoDB

MongoDB

What is MongoDB?

Overview

MongoDB is a document database that provides high performance, high availability, and easy scalability.

Document Database

  • Documents (objects) map nicely to programming language data types.
  • Embedded documents and arrays reduce need for joins.
  • Dynamic schema makes polymorphism easier.

High Performance

  • Embedding makes reads and writes fast.
  • Indexes can include keys from embedded documents and arrays.
  • Optional streaming writes (no acknowledgements).

High Availability

  •  Replicated servers with automatic master failover.

 Easy Scalability

  •  Automatic sharding distributes collection data across machines.
  • Eventually-consistent reads can be distributed over replicated servers.

Key MongoDB Features

MongoDB focuses on flexibility, power, speed, and ease of use:

  • Flexibility
    MongoDB stores data in JSON documents (which we serialize to BSON). JSON provides a rich data model that seamlessly maps to native programming language types, and the dynamic schema makes it easier to evolve your data model than with a system with enforced schemas such as a RDBMS.
  • Power
    MongoDB provides a lot of the features of a traditional RDBMS such as secondary indexes, dynamic queries, sorting, rich updates, upserts (update if document exists, insert if it doesn’t), and easy aggregation. This gives you the breadth of functionality that you are used to from an RDBMS, with the flexibility and scaling capability that the non-relational model allows.
  • Speed/Scaling
    By keeping related data together in documents, queries can be much faster than in a relational database where related data is separated into multiple tables and then needs to be joined later. MongoDB also makes it easy to scale out your database. Autosharding allows you to scale your cluster linearly by adding more machines. It is possible to increase capacity without any downtime, which is very important on the web when load can increase suddenly and bringing down the website for extended maintenance can cost your business large amounts of revenue.
  • Ease of use
    MongoDB works hard to be very easy to install, configure, maintain, and use. To this end, MongoDB provides few configuration options, and instead tries to automatically do the “right thing” whenever possible. This means that MongoDB works right out of the box, and you can dive right into developing your application, instead of spending a lot of time fine-tuning obscure database configurations.

Operations

MongoDB is a server process that runs on Linux, Windows and OS X. It can be run both as a 32 or 64-bit application. 64-bit mode is recommended, since MongoDB is limited to a total data size of about 2GB for all databases in 32-bit mode.

The MongoDB process listens on port 27017 by default (note that this can be set at start time - please see mongod options for more information).

Clients connect to the MongoDB process, optionally authenticate themselves if security is turned on, and perform a sequence of actions, such as inserts, queries and updates.

MongoDB stores its data in files (default location is /data/db/), and uses memory mapped files for data management for efficiency.

 MongoDB can also be configured for data replication.

For more information on MongoDB administration, please see the administration guide.

Installation

Ubuntu - Upstart (/etc/init)

Add the 10gen APT repository

echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen" | sudo tee /etc/apt/sources.list.d/mongodb.list

Add the 10gen GPG key

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10

Install the latest stable MongoDB

# Update the system
apt-get update && apt-get dist-upgrade
# Install mongoDB package
sudo apt-get install mongodb-10gen

For Red Hat Enterprise Linux & Oracle Linux

NOTE: Package options

The 10gen repository contains three packages

  • mongodb-10gen
    This package contains the latest stable release. Use this for production deployments.
  • mongodb20-10ge
    This package contains the stable release of v2.0 branch.
  • mongodb18-10gen
    This package contains the stable release of v1.8 branch.

Configure MongoDB

These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.

This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.

Note If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.

MongoDB default ports

By default, listens for connections on the following ports

  • 27017
    This is the default port mongod and mongos instances. You can change this port with port or --port.
  • 27018
    This is the default port when running with --shardsvr runtime operation or shardsvr setting.
  • 27019
    This is the default port when running with --configsvr runtime operation or configsvr setting.
  • 28017
    This is the default port for the web status page. This is always accessible at a port that is 1000 greater than the port determined by port.

By default MongoDB programs (i.e. mongos and mongod) will bind to all available network interfaces (i.e. IP addresses) on a system. To change this use --bind_ip when starting mongod from command line or set bind_ip = IP_ADDRESS in mongodb.conf.

Controlling MongoDB

Starting MongoDB

sudo service mongodb start

You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.

Stopping MongoDB

sudo service mongodb stop

Restarting MongoDB

sudo service mongodb restart

Controlling mongos

As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.

Using MongoDB

Among the tools included with the MongoDB package, is the mongo shell (JavaScript Shell). Connect to the MongoDB instance by running

mongo

This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database.

> db.test.save( { a: 1 } )
> db.test.find()

See more

Using the mongo Shell

mongo Shell JavaScript Quick Reference

Database From File System Perspective

Database test on the file system

/var/lib/mongodb
.
├── journal
│   ├── j._0
│   ├── lsn
│   ├── prealloc.1
│   └── prealloc.2
├── mongod.lock
├── test.0
├── test.1
└── test.ns

Files explained

  • test.0 and test.1
    are pre-allocated data files, with smallfiles = true, they are 16MB and 32 MB.
  • test.ns
    The ".ns" files are namespace files. Each collection and index would count as a namespace. Each namespace is 628 bytes, the .ns file is 16MB by default. Thus if every collection had one index, we can create up to 12,000 collections. The --nssize parameter allows you to increase this limit. Maximum .ns file size is 2GB.

Security

For Linux, use iptables (userspace tool for netfilter) to secure MongoDB.

See iptables

Refer to Configure Linux iptables Firewall for MongoDB for details.

Backup and Restore

Backup strategy for MongoDB.

mongodump and mongorestore

Use mongodump and mongorestore to Backup and Restore MongoDB Databases

Basic mongodump operations

The mongodump utility can perform a live backup of data or can work against an inactive set of database files. The mongodump utility can create a dump for an entire server/database/collection (or part of a collection using of query), even when the database is running and active. If you run mongodump without any arguments, the command connects to the local database instance (e.g. 127.0.0.1 or localhost) and creates a database backup named dump/ in the current directory.

Icon

The format of data created by mongodump tool from the 2.2 distribution or later is different and incompatible with earlier versions of mongod.

To limit the amount of data included in the database dump, you can specify --db and --collection as options to the mongodump command.

mongodump --collection collection --db test

This command creates a dump of the collection named collection from the database test in a dump/ subdirectory of the current working directory.

Point in Time Operation Using Oplogs

Use the --oplog option with mongodump to collect the oplog entries to build a point-in-time snapshot of a database within a replica set. With --oplog, mongodump copies all the data from the source database as well as all of the oplog entries from the beginning of the backup procedure to until the backup procedure completes. This backup procedure, in conjunction with mongorestore --oplogReplay, allows you to restore a backup that reflects a consistent and specific moment in time.

Create Backups Without a Running mongod Instance

If MongoDB instance is not running, use the --dbpath option to specify the location to the MongoDB instance’s database files. mongodump reads from the data files directly with this operation. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongodump in this configuration.

Consider the following example:

mongodump --dbpath /srv/mongodb
Create Backups from Non-Local mongod Instances

The --host and --port options for mongodump allow you to specify a non-local host to connect to capture the dump.

Consider the following example:

mongodump --host drbd1.au.oracle.com --port 27017 --username user --password pass --out /opt/backup/mongodump-$(date -d "today" +"%Y%m%d")

On any mongodump command you may, as above, specify username and password credentials to specify database authentication.

Restore a Database using mongorestore

The mongorestore utility restores a binary backup created by mongodump.

Consider the following example command

mongorestore dump-2013-02-26/

Here, mongorestore imports the database backup located in the dump-2013-02-26 directory to the mongod instance running on the localhost interface. By default, mongorestore looks for a database dump in the dump/ directory and restores that. If you wish to restore to a non-default host, the --host and --port options allow you to specify a non-local host to connect to capture the dump.

Consider the following example

mongorestore --host drbd1.au.oracle.com --port 27017 --username user --password pass /opt/backup/mongodump-2013-02-26

On any mongorestore command you may specify username and password credentials, as above.

Restore Point in Time Oplog Backup

If you created your database dump using the --oplog option to ensure a point-in-time snapshot, call mongorestore with the --oplogReplay option, as in the following example:

mongorestore --oplogReplay

You may also consider using the mongorestore --objcheck option to check the integrity of objects while inserting them into the database, or you may consider the mongorestore --drop option to drop each collection from the database before restoring from backups.

Restore a Subset of data from a Binary Database Dump

mongorestore also includes the ability to a filter to all input before inserting it into the new database.

Consider the following example

mongorestore --filter '{"field": 1}'

Here, mongorestore only adds documents to the database from the dump located in the dump/ folder if the documents have a field name field that holds a value of 1. Enclose the filter in single quotes (e.g. ') to prevent the filter from interacting with your shell environment.

Restore without a Running mongod

mongorestore can write data to MongoDB data files without needing to connect to a mongod directly.

mongorestore --dbpath /srv/mongodb --journal

Here, mongorestore restores the database dump located in dump/ folder into the data files located at /srv/mongodb. Additionally, the --journal option ensures that mongorestore records all operation in the durability journal. The journal prevents data file corruption if anything (e.g. power failure, disk failure, etc.) interrupts the restore operation.

File System snapshot

Filesystem snapshots, or "block-level" backup methods use system level tools to create copies of the device that holds MongoDB’s data files. These methods complete quickly and work reliably, but require more system configuration outside of MongoDB.

Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to hard links.

As the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result the snapshot only stores modified data.

After making the snapshot, you mount the snapshot image on your file system and copy data from the snapshot. The resulting backup contains a full copy of all data.

  • ZFS - Solaris or FreeBSD / FreeNAS
  • Btrfs - Linux
  • Linux - LVM snapshot
  • DRBD (Distributed Replicated Block Device) - Linux (Network based RAID-1)

For Btrfs, create a snapshot, it is a subvolume with some initial contents. It can be archived and send over the network (ZFS like send /receive is still under way). Restore is easy, just use rsync.

MongoDB Drivers and Client Libraries

Java

Ruby

Monitoring

Utilities

  • mongotop
  • mongostat
  • REST Interface (port 28017)
    Need to set rest = true in mongodb.conf

Statistics

  • serverStatus
  • replSetGetStatus
  • dbStats
  • collStats

3rd Party Tools

  • Ganglia
  • Motop
  • mtop
  • Munin
  • Nagios
  • Zabbix

More: DevOps

Web Interface (REST)

Enable Web Interface (REST), in /etc/mongodb.conf add

rest = true

Restart MongoDB (mongod)

Simple REST Interface examples

Troubleshooting

1. Cannot start mongodb, do a tail -f /var/log/mongodb/mongodb.log and start again, see the following

Wed Feb 13 11:24:45 [initandlisten] ERROR: Insufficient free space for journal files
Wed Feb 13 11:24:45 [initandlisten] Please make at least 3379MB available in /var/lib/mongodb/journal or use --smallfiles
Wed Feb 13 11:24:45 [initandlisten] 
Wed Feb 13 11:24:45 [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating
Wed Feb 13 11:24:45 dbexit:

/var/lib does NOT have enough space for pre-allocation.

Edit /etc/mongodb.conf (remember to delete /var/lib/mongodb/journal to free up disk space) add the following and start mongodb again.

smallerfiles = true

Admin web console on port 28017.

2. Cannot connect to mongodb from a different host using mongo shell

By default mongodb binds to loopback interface ONLY for security reasons. Comment out the following line to bind all interfaces

#bind_ip = 127.0.0.1
#port = 27017

NOTE: it is a good practice to bind ONLY 1 interface to limit access to mongodb.

Reference

Introduction to MongoDB

The MongoDB Manual

SQL to MongoDB Mapping Chart

Using the mongo Shell