MongoDB
- MongoDB
- What is MongoDB?
- Key MongoDB Features
- Installation
- Configure MongoDB
- MongoDB default ports
- Controlling MongoDB
- Using MongoDB
- Database From File System Perspective
- Security
- Backup and Restore
- MongoDB Drivers and Client Libraries
- Monitoring
- Web Interface (REST)
- Troubleshooting
What is MongoDB?
Overview
MongoDB is a document database that provides high performance, high availability, and easy scalability.
Document Database
- Documents (objects) map nicely to programming language data types.
- Embedded documents and arrays reduce need for joins.
- Dynamic schema makes polymorphism easier.
High Performance
- Embedding makes reads and writes fast.
- Indexes can include keys from embedded documents and arrays.
- Optional streaming writes (no acknowledgements).
High Availability
- Replicated servers with automatic master failover.
Easy Scalability
- Automatic sharding distributes collection data across machines.
- Eventually-consistent reads can be distributed over replicated servers.
Key MongoDB Features
MongoDB focuses on flexibility, power, speed, and ease of use:
- Flexibility
MongoDB stores data in JSON documents (which we serialize to BSON). JSON provides a rich data model that seamlessly maps to native programming language types, and the dynamic schema makes it easier to evolve your data model than with a system with enforced schemas such as a RDBMS. - Power
MongoDB provides a lot of the features of a traditional RDBMS such as secondary indexes, dynamic queries, sorting, rich updates, upserts (update if document exists, insert if it doesn’t), and easy aggregation. This gives you the breadth of functionality that you are used to from an RDBMS, with the flexibility and scaling capability that the non-relational model allows. - Speed/Scaling
By keeping related data together in documents, queries can be much faster than in a relational database where related data is separated into multiple tables and then needs to be joined later. MongoDB also makes it easy to scale out your database. Autosharding allows you to scale your cluster linearly by adding more machines. It is possible to increase capacity without any downtime, which is very important on the web when load can increase suddenly and bringing down the website for extended maintenance can cost your business large amounts of revenue. - Ease of use
MongoDB works hard to be very easy to install, configure, maintain, and use. To this end, MongoDB provides few configuration options, and instead tries to automatically do the “right thing” whenever possible. This means that MongoDB works right out of the box, and you can dive right into developing your application, instead of spending a lot of time fine-tuning obscure database configurations.
Operations
MongoDB is a server process that runs on Linux, Windows and OS X. It can be run both as a 32 or 64-bit application. 64-bit mode is recommended, since MongoDB is limited to a total data size of about 2GB for all databases in 32-bit mode.
The MongoDB process listens on port 27017 by default (note that this can be set at start time - please see mongod options for more information).
Clients connect to the MongoDB process, optionally authenticate themselves if security is turned on, and perform a sequence of actions, such as inserts, queries and updates.
MongoDB stores its data in files (default location is /data/db/), and uses memory mapped files for data management for efficiency.
MongoDB can also be configured for data replication.
For more information on MongoDB administration, please see the administration guide.
Installation
Ubuntu - Upstart (/etc/init)
Add the 10gen APT repository
echo "deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen" | sudo tee /etc/apt/sources.list.d/mongodb.list
Add the 10gen GPG key
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
Install the latest stable MongoDB
# Update the system apt-get update && apt-get dist-upgrade # Install mongoDB package sudo apt-get install mongodb-10gen
For Red Hat Enterprise Linux & Oracle Linux
NOTE: Package options
The 10gen repository contains three packages
- mongodb-10gen
This package contains the latest stable release. Use this for production deployments. - mongodb20-10ge
This package contains the stable release of v2.0 branch. - mongodb18-10gen
This package contains the stable release of v1.8 branch.
Configure MongoDB
These packages configure MongoDB using the /etc/mongodb.conf file in conjunction with the control script. You will find the control script is at /etc/init.d/mongodb.
This MongoDB instance will store its data files in the /var/lib/mongodb and its log files in /var/log/mongodb, and run using the mongodb user account.
Note If you change the user that runs the MongoDB process, you will need to modify the access control rights to the /var/lib/mongodb and /var/log/mongodb directories.
MongoDB default ports
By default, listens for connections on the following ports
- 27017
This is the default port mongod and mongos instances. You can change this port with port or --port. - 27018
This is the default port when running with --shardsvr runtime operation or shardsvr setting. - 27019
This is the default port when running with --configsvr runtime operation or configsvr setting. - 28017
This is the default port for the web status page. This is always accessible at a port that is 1000 greater than the port determined by port.
By default MongoDB programs (i.e. mongos and mongod) will bind to all available network interfaces (i.e. IP addresses) on a system. To change this use --bind_ip when starting mongod from command line or set bind_ip = IP_ADDRESS in mongodb.conf.
Controlling MongoDB
Starting MongoDB
sudo service mongodb start
You can verify that mongod has started successfully by checking the contents of the log file at /var/log/mongodb/mongodb.log.
Stopping MongoDB
sudo service mongodb stop
Restarting MongoDB
sudo service mongodb restart
Controlling mongos
As of the current release, there are no control scripts for mongos. mongos is only used in sharding deployments and typically do not run on the same systems where mongod runs. You can use the mongodb script referenced above to derive your own mongos control script.
Using MongoDB
Among the tools included with the MongoDB package, is the mongo shell (JavaScript Shell). Connect to the MongoDB instance by running
mongo
This will connect to the database running on the localhost interface by default. At the mongo prompt, issue the following two commands to insert a record in the “test” collection of the (default) “test” database.
> db.test.save( { a: 1 } ) > db.test.find()
See more
mongo Shell JavaScript Quick Reference
Database From File System Perspective
Database test on the file system
/var/lib/mongodb . ├── journal │ ├── j._0 │ ├── lsn │ ├── prealloc.1 │ └── prealloc.2 ├── mongod.lock ├── test.0 ├── test.1 └── test.ns
Files explained
- test.0 and test.1
are pre-allocated data files, with smallfiles = true, they are 16MB and 32 MB. - test.ns
The ".ns" files are namespace files. Each collection and index would count as a namespace. Each namespace is 628 bytes, the .ns file is 16MB by default. Thus if every collection had one index, we can create up to 12,000 collections. The --nssize parameter allows you to increase this limit. Maximum .ns file size is 2GB.
Security
For Linux, use iptables (userspace tool for netfilter) to secure MongoDB.
See iptables
Refer to Configure Linux iptables Firewall for MongoDB for details.
Backup and Restore
Backup strategy for MongoDB.
mongodump and mongorestore
Use mongodump and mongorestore to Backup and Restore MongoDB Databases
Basic mongodump operations
The mongodump utility can perform a live backup of data or can work against an inactive set of database files. The mongodump utility can create a dump for an entire server/database/collection (or part of a collection using of query), even when the database is running and active. If you run mongodump without any arguments, the command connects to the local database instance (e.g. 127.0.0.1 or localhost) and creates a database backup named dump/ in the current directory.
To limit the amount of data included in the database dump, you can specify --db and --collection as options to the mongodump command.
mongodump --collection collection --db test
This command creates a dump of the collection named collection from the database test in a dump/ subdirectory of the current working directory.
Point in Time Operation Using Oplogs
Use the --oplog option with mongodump to collect the oplog entries to build a point-in-time snapshot of a database within a replica set. With --oplog, mongodump copies all the data from the source database as well as all of the oplog entries from the beginning of the backup procedure to until the backup procedure completes. This backup procedure, in conjunction with mongorestore --oplogReplay, allows you to restore a backup that reflects a consistent and specific moment in time.
Create Backups Without a Running mongod Instance
If MongoDB instance is not running, use the --dbpath option to specify the location to the MongoDB instance’s database files. mongodump reads from the data files directly with this operation. This locks the data directory to prevent conflicting writes. The mongod process must not be running or attached to these data files when you run mongodump in this configuration.
Consider the following example:
mongodump --dbpath /srv/mongodb
Create Backups from Non-Local mongod Instances
The --host and --port options for mongodump allow you to specify a non-local host to connect to capture the dump.
Consider the following example:
mongodump --host drbd1.au.oracle.com --port 27017 --username user --password pass --out /opt/backup/mongodump-$(date -d "today" +"%Y%m%d")
On any mongodump command you may, as above, specify username and password credentials to specify database authentication.
Restore a Database using mongorestore
The mongorestore utility restores a binary backup created by mongodump.
Consider the following example command
mongorestore dump-2013-02-26/
Here, mongorestore imports the database backup located in the dump-2013-02-26 directory to the mongod instance running on the localhost interface. By default, mongorestore looks for a database dump in the dump/ directory and restores that. If you wish to restore to a non-default host, the --host and --port options allow you to specify a non-local host to connect to capture the dump.
Consider the following example
mongorestore --host drbd1.au.oracle.com --port 27017 --username user --password pass /opt/backup/mongodump-2013-02-26
On any mongorestore command you may specify username and password credentials, as above.
Restore Point in Time Oplog Backup
If you created your database dump using the --oplog option to ensure a point-in-time snapshot, call mongorestore with the --oplogReplay option, as in the following example:
mongorestore --oplogReplay
You may also consider using the mongorestore --objcheck option to check the integrity of objects while inserting them into the database, or you may consider the mongorestore --drop option to drop each collection from the database before restoring from backups.
Restore a Subset of data from a Binary Database Dump
mongorestore also includes the ability to a filter to all input before inserting it into the new database.
Consider the following example
mongorestore --filter '{"field": 1}'
Here, mongorestore only adds documents to the database from the dump located in the dump/ folder if the documents have a field name field that holds a value of 1. Enclose the filter in single quotes (e.g. ') to prevent the filter from interacting with your shell environment.
Restore without a Running mongod
mongorestore can write data to MongoDB data files without needing to connect to a mongod directly.
mongorestore --dbpath /srv/mongodb --journal
Here, mongorestore restores the database dump located in dump/ folder into the data files located at /srv/mongodb. Additionally, the --journal option ensures that mongorestore records all operation in the durability journal. The journal prevents data file corruption if anything (e.g. power failure, disk failure, etc.) interrupts the restore operation.
File System snapshot
Filesystem snapshots, or "block-level" backup methods use system level tools to create copies of the device that holds MongoDB’s data files. These methods complete quickly and work reliably, but require more system configuration outside of MongoDB.
Snapshots work by creating pointers between the live data and a special snapshot volume. These pointers are theoretically equivalent to hard links.
As the working data diverges from the snapshot, the snapshot process uses a copy-on-write strategy. As a result the snapshot only stores modified data.
After making the snapshot, you mount the snapshot image on your file system and copy data from the snapshot. The resulting backup contains a full copy of all data.
- ZFS - Solaris or FreeBSD / FreeNAS
- Btrfs - Linux
- Linux - LVM snapshot
- DRBD (Distributed Replicated Block Device) - Linux (Network based RAID-1)
For Btrfs, create a snapshot, it is a subvolume with some initial contents. It can be archived and send over the network (ZFS like send /receive is still under way). Restore is easy, just use rsync.
MongoDB Drivers and Client Libraries
Monitoring
Utilities
- mongotop
- mongostat
- REST Interface (port 28017)
Need to set rest = true in mongodb.conf
Statistics
- serverStatus
- replSetGetStatus
- dbStats
- collStats
3rd Party Tools
- Ganglia
- Motop
- mtop
- Munin
- Nagios
- Zabbix
More: DevOps
Web Interface (REST)
Enable Web Interface (REST), in /etc/mongodb.conf add
rest = true
Restart MongoDB (mongod)
Simple REST Interface examples
- Get the contents of a collecion (trailing slash)
http://hostname:28017/databaseName/collectionName/ - Add a limit
http://hostname:28017/databaseName/collectionName/?limit=-10 - Skip
http://hostname:28017/databaseName/collectionName/?skip=5 - Query {a : 1}, {name : terry}}
http://hostname:28017/databaseName/collectionName/?filter_a=1
http://hostname:28017/databaseName/collectionName/?filter_name=terry - Separate conditions with an &
http://hostname:28017/databaseName/collectionName/?filter_name=terry&limit=-10 - Same as db.$cmd.findOne({listDatabase:1}) on the admin database in the shell
http://hostname:28017/admin/$cmd/?filter_listDatabases=1&limit=1 - Count documents in a collection
http://host:port/databaseName/$cmd/?filter_count=collectionName&limit=1
Troubleshooting
1. Cannot start mongodb, do a tail -f /var/log/mongodb/mongodb.log and start again, see the following
Wed Feb 13 11:24:45 [initandlisten] ERROR: Insufficient free space for journal files Wed Feb 13 11:24:45 [initandlisten] Please make at least 3379MB available in /var/lib/mongodb/journal or use --smallfiles Wed Feb 13 11:24:45 [initandlisten] Wed Feb 13 11:24:45 [initandlisten] exception in initAndListen: 15926 Insufficient free space for journals, terminating Wed Feb 13 11:24:45 dbexit:
/var/lib does NOT have enough space for pre-allocation.
Edit /etc/mongodb.conf (remember to delete /var/lib/mongodb/journal to free up disk space) add the following and start mongodb again.
smallerfiles = true
Admin web console on port 28017.
2. Cannot connect to mongodb from a different host using mongo shell
By default mongodb binds to loopback interface ONLY for security reasons. Comment out the following line to bind all interfaces
#bind_ip = 127.0.0.1 #port = 27017
NOTE: it is a good practice to bind ONLY 1 interface to limit access to mongodb.
Reference