The Mongo database system

The Mongo database system, MongoDB, has been in development since 2007. It has become the most widely used NoSQL database system. The name is a substring of the word humongous, suggesting that it works well with very large datasets (it has also been suggested that the name is after a character, played by the great NFL defensive tackle, Alex Karras, in the Mel Brooks movie, Blazing Saddles). The system is described as a document-oriented database.

You can download MongoDB from https://www.mongodb.com/download-center.

See the Appendix for details on installing MongoDB.

After installing MongoDB, start the database system with the mongo command, as shown in Figure 10-3. We are using the command line here. This is done using the Terminal app on a Mac, the Command Prompt on a PC, and a Shell window on a UNIX box.

The Mongo database system

Figure 10-3. Starting the Mongo database system from the command line

The output continues for many more lines, and then it pauses.

Set that command window aside, open a new one, and then execute the mongo command, as shown in Figure 10-4:

The Mongo database system

Figure 10-4. Starting the MongoDB shell from the command line

The mongo command starts up MongoDB shell, allowing the execution of MongoDB commands. Notice that the command prompt changes to an angle bracket >.

Notice the two startup warnings. The first refers to access control and can be remedied by defining an administrative user with a password. For details, go to:

https://docs.mongodb.com/master/tutorial/enable-authentication/.

The second warning refers to your operating system's user limit on the number of files that can be open at a time, 256 files in this case. This page refers to that:

https://docs.mongodb.com/manual/reference/ulimit/.

The Mongo database system

Figure 10-5. Creating a database in MongoDB

The commands executed in Figure 10-5 create a MongoDB database with a collection of three documents. The name of the database is friends.

The show dbs command at first shows only two databases in the system: the default databases, admin and local. The use friends command creates the friends database. Next, we create three documents, named tom, ann, and jim, and add them to the database with the db.friends.insert() statement. Note that, like Java, MongoDB shell statements must be terminated with a semicolon. The last show dbs command confirms that the friends database has been created.

The find() command is used in Figure 10-6.

The Mongo database system

Figure 10-6. Using the MongoDB find() and getTimestamp() methods

It lists the collection in the friends database. Note that each of the three documents has been given an ObjectID for its _id field. This object contains a timestamp of the instant when the document was created and it identifies the machine and process that created it. As shown, the getTimestamp() command shows the timestamp stored in the referenced object.

NoSQL databases are structurally different from relational databases (Rdbs) in many fundamental ways, but logically, there is some correspondence with their data structures. This is summarized in Table 10-2.

Relational Database

MongoDB

database

database

table (relation)

collection

row (record, tuple)

document

entry (field, element)

key-value pair

Table 10-2. Rdb and MongoDB data structures

A relational database can be imagined as mainly a set of tables; similarly, a MongoDB database is a set of collections. An Rdb table is a set of rows, each adhering to the datatype schema defined for that table. Analogously, a collection is a set of documents, each document being stored as a binary JSON file (a BSON file; JSON (JavaScript Object Notation) files are discussed in Chapter 2, Data Preprocessing). Finally, an Rdb table row is a sequence of data entries, one for each column, while a MongoDB document is a set of key-value pairs.

This data model is known as a document store, and is used by some of the other leading NoSQL database systems, such as IBM Domino and Apache CouchDB. In contrast, Apache Cassandra and HBase use the column store data model, where a column is defined to be a key-value pair with a timestamp.

Notice how much freer the data design process is with a NoSQL database as opposed to an Rdb. The Rdb requires a rather tightly defined preliminary data architecture that specifies tables, schema, datatypes, keys, and foreign keys before we can even create the tables. The only preliminary data design needed with MongoDB, before we begin inserting data, is the definitions of the collections.

In an Rdb w, if z is a data value for column y in table x, then x can be referenced as w.x.y.z; that is, the names of the database, the table, and the column can serve as namespace identifiers. In contrast, the corresponding namespace identifier in a MongoDB would be w.x.y.k.z, where x is the name of the collection, y is the name of the document, and k is the key in the key-value pair for x.

We saw previously that the MongoDB shell will automatically create a collection with the same name as the database itself if we use the command db.name.insert(); so, we already have a collection named friends. The code in Figure 10-7 illustrates how to create a collection explicitly, in this case one named relatives:

The Mongo database system

Figure 10-7. Creating a separate MongoDB collection

The show collections command shows that we have those two collections now.

Next, we insert three documents into our relatives collection. Notice the following features:

The next example, shown in Figure 10-8, illustrates a compound query:

The Mongo database system

Figure 10-8. A MongoDB compound query

The argument to the find() command contains a conjunction of two conditions. The conjunction is the logical AND operator, written as && in Java and as $and in MongoDB. In this example, the two conditions are that 'sex' is 'M' and 'dob' > '1985-01-01'. In other words, "Find all male friends who were born after January 1, 1985".

The appended pretty() method simply tells the shell to use multi-line formatting for the results.

In MongoDB, the two logical operators AND and OR are written as $and: and $or:. The six arithmetic operators, <, ≤, >, ≥, ≠, and =, are written as $lt:, $lte:, $gt:, $gte:, $ne:, and :.

After getting used to the syntax, you could almost guess the right form for the update operation. It is illustrated in Figure 10-9:

The Mongo database system

Figure 10-9. Using the MongoDB update() method

Here we have added a phone number to our friend Tom's document.

You can use the update() method to change existing fields or to add a new one.

Notice that this method will change all the documents whose existing data satisfy the first argument. If we had used {'sex'='M'} instead of {'fname'='Tom'}, that phone number would have been added to both documents whose sex field is M.

When you have finished using mongo, execute the quit() method to terminate the session and return to the OS command line. This is illustrated in Figure 10-10:

The Mongo database system

Figure 10-10. A complete MongoDB shell session

The online MongoDB Manual at https://docs.mongodb.com/manual/introduction/ is an excellent source of examples and further information.