SQL versus NoSQL

Databases are generalized data structures. Both store data, either internally in memory or externally on disk or in the cloud. As data containers, they have a logical structure and a physical structure.

Consider the simplest of data structures: a one-dimensional array a[] of strings. The logical structure of this is shown in Figure 10-2.

SQL versus NoSQL

Figure 10-2. An array of strings

It is an object, referenced by the variable a. Inside that object is a sequence of numbered storage compartments, each capable of holding a string object.

However, the physical structure, hidden from the programmer, is a sequence of bytes in memory. Using two-byte Unicode characters, it will allocate 16 bytes for the encodings of the characters of the eight string, and it will also store information, such as the name of the array (a), the datatype of the elements being stored (String), and the hexadecimal starting location of the sequence of 16 bytes, elsewhere.

The same dichotomy holds for database structures, except that the actual storage is on disk (or the cloud) and its complexity is a magnitude greater. Fortunately, the software designers and engineers can imagine the logical structure most of the time.

As discussed in Chapter 5, Relational Databases, the logical structure of a relational database is a collection of tables and associated links. These are maintained by the database system, which is controlled by programs mostly written in the SQL query language.

A NoSQL database does not use tables to store its data. Its logical structure can be imagined as a large collection of key-value maps, each stored as a separate document. As we have just seen, this is similar to relational tables with specified key attributes. But, without the use of SQL, the database is less lightly structured. The trade-off is that, for the kind of operations needed for web-based software, the system will be more flexible and efficient, especially with very large datasets.

Relational databases and the SQL language were developed in the 1970s. They became the standard database environment for the secure management of stable institutional data. However, with the advance of Web commerce, the demand for data management shifted: datasets became much larger and much more dynamic. NoSQL database systems responded to that shift in requirements.

There are several popular NoSQL database systems in use, including MongoDB, Cassandra, HBase, and Oracle NoSQL Database. We shall examine MongoDB in the following sections.