Chapter 4. Database

All non-trivial abstractions, to some degree, are leaky.

—Joel Spolsky

For many developers, Rails starts with the database. One of the most compelling features of Rails is ActiveRecord, the object-relational mapping(ORM) layer. ActiveRecord does such a good job of hiding the gory details of SQL from the programmer that it almost seems like magic.

However, as Joel Spolsky says, all abstractions are leaky. There is no perfectly transparent ORM system, and there never will be, due to the fundamentally different nature of the object-oriented and relational models. Ignore the underlying database at your own peril.

The Rails community has been built around the MySQL database management system (DBMS[30] ) for years. However, there are still a lot of misconceptions surrounding DBMSs, especially when used with Rails. While MySQL has its place, it is certainly not the only option. In the past few years, support for other databases has vastly grown. I encourage you to keep an open mind throughout this chapter, and weigh all criteria before making a decision on a DBMS.

Rails supports many DBMSs; at the time of this writing, DB2, Firebird, FrontBase, MySQL, OpenBase, Oracle, PostgreSQL, SQLite, Microsoft SQL Server, and Sybase are supported. You will probably know if you need to use a DBMS other than the ones mentioned here. Check the RDoc for the connection adapter for any caveats specific to your DBMS; some features such as migrations are only supported on a handful of connection adapters.

I list PostgreSQL[31] first because it is my platform of choice. It is one of the most advanced open source databases available today. It has a long history, dating back to the University of California at Berkeley's Ingres project from the early 1980s. In contrast to MySQL, Postgres has supported advanced features such as triggers, stored procedures, custom data types, and transactions for much longer.

PostgreSQL's support for concurrency is more mature than MySQL's. Postgres supports multiversion concurrency control (MVCC), which is even more advanced than row-level locking. MVCC can isolate transactions, using timestamps to give each concurrent transaction its own snapshot of the data set. Under the Serializable isolation level, this prevents such problems as dirty reads, nonrepeatable reads, and phantom reads. [32]See the upcoming sidebar, "Multiversion Concurrency Control," for more information about MVCC.

One advantage that PostgreSQL may have in the enterprise is its similarity to commercial enterprise databases such as Oracle, MS SQL Server, or DB2. Although Postgres is not by any means a clone or emulation of any commercial database, it will nevertheless be familiar to programmers and DBAs who have experience with one of the commercial databases. It will also likely be easier to migrate an application from Postgres to (say) Oracle than from MySQL to Oracle.

PostgreSQL has an unfortunate reputation for being slow. It got this reputation because the default configuration is optimized for performance on a tiny machine. Therefore, it will perform fairly consistently out of the box on a server with as little as 64 MB of RAM or as much as 64 GB. Like any database, Postgres must be tuned for any serious use. The official documentation at http://www.postgresql.org/docs/ has lots of great information on performance tuning.

One disadvantage of using PostgreSQL is that it has a smaller community around it. There are more developers, especially in the Rails world, workingwith MySQL. There are more tested solutions built around MySQL than PostgreSQL. The company behind MySQL, MySQL AB, provides commercial support for its product. There is no such centralized support structure for Postgres, as there is no single company behind PostgreSQL; however, there are several companies that specialize in Postgres consulting and provide support contracts.

The MySQL DBMS is controversial. Some hold it to be a toy, while others consider it to be a good foundation for web applications. Nevertheless, MySQL is the dominant DBMS in use for Rails web applications today, and it has improved greatly between versions 3 and 5.

Part of the Rails scalability mantra is "shared nothing": each application server should be able to stand on its own. Thus, you can throw five of them behind a load balancer and it doesn't matter if a user is served by different servers throughout the course of a session. However, the bottleneck is the database. A big assumption of this shared-nothing architecture is that the application servers all share a database. If you use a database that doesn't have great support for concurrency, you will have problems.

Old versions of MySQL had some fairly serious issues, many revolving around the issue of data integrity and constraints. The problem was not so much that the issues existed as that MySQL's developers seemed to have an attitude of "you aren't going to need it." Even transactions are not supported with the default storage engine (MyISAM) to this day. In versions prior to 5.0, there were many bugs that would silently discard incorrect data rather than raising an error. To be fair, new versions of MySQL are addressing a lot of its issues. I would still recommend PostgreSQL as a general rule where speed is not the primary criterion, since it has had enterprise-level features for much longer. If you use MySQL, take these recommendations:

MySQL does have some clear advantages over PostgreSQL in some situations. On the whole, MySQL tends to be faster. For many web applications, query speed may be the most important factor. MySQL also has more stable, tested replication and clustering options available. MySQL is also somewhat better at handling binary data stored in the database (we discuss this at length later in the chapter). For many web applications, MySQL may be a clear win.



[31] Technically pronounced "post-gres-Q-L," and usually just called "Postgres." This is a contender for the least intuitive name in computing today. It has its roots in PostgreSQL's long-ago predecessor, Postgres, which did not support SQL.

[32] For a detailed look at how Postgres handles concurrency, including a summary of the potential problems and how Postgres handles them, see the documentation at http://www.postgresql.org/docs/8.2/interactive/transaction-iso.html.

[33] ACID stands for Atomic, Consistent, Isolated, and Durable, which are necessary properties for transactional integrity within a database. See http://en.wikipedia.org/wiki/ACID for a full definition and explanation.