Chapter 4. Database

All non-trivial abstractions, to some degree, are leaky.

For many developers, Rails starts with the database. One of the most compelling features of Rails is ActiveRecord, the object-relational mapping(ORM) layer. ActiveRecord does such a good job of hiding the gory details of SQL from the programmer that it almost seems like magic.

However, as Joel Spolsky says, all abstractions are leaky. There is no perfectly transparent ORM system, and there never will be, due to the fundamentally different nature of the object-oriented and relational models. Ignore the underlying database at your own peril.

Database Management Systems

The Rails community has been built around the MySQL database management system (DBMS^[30] ) for years. However, there are still a lot of misconceptions surrounding DBMSs, especially when used with Rails. While MySQL has its place, it is certainly not the only option. In the past few years, support for other databases has vastly grown. I encourage you to keep an open mind throughout this chapter, and weigh all criteria before making a decision on a DBMS.

Rails supports many DBMSs; at the time of this writing, DB2, Firebird, FrontBase, MySQL, OpenBase, Oracle, PostgreSQL, SQLite, Microsoft SQL Server, and Sybase are supported. You will probably know if you need to use a DBMS other than the ones mentioned here. Check the RDoc for the connection adapter for any caveats specific to your DBMS; some features such as migrations are only supported on a handful of connection adapters.

PostgreSQL

I list PostgreSQL^[31] first because it is my platform of choice. It is one of the most advanced open source databases available today. It has a long history, dating back to the University of California at Berkeley's Ingres project from the early 1980s. In contrast to MySQL, Postgres has supported advanced features such as triggers, stored procedures, custom data types, and transactions for much longer.

PostgreSQL's support for concurrency is more mature than MySQL's. Postgres supports multiversion concurrency control (MVCC), which is even more advanced than row-level locking. MVCC can isolate transactions, using timestamps to give each concurrent transaction its own snapshot of the data set. Under the Serializable isolation level, this prevents such problems as dirty reads, nonrepeatable reads, and phantom reads. ^[32]See the upcoming sidebar, "Multiversion Concurrency Control," for more information about MVCC.

One advantage that PostgreSQL may have in the enterprise is its similarity to commercial enterprise databases such as Oracle, MS SQL Server, or DB2. Although Postgres is not by any means a clone or emulation of any commercial database, it will nevertheless be familiar to programmers and DBAs who have experience with one of the commercial databases. It will also likely be easier to migrate an application from Postgres to (say) Oracle than from MySQL to Oracle.

PostgreSQL has an unfortunate reputation for being slow. It got this reputation because the default configuration is optimized for performance on a tiny machine. Therefore, it will perform fairly consistently out of the box on a server with as little as 64 MB of RAM or as much as 64 GB. Like any database, Postgres must be tuned for any serious use. The official documentation at http://www.postgresql.org/docs/ has lots of great information on performance tuning.

One disadvantage of using PostgreSQL is that it has a smaller community around it. There are more developers, especially in the Rails world, workingwith MySQL. There are more tested solutions built around MySQL than PostgreSQL. The company behind MySQL, MySQL AB, provides commercial support for its product. There is no such centralized support structure for Postgres, as there is no single company behind PostgreSQL; however, there are several companies that specialize in Postgres consulting and provide support contracts.

MySQL

The MySQL DBMS is controversial. Some hold it to be a toy, while others consider it to be a good foundation for web applications. Nevertheless, MySQL is the dominant DBMS in use for Rails web applications today, and it has improved greatly between versions 3 and 5.

Part of the Rails scalability mantra is "shared nothing": each application server should be able to stand on its own. Thus, you can throw five of them behind a load balancer and it doesn't matter if a user is served by different servers throughout the course of a session. However, the bottleneck is the database. A big assumption of this shared-nothing architecture is that the application servers all share a database. If you use a database that doesn't have great support for concurrency, you will have problems.

Old versions of MySQL had some fairly serious issues, many revolving around the issue of data integrity and constraints. The problem was not so much that the issues existed as that MySQL's developers seemed to have an attitude of "you aren't going to need it." Even transactions are not supported with the default storage engine (MyISAM) to this day. In versions prior to 5.0, there were many bugs that would silently discard incorrect data rather than raising an error. To be fair, new versions of MySQL are addressing a lot of its issues. I would still recommend PostgreSQL as a general rule where speed is not the primary criterion, since it has had enterprise-level features for much longer. If you use MySQL, take these recommendations:

Use version 5.0 or later. Many of the issues that existed with previous versions have been fixed or improved in 5.0 and newer versions.
Use InnoDB for absolutely anything where data integrity or concurrency matter. MyISAM, the default engine on most MySQL installations, does not support features that most RDBMSs consider essential: foreign key constraints, row-level locking, and transactions. In most business environments, these features are non-negotiable. InnoDB is a journaled storage engine that is much more resilient to failures. Rails does the right thing here and defaults to the InnoDB storage engine when creating tables.
Unfortunately, InnoDB can be much slower than MyISAM, and the table sizes are usually several times larger. MyISAM is usually faster when reads vastly out-number writes or vice versa, while InnoDB is generally faster when reads and writes are balanced. It all comes down to the requirements of the specific application; these are general rules. You should always benchmark with your real data, and an accurate sample of queries and statements you will be issuing, in a realistic environment.
There are a few exceptions to this guideline: MyISAM may be a better choice if you need full-text indexing(which is only supported on MyISAM tables at this time). In addition, if raw speed of reads or writes is the primary concern, MyISAM can help. For example, a logging server for web analytics might use MyISAM tables: you want to be able to dump logs into it as fast as possible, and reads are performed far less often than writes.
Set the SQL mode to TRADITIONAL. This can be accomplished with the following command:
```
	SET GLOBAL sql_mode='TRADITIONAL';
```
This will make MySQL a little bit more strict, raising errors on incorrect data rather than silently discarding it.

MySQL does have some clear advantages over PostgreSQL in some situations. On the whole, MySQL tends to be faster. For many web applications, query speed may be the most important factor. MySQL also has more stable, tested replication and clustering options available. MySQL is also somewhat better at handling binary data stored in the database (we discuss this at length later in the chapter). For many web applications, MySQL may be a clear win.

SQLite

SQLite is a minimalist database that is excellent for small projects. Although it does not support many fancy features, it is a great choice for projects that will not grow very large. It supports ACID transactions ^[33] out of the box. SQLite is a library that is linked into your program; there is no server process to speak of. The library code residing in your application's process space accesses a database file.

SQLite provides no concurrency, as there is no server process to enforce the ACID properties. Therefore, it uses file-level locking: the entire database file is locked at the filesystem level during a transaction. Still, for many small applications, it fits the bill perfectly. It is a good replacement for data that may have been stored in flat files, as it supports most of the SQL-92 standard and would be easy to migrate to a more traditional DBMS as needs grow.

Microsoft SQL Server

Though Rails grew up in the Linux/Unix world, it has developed great community support for the Windows platform as well. Not only are Microsoft SQL Server database connections supported in Rails, there are also provisions for connecting to SQL Server from a Linux-based systems as well, using the FreeTDS library. Instructions are at http://wiki.rubyonrails.org/rails/pages/HowtoConnectToMicrosoftSQLServerFromRailsOnLinux; FreeTDS is available from http://www.freetds.org/.

From a Windows client, the standard approach is to use Ruby-DBI (a Ruby database-independent adapter) with ADO. The configuration looks like this:

	development:
	  adapter: sqlserver
	  host: server_name
	  database: my_db
	  username: user
	  password: pass

Your configuration may vary, depending on the version of SQL Server and the ADO libraries you have installed. Once the database configuration is in place, the standard ActiveRecord API methods can be used to manipulate data.

Oracle

Rails supports Oracle versions 8i, 9i, and 10g through the ruby-oci8 library, http://rubyforge.org/projects/ruby-oci8/ which supports the OCI8 API. Windows, Linux, and OS X are supported as clients. The connection configuration is fairly standard, using oci as the connection adapter name.

However, the Oracle client library still maps net service names to connection specifications, so the host parameter provides a service name rather than a physical hostname:

	development:
	  adapter: oci
	  host: ORCL
	  username: user
	  password: pass

The ORCL in the preceding configuration corresponds to an entry in the TNSNAMES. ORA file, which will look something like this:

	ORCL =
	  (DESCRIPTION =
	    (ADDRESS_LIST =
	      (ADDRESS = (PROTOCOL = TCP)(HOST = srv)(PORT = 1521))
	    )
	    ...
	  )

Alternatively, you can provide the connection specification on one line with the Rails database configuration:

	development:
	  dapter: oci
	  host: (DESCRIPTION = (ADDRESS_LIST = (...)))
	  username: user
	  password: pass

The connection setup is the hardest part. Once the database is connected, Rails supports Oracle connections just as it does connections to any other DBMS. Stored procedures and other Oracle-specific syntax are available through the standard methods that expose an SQL interface, such as ActiveRecord::Base.find_by_sql.

^[30] Informally, DBMSs are often referred to as "databases." Consistent with industry practices, in this book "database management system" refers to the software package or installation thereof, while "database" refers to the actual data set being managed.

^[31] Technically pronounced "post-gres-Q-L," and usually just called "Postgres." This is a contender for the least intuitive name in computing today. It has its roots in PostgreSQL's long-ago predecessor, Postgres, which did not support SQL.

^[32] For a detailed look at how Postgres handles concurrency, including a summary of the potential problems and how Postgres handles them, see the documentation at http://www.postgresql.org/docs/8.2/interactive/transaction-iso.html.

^[33]ACID stands for Atomic, Consistent, Isolated, and Durable, which are necessary properties for transactional integrity within a database. See http://en.wikipedia.org/wiki/ACID for a full definition and explanation.