RAID

The Redundant Array of Inexpensive Disks (RAID) approach is the standard way to handle both the performance and reliability limitations of individual disk drives. A RAID array puts many disks, typically of exactly the same configuration, into a set that acts like a single disk-but with either enhanced performance, reliability, or both. In some cases, the extra reliability comes from computing what's called parity information for writes to the array. Parity is a form of checksum on the data, which allows for reconstruction even if some of the information is lost. RAID levels that use parity are efficient, from a space perspective, at writing data in a way that will survive drive failures, but the parity computation overhead can be significant for database applications.

The most common basic forms of RAID arrays used are as follows:

Differences between RAID 5 and RAID 6 can be found at http://www.freeraidrecovery.com/library/raid-5-6.aspx.

To be fair, in any disk performance comparison, you need to consider that most systems are going to have a net performance from several disks, such as in a RAID array. Since SATA disks are individually cheaper, you might be able to purchase considerably more of them for the same budget than had you picked SAS instead. In cases where your application accelerates usefully from being spread over more disks, being able to get more of them per dollar can result in an overall faster system. Note that the upper limit here will often be your server's physical hardware.

You only have so many storage bays or controller ports available, and larger enclosures can cost more, both up-front and over their lifetime. It's easy to find situations where smaller numbers of faster drives, which SAS provides, are the better way to go. This is why it's so important to constantly be benchmarking both hardware and your database application, to get a feel for how well it improves as the disk count increases.