Reliable controller and disk setup

PostgreSQL uses a write-ahead log (WAL) to write data in a way that survives a database or hardware crash. This is similar to the log buffer or REDO log found in other databases. The database documentation covers the motivation and implementation of the WAL at https://www.postgresql.org/docs/current/static/wal-intro.html.

To quote from the introduction:

WAL's central concept is that changes to data files (where tables and indexes reside) must be written only after those changes have been logged, that is, after log records describing the changes have been flushed to permanent storage.

This procedure ensures that if your application has received COMMIT for a transaction, that transaction is on permanent storage, and will not be lost even if there is a crash. This satisfies the durability portion of the atomicity, consistency, isolation, durability (ACID) expectations databases aim to satisfy.

The tricky part of the WAL implementation is the flushed to permanent storage part, which you might be surprised to learn will take several pages to cover in this chapter alone-and come back up again in a later ones, too.

Hard drive reliability studies

Generally, expected reliability is also an important thing to prioritize. There have been three excellent studies of large numbers of disk drives published in the last few years:

Google: Failure Trends in a Large Disk Drive Population (http://research.google.com/archive/disk_failures.pdf)
Carnegie Mellon study: Disk failures in the real world (http://www.cs.cmu.edu/~bianca/fast07.pdf)
University of Wisconsin-Madison and Network Appliance: An Analysis of Data Corruption in the Storage Stack, (http://www.usenix.org/event/fast08/tech/full_papers/bairavasundaram/bairavasundaram.pdf (long version), or http://www.usenix.org/publications/login/2008-06/openpdfs/bairavasundaram.pdf (shorter version))

The data in the Google and Carnegie Mellon studies don't show any significant bias toward the SCSI/SAS family of disks being more reliable. But the U of W/Netapp study suggests that "SATA disks have an order of magnitude higher probability of developing checksum mismatches than Fibre Channel disks." That matches the idea suggested above, that error handling under SAS is usually more robust than on similar SATA drives. Since they're more expensive, too, whether this improved error handling is worth paying for depends on your business requirements for reliability. This may not even be the same for every database server you run. Systems where a single master database feeds multiple slaves will obviously favor better using components in the master as one example of that.

You can find both statistically reliable and unreliable hard drives with either type of connection. One good practice is to only deploy drives that have been on the market long enough that you can get good data from actual customers on the failure rate. Newer drives using newer technology usually fail more often than slightly older designs that have been proven in the field, so if you can't find any reliability surveys, that's a reason to be suspicious. The reliability is more important then the speed and performance, because data is a precious entity in database terminology, so selecting a reliable drive is very important in database deployment.