Chapter 7. Backups

There are so many enemies of your data. When it comes to disks, it’s not a question of whether your hard drives will fail, it’s a question of when. Beyond hard drive failure you find rm, dd, and a number of other Linux commands that are incredibly efficient at destroying your data. Just ask a good friend of mine who was trying to clean up his MP3 directory. A number of us were helping him perfect a find script that would delete all of the files in his MP3 directory that did not end in .mp3. Despite our warnings to test the script with echo first, he ran the full command: find . -type f ! -name '*.mp3' -exec rm -f {} \;. At first it appeared to be working, until he discovered he hadn’t run the command in his MP3 directory—he ran it in ~, his home directory. True, he had cleaned up his MP3 directory, along with the rest of his files. The bottom line is that the only real way to ensure that your data is safe is to back it up.

There are any number of ways to back up data under Ubuntu, and in this chapter I cover a graphical tool called BackupPC. I also discuss some commonsense backup tips and describe how to create a full image of a drive or partition. I include some special considerations for when you’re backing up a database. By the end of the chapter, if you haven’t set up a backup system yet, I hope you will be encouraged by how easy it is under Ubuntu.

Backup Principles

There are a number of principles that should guide you when you set up your backup strategy. Most of these are common sense but bear repeating:

Back up data to a separate system.

That separate system might be a separate drive, a tape, or ideally a completely separate host. The point is not to back up data on a drive to the same drive. You really want your backups to be as far removed from the system as possible—even for my personal data at home I have a backup system in place to copy my most important files to a server out of state. That way, if my house burned down or serious file system corruption hit my server, my important data would still exist.

Test your backups.

If you haven’t successfully restored from backup, you haven’t truly backed anything up. After you set up a backup system, you must make sure that you can restore from it. It’s a good practice to follow up with tests of your restore process periodically afterward. The worst time to find out a backup didn’t work is when you really need a file.

RAID is not a substitute for backups.

A common mistake among beginner administrators is to mistake RAID for backups. RAID provides you with redundancy for hard disks so that if a particular disk fails, your data still remains safe on the other disks. RAID does not protect you from a user deleting a file or, worse, complete file system corruption. In the case of a RAID mirror, if you write bad data to one drive, that bad data will simply be replicated to the second. On top of this, it’s not unheard of for a RAID controller to die and write bad data to the disks as it goes down. In any of these cases if you did not keep a backup that is separate from your RAID, your data would be gone.

Create full and incremental backup schedules.

The majority of files on a server tend to stay the same, particularly when you are talking about the core OS files. For this reason most administrators opt for a combination of full backups (a complete copy of every file) over a longer period of time, such as every week, and incremental backups (only files that have changed since the last backup) over a shorter period of time, usually daily. Since incremental backups generally involved fewer files, they take up less space and are faster to complete. Just keep in mind that if you restore multiple files, there’s a chance that some of the files aren’t included in the latest incremental backup. The safe approach is to restore from the full backup and then all subsequent incrementals if you aren’t sure every file made it into the last backup.

Decide how often to back up.

A common question one might ask is “How often should I back up?” The basic answer is “How much work can you afford to lose?” Many organizations can stand losing up to a day’s work, so they back up nightly. If you can afford to lose only a few hours of work, then you need to back your data up every few hours.

Archive your backups.

While it would be nice to save backups forever, the reality is that backups can consume an incredible amount of space. You may be able to keep only a month’s worth of backups on your system before you run out of space. Even if that is the case, consider archiving old backups to separate storage like a tape, a USB drive, or even DVDs that you label and store in a vault. Many organizations maintain a month’s worth of backups, and then archive off a full backup every month, every quarter, or every year. That way they have a snapshot of their data at that point so even if the backup server itself were to catch fire, there’s still a version of the data available.

Drive Imaging

An image is a complete bit-for-bit copy of a drive. Once you image a drive, its image should be indistinguishable from the original drive. One of the most guaranteed, if wasteful, methods for backing up a system is to take an image of its drives. Even if you don’t use drive imaging as your backup strategy, you will find a number of other circumstances where drive images come in handy, from cloning a system to file system recovery to forensics.

Warning

When imaging a drive, it’s important that the drive not be in use. If the drive changes while you image it, you will not be able to guarantee that the image is consistent, so be sure that any file systems on a drive are unmounted. The requirement that a drive you image not be in use is yet another reason why most people don’t use imaging as their primary backup strategy.

The classic UNIX imaging tool is dd, and you will find it on just about any Linux system and definitely on any Ubuntu server. This straightforward and blunt tool in its most basic form reads an input file bit by bit and copies it to an output file bit by bit. If you had two drives of identical size, /dev/sda and /dev/sdb, here is the command to image sda to sdb:

$ sudo dd if=/dev/sda of=/dev/sdb

Of course, dd can use any file as its input and output file, so instead of imaging to another drive, you could image to a file. This is particularly handy for forensics, when you might have a number of file system images stored on a single large USB drive. Assuming you have mounted your USB drive at /media/disk1, here is how you could image /dev/sda to a file on that drive: