Using tar to Create and Unpack Archives

tar (Section 38.2) is a general-purpose archiving utility capable of packing many files into a single archive file, retaining information such as file permissions and ownership. The name tar stands for tape archive, because the tool was originally used to archive files as backups on tape. However, use of tar is not at all restricted to making tape backups, as we'll see.

The format of the tar command is:

tar functionoptions 
               files...

where function is a single letter indicating the operation to perform, options is a list of (single-letter) options to that function, and files is the list of files to pack or unpack in an archive. (Note that function is not separated from options by any space.)

function can be one of:

c: Create a new archive.
x: Extract files from an archive.
t: List the contents of an archive.
r: Append files to the end of an archive.
u: Update files that are newer than those in the archive.
d: Compare files in the archive to those in the filesystem.

The most commonly used functions are c reate, extract, and table-of-contents.

The most common options are:

v: Prints verbose information when packing or unpacking archives. This makes tar show the files it is archiving or restoring. It is good practice to use this option so that you can see what actually happens, though if you're using tar in a shell script you might skip it so as to avoid spamming the user of your script.
k: Keeps any existing files when extracting — that is, prevents overwriting any existing files contained within the tar file.
f filename: Specifies that the tar file to be read or written is filename.
z: Specifies that the data to be written to the tar file should be compressed or that the data in the tar file is compressed with gzip. (Not available on all tars.)

There are other options, which we cover in Section 38.5. Section 38.12 has more information about the order of tar options, and Section 39.3 has a lot more about GNU tar.

Although the tar syntax might appear complex at first, in practice it's quite simple. For example, say we have a directory named mt, containing these files:

rutabaga% ls -l mt
total 37
-rw-r--r--   1 root     root           24 Sep 21  1993 Makefile
-rw-r--r--   1 root     root          847 Sep 21  1993 README
-rwxr-xr-x   1 root     root         9220 Nov 16 19:03 mt
-rw-r--r--   1 root     root         2775 Aug  7  1993 mt.1
-rw-r--r--   1 root     root         6421 Aug  7  1993 mt.c
-rw-r--r--   1 root     root         3948 Nov 16 19:02 mt.o
-rw-r--r--   1 root     root        11204 Sep  5  1993 st_info.txt

We wish to pack the contents of this directory into a single tar archive. To do this, we use the following command:

tar cf mt.tar mt

The first argument to tar is the function (here, c, for create) followed by any options. Here, we use the one option f mt.tar, to specify that the resulting tar archive be named mt.tar. The last argument is the name of the file or files to archive; in this case, we give the name of a directory, so tar packs all files in that directory into the archive.

Note that the first argument to tar must be a function letter followed by a list of options. Because of this, there's no reason to use a hyphen (-) to precede the options as many Unix commands require. tar allows you to use a hyphen, as in:

tar -cf mt.tar mt

but it's really not necessary. In some versions of tar, the first letter must be the function, as in c, t, or x. In other versions, the order of letters does not matter as long as there is one and only one function given.

The function letters as described here follow the so-called "old option style." There is also a newer "short option style," in which you precede the function options with a hyphen. On some versions of tar, a "long option style" is available, in which you use long option names with two hyphens. See the manpage or info page (Section 2.9) for tar for more details if you are interested.

It is often a good idea to use the v option with tar to list each file as it is archived. For example:

rutabaga% tar cvf mt.tar mt
mt/
mt/st_info.txt
mt/README
mt/mt.1
mt/Makefile
mt/mt.c
mt/mt.o
mt/mt

On some tars, if you use v multiple times, additional information will be printed, as in:

rutabaga% tar cvvf mt.tar mt
drwxr-xr-x root/root         0 Nov 16 19:03 1994 mt/
-rw-r--r-- root/root     11204 Sep  5 13:10 1993 mt/st_info.txt
-rw-r--r-- root/root       847 Sep 21 16:37 1993 mt/README
-rw-r--r-- root/root      2775 Aug  7 09:50 1993 mt/mt.1
-rw-r--r-- root/root        24 Sep 21 16:03 1993 mt/Makefile
-rw-r--r-- root/root      6421 Aug  7 09:50 1993 mt/mt.c
-rw-r--r-- root/root      3948 Nov 16 19:02 1994 mt/mt.o
-rwxr-xr-x root/root      9220 Nov 16 19:03 1994 mt/mt

This is especially useful as it lets you verify that tar is doing the right thing.

In some versions of tar, f must be the last letter in the list of options. This is because tar expects the f option to be followed by a filename — the name of the tar file to read from or write to. If you don't specify f filename at all, tar uses a default tape device (some versions of tar use /dev/rmt0 for historical reasons regardless of the OS; some have a slightly more specific default). Section 38.5 talks about using tar in conjunction with a tape drive to make backups.

Now we can give the file mt.tar to other people, and they can extract it on their own system. To do this, they would use the command:

tar xvf mt.tar

This creates the subdirectory mt and places all the original files into it, with the same permissions as found on the original system. The new files will be owned by the user running tar xvf (you) unless you are running as root, in which case the original owner is generally preserved. Some versions require the o option to set ownership. The x option stands for "extract." The v option is used again here to list each file as it is extracted. This produces:

courgette% tar xvf mt.tar
mt/
mt/st_info.txt
mt/README
mt/mt.1
mt/Makefile
mt/mt.c
mt/mt.o
mt/mt

We can see that tar saves the pathname of each file relative to the location where the tar file was originally created. That is, when we created the archive using tar cf mt.tar mt, the only input filename we specified was mt, the name of the directory containing the files. Therefore, tar stores the directory itself and all of the files below that directory in the tar file. When we extract the tar file, the directory mt is created and the files are placed into it, which is the exact inverse of what was done to create the archive.

If you were to pack up the contents of your /bin directory with the command:

tar cvf bin.tar /bin

you can cause terrible mistakes when extracting the tar file. Extracting a tar file packed as /bin could trash the contents of your /bin directory when you extract it. If you want to archive /bin, you should create the archive from the root directory, /, using the relative pathname (Section 1.16) bin (with no leading slash) — and if you really want to overwrite /bin, extract the tar file by cding to / first. Section 38.11 explains and lists workarounds.

Another way to create the tar file mt.tar would be to cd into the mt directory itself, and use a command such as:

tar cvf mt.tar *

This way the mt subdirectory would not be stored in the tar file; when extracted, the files would be placed directly in your current working directory. One fine point of tar etiquette is always to pack tar files so that they contain a subdirectory, as we did in the first example with tar cvf mt.tar mt. Therefore, when the archive is extracted, the subdirectory is also created and any files placed there. This way you can ensure that the files won't be placed directly in your current working directory; they will be tucked out of the way and prevent confusion. This also saves the person doing the extraction the trouble of having to create a separate directory (should they wish to do so) to unpack the tar file. Of course, there are plenty of situations where you wouldn't want to do this. So much for etiquette.

When creating archives, you can, of course, give tar a list of files or directories to pack into the archive. In the first example, we have given tar the single directory mt, but in the previous paragraph we used the wildcard *, which the shell expands into the list of filenames in the current directory.

Before extracting a tar file, it's usually a good idea to take a look at its table of contents to determine how it was packed. This way you can determine whether you do need to create a subdirectory yourself where you can unpack the archive. A command such as:

tar tvf tarfile

lists the table of contents for the named tarfile. Note that when using the t function, only one v is required to get the long file listing, as in this example:

courgette% tar tvf mt.tar
drwxr-xr-x root/root         0 Nov 16 19:03 1994 mt/
-rw-r--r-- root/root     11204 Sep  5 13:10 1993 mt/st_info.txt
-rw-r--r-- root/root       847 Sep 21 16:37 1993 mt/README
-rw-r--r-- root/root      2775 Aug  7 09:50 1993 mt/mt.1
-rw-r--r-- root/root        24 Sep 21 16:03 1993 mt/Makefile
-rw-r--r-- root/root      6421 Aug  7 09:50 1993 mt/mt.c
-rw-r--r-- root/root      3948 Nov 16 19:02 1994 mt/mt.o
-rwxr-xr-x root/root      9220 Nov 16 19:03 1994 mt/mt

No extraction is being done here; we're just displaying the archive's table of contents. We can see from the filenames that this file was packed with all files in the subdirectory mt, so that when we extract the tar file, the directory mt will be created, and the files placed there.

You can also extract individual files from a tar archive. To do this, use the command:

tar xvf tarfile 
               files

where files is the list of files to extract. As we've seen, if you don't specify any files, tar extracts the entire archive.

When specifying individual files to extract, you must give the full pathname as it is stored in the tar file. For example, if we wanted to grab just the file mt.c from the previous archive mt.tar, we'd use the command:

tar xvf mt.tar mt/mt.c

This would create the subdirectory mt and place the file mt.c within it.

tar has many more options than those mentioned here. These are the features that you're likely to use most of the time, but GNU tar, in particular, has extensions that make it ideal for creating backups and the like. See the tar manpage or info page (Section 2.9) and the following chapter for more information.

MW, MKD, and LK