Chapter 8
Working with Text Files

A proficient Linux systems administrator is capable of using at least one text editor in Linux. This skill is needed to manage daily work such as modifying configuration files and creating shell scripts. You have the choice of several editors. Many individuals find a particular editor whose functionality they love and use that one exclusively. This chapter provides a brief sampling of two text editors that are popular with admins.

When managing any computer server, you'll often have files that contain large amounts of text data. It's typically difficult to handle the information and make it useful. It's easy to get data overload when working with system commands. Fortunately, Linux provides several command‐line utilities that help you manage large amounts of data.

The vim Editor

When managing a Linux system, it's wise to gain proficiency using at least one text editor. Using features such as searching, cutting, and pasting allows you to modify configuration files more quickly. In Chapter 19, “Writing Scripts,” you'll need the text editor skills that you learn in this chapter to quickly create Bash shell scripts. This section walks you through the basics of using the vim editor, which is typically available for most Linux distros.

Checking Your vim Editor Program

Before you begin your exploration of the vim editor, it's a good idea to understand what “flavor” of vim your Linux system has installed. On some distributions, you will have a full functioning vim editor installed, but not on others, which can cause you some difficulties.

Three commands can help you determine whether your Linux distro has a fully functioning vim editor.

  • The alias Command To display whether a command calls a different command and/or uses additional command options, you type alias command‐name at the command‐line interface (CLI). For our purposes here, we need to see if the typical command to run the vim editor, vi, is aliased to the vim command or not. Thus, alias vi will let us know this information. If vi is not aliased, it's our first clue that a fully functioning vim editor is probably not installed.
  • The which Command This particular command is useful in many circumstances. When you type which command‐name , it shows you exactly where the program associated with the command‐name resides in the Linux virtual directory system. For our investigation, we'll need to use the which vim command to get the location, so we can properly use the next command, readlink.
  • The readlink Command When researching whether you have a fully functioning vim editor, you need to determine if a symbolic link is involved with the vim program name. Sometimes when a vim editor is installed, the vim program file, which by all initial appearances is fully functioning, is symbolically linked to a lesser vim flavor. We could adopt the ls ‐l technique used in Chapter 7, “Exploring Linux File Management,” to investigate the soft links, but vim file soft links are often chained so that one soft link points to another soft link, which points to an additional soft link, and so on. In this case, it's best to use the readlink ‐f command, which will quickly find the final file in a chain of links.

Using these three commands to check the vim editor on a CentOS distribution reveals the following:

$ alias vi
alias vi='vim'
$
$ which vim
/usr/bin/vim
$
$ readlink -f /usr/bin/vim
/usr/bin/vim
$

The vi command is aliased, which is a good sign. And there are no soft links that indicate that the vim editor program is linked to a less than full‐featured editor program. From the investigation here, you can rest assured that this CentOS distribution has a fully functioning vim editor.

Running these same commands on this Ubuntu distribution shows different results:

$ alias vi
-bash: alias: vi: not found
$
$ which vim
/usr/bin/vim
$
$ readlink -f /usr/bin/vim
/usr/bin/vim.basic
$

Notice that there is no alias for the vi command, but the /usr/bin/vim program file points to /usr/bin/vim.basic. If you have the vim.basic program as shown here on your Ubuntu system, you are OK. However, if in your investigation you find the vim.tiny program, you'll want to install the vim package to get vim.basic so that you can follow along with the vim editor examples in this chapter. Package installation for Ubuntu distributions was covered in Chapter 3, “Installing and Maintaining Software in Ubuntu.”

Setting up an alias for the vim command on your Ubuntu system is fairly easy: just type alias vi='vim' on your system. However, this alias will not survive once you log out of the system. You'll need to add it to one of your login startup files, covered in Chapter 13, “Managing Users and Groups.”

Using the vim Editor

To start using the vim text editor, type vim or vi, depending on your distribution, followed by the name of the file you want to edit or create. Figure 8.1 shows a vim text editor screen in action with a text file that was previously created.

Snapshot of using the vim text editor

FIGURE 8.1 Using the vim text editor

In Figure 8.1, the file being edited is the editorTestFile.txt file. The vim editor works the file data in a memory buffer, and this buffer is displayed on the screen. If you open vim without a filename, or the filename you entered doesn't yet exist, vim starts a new buffer area for editing.

The vim editor has a message area near the bottom line. If you have just opened an already created file, it will display the filename along with the number of lines and characters read into the buffer area. If you are creating a new file, you will see [New File] in the message area.

The vim editor has three standard modes.

  • Command Mode This is the mode vim uses when you first enter the buffer area; this is sometimes called normal mode. Here you enter keystrokes to enact commands. For example, pressing the J key will move your cursor down one line. This is the best mode to use for quickly moving around the buffer area.
  • Insert Mode Insert mode is also called edit or entry mode. This is the mode where you can perform simple editing. There are not many commands or special mode keystrokes. You enter this mode from Command mode by pressing the I key. At this point, the message ‐‐Insert‐‐ will display in the message area. You leave this mode by pressing the Esc key.
  • Ex Mode This mode is sometimes also called colon commands because every command entered here is preceded with a colon (:). For example, to leave the vim editor and not save any changes, you type :q and press the Enter key.

Since you start in Command mode when entering the vim editor's buffer area, it's good to understand a few of the commonly used commands to move around in this mode. Table 8.1 contains several commands for moving around in the editor.

TABLE 8.1: Commonly Used vim Command Mode Moving Commands

KEYSTROKE DESCRIPTION
h Move cursor left one character.
l Move cursor right one character.
j Move cursor down one line (the next line in the text).
k Move cursor up one line (the previous line in the text).
w Move cursor forward one word to front of next word.
e Move cursor to end of current word.
b Move cursor backward one word.
^ Move cursor to beginning of line.
$ Move cursor to end of line.
gg Move cursor to the file's first line.
G Move cursor to the file's last line.
n G Move cursor to file line number n .
Ctrl+B Scroll up almost one full screen.
Ctrl+F Scroll down almost one full screen.
Ctrl+U Scroll up half of a screen.
Ctrl+D Scroll down half of a screen.
Ctrl+Y Scroll up one line.
Ctrl+E Scroll down one line.

Quickly moving around in the vim editor buffer is useful. However, there are also several editing commands that help to speed up your modification process. For example, by moving your cursor to a word's first letter and pressing CW, the word is deleted, and you are thrown into Insert mode. You can then type in the new word and press Esc to leave Insert mode.

Once you have made any needed text changes in the vim buffer area, it's time to save your work. You can type ZZ in Command mode to write the buffer to disk and exit your process from the vim editor.

The third vim mode, Ex mode, has additional handy commands. You must be in Command mode to enter into Ex mode. You cannot jump from Insert mode to Ex mode. Therefore, if you're currently in Insert mode, press the Esc key to go back to Command mode first.

Table 8.2 shows several Ex commands that can help you manage your text file. Notice that all the keystrokes include the necessary colon (:) to use Ex commands.

TABLE 8.2: Commonly Used vim Ex Mode Commands

KEYSTROKES DESCRIPTION
:x Write buffer to file and quit editor.
:wq Write buffer to file and quit editor.
:wq! Write buffer to file and quit editor (overrides protection).
:w Write buffer to file and stay in editor.
:w! Write buffer to file and stay in editor (overrides protection).
:q Quit editor without writing buffer to file.
:q! Quit editor without writing buffer to file (overrides protection).
:! command Execute shell command and display results, but don't quit editor.
:r! command Execute shell command and include the results in editor buffer area.
:r file Read file contents and include them in editor buffer area.

After reading through the various mode commands, you may see why some people despise the vim editor. There are a lot of obscure commands to know. However, some people love the vim editor because it is so powerful.

It's tempting to learn only one text editor and ignore the others. But knowing at least two text editors is useful in your day‐to‐day Linux work. For complex editing and writing programs, the vim editor is one of the most popular text editors. However, if you just need to make a small change to a file, the nano text editor shines. We cover this one next.

The nano Editor

In contrast to vim, which is a complicated editor with powerful features, nano is a simple editor. For individuals who need a simple console mode text editor that is easy to navigate, nano is the tool to use. It's also a great text editor for those who are just starting on their Linux command‐line adventure.

The nano text editor is installed on most Linux distributions by default. Everything about the nano text editor is easy. To open a file at the command line with nano, enter nano filename .

If you start nano without a filename or if the file doesn't exist, nano simply opens a new buffer area for editing. If you specify an existing file on the command line, nano reads the entire contents of the file into a buffer area, where it is ready for editing, as shown in Figure 8.2.

Snapshot of the nano editor window

FIGURE 8.2 The nano editor window

Notice at the bottom of the nano editor window, various commands with a brief description are shown. These commands are the nano control commands. The caret (^) symbol shown represents the Ctrl key. Therefore, ^X stands for the keyboard sequence Ctrl+X. Though the nano control commands list capital letters in the keyboard sequences, you can use either lowercase or uppercase characters for control commands.

Having most of the basic commands listed right in front of you is great—no need to memorize what control command does what. Table 8.3 presents the most common nano control commands.

TABLE 8.3: nano Common Control Commands

COMMAND DESCRIPTION
Ctrl+C Displays the cursor's position within the text editing buffer
Ctrl+G Displays nano 's main help window
Ctrl+J Justifies the current text paragraph
Ctrl+K Cuts the text line and stores it in the cut buffer
Ctrl+O Writes out the current text editing buffer to a file
Ctrl+R Reads a file into the current text editing buffer
Ctrl+T Starts the available spell checker
Ctrl+U Pastes text stored in the cut buffer and places in current line
Ctrl+V Scrolls text editing buffer to the next page
Ctrl+W Searches for word or phrases within the text editing buffer
Ctrl+X Closes the current text editing buffer, exits nano, and returns to the shell
Ctrl+Y Scrolls the text editing buffer to previous page

The control commands listed in Table 8.3 are really all you need. However, if you desire more powerful control features than those listed, nano has them. To see more control commands, press Ctrl+G in the nano text editor to display its main help window containing additional control commands.

A few of these additional control commands are called Meta‐key sequences. In the nano documentation, they are denoted by the letter M. For example, you'll find the key sequence to undo the last task denoted as M‐U in the nano help system. But don't press the M key to accomplish this. Instead, M represents the Esc, Alt, or Meta key, depending on your keyboard's configuration. Thus, you might press the Alt+U key combination to undo the last task within nano.

Using text editors at the CLI is one way to work with text files. The rest of this chapter explores additional methods you can use to manipulate text file data.

Working with Data Files

When you have a large amount of data, it's often difficult to handle the information and make it useful. The Linux system provides several CLI tools to help you manage large amounts of data. This section covers the basic commands that every system administrator—as well as any everyday Linux user—should know how to use to make their lives easier.

Sorting

Often, to understand the data within text files, you need to reformat file data in some way. The sort utility sorts a file's data. It makes no changes to the original file. It only reads the file, sorts its data, and displays the sorted data to STDOUT (covered in Chapter 6, “Working with the Shell”).

If you want to order a file's content alphabetically, simply enter the sort command followed by the name of the file you want to sort.

$ nano alphabetKey.txt
$ cat alphabetKey.txt
Alpha
Tango
Sierra
Bravo
Foxtrot
Echo
$
$ sort alphabetKey.txt
Alpha
Bravo
Echo
Foxtrot
Sierra
Tango
$

It's pretty simple. However, things aren't always as easy as they appear. Take a look at this example:

$ nano numberKey.txt
$ cat numberKey.txt
1 One
2 Two
4 Four
3 Three
10 Ten
20 Twenty
100 One Hundred
$
$ sort numberKey.txt
1 One
10 Ten
100 One Hundred
2 Two
20 Twenty
3 Three
4 Four
$

If you were expecting the numbers to sort in numerical order, you were disappointed. By default, the sort command interprets numbers as characters, producing a sorted output that you may not want. Add the ‐n option to the command to sort the text based on their numerical values, if that's what you are seeking.

$ sort -n numberKey.txt
1 One
2 Two
3 Three
4 Four
10 Ten
20 Twenty
100 One Hundred
$

There are several useful sort parameters you can use depending on what kind of sort is needed. Table 8.4 shows commonly used options.

TABLE 8.4: Commonly Used sort Command Options

SINGLE DASH DOUBLE DASH DESCRIPTION
‐b ‐‐ignore‐leading‐blanks Ignore leading blanks when sorting.
‐d ‐‐dictionary‐order Consider only blanks and alphanumeric characters; don't consider special characters.
‐f ‐‐ignore‐case By default, sort orders capitalized letters first. This parameter ignores case.
‐g ‐‐general‐numeric‐sort Use general numerical value to sort.
‐i ‐‐ignore‐nonprinting Ignore nonprintable characters in the sort.
‐k ‐‐key= POS1 [, POS2 ] Sort based on position POS1 , and end at POS2 if specified.
‐n ‐‐numeric‐sort Sort by string numerical value.
‐o ‐‐output= file Write results to file specified.
‐r ‐‐reverse Reverse the sort order (descending instead of ascending).
‐t ‐‐field‐separator= SEP Specify the character used to distinguish key positions.
‐z ‐‐zero‐terminated End all lines with a NULL character instead of a new line.

Viewing sorted data is helpful, but what do you do if you want to keep that sorted data? STDOUT redirection (covered in Chapter 6) can help here:

$ nano numberKeySciFi.txt
$ cat numberKeySciFi.txt
1984 101
Wars 1138
Pi 3.14
Trek 1701
Back 88
$
$ sort -n -t ' ' -k 2 numberKeySciFi.txt > sortedSciFi.txt
$ cat sortedSciFi.txt
Pi 3.14
Back 88
1984 101
Wars 1138
Trek 1701
$

Keeping sorted data is especially handy when you've used a complex sort like the previous one. Another useful function when dealing with text data is searching for it. We'll cover that topic next.

Searching

You may need to locate a text file within the virtual directory structure, or just search through a file for text. Either way, Linux provides you lots of options to accomplish your task.

LOCATING FILES

A simple utility to use in finding files quickly is the locate program. What makes it fast is that this command searches a database that is pre‐filled with filenames and their locations.

To find a file with the locate command, just enter locate followed by the file's name you want to find. If the file is on your system and you have permission to view it, the locate utility will display the file's directory path and name as demonstrated here on an Ubuntu distribution:

$ locate .bash_history
/home/sysadmin/.bash_history
$

Another nice feature of locate is that it uses a pattern to find files. This allows you to employ partial filenames and regular expressions (covered later in this chapter) and, with the command options, ignore case. Table 8.5 shows a few of the more commonly used locate command options.

TABLE 8.5: The locate Command's Commonly Used Options

SHORT LONG DESCRIPTION
‐A ‐‐all Display only filenames that match all the patterns, instead of displaying files that match only one pattern in the pattern list.
‐b ‐‐basename Display only filenames that match the pattern and do not include any directory names that match the pattern.
‐c ‐‐count Display only the number of files whose name matches the pattern instead of displaying filenames.
‐i ‐‐ignore‐case Ignore case in the pattern for matching filenames.
‐q ‐‐quiet Do not display any error messages, such as permission denied, when processing.
‐r ‐‐regexp R Use the regular expression, R , instead of the pattern list to match filenames.
‐w ‐‐wholename Display filenames that match the pattern and include any directory names that match the pattern. This is default behavior.

Where you can run into problems with locate is when a file is newly created. Here's an example of using the touch command to create a file and then it tries to find it with the locate utility:

$ touch newFile.txt
$ ls newFile.txt
newFile.txt
$
$ locate newFile.txt
$

When a file is newly created (or downloaded), often it is not yet listed in the locate database. Typically, this database is updated only periodically. Also, when you have a newly installed Linux system, the database may not even yet exist!

To fix both of these issues, you'll need to obtain super user privileges and run the updatedb command. This will update the database, named /var/lib/mlocate/mlocate.db (or some variation), or create and update it.

$ sudo updatedb
[sudo] password for sysadmin:
$
$ locate newFile.txt
/home/sysadmin/newFile.txt
$

That's much better! Now that the database is updated, the newly created file can be found by the locate utility.

The locate command is useful when you want to find files by their name, but it's not useful when you're trying to find a file based on its size or who owns it. This is where the find command can help.

FINDING FILES

The find command is flexible. It allows you to locate files based on data, such as who owns the file, when the file was last modified, permissions set on the file, and so on. Its command‐line format is a little different.

find [path] [option] [expression]

The path argument is a starting point directory, because you designate a starting point in a directory tree, and find will search through that directory and all its subdirectories (recursively) for the file or files you seek. You can use a single period ( . ) to designate your present working directory as the starting point directory.

The expression command argument and its preceding option control what type of filters are applied to the search as well as any settings that may limit the search. Table 8.6 shows the more commonly used option and expression combinations.

TABLE 8.6: The find Command's Commonly Used Options and Expressions

OPTION EXPRESSION DESCRIPTION
‐cmin n Display names of files whose status changed n minutes ago.
‐empty N/A Display names of files that are empty and are a regular text file or a directory.
‐gid n Display names of files whose group ID is equal to n .
‐group name Display names of files whose group is name .
‐inum n Display names of files whose inode number is equal to n .
‐maxdepth n When searching for files, traverse down into the starting point directory's tree only n levels.
‐mmin n Display names of files whose data changed n minutes ago.
‐name pattern Display names of files whose name matches pattern . Many regular expression arguments may be used in the pattern and need to be enclosed in quotation marks to avoid unpredictable results. Replace ‐name with ‐iname to ignore case.
‐nogroup N/A Display names of files where no group name exists for the file's group ID.
‐nouser N/A Display names of files where no username exists for the file's user ID.
‐perm mode Display names of files whose permissions match mode . Either octal or symbolic modes may be used.
‐size n Display names of files whose size matches n . Suffixes can be used to make the size more human readable, such as G for gigabytes.
‐user name Display names of files whose owner is name .

One nice feature of find is that it will display all the files in your present working directory (and any subdirectories) that have no data in them:

$ find . -empty
[…]
./newFile.txt
./.local/share/nano
[…]
$

Unlike locate, you can quickly find files that were newly created without updating a database.

$ touch anotherNewFile.txt
$
$ find /home/sysadmin -name anotherNewFile.txt
/home/sysadmin/anotherNewFile.txt
$

You can search for files whose status was recently changed, such as when data has been added to the file.

$ nano anotherNewFile.txt
$
$ find /home/sysadmin -cmin 1
[…]
/home/sysadmin/anotherNewFile.txt
$

You can modify these find command searches at any location within the virtual directory system. However, you may want to use super user privileges to get accurate results and avoid an overload of permission denied messages.

$ sudo find / -name mlocate.db
[sudo] password for sysadmin:
/var/lib/mlocate/mlocate.db
$

Both locate and find are useful for discovering a file's location or performing a basic file analysis. However, neither of these commands can search through and display a file's contents. We'll cover a utility that does offer that feature next.

SEARCHING FOR AND THROUGH FILES

When you need a utility that lets you search for files that contain certain data, grep is the winner. The command‐line format for the grep command is as follows:

grep [options] pattern [file]

In the following example, the text files we've used or created for this chapter are listed in a single‐column format via the ls ‐1 *.txt command. Two of those files contain the word Pi, but which ones? The grep command can easily determine the correct answer.

$ ls -1 *.txt
alphabetKey.txt
anotherNewFile.txt
editorTestFile.txt
editorTestFileNano.txt
newFile.txt
numberKey.txt
numberKeySciFi.txt
sortedSciFi.txt
$
$ cat numberKeySciFi.txt
1984 101
Wars 1138
Pi 3.14
Trek 1701
Back 88
$
$ cat sortedSciFi.txt
Pi 3.14
Back 88
1984 101
Wars 1138
Trek 1701
$
$ grep Pi *.txt
numberKeySciFi.txt:Pi 3.14
sortedSciFi.txt:Pi 3.14
$

In the preceding example, the pattern used with grep was Pi, and in the current directory, all the text files, *.txt, were searched. The grep command lists the search results by displaying each file's name that contains the pattern and then shows the entire text line that has the pattern .

There are some nice options you can use in your grep searches. Table 8.7 shows some of the more commonly used grep utility options.

TABLE 8.7: The grep Command's Commonly Used Options

SHORT LONG DESCRIPTION
‐c ‐‐count Display a count of text file records that contain a PATTERN match.
‐d action ‐‐directories= action When a file is a directory, if action is set to read, read the directory as if it were a regular text file; if action is set to skip, ignore the directory; and if action is set to recurse, act as if the ‐R, ‐r, or ‐‐recursive option was used.
‐E ‐‐extended‐regexp Designate the PATTERN as an extended regular expression.
‐i ‐‐ignore‐case Ignore the case in the PATTERN as well as in any text file records.
‐R, ‐r ‐‐recursive Search a directory's contents, and for any subdirectory within the original directory tree, consecutively search its contents as well (recursively).
‐v ‐‐invert‐match Display only text file's records that do not contain a PATTERN match.

When searching through larger sections of the virtual directory structure for files containing certain data, it's a good idea to employ the ‐d skip option so that grep doesn't complain at you when it encounters a directory file.

$ grep -d skip sysadmin /etc/*
grep: /etc/at.deny: Permission denied
/etc/group:sysadmin:x:1000:
[…]
/etc/passwd:sysadmin:x:1000:1000:[…]:/home/sysadmin:/bin/bash
[…]
$

Notice that the grep utility found the word sysadmin in two files. This is a real time‐saver when you're trying to locate data. However, also notice that a Permission denied message was produced. You'll need to use super user privileges to search through files that require higher permission levels to look through.

You can also use grep to conduct searches on one particular file. Often in a large file, you have to look for a specific line of data buried somewhere in the middle of the file. Instead of manually scrolling through the entire file, you can let the grep command search for you.

$ grep bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
sysadmin:x:1000:1000:[…]:/home/sysadmin:/bin/bash
$

When looking for a particular piece of data whose case you cannot remember, use the ‐i option to make grep case‐insensitive.

$ sudo grep -d skip Ubuntu-Server /etc/*
[sudo] password for sysadmin:
$
$ sudo grep -i -d skip Ubuntu-Server /etc/*
/etc/hostname:ubuntu-server
/etc/hosts:127.0.1.1 ubuntu-server
$

The ‐d skip and ‐i options, along with super user privileges, make your grep search results cleaner and provide you with faster results.

You can conduct rather complex searches with grep by using regular expressions. The grep can even handle extended regular expressions, if you use the ‐E option.

$ grep -E "(^root|^sysadmin)" /etc/passwd
root:x:0:0:root:/root:/bin/bash
sysadmin:x:1000:1000:[…]:/home/sysadmin:/bin/bash
$

The grep ‐E command is the more modern version of the egrep utility. The two are functionally the same, but egrep is now deprecated. When a command is deprecated, this means that it may not be available in the future, so you should stop using it as soon as possible and start using its modern equal.

Compressing

Linux contains several file compression utilities that allow you to easily compress large files into smaller files that take up less space. While this may sound great, it often leads to confusion and chaos when you're trying to determine which utility to use. The following popular utilities are available on Linux:

  • gzip
  • bzip2
  • xz

The advantages and disadvantages of each of these data compression methods are explored in this section.

  • gzip The gzip utility was developed in 1992 as a replacement for the old compress program. Achieving text‐based file compression rates of 60–70 percent, gzip has long been a popular data compression utility. To compress a file, simply type in gzip followed by the file's name. The original file is replaced by a compressed version with a .gz filename extension. To reverse the operation, type in gunzip followed by the compressed file's name.
  • bzip2 Developed in 1996, the bzip2 utility offers higher compression rates than gzip but takes slightly longer to perform the data compression. There was a bzip program, but it had some patent issues, so bzip2 was created to replace it.

    The bzip2 utility employs multiple layers of compression techniques and algorithms. Until 2013, this data compression utility was used to compress the Linux kernel for distribution. To compress a file, simply type in bzip2 followed by the file's name. The original file is replaced by a compressed version with a .bz2 file extension. To reverse the operation, type in bunzip2 followed by the compressed file's name, which decompresses (deflates) the data.

  • xz Developed in 2009, the xz data compression utility quickly became popular among Linux administrators. It boasts a higher default compression rate than bzip2 and gzip. In 2013, the xz compression utility replaced bzip2 for compressing the Linux kernel for distribution. To compress a file, simply type in xz followed by the file's name. The original file is replaced by a compressed version with an .xz file extension. To reverse the operation, type in unxz followed by the compressed file's name.

It's helpful to see a side‐by‐side comparison of some of the compression utilities using their defaults. Here is a compression comparison example on an Ubuntu distribution:

$ ls -hs /var/log/syslog
344K /var/log/syslog
$
$ cp /var/log/syslog syslog1
$ cp /var/log/syslog syslog2
$ cp /var/log/syslog syslog3
$
$ gzip syslog1
$ bzip2 syslog2
$ xz syslog3
$
 
$ ls -hs syslog?.*
72K syslog1.gz  40K syslog2.bz2  32K syslog3.xz
$
 

In the preceding example, first the /var/log/syslog file size is shown, which is 344 K. (You can use /var/log/lastlog in place of /var/log/syslog for this comparison on a CentOS distribution.) Then the file is copied three times to the local directory using a new filename each time. Next, three compression utilities are used. After the files are compressed with the various utilities, another ls ‐hs command displays the compressed files' names and their sizes. You can see that the xz program produces the highest compression of this file, because its file, syslog3.xz, is the smallest in size.

Compression goes hand in hand with backing up files, because the resulting file containing a backup is often rather large. We'll cover backing up files next.

Archiving

Backing up files is often called archiving, especially in the Linux world. There are several programs you can employ for managing backups. Some of the more popular products are Amanda, Bacula, Bareos, Duplicity, and BackupPC. Yet, often these GUI and/or web‐based programs have command‐line utilities at their core, which include the following:

  • cpio
  • dd
  • rsync
  • tar

The tar command was originally used to write files to a tape device for archiving. However, it can also write the output to a file, which has become a popular way to archive data in Linux, and that's the command we'll focus on in this chapter.

The tar command copies the selected files and stores them in a single file. This file is called a tar archive file. If this archive file is compressed using a data compression utility, the compressed archive file is called a tarball.

The tar program has several useful options. Table 8.8 describes the more commonly used ones for creating data backups.

TABLE 8.8: The tar Command's Commonly Used Archive Creation Options

SHORT LONG DESCRIPTION
‐c ‐‐create Creates a tar archive file. The backup can be a full or incremental backup, depending upon the other selected options.
‐u ‐‐update Appends files to an existing tar archive file, but only copies those files that were modified since the original archive file was created
‐g ‐‐listed‐incremental Creates an incremental or full archive based upon metadata stored in the provided file
‐z ‐‐gzip Compresses a tar archive file into a tarball using gzip
‐j ‐‐bzip2 Compresses a tar archive file into a tarball using bzip2
‐J ‐‐xz Compresses a tar archive file into a tarball using xz
‐v ‐‐verbose Displays each file's name as each file is processed

Notice that there are some compression options in Table 8.8. When you use a compression utility along with an archive and restore program for data backups, it is vital that you use a lossless compression method. A lossless compression is just as it sounds: no data is lost. The gzip, bzip2, and xz utilities provide lossless compression. Obviously, it is important not to lose data when doing backups!

To create an archive using the tar utility, you have to add a few arguments for the options and the command.

$ ls n*.txt
newFile.txt  numberKey.txt  numberKeySciFi.txt
$
$ tar -cvf archive.tar n*.txt
newFile.txt
numberKey.txt
numberKeySciFi.txt
$

In the preceding example, three options are used.

  • The ‐c option creates the tar archive.
  • The ‐v option displays the filenames as they are placed into the archive file.
  • The ‐f option designates the archive filename, which is archive.tar.

Though not required, it is considered good form to use the .tar extension on tar archive files. The example command's last argument designates the files to copy into this archive.

If you are backing up lots of files or large amounts of data, it is a good idea to employ a compression utility. This is easily accomplished by adding an additional switch to your tar command options. Here gzip compression is used to create a tarball:

$ ls -hs /var/log/syslog.*
132K /var/log/syslog.1  168K /var/log/syslog.2.gz
$
$ tar -zcvf syslog.tar.gz /var/log/syslog.*
tar: Removing leading `/' from member names
/var/log/syslog.1
tar: Removing leading `/' from hard link targets
/var/log/syslog.2.gz
$
$ ls -hs syslog.tar.gz
196K syslog.tar.gz
$

There are a couple of things to note in this example. First, look at the tar: Removing leading messages. The tar utility strips off the first forward slash (/) in filenames so that they can be restored anywhere in the future. If that forward slash was left in there, the files would only go back to their original location in the virtual directory structure, which is not very flexible.

The next thing to note in the preceding example is that the tarball filename has the .tar.gz file extension. It is considered good form to use the .tar extension and tack on an indicator showing the compression method that was used. However, you can shorten it to .tgz if desired.

Whenever you create data backups, it is a good practice to verify them. Table 8.9 provides some tar command options for viewing and verifying data backups.

TABLE 8.9: The tar Command's Commonly Used Archive Verification Options

SHORT LONG DESCRIPTION
‐d ‐‐compare
‐‐diff
Compares a tar archive file's members with external files and lists the differences
‐t ‐‐list Displays a tar archive file's contents
‐W ‐‐verify Verifies each file as the file is processed. This option cannot be used with the compression options.

Backup verification can take several different forms. You might ensure that the desired files (sometimes called members) are included in your backup by using the ‐v option on the tar command in order to watch the files being listed as they are included in the archive file. You can also verify that desired files are included in your backup after the fact. Use the ‐t option to list a tarball or archive file's contents, as shown here:

$ tar -tf archive.tar
newFile.txt
numberKey.txt
numberKeySciFi.txt
$
$ tar -tf syslog.tar.gz
var/log/syslog.1
var/log/syslog.2.gz
$

Table 8.10 lists some of the options that you can use with the tar utility to restore data from a tar archive file or tarball. Several options used to create the backup, such as ‐g, are also available when restoring data.

TABLE 8.10: The tar Command's Commonly Used File Restore Options

SHORT LONG DESCRIPTION
‐x ‐‐extract
‐‐get
Extracts files from a tarball or archive file and places them in the current working directory
‐z ‐‐gunzip Decompresses files in a tarball using gunzip
‐j ‐‐bunzip2 Decompresses files in a tarball using bunzip2
‐J ‐‐unxz Decompresses files in a tarball using unxz

Extracting files from an archive or tarball is fairly simple using the tar utility. Here is an example of extracting files from our previously created tarball:

$ mkdir Extract
$ mv syslog.tar.gz Extract/
$ cd Extract
$
$ tar -zxvf syslog.tar.gz
var/log/syslog.1
var/log/syslog.2.gz
$
$ ls -F
syslog.tar.gz  var/
$
$ ls var/log/
syslog.1  syslog.2.gz
$

In the previous example, a new subdirectory, Extract, is created. The tarball is moved to the new subdirectory, and then the files are restored from the tarball. Notice that instead of putting the files in the top level of the new subdirectory, Extract, they were instead placed in the var/log subdirectory. That's because tar removed the leading forward slash of the original files but kept the rest of the directory reference. This is a rather useful feature of the tar utility.

Using the tar command is a simple way to create a backup file of various files. You can also create archive files of entire directory structures. This is a common method for distributing source code files for open source applications in the Linux world.

The Bottom Line

  • Use the vim editor's basic features. The vim editor is one of the most popular text editors in use. Though it can be tricky to use, modifying text files using vim is worth the time to learn. Grasping the basics of the vim editor is all that is needed for a system admin.
    • Master It Imagine that you just opened up a configuration file in the vim editor. You only want to quickly add a paragraph of comments to the top of the file. What editor commands can you employ to accomplish this task quickly?
  • Employ the nano editor for everyday text file editing. The nano text editor is a simple and quick editor to use in your daily work. You can quickly get into a file, make any needed modifications, save your work, and go on with other tasks. It's a favorite editor of system administrators because of its simplicity.
    • Master It You need to quickly edit a text file by copying two lines of text from the top of the file to the bottom of the text file. Assuming you are already in the nano editor with this file in the buffer, what editor commands covered in this chapter can you use to accomplish this task quickly?
  • Find data in a text file, and reduce its size. To quickly find files that contain certain data, the grep command is a utility to learn. With its ability to conduct simple or complex searches, locating the information or the files you need is a snap.
    • Master It You need to find all the files in the /etc directory (but not its subdirectories) that contain the word host. The search must be case‐insensitive, and you don't want to see any error messages concerning directory files. Assuming you need to use the sudo command along with your grep command, what will your command look like to conduct this search?
  • Back up and organize text file data. The tar utility has been around for a long time. It provides useful options to create archive files. While tar has the ability to compress files on the fly, you can also use the gzip, bzip2, and xz compression utilities to compress tar archive files as well as other files.
    • Master It You created a tar archive file, myArchive.tar, but did not compress it with a tar option, because you needed to verify each file as it was processed with the ‐W option. Now that the archive file was successfully created and verified, what command will you use to compress it to the highest level, and what will the resulting file's name be?