Chapter 8
Working with Text Files

A proficient Linux systems administrator is capable of using at least one text editor in Linux. This skill is needed to manage daily work such as modifying configuration files and creating shell scripts. You have the choice of several editors. Many individuals find a particular editor whose functionality they love and use that one exclusively. This chapter provides a brief sampling of two text editors that are popular with admins.

When managing any computer server, you'll often have files that contain large amounts of text data. It's typically difficult to handle the information and make it useful. It's easy to get data overload when working with system commands. Fortunately, Linux provides several command‐line utilities that help you manage large amounts of data.

The vim Editor

When managing a Linux system, it's wise to gain proficiency using at least one text editor. Using features such as searching, cutting, and pasting allows you to modify configuration files more quickly. In Chapter 19, “Writing Scripts,” you'll need the text editor skills that you learn in this chapter to quickly create Bash shell scripts. This section walks you through the basics of using the vim editor, which is typically available for most Linux distros.

Checking Your vim Editor Program

Before you begin your exploration of the vim editor, it's a good idea to understand what “flavor” of vim your Linux system has installed. On some distributions, you will have a full functioning vim editor installed, but not on others, which can cause you some difficulties.

Three commands can help you determine whether your Linux distro has a fully functioning vim editor.

The alias Command To display whether a command calls a different command and/or uses additional command options, you type alias command‐name at the command‐line interface (CLI). For our purposes here, we need to see if the typical command to run the vim editor, vi, is aliased to the vim command or not. Thus, alias vi will let us know this information. If vi is not aliased, it's our first clue that a fully functioning vim editor is probably not installed.
The which Command This particular command is useful in many circumstances. When you type which command‐name , it shows you exactly where the program associated with the command‐name resides in the Linux virtual directory system. For our investigation, we'll need to use the which vim command to get the location, so we can properly use the next command, readlink.
The readlink Command When researching whether you have a fully functioning vim editor, you need to determine if a symbolic link is involved with the vim program name. Sometimes when a vim editor is installed, the vim program file, which by all initial appearances is fully functioning, is symbolically linked to a lesser vim flavor. We could adopt the ls ‐l technique used in Chapter 7, “Exploring Linux File Management,” to investigate the soft links, but vim file soft links are often chained so that one soft link points to another soft link, which points to an additional soft link, and so on. In this case, it's best to use the readlink ‐f command, which will quickly find the final file in a chain of links.

Using these three commands to check the vim editor on a CentOS distribution reveals the following:

$ alias vi
alias vi='vim'
$
$ which vim
/usr/bin/vim
$
$ readlink -f /usr/bin/vim
/usr/bin/vim
$

The vi command is aliased, which is a good sign. And there are no soft links that indicate that the vim editor program is linked to a less than full‐featured editor program. From the investigation here, you can rest assured that this CentOS distribution has a fully functioning vim editor.

Running these same commands on this Ubuntu distribution shows different results:

$ alias vi
-bash: alias: vi: not found
$
$ which vim
/usr/bin/vim
$
$ readlink -f /usr/bin/vim
/usr/bin/vim.basic
$

Notice that there is no alias for the vi command, but the /usr/bin/vim program file points to /usr/bin/vim.basic. If you have the vim.basic program as shown here on your Ubuntu system, you are OK. However, if in your investigation you find the vim.tiny program, you'll want to install the vim package to get vim.basic so that you can follow along with the vim editor examples in this chapter. Package installation for Ubuntu distributions was covered in Chapter 3, “Installing and Maintaining Software in Ubuntu.”

Setting up an alias for the vim command on your Ubuntu system is fairly easy: just type alias vi='vim' on your system. However, this alias will not survive once you log out of the system. You'll need to add it to one of your login startup files, covered in Chapter 13, “Managing Users and Groups.”

Using the vim Editor

To start using the vim text editor, type vim or vi, depending on your distribution, followed by the name of the file you want to edit or create. Figure 8.1 shows a vim text editor screen in action with a text file that was previously created.

Snapshot of using the vim text editor — FIGURE 8.1 Using the `vim` text editor

In Figure 8.1, the file being edited is the editorTestFile.txt file. The vim editor works the file data in a memory buffer, and this buffer is displayed on the screen. If you open vim without a filename, or the filename you entered doesn't yet exist, vim starts a new buffer area for editing.

The vim editor has a message area near the bottom line. If you have just opened an already created file, it will display the filename along with the number of lines and characters read into the buffer area. If you are creating a new file, you will see [New File] in the message area.

The vim editor has three standard modes.

Command Mode This is the mode vim uses when you first enter the buffer area; this is sometimes called normal mode. Here you enter keystrokes to enact commands. For example, pressing the J key will move your cursor down one line. This is the best mode to use for quickly moving around the buffer area.
Insert Mode Insert mode is also called edit or entry mode. This is the mode where you can perform simple editing. There are not many commands or special mode keystrokes. You enter this mode from Command mode by pressing the I key. At this point, the message ‐‐Insert‐‐ will display in the message area. You leave this mode by pressing the Esc key.
Ex Mode This mode is sometimes also called colon commands because every command entered here is preceded with a colon (:). For example, to leave the vim editor and not save any changes, you type :q and press the Enter key.

Since you start in Command mode when entering the vim editor's buffer area, it's good to understand a few of the commonly used commands to move around in this mode. Table 8.1 contains several commands for moving around in the editor.

TABLE 8.1: Commonly Used vim Command Mode Moving Commands

KEYSTROKE	DESCRIPTION
h	Move cursor left one character.
l	Move cursor right one character.
j	Move cursor down one line (the next line in the text).
k	Move cursor up one line (the previous line in the text).
w	Move cursor forward one word to front of next word.
e	Move cursor to end of current word.
b	Move cursor backward one word.
^	Move cursor to beginning of line.
$	Move cursor to end of line.
gg	Move cursor to the file's first line.
G	Move cursor to the file's last line.
`n` G	Move cursor to file line number `n` .
Ctrl+B	Scroll up almost one full screen.
Ctrl+F	Scroll down almost one full screen.
Ctrl+U	Scroll up half of a screen.
Ctrl+D	Scroll down half of a screen.
Ctrl+Y	Scroll up one line.
Ctrl+E	Scroll down one line.

Quickly moving around in the vim editor buffer is useful. However, there are also several editing commands that help to speed up your modification process. For example, by moving your cursor to a word's first letter and pressing CW, the word is deleted, and you are thrown into Insert mode. You can then type in the new word and press Esc to leave Insert mode.

Once you have made any needed text changes in the vim buffer area, it's time to save your work. You can type ZZ in Command mode to write the buffer to disk and exit your process from the vim editor.

The third vim mode, Ex mode, has additional handy commands. You must be in Command mode to enter into Ex mode. You cannot jump from Insert mode to Ex mode. Therefore, if you're currently in Insert mode, press the Esc key to go back to Command mode first.

Table 8.2 shows several Ex commands that can help you manage your text file. Notice that all the keystrokes include the necessary colon (:) to use Ex commands.

TABLE 8.2: Commonly Used vim Ex Mode Commands

KEYSTROKES	DESCRIPTION
:x	Write buffer to file and quit editor.
:wq	Write buffer to file and quit editor.
:wq!	Write buffer to file and quit editor (overrides protection).
:w	Write buffer to file and stay in editor.
:w!	Write buffer to file and stay in editor (overrides protection).
:q	Quit editor without writing buffer to file.
:q!	Quit editor without writing buffer to file (overrides protection).
:! `command`	Execute shell `command` and display results, but don't quit editor.
:r! `command`	Execute shell `command` and include the results in editor buffer area.
:r `file`	Read `file` contents and include them in editor buffer area.

After reading through the various mode commands, you may see why some people despise the vim editor. There are a lot of obscure commands to know. However, some people love the vim editor because it is so powerful.

It's tempting to learn only one text editor and ignore the others. But knowing at least two text editors is useful in your day‐to‐day Linux work. For complex editing and writing programs, the vim editor is one of the most popular text editors. However, if you just need to make a small change to a file, the nano text editor shines. We cover this one next.

The nano Editor

In contrast to vim, which is a complicated editor with powerful features, nano is a simple editor. For individuals who need a simple console mode text editor that is easy to navigate, nano is the tool to use. It's also a great text editor for those who are just starting on their Linux command‐line adventure.

The nano text editor is installed on most Linux distributions by default. Everything about the nano text editor is easy. To open a file at the command line with nano, enter nano filename .

If you start nano without a filename or if the file doesn't exist, nano simply opens a new buffer area for editing. If you specify an existing file on the command line, nano reads the entire contents of the file into a buffer area, where it is ready for editing, as shown in Figure 8.2.

Snapshot of the nano editor window — FIGURE 8.2 The `nano` editor window

Notice at the bottom of the nano editor window, various commands with a brief description are shown. These commands are the nano control commands. The caret (^) symbol shown represents the Ctrl key. Therefore, ^X stands for the keyboard sequence Ctrl+X. Though the nano control commands list capital letters in the keyboard sequences, you can use either lowercase or uppercase characters for control commands.

Having most of the basic commands listed right in front of you is great—no need to memorize what control command does what. Table 8.3 presents the most common nano control commands.

TABLE 8.3: nano Common Control Commands

COMMAND	DESCRIPTION
Ctrl+C	Displays the cursor's position within the text editing buffer
Ctrl+G	Displays `nano` 's main help window
Ctrl+J	Justifies the current text paragraph
Ctrl+K	Cuts the text line and stores it in the cut buffer
Ctrl+O	Writes out the current text editing buffer to a file
Ctrl+R	Reads a file into the current text editing buffer
Ctrl+T	Starts the available spell checker
Ctrl+U	Pastes text stored in the cut buffer and places in current line
Ctrl+V	Scrolls text editing buffer to the next page
Ctrl+W	Searches for word or phrases within the text editing buffer
Ctrl+X	Closes the current text editing buffer, exits `nano`, and returns to the shell
Ctrl+Y	Scrolls the text editing buffer to previous page

The control commands listed in Table 8.3 are really all you need. However, if you desire more powerful control features than those listed, nano has them. To see more control commands, press Ctrl+G in the nano text editor to display its main help window containing additional control commands.

A few of these additional control commands are called Meta‐key sequences. In the nano documentation, they are denoted by the letter M. For example, you'll find the key sequence to undo the last task denoted as M‐U in the nano help system. But don't press the M key to accomplish this. Instead, M represents the Esc, Alt, or Meta key, depending on your keyboard's configuration. Thus, you might press the Alt+U key combination to undo the last task within nano.

Real World Scenario

CREATING AND MODIFYING A FILE WITH THE NANO TEXT EDITOR

When you're learning to use text editors, experience is the best tutor. Follow these steps to create, save, and then modify a text file with the nano utility:

Log into a Linux system using the sysadmin account and the password you created for it.
Create a new text file with nano by typing nano QuestAns.txt and pressing the Enter key.
From “The Bottom Line” section of this chapter, type in the text for the “Master It” paragraph concerning the vim editor and then the text in the “Master It” paragraph for the nano editor.
Save the file by pressing the Ctrl+O key combination, and then press Enter when the editor asks you the file's name.
Exit the editor by pressing the Ctrl+X key combination.
Determine the solutions for both the “Master It” questions before you return to modify the QuestAns.txt text file.
Return to edit the text file by typing nano QuestAns.txt and pressing the Enter key.
Add each solution under its appropriate “Master It” paragraph in the text file. Try to use as many key combinations in Table 8.3 for movement through the editor as possible.
When you are finished adding the solutions, save the file by pressing the Ctrl+O key combination, and then press Enter when the editor asks you the file's name.
Exit the editor by pressing the Ctrl+X key combination.

Using text editors at the CLI is one way to work with text files. The rest of this chapter explores additional methods you can use to manipulate text file data.

Working with Data Files

When you have a large amount of data, it's often difficult to handle the information and make it useful. The Linux system provides several CLI tools to help you manage large amounts of data. This section covers the basic commands that every system administrator—as well as any everyday Linux user—should know how to use to make their lives easier.

Sorting

Often, to understand the data within text files, you need to reformat file data in some way. The sort utility sorts a file's data. It makes no changes to the original file. It only reads the file, sorts its data, and displays the sorted data to STDOUT (covered in Chapter 6, “Working with the Shell”).

If you want to order a file's content alphabetically, simply enter the sort command followed by the name of the file you want to sort.

$ nano alphabetKey.txt
$ cat alphabetKey.txt
Alpha
Tango
Sierra
Bravo
Foxtrot
Echo
$
$ sort alphabetKey.txt
Alpha
Bravo
Echo
Foxtrot
Sierra
Tango
$

It's pretty simple. However, things aren't always as easy as they appear. Take a look at this example:

$ nano numberKey.txt
$ cat numberKey.txt
1 One
2 Two
4 Four
3 Three
10 Ten
20 Twenty
100 One Hundred
$
$ sort numberKey.txt
1 One
10 Ten
100 One Hundred
2 Two
20 Twenty
3 Three
4 Four
$

If you were expecting the numbers to sort in numerical order, you were disappointed. By default, the sort command interprets numbers as characters, producing a sorted output that you may not want. Add the ‐n option to the command to sort the text based on their numerical values, if that's what you are seeking.

$ sort -n numberKey.txt
1 One
2 Two
3 Three
4 Four
10 Ten
20 Twenty
100 One Hundred
$

There are several useful sort parameters you can use depending on what kind of sort is needed. Table 8.4 shows commonly used options.

TABLE 8.4: Commonly Used sort Command Options

SINGLE DASH	DOUBLE DASH	DESCRIPTION
`‐b`	`‐‐ignore‐leading‐blanks`	Ignore leading blanks when sorting.
`‐d`	`‐‐dictionary‐order`	Consider only blanks and alphanumeric characters; don't consider special characters.
`‐f`	`‐‐ignore‐case`	By default, `sort` orders capitalized letters first. This parameter ignores case.
`‐g`	`‐‐general‐numeric‐sort`	Use general numerical value to sort.
`‐i`	`‐‐ignore‐nonprinting`	Ignore nonprintable characters in the sort.
`‐k`	`‐‐key=` `POS1` `[,` `POS2` `]`	Sort based on position `POS1` , and end at `POS2` if specified.
`‐n`	`‐‐numeric‐sort`	Sort by string numerical value.
`‐o`	`‐‐output=` `file`	Write results to `file` specified.
`‐r`	`‐‐reverse`	Reverse the sort order (descending instead of ascending).
`‐t`	`‐‐field‐separator=` `SEP`	Specify the character used to distinguish key positions.
`‐z`	`‐‐zero‐terminated`	End all lines with a NULL character instead of a new line.

Viewing sorted data is helpful, but what do you do if you want to keep that sorted data? STDOUT redirection (covered in Chapter 6) can help here:

$ nano numberKeySciFi.txt
$ cat numberKeySciFi.txt
1984 101
Wars 1138
Pi 3.14
Trek 1701
Back 88
$
$ sort -n -t ' ' -k 2 numberKeySciFi.txt > sortedSciFi.txt
$ cat sortedSciFi.txt
Pi 3.14
Back 88
1984 101
Wars 1138
Trek 1701
$

Keeping sorted data is especially handy when you've used a complex sort like the previous one. Another useful function when dealing with text data is searching for it. We'll cover that topic next.

Searching

You may need to locate a text file within the virtual directory structure, or just search through a file for text. Either way, Linux provides you lots of options to accomplish your task.

LOCATING FILES

A simple utility to use in finding files quickly is the locate program. What makes it fast is that this command searches a database that is pre‐filled with filenames and their locations.

To find a file with the locate command, just enter locate followed by the file's name you want to find. If the file is on your system and you have permission to view it, the locate utility will display the file's directory path and name as demonstrated here on an Ubuntu distribution:

$ locate .bash_history
/home/sysadmin/.bash_history
$

Another nice feature of locate is that it uses a pattern to find files. This allows you to employ partial filenames and regular expressions (covered later in this chapter) and, with the command options, ignore case. Table 8.5 shows a few of the more commonly used locate command options.

TABLE 8.5: The locate Command's Commonly Used Options

SHORT	LONG	DESCRIPTION
`‐A`	`‐‐all`	Display only filenames that match all the patterns, instead of displaying files that match only one pattern in the pattern list.
`‐b`	`‐‐basename`	Display only filenames that match the pattern and do not include any directory names that match the pattern.
`‐c`	`‐‐count`	Display only the number of files whose name matches the pattern instead of displaying filenames.
`‐i`	`‐‐ignore‐case`	Ignore case in the pattern for matching filenames.
`‐q`	`‐‐quiet`	Do not display any error messages, such as `permission denied`, when processing.
`‐r`	`‐‐regexp` `R`	Use the regular expression, `R` , instead of the pattern list to match filenames.
`‐w`	`‐‐wholename`	Display filenames that match the pattern and include any directory names that match the pattern. This is default behavior.

Where you can run into problems with locate is when a file is newly created. Here's an example of using the touch command to create a file and then it tries to find it with the locate utility:

$ touch newFile.txt
$ ls newFile.txt
newFile.txt
$
$ locate newFile.txt
$

When a file is newly created (or downloaded), often it is not yet listed in the locate database. Typically, this database is updated only periodically. Also, when you have a newly installed Linux system, the database may not even yet exist!

To fix both of these issues, you'll need to obtain super user privileges and run the updatedb command. This will update the database, named /var/lib/mlocate/mlocate.db (or some variation), or create and update it.

$ sudo updatedb
[sudo] password for sysadmin:
$
$ locate newFile.txt
/home/sysadmin/newFile.txt
$

That's much better! Now that the database is updated, the newly created file can be found by the locate utility.

The locate command is useful when you want to find files by their name, but it's not useful when you're trying to find a file based on its size or who owns it. This is where the find command can help.

FINDING FILES

The find command is flexible. It allows you to locate files based on data, such as who owns the file, when the file was last modified, permissions set on the file, and so on. Its command‐line format is a little different.

find [path] [option] [expression]

The path argument is a starting point directory, because you designate a starting point in a directory tree, and find will search through that directory and all its subdirectories (recursively) for the file or files you seek. You can use a single period ( . ) to designate your present working directory as the starting point directory.

The expression command argument and its preceding option control what type of filters are applied to the search as well as any settings that may limit the search. Table 8.6 shows the more commonly used option and expression combinations.

TABLE 8.6: The find Command's Commonly Used Options and Expressions

OPTION	EXPRESSION	DESCRIPTION
`‐cmin`	`n`	Display names of files whose status changed `n` minutes ago.
`‐empty`	N/A	Display names of files that are empty and are a regular text file or a directory.
`‐gid`	`n`	Display names of files whose group ID is equal to `n` .
`‐group`	`name`	Display names of files whose group is `name` .
`‐inum`	`n`	Display names of files whose inode number is equal to `n` .
`‐maxdepth`	`n`	When searching for files, traverse down into the starting point directory's tree only `n` levels.
`‐mmin`	`n`	Display names of files whose data changed `n` minutes ago.
`‐name`	`pattern`	Display names of files whose name matches `pattern` . Many regular expression arguments may be used in the `pattern` and need to be enclosed in quotation marks to avoid unpredictable results. Replace `‐name` with `‐iname` to ignore case.
`‐nogroup`	N/A	Display names of files where no group name exists for the file's group ID.
`‐nouser`	N/A	Display names of files where no username exists for the file's user ID.
`‐perm`	`mode`	Display names of files whose permissions match `mode` . Either octal or symbolic modes may be used.
`‐size`	`n`	Display names of files whose size matches `n` . Suffixes can be used to make the size more human readable, such as G for gigabytes.
`‐user`	`name`	Display names of files whose owner is `name` .

One nice feature of find is that it will display all the files in your present working directory (and any subdirectories) that have no data in them:

$ find . -empty
[…]
./newFile.txt
./.local/share/nano
[…]
$

Unlike locate, you can quickly find files that were newly created without updating a database.

$ touch anotherNewFile.txt
$
$ find /home/sysadmin -name anotherNewFile.txt
/home/sysadmin/anotherNewFile.txt
$

You can search for files whose status was recently changed, such as when data has been added to the file.

$ nano anotherNewFile.txt
$
$ find /home/sysadmin -cmin 1
[…]
/home/sysadmin/anotherNewFile.txt
$

You can modify these find command searches at any location within the virtual directory system. However, you may want to use super user privileges to get accurate results and avoid an overload of permission denied messages.

$ sudo find / -name mlocate.db
[sudo] password for sysadmin:
/var/lib/mlocate/mlocate.db
$

Both locate and find are useful for discovering a file's location or performing a basic file analysis. However, neither of these commands can search through and display a file's contents. We'll cover a utility that does offer that feature next.

SEARCHING FOR AND THROUGH FILES

When you need a utility that lets you search for files that contain certain data, grep is the winner. The command‐line format for the grep command is as follows:

grep [options] pattern [file]

In the following example, the text files we've used or created for this chapter are listed in a single‐column format via the ls ‐1 *.txt command. Two of those files contain the word Pi, but which ones? The grep command can easily determine the correct answer.

$ ls -1 *.txt
alphabetKey.txt
anotherNewFile.txt
editorTestFile.txt
editorTestFileNano.txt
newFile.txt
numberKey.txt
numberKeySciFi.txt
sortedSciFi.txt
$
$ cat numberKeySciFi.txt
1984 101
Wars 1138
Pi 3.14
Trek 1701
Back 88
$
$ cat sortedSciFi.txt
Pi 3.14
Back 88
1984 101
Wars 1138
Trek 1701
$
$ grep Pi *.txt
numberKeySciFi.txt:Pi 3.14
sortedSciFi.txt:Pi 3.14
$

In the preceding example, the pattern used with grep was Pi, and in the current directory, all the text files, *.txt, were searched. The grep command lists the search results by displaying each file's name that contains the pattern and then shows the entire text line that has the pattern .

There are some nice options you can use in your grep searches. Table 8.7 shows some of the more commonly used grep utility options.

TABLE 8.7: The grep Command's Commonly Used Options

SHORT	LONG	DESCRIPTION
`‐c`	`‐‐count`	Display a count of text file records that contain a `PATTERN` match.
`‐d` `action`	`‐‐directories=` `action`	When a file is a directory, if `action` is set to `read`, read the directory as if it were a regular text file; if action is set to `skip`, ignore the directory; and if action is set to `recurse`, act as if the `‐R`, `‐r`, or `‐‐recursive` option was used.
`‐E`	`‐‐extended‐regexp`	Designate the `PATTERN` as an extended regular expression.
`‐i`	`‐‐ignore‐case`	Ignore the case in the `PATTERN` as well as in any text file records.
`‐R, ‐r`	`‐‐recursive`	Search a directory's contents, and for any subdirectory within the original directory tree, consecutively search its contents as well (recursively).
`‐v`	`‐‐invert‐match`	Display only text file's records that do not contain a `PATTERN` match.

When searching through larger sections of the virtual directory structure for files containing certain data, it's a good idea to employ the ‐d skip option so that grep doesn't complain at you when it encounters a directory file.

$ grep -d skip sysadmin /etc/*
grep: /etc/at.deny: Permission denied
/etc/group:sysadmin:x:1000:
[…]
/etc/passwd:sysadmin:x:1000:1000:[…]:/home/sysadmin:/bin/bash
[…]
$

Notice that the grep utility found the word sysadmin in two files. This is a real time‐saver when you're trying to locate data. However, also notice that a Permission denied message was produced. You'll need to use super user privileges to search through files that require higher permission levels to look through.

You can also use grep to conduct searches on one particular file. Often in a large file, you have to look for a specific line of data buried somewhere in the middle of the file. Instead of manually scrolling through the entire file, you can let the grep command search for you.

$ grep bash /etc/passwd
root:x:0:0:root:/root:/bin/bash
sysadmin:x:1000:1000:[…]:/home/sysadmin:/bin/bash
$

When looking for a particular piece of data whose case you cannot remember, use the ‐i option to make grep case‐insensitive.

$ sudo grep -d skip Ubuntu-Server /etc/*
[sudo] password for sysadmin:
$
$ sudo grep -i -d skip Ubuntu-Server /etc/*
/etc/hostname:ubuntu-server
/etc/hosts:127.0.1.1 ubuntu-server
$

The ‐d skip and ‐i options, along with super user privileges, make your grep search results cleaner and provide you with faster results.

REGULAR EXPRESSIONS

Instead of looking for a word using grep, you can conduct a search using a regular expression. A regular expression is a pattern template you define for a utility, such as grep, which uses a pattern template to filter text. Basic regular expressions (BREs) include characters, such as a dot followed by an asterisk (.*), to represent multiple characters and a single dot (.) to represent one character. They also may use brackets to represent multiple characters, such as [a,e,i,o,u] or a range of characters, such as [A‐z]. To find text file records that begin with particular characters, you can proceed them with a caret (^) symbol. For finding text file records where particular characters are at the record's end, succeed them with a dollar sign ($) symbol.

In addition to BREs, there are extended regular expressions (EREs), which have more complex pattern templates. When using grep with EREs, you must add the ‐E option to the command. An ERE pattern example is "word1|word2", which tells grep that if a text line contains either word1 or word2, display it. If you'd like a more in‐depth look at regular expressions, our favorite resource is Chapter 20 in the book Linux Command Line and Shell Scripting Bible by Blum and Bresnahan (Wiley, 2021).

You can conduct rather complex searches with grep by using regular expressions. The grep can even handle extended regular expressions, if you use the ‐E option.

$ grep -E "(^root|^sysadmin)" /etc/passwd
root:x:0:0:root:/root:/bin/bash
sysadmin:x:1000:1000:[…]:/home/sysadmin:/bin/bash
$

The grep ‐E command is the more modern version of the egrep utility. The two are functionally the same, but egrep is now deprecated. When a command is deprecated, this means that it may not be available in the future, so you should stop using it as soon as possible and start using its modern equal.

Compressing

Linux contains several file compression utilities that allow you to easily compress large files into smaller files that take up less space. While this may sound great, it often leads to confusion and chaos when you're trying to determine which utility to use. The following popular utilities are available on Linux:

gzip
bzip2
xz

The advantages and disadvantages of each of these data compression methods are explored in this section.

gzip The gzip utility was developed in 1992 as a replacement for the old compress program. Achieving text‐based file compression rates of 60–70 percent, gzip has long been a popular data compression utility. To compress a file, simply type in gzip followed by the file's name. The original file is replaced by a compressed version with a .gz filename extension. To reverse the operation, type in gunzip followed by the compressed file's name.
bzip2 Developed in 1996, the bzip2 utility offers higher compression rates than gzip but takes slightly longer to perform the data compression. There was a bzip program, but it had some patent issues, so bzip2 was created to replace it.
The bzip2 utility employs multiple layers of compression techniques and algorithms. Until 2013, this data compression utility was used to compress the Linux kernel for distribution. To compress a file, simply type in bzip2 followed by the file's name. The original file is replaced by a compressed version with a .bz2 file extension. To reverse the operation, type in bunzip2 followed by the compressed file's name, which decompresses (deflates) the data.
xz Developed in 2009, the xz data compression utility quickly became popular among Linux administrators. It boasts a higher default compression rate than bzip2 and gzip. In 2013, the xz compression utility replaced bzip2 for compressing the Linux kernel for distribution. To compress a file, simply type in xz followed by the file's name. The original file is replaced by a compressed version with an .xz file extension. To reverse the operation, type in unxz followed by the compressed file's name.

It's helpful to see a side‐by‐side comparison of some of the compression utilities using their defaults. Here is a compression comparison example on an Ubuntu distribution:

$ ls -hs /var/log/syslog
344K /var/log/syslog
$
$ cp /var/log/syslog syslog1
$ cp /var/log/syslog syslog2
$ cp /var/log/syslog syslog3
$
$ gzip syslog1
$ bzip2 syslog2
$ xz syslog3
$
 
$ ls -hs syslog?.*
72K syslog1.gz  40K syslog2.bz2  32K syslog3.xz
$

In the preceding example, first the /var/log/syslog file size is shown, which is 344 K. (You can use /var/log/lastlog in place of /var/log/syslog for this comparison on a CentOS distribution.) Then the file is copied three times to the local directory using a new filename each time. Next, three compression utilities are used. After the files are compressed with the various utilities, another ls ‐hs command displays the compressed files' names and their sizes. You can see that the xz program produces the highest compression of this file, because its file, syslog3.xz, is the smallest in size.

Compression goes hand in hand with backing up files, because the resulting file containing a backup is often rather large. We'll cover backing up files next.

Archiving

Backing up files is often called archiving, especially in the Linux world. There are several programs you can employ for managing backups. Some of the more popular products are Amanda, Bacula, Bareos, Duplicity, and BackupPC. Yet, often these GUI and/or web‐based programs have command‐line utilities at their core, which include the following:

cpio
dd
rsync
tar

The tar command was originally used to write files to a tape device for archiving. However, it can also write the output to a file, which has become a popular way to archive data in Linux, and that's the command we'll focus on in this chapter.

The tar command copies the selected files and stores them in a single file. This file is called a tar archive file. If this archive file is compressed using a data compression utility, the compressed archive file is called a tarball.

The tar program has several useful options. Table 8.8 describes the more commonly used ones for creating data backups.

TABLE 8.8: The tar Command's Commonly Used Archive Creation Options

SHORT	LONG	DESCRIPTION
`‐c`	`‐‐create`	Creates a `tar` archive file. The backup can be a full or incremental backup, depending upon the other selected options.
`‐u`	`‐‐update`	Appends files to an existing `tar` archive file, but only copies those files that were modified since the original archive file was created
`‐g`	`‐‐listed‐incremental`	Creates an incremental or full archive based upon metadata stored in the provided file
`‐z`	`‐‐gzip`	Compresses a `tar` archive file into a tarball using `gzip`
`‐j`	`‐‐bzip2`	Compresses a `tar` archive file into a tarball using `bzip2`
`‐J`	`‐‐xz`	Compresses a `tar` archive file into a tarball using `xz`
`‐v`	`‐‐verbose`	Displays each file's name as each file is processed

Notice that there are some compression options in Table 8.8. When you use a compression utility along with an archive and restore program for data backups, it is vital that you use a lossless compression method. A lossless compression is just as it sounds: no data is lost. The gzip, bzip2, and xz utilities provide lossless compression. Obviously, it is important not to lose data when doing backups!

To create an archive using the tar utility, you have to add a few arguments for the options and the command.

$ ls n*.txt
newFile.txt  numberKey.txt  numberKeySciFi.txt
$
$ tar -cvf archive.tar n*.txt
newFile.txt
numberKey.txt
numberKeySciFi.txt
$

In the preceding example, three options are used.

The ‐c option creates the tar archive.
The ‐v option displays the filenames as they are placed into the archive file.
The ‐f option designates the archive filename, which is archive.tar.

Though not required, it is considered good form to use the .tar extension on tar archive files. The example command's last argument designates the files to copy into this archive.

If you are backing up lots of files or large amounts of data, it is a good idea to employ a compression utility. This is easily accomplished by adding an additional switch to your tar command options. Here gzip compression is used to create a tarball:

$ ls -hs /var/log/syslog.*
132K /var/log/syslog.1  168K /var/log/syslog.2.gz
$
$ tar -zcvf syslog.tar.gz /var/log/syslog.*
tar: Removing leading `/' from member names
/var/log/syslog.1
tar: Removing leading `/' from hard link targets
/var/log/syslog.2.gz
$
$ ls -hs syslog.tar.gz
196K syslog.tar.gz
$

There are a couple of things to note in this example. First, look at the tar: Removing leading messages. The tar utility strips off the first forward slash (/) in filenames so that they can be restored anywhere in the future. If that forward slash was left in there, the files would only go back to their original location in the virtual directory structure, which is not very flexible.

The next thing to note in the preceding example is that the tarball filename has the .tar.gz file extension. It is considered good form to use the .tar extension and tack on an indicator showing the compression method that was used. However, you can shorten it to .tgz if desired.

Whenever you create data backups, it is a good practice to verify them. Table 8.9 provides some tar command options for viewing and verifying data backups.

TABLE 8.9: The tar Command's Commonly Used Archive Verification Options

SHORT	LONG	DESCRIPTION
`‐d`	`‐‐compare` `‐‐diff`	Compares a tar archive file's members with external files and lists the differences
`‐t`	`‐‐list`	Displays a tar archive file's contents
`‐W`	`‐‐verify`	Verifies each file as the file is processed. This option cannot be used with the compression options.

Backup verification can take several different forms. You might ensure that the desired files (sometimes called members) are included in your backup by using the ‐v option on the tar command in order to watch the files being listed as they are included in the archive file. You can also verify that desired files are included in your backup after the fact. Use the ‐t option to list a tarball or archive file's contents, as shown here:

$ tar -tf archive.tar
newFile.txt
numberKey.txt
numberKeySciFi.txt
$
$ tar -tf syslog.tar.gz
var/log/syslog.1
var/log/syslog.2.gz
$

Table 8.10 lists some of the options that you can use with the tar utility to restore data from a tar archive file or tarball. Several options used to create the backup, such as ‐g, are also available when restoring data.

TABLE 8.10: The tar Command's Commonly Used File Restore Options

SHORT	LONG	DESCRIPTION
`‐x`	`‐‐extract` `‐‐get`	Extracts files from a tarball or archive file and places them in the current working directory
`‐z`	`‐‐gunzip`	Decompresses files in a tarball using `gunzip`
`‐j`	`‐‐bunzip2`	Decompresses files in a tarball using `bunzip2`
`‐J`	`‐‐unxz`	Decompresses files in a tarball using `unxz`

Extracting files from an archive or tarball is fairly simple using the tar utility. Here is an example of extracting files from our previously created tarball:

$ mkdir Extract
$ mv syslog.tar.gz Extract/
$ cd Extract
$
$ tar -zxvf syslog.tar.gz
var/log/syslog.1
var/log/syslog.2.gz
$
$ ls -F
syslog.tar.gz  var/
$
$ ls var/log/
syslog.1  syslog.2.gz
$

In the previous example, a new subdirectory, Extract, is created. The tarball is moved to the new subdirectory, and then the files are restored from the tarball. Notice that instead of putting the files in the top level of the new subdirectory, Extract, they were instead placed in the var/log subdirectory. That's because tar removed the leading forward slash of the original files but kept the rest of the directory reference. This is a rather useful feature of the tar utility.

Real World Scenario

TRYING THE TAR COMMAND

Archiving data is a large part of protecting information. The Linux tar utility is a popular way to archive directory structures into a single file that can easily be ported to another system. Practicing the use of this particular command will help you become more efficient in your day‐to‐day system administration tasks.

The following steps will take you through trying out the archiving and restoring files:

Log into a Linux system using the sysadmin account and the password you created for it.
Copy the /etc/passwd file to use for backup practice by typing cp /etc/passwd $HOME/passwd1.txt and pressing Enter.
Do this again to make a second file by typing cp /etc/passwd $HOME/passwd2.txt and pressing Enter.
You need one more file, so type cp /etc/passwd $HOME/passwd3.txt and press Enter.
Check to ensure you have three password files by typing ls passwd?.txt and pressing Enter.
Now that you have some files to archive, use the tar utility to make a tarball. Type tar ‐zcvf PArchive.tgz passwd?.txt and press Enter. You should see the three files' names displayed as they are processed by the tar command.
Review the PArchive.tgz tarball's contents by typing tar ‐tf PArchive.tgz and pressing Enter. You should see the three files' names that were displayed in the previous command.
Before you decompress and extract the tarball's files, you'll need a place to put the files. Create a new directory in your present working directory by typing mkdirPArchiveDir and pressing Enter.
Copy the tarball file to the new subdirectory by typing cp PArchive.tgz PArchiveDir/ and pressing Enter.
Move your current working directory into the new subdirectory. Type cd PArchiveDir and press Enter.
Double‐check that you are in the correct directory by typing pwd and pressing Enter. You should see something similar to /home/sysadmin/PArchiveDir displayed.
Now you can decompress and unpack the tarball in this subdirectory. Type tar ‐zxvf PArchive.tgz and press Enter. You should see three files' names displayed as they are processed by the tar command.
Double‐check that the files were extracted by typing ls and pressing Enter. Do you see the three password files as well as the PArchive.tgz file? You should.

Using the tar command is a simple way to create a backup file of various files. You can also create archive files of entire directory structures. This is a common method for distributing source code files for open source applications in the Linux world.

The Bottom Line

Use the vim editor's basic features. The vim editor is one of the most popular text editors in use. Though it can be tricky to use, modifying text files using vim is worth the time to learn. Grasping the basics of the vim editor is all that is needed for a system admin.
- Master It Imagine that you just opened up a configuration file in the vim editor. You only want to quickly add a paragraph of comments to the top of the file. What editor commands can you employ to accomplish this task quickly?
Employ the nano editor for everyday text file editing. The nano text editor is a simple and quick editor to use in your daily work. You can quickly get into a file, make any needed modifications, save your work, and go on with other tasks. It's a favorite editor of system administrators because of its simplicity.
- Master It You need to quickly edit a text file by copying two lines of text from the top of the file to the bottom of the text file. Assuming you are already in the nano editor with this file in the buffer, what editor commands covered in this chapter can you use to accomplish this task quickly?
Find data in a text file, and reduce its size. To quickly find files that contain certain data, the grep command is a utility to learn. With its ability to conduct simple or complex searches, locating the information or the files you need is a snap.
- Master It You need to find all the files in the /etc directory (but not its subdirectories) that contain the word host. The search must be case‐insensitive, and you don't want to see any error messages concerning directory files. Assuming you need to use the sudo command along with your grep command, what will your command look like to conduct this search?
Back up and organize text file data. The tar utility has been around for a long time. It provides useful options to create archive files. While tar has the ability to compress files on the fly, you can also use the gzip, bzip2, and xz compression utilities to compress tar archive files as well as other files.
- Master It You created a tar archive file, myArchive.tar, but did not compress it with a tar option, because you needed to verify each file as it was processed with the ‐W option. Now that the archive file was successfully created and verified, what command will you use to compress it to the highest level, and what will the resulting file's name be?