Chapter 10 showed you how to edit text files using different editor programs available in the Linux environment. These editors enable you to easily manipulate text contained in a text file by using simple commands or mouse clicks.
There are times, however, when you'll find yourself wanting to manipulate text in a text file on the fly, without having to pull out a full-fledged interactive text editor. In these situations, it would be useful to have a simple command line editor that could easily format, insert, modify, or delete text elements automatically.
The Linux system provides two common tools for doing just that. This section describes the two most popular command line editors used in the Linux world, sed
and gawk
.
The sed
editor is called a stream editor, as opposed to a normal interactive text editor. In an interactive text editor, such as vim
, you interactively use keyboard commands to insert, delete, or replace text in the data. A stream editor edits a stream of data based on a set of rules you supply ahead of time, before the editor processes the data.
The sed
editor can manipulate data in a data stream based on commands you either enter into the command line or store in a command text file. The sed
editor does these things:
STDOUT
After the stream editor matches all the commands against a line of data, it reads the next line of data and repeats the process. After the stream editor processes all the lines of data in the stream, it terminates.
Because the commands are applied sequentially line by line, the sed
editor makes only one pass through the data stream to make the edits. This makes the sed
editor much faster than an interactive editor and allows you to quickly make changes to data in a file on the fly.
Here's the format for using the sed
command:
sed
options script file
The options
parameters allow you to customize the behavior of the sed
command and include the options shown in Table 19.1.
Table 19.1 The sed Command Options
Option | Description |
-e script |
Adds commands specified in the script to the commands run while processing the input |
-f file |
Adds the commands specified in the file to the commands run while processing the input |
-n |
Doesn't produce output for each command, but waits for the print command |
The script parameter specifies a single command to apply against the stream data. If more than one command is required, you must use either the -e
option to specify them in the command line or the -f
option to specify them in a separate file. Numerous commands are available for manipulating data. We examine some of the basic commands used by the sed
editor in this chapter and then look at some of the more advanced commands in Chapter 21.
By default, the sed
editor applies the specified commands to the STDIN
input stream. This allows you to pipe data directly to the sed
editor for processing. Here's a quick example demonstrating how to do this:
$ echo "This is a test" | sed 's/test/big test/'
This is a big test
$
This example uses the s
command in the sed
editor. The s
command substitutes a second text string for the first text string pattern specified between the forward slashes. In this example, the words big test
were substituted for the word test
.
When you run this example, it should display the results almost instantaneously. That's the power of using the sed
editor. You can make multiple edits to data in about the same time it takes for some of the interactive editors just to start up.
Of course, this simple test just edited one line of data. You should get the same speedy results when editing complete files of data:
$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
$
$ sed 's/dog/cat/' data1.txt
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
The quick brown fox jumps over the lazy cat.
$
The sed
command executes and returns the data almost instantaneously. As it processes each line of data, the results are displayed. You'll start seeing results before the sed
editor completes processing the entire file.
It's important to note that the sed
editor doesn't modify the data in the text file itself. It only sends the modified text to STDOUT
. If you look at the text file, it still contains the original data:
$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
$
To execute more than one command from the sed
command line, just use the -e
option:
$ sed -e 's/brown/green/; s/dog/cat/' data1.txt
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
The quick green fox jumps over the lazy cat.
$
Both commands are applied to each line of data in the file. The commands must be separated with a semicolon, and there shouldn't be any spaces between the end of the command and the semicolon.
Instead of using a semicolon to separate the commands, you can use the secondary prompt in the bash shell. Just enter the first single quotation mark to open the sed
program script (sed
editor command list), and bash continues to prompt you for more commands until you enter the closing quotation mark:
$ sed -e '
> s/brown/green/
> s/fox/elephant/
> s/dog/cat/' data1.txt
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
$
You must remember to finish the command on the same line where the closing single quotation mark appears. After the bash shell detects the closing quotation mark, it processes the command. After it starts, the sed
command applies each command you specified to each line of data in the text file.
Finally, if you have lots of sed
commands you want to process, it is often easier to just store them in a separate file. Use the -f
option to specify the file in the sed
command:
$ cat script1.sed
s/brown/green/
s/fox/elephant/
s/dog/cat/
$
$ sed -f script1.sed data1.txt
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
The quick green elephant jumps over the lazy cat.
$
In this case, you don't put a semicolon after each command. The sed
editor knows that each line contains a separate command. As with entering commands on the command line, the sed
editor reads the commands from the specified file and applies them to each line in the data file.
We'll look at some other sed
editor commands that come in handy for manipulating data in the “Commanding at the sed
Editor Basics” section. Before that, let's quickly look at the other Linux data editor.
Although the sed
editor is a handy tool for modifying text files on the fly, it has its limitations. Often, you need a more advanced tool for manipulating data in a file, one that provides a more programming-like environment allowing you to modify and reorganize data in a file. This is where gawk
comes in.
The gawk
program is the GNU version of the original awk
program in Unix. The gawk
program takes stream editing one step further than the sed
editor by providing a programming language instead of just editor commands. Within the gawk
programming language, you can do the following:
if-then
statements and loops, to add logic to your data processing.The gawk
program's report-generating capabilities are often used for extracting data elements from large bulky text files and formatting them into a readable report. The perfect example of this is formatting log files. Trying to pore through lines of errors in a log file can be difficult. The gawk
program allows you to filter just the data elements you want to view from the log file, and then you can format them in a manner that makes reading the important data easier.
Here's the basic format of the gawk
program:
gawk
options program file
Table 19.2 shows the options available with the gawk
program.
Table 19.2 The gawk Options
Option | Description |
-F fs |
Specifies a file separator for delineating data fields in a line |
-f file |
Specifies a file name to read the program from |
-v var=value |
Defines a variable and default value used in the gawk program |
-mf N |
Specifies the maximum number of fields to process in the data file |
-mr N |
Specifies the maximum record size in the data file |
-W keyword |
Specifies the compatibility mode or warning level for gawk |
The command line options provide an easy way to customize features in the gawk
program. We'll look more closely at these as we explore gawk
.
The power of gawk
is in the program script. You can write scripts to read the data within a text line and then manipulate and display the data to create any type of output report.
A gawk
program script is defined by opening and closing braces. You must place script commands between the two braces ({}
). If you incorrectly use a parenthesis instead of a brace to enclose your gawk script, you get error messages, similar to the following:
$
gawk '(print "Hello World!"}'
gawk: (print "Hello World!"}
gawk: ∧ syntax error
Because the gawk
command line assumes that the script is a single text string, you must also enclose your script in single quotation marks. Here's an example of a simple gawk
program script specified on the command line:
$ gawk '{print "Hello World!"}'
The program script defines a single command, the print
command. The print
command does what it says: It prints text to STDOUT
. If you try running this command, you'll be somewhat disappointed, because nothing happens right away. Because no filename was defined in the command line, the gawk
program retrieves data from STDIN
. When you run the program, it just waits for text to come in via STDIN
.
If you type a line of text and press the Enter key, gawk
runs the text through the program script. Just like the sed
editor, the gawk
program executes the program script on each line of text available in the data stream. Because the program script is set to display a fixed text string, no matter what text you enter in the data stream, you get the same text output:
$ gawk '{print "Hello World!"}'
This is a test
Hello World!
hello
Hello World!
This is another test
Hello World!
To terminate the gawk
program, you must signal that the data stream has ended. The bash shell provides a key combination to generate an End-of-File (EOF) character. The Ctrl+D key combination generates an EOF character in bash. Using that key combination terminates the gawk
program and returns you to a command line interface prompt.
One of the primary features of gawk
is its ability to manipulate data in the text file. It does this by automatically assigning a variable to each data element in a line. By default, gawk
assigns the following variables to each data field it detects in the line of text:
$0
represents the entire line of text.$1
represents the first data field in the line of text.$2
represents the second data field in the line of text.$n
represents the nth data field in the line of text.Each data field is determined in a text line by a field separation character. When gawk
reads a line of text, it delineates each data field using the defined field separation character. The default field separation character in gawk
is any whitespace character (such as the tab or space characters).
Here's an example gawk
program that reads a text file and displays only the first data field value:
$ cat data2.txt
One line of test text.
Two lines of test text.
Three lines of test text.
$
$ gawk '{print $1}' data2.txt
One
Two
Three
$
This program uses the $1
field variable to display only the first data field for each line of text.
If you're reading a file that uses a different field separation character, you can specify it by using the -F
option:
$ gawk -F: '{print $1}' /etc/passwd
root
bin
daemon
adm
lp
sync
shutdown
halt
mail
[...]
This short program displays the first data field in the password file on the system. Because the /etc/passwd
file uses a colon to separate the data fields, if you want to separate each data element, you must specify it as the field separation character in the gawk
options.
A programming language wouldn't be very useful if you could only execute one command. The gawk
programming language allows you to combine commands into a normal program. To use multiple commands in the program script specified on the command line, just place a semicolon between each command:
$ echo "My name is Rich" | gawk '{$4="Christine"; print $0}'
My name is Christine
$
The first command assigns a value to the $4
field variable. The second command then prints the entire data field. Notice from the output that the gawk
program replaced the fourth data field in the original text with the new value.
You can also use the secondary prompt to enter your program script commands one line at a time:
$ gawk '{
> $4="Christine"
> print $0}'
My name is Rich
My name is Christine
$
After you open the single quotation mark, the bash shell provides the secondary prompt to prompt you for more data. You can add your commands one at a time on each line until you enter the closing single quotation mark. Because no filename was defined in the command line, the gawk
program retrieves data from STDIN
. When you run the program, it waits for text to come in via STDIN
. To exit the program, just press the Ctrl+D key combination to signal the end of the data.
As with the sed
editor, the gawk
editor allows you to store your programs in a file and refer to them in the command line:
$ cat script2.gawk
{print $1 "'s home directory is " $6}
$
$ gawk -F: -f script2.gawk /etc/passwd
root's home directory is /root
bin's home directory is /bin
daemon's home directory is /sbin
adm's home directory is /var/adm
lp's home directory is /var/spool/lpd
[...]
Christine's home directory is /home/Christine
Samantha's home directory is /home/Samantha
Timothy's home directory is /home/Timothy
$
The script2.gawk
program script uses the print
command again to print the /etc/passwd
file's home directory data field (field variable $6
) and the userid data field (field variable $1
).
You can specify multiple commands in the program file. To do so, just place each command on a separate line. You don't need to use semicolons:
$ cat script3.gawk
{
text = "'s home directory is "
print $1 text $6
}
$
$ gawk -F: -f script3.gawk /etc/passwd
root's home directory is /root
bin's home directory is /bin
daemon's home directory is /sbin
adm's home directory is /var/adm
lp's home directory is /var/spool/lpd
[...]
Christine's home directory is /home/Christine
Samantha's home directory is /home/Samantha
Timothy's home directory is /home/Timothy
$
The script3.gawk
program script defines a variable to hold a text string used in the print
command. Notice that gawk
programs don't use a dollar sign when referencing a variable's value, as a shell script does.
The gawk
program also allows you to specify when the program script is run. By default, gawk
reads a line of text from the input and then executes the program script on the data in the line of text. Sometimes, you may need to run a script before processing data, such as to create a header section for a report. The BEGIN
keyword is used to accomplish this. It forces gawk
to execute the program script specified after the BEGIN
keyword, before gawk
reads the data:
$ gawk 'BEGIN {print "Hello World!"}'
Hello World!
$
This time the print
command displays the text before reading any data. However, after it displays the text, it quickly exits, without waiting for any data.
The reason for this is that the BEGIN
keyword only applies the specified script before it processes any data. If you want to process data with a normal program script, you must define the program using another script section:
$ cat data3.txt
Line 1
Line 2
Line 3
$
$ gawk 'BEGIN {print ''The data3 File Contents:''}
> {print $0}' data3.txt
The data3 File Contents:
Line 1
Line 2
Line 3
$
Now after gawk
executes the BEGIN
script, it uses the second script to process any file data. Be careful when doing this; both of the scripts are still considered one text string on the gawk
command line. You need to place your single quotation marks accordingly.
Like the BEGIN
keyword, the END
keyword allows you to specify a program script that gawk
executes after reading the data:
$ gawk 'BEGIN {print "The data3 File Contents:"}
> {print $0}
> END {print "End of File"}' data3.txt
The data3 File Contents:
Line 1
Line 2
Line 3
End of File
$
When the gawk
program is finished printing the file contents, it executes the commands in the END
script. This is a great technique to use to add footer data to reports after all the normal data has been processed.
You can put all these elements together into a nice little program script file to create a full report from a simple data file:
$ cat script4.gawk
BEGIN {
print "The latest list of users and shells"
print " UserID \t Shell"
print "-------- \t -------"
FS=":"
}
{
print $1 " \t " $7
}
END {
print "This concludes the listing"
}
$
This script uses the BEGIN
script to create a header section for the report. It also defines a special variable called FS
. This is yet another way to define the field separation character. This way, you don't have to depend on the script's user to define the field separation character in the command line options.
Here's a somewhat truncated output from running this gawk
program script:
$ gawk -f script4.gawk /etc/passwd
The latest list of users and shells
UserID Shell
-------- -------
root /bin/bash
bin /sbin/nologin
daemon /sbin/nologin
[...]
Christine /bin/bash
mysql /bin/bash
Samantha /bin/bash
Timothy /bin/bash
This concludes the listing
$
As expected, the BEGIN
script created the header text, the program script processed the information from the specified data file (the /etc/passwd
file), and the END
script produced the footer text. The ∖t
within the print
command produces some nicely formatted tabbed output.
This gives you a small taste of the power available when you use simple gawk
scripts. Chapter 22 describes some more basic programming principles available for your gawk
scripts, along with some even more advanced programming concepts you can use in your gawk
program scripts to create professional looking reports from even the most cryptic data files.
The key to successfully using the sed
editor is to know its myriad of commands and formats, which help you to customize your text editing. This section describes some of the basic commands and features you can incorporate into your script to start using the sed
editor.
You've already seen how to use the s
command to substitute new text for the text in a line. However, a few additional options are available for the substitute
command that can help make your life easier.
There's a caveat to how the substitute
command replaces matching patterns in the text string. Watch what happens in this example:
$ cat data4.txt
This is a test of the test script.
This is the second test of the test script.
$
$ sed 's/test/trial/' data4.txt
This is a trial of the test script.
This is the second trial of the test script.
$
The substitute
command works fine in replacing text in multiple lines, but by default, it replaces only the first occurrence in each line. To get the substitute
command to work on different occurrences of the text, you must use a substitution flag. The substitution flag is set after the substitution command strings:
s/pattern/replacement/flags
Four types of substitution flags are available:
g
, indicating that new text should be substituted for all occurrences of the existing textp
, indicating that the contents of the original line should be printedw
file, which means to write the results of the substitution to a fileIn the first type of substitution, you can specify which occurrence of the matching pattern the sed
editor should substitute new text for:
$ sed 's/test/trial/2' data4.txt
This is a test of the trial script.
This is the second test of the trial script.
$
As a result of specifying a 2
as the substitution flag, the sed
editor replaces the pattern only in the second occurrence in each line. The g
substitution flag enables you to replace every occurrence of the pattern in the text:
$ sed 's/test/trial/g' data4.txt
This is a trial of the trial script.
This is the second trial of the trial script.
$
The p
substitution flag prints a line that contains a matching pattern in the substitute command. This is most often used in conjunction with the -n sed
option:
$ cat data5.txt
This is a test line.
This is a different line.
$
$ sed -n 's/test/trial/p' data5.txt
This is a trial line.
$
The -n
option suppresses output from the sed
editor. However, the p
substitution flag outputs any line that has been modified. Using the two in combination produces output only for lines that have been modified by the substitute command.
The w
substitution flag produces the same output but stores the output in the specified file:
$ sed 's/test/trial/w test.txt' data5.txt
This is a trial line.
This is a different line.
$
$ cat test.txt
This is a trial line.
$
The normal output of the sed
editor appears in STDOUT
, but only the lines that include the matching pattern are stored in the specified output file.
Sometimes, you run across characters in text strings that aren't easy to use in the substitution pattern. One popular example in the Linux world is the forward slash (/).
Substituting pathnames in a file can get awkward. For example, if you wanted to substitute the C shell for the bash shell in the /etc/passwd
file, you'd have to do this:
$ sed 's/\/bin\/bash/\/bin\/csh/' /etc/passwd
Because the forward slash is used as the string delimiter, you must use a backslash to escape it if it appears in the pattern text. This often leads to confusion and mistakes.
To solve this problem, the sed
editor allows you to select a different character for the string delimiter in the substitute command:
$ sed 's!/bin/bash!/bin/csh!' /etc/passwd
In this example, the exclamation point is used for the string delimiter, making the pathnames much easier to read and understand.
By default, the commands you use in the sed
editor apply to all lines of the text data. If you want to apply a command only to a specific line or a group of lines, you must use line addressing.
There are two forms of line addressing in the sed
editor:
Both forms use the same format for specifying the address:
[address
]command
You can also group more than one command together for a specific address:
address {
command1
command2
command3
}
The sed
editor applies each of the commands you specify only to lines that match the address specified. This section demonstrates using both of these addressing techniques in your sed
editor scripts.
When using numeric line addressing, you reference lines using their line position in the text stream. The sed
editor assigns the first line in the text stream as line number one and continues sequentially for each new line.
The address you specify in the command can be a single line number or a range of lines specified by a starting line number, a comma, and an ending line number. Here's an example of specifying a line number to which the sed
command will be applied:
$ sed '2s/dog/cat/' data1.txt
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
$
The sed
editor modified the text only in line two per the address specified. Here's another example, this time using a range of line addresses:
$ sed '2,3s/dog/cat/' data1.txt
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy dog
$
If you want to apply a command to a group of lines starting at some point within the text, but continuing to the end of the text, you can use the special address, the dollar sign:
$ sed '2,$s/dog/cat/' data1.txt
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy cat
The quick brown fox jumps over the lazy cat
$
Because you may not know how many lines of data are in the text, the dollar sign often comes in handy.
The other method of restricting which lines a command applies to is a bit more complicated. The sed
editor allows you to specify a text pattern that it uses to filter lines for the command. This is the format:
/pattern
/command
You must encapsulate the pattern you specify in forward slashes. The sed
editor applies the command only to lines that contain the text pattern you specify.
For example, if you want to change the default shell for only the user Samantha, you'd use the sed
command:
$ grep Samantha /etc/passwd
Samantha:x:502:502::/home/Samantha:/bin/bash
$
$ sed '/Samantha/s/bash/csh/' /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
[...]
Christine:x:501:501:Christine B:/home/Christine:/bin/bash
Samantha:x:502:502::/home/Samantha:/bin/csh
Timothy:x:503:503::/home/Timothy:/bin/bash
$
The command was applied only to the line with the matching text pattern. Although using a fixed text pattern may be useful for filtering specific values, as in the userid
example, it's somewhat limited in what you can do with it. The sed
editor uses a feature called regular expressions in text patterns to allow you to create patterns that get pretty involved.
Regular expressions allow you to create advanced text pattern–matching formulas to match all sorts of data. These formulas combine a series of wildcard characters, special characters, and fixed text characters to produce a concise pattern that can match just about any text situation. Regular expressions are one of the scarier parts of shell script programming, and Chapter 20 covers them in great detail.
If you need to perform more than one command on an individual line, group the commands together using braces. The sed
editor processes each command listed on the address line(s):
$ sed '2{
> s/fox/elephant/
> s/dog/cat/
> }' data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown elephant jumps over the lazy cat.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
$
Both commands are processed against the address. And of course, you can specify an address range before the grouped commands:
$ sed '3,${
> s/brown/green/
> s/lazy/active/
> }' data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick green fox jumps over the active dog.
The quick green fox jumps over the active dog.
$
The sed
editor applies all the commands to all the lines in the address range.
The text substitution command isn't the only command available in the sed
editor. If you need to delete specific lines of text in a text stream, you can use the delete
command.
The delete
command, d
, pretty much does what it says. It deletes any text lines that match the addressing scheme supplied. Be careful with the delete
command, because if you forget to include an addressing scheme, all the lines are deleted from the stream:
$ cat data1.txt
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
The quick brown fox jumps over the lazy dog
$
$ sed 'd' data1.txt
$
The delete
command is obviously most useful when used in conjunction with a specified address. This allows you to delete specific lines of text from the data stream, either by line number:
$ cat data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
$
$ sed '3d' data6.txt
This is line number 1.
This is line number 2.
This is line number 4.
$
or by a specific range of lines:
$ sed '2,3d' data6.txt
This is line number 1.
This is line number 4.
$
or by using the special end-of-file character:
$ sed '3,$d' data6.txt
This is line number 1.
This is line number 2.
$
The pattern-matching feature of the sed
editor also applies to the delete
command:
$ sed '/number 1/d' data6.txt
This is line number 2.
This is line number 3.
This is line number 4.
$
The sed
editor removes the line containing text that matches the pattern you specify.
You can also delete a range of lines using two text patterns, but be careful if you do this. The first pattern you specify “turns on” the line deletion, and the second pattern “turns off” the line deletion. The sed
editor deletes any lines between the two specified lines (including the specified lines):
$ sed '/1/,/3/d' data6.txt
This is line number 4.
$
In addition, you must be careful because the delete feature “turns on” whenever the sed
editor detects the start pattern in the data stream. This may produce an unexpected result:
$ cat data7.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
This is text you want to keep.
This is the last line in the file.
$
$ sed '/1/,/3/d' data7.txt
This is line number 4.
$
The second occurrence of a line with the number 1 in it triggered the delete
command again, deleting the rest of the lines in the data stream, because the stop pattern wasn't recognized. Of course, the other obvious problem occurs if you specify a stop pattern that never appears in the text:
$ sed '/1/,/5/d' data7.txt
$
Because the delete features “turned on” at the first pattern match, but never found the end pattern match, the entire data stream was deleted.
As you would expect, like any other editor, the sed
editor allows you to insert and append text lines to the data stream. The difference between the two actions can be confusing:
insert
command (i
) adds a new line before the specified line.append
command (a
) adds a new line after the specified line.What is confusing about these two commands is their formats. You can't use these commands on a single command line. You must specify the line to insert or append the line to insert on a separate line by itself. Here's the format for doing this:
sed
'[address
]command\
new line
'
The text in new line
appears in the sed
editor output in the place you specify. Remember that when you use the insert
command, the text appears before the data stream text:
$ echo "Test Line 2" | sed 'i\Test Line 1'
Test Line 1
Test Line 2
$
And when you use the append
command, the text appears after the data stream text:
$ echo "Test Line 2" | sed 'a\Test Line 1'
Test Line 2
Test Line 1
$
When you use the sed
editor from the command line interface prompt, you get the secondary prompt to enter the new line of data. You must complete the sed
editor command on this line. After you enter the ending single quotation mark, the bash shell processes the command:
$ echo "Test Line 2" | sed 'i\
> Test Line 1'
Test Line 1
Test Line 2
$
This works well for adding text before or after the text in the data stream, but what about adding text inside the data stream?
To insert or append data inside the data stream lines, you must use addressing to tell the sed
editor where you want the data to appear. You can specify only a single line address when using these commands. You can match either a numeric line number or a text pattern, but you cannot use a range of addresses. This is logical, because you can only insert or append before or after a single line, and not a range of lines.
Here's an example of inserting a new line before line 3 in the data stream:
$ sed '3i\
> This is an inserted line.' data6.txt
This is line number 1.
This is line number 2.
This is an inserted line.
This is line number 3.
This is line number 4.
$
Here's an example of appending a new line after line 3 in the data stream:
$ sed '3a\
> This is an appended line.' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is an appended line.
This is line number 4.
$
This uses the same process as the insert
command; it just places the new text line after the specified line number. If you have a multiline data stream, and you want to append a new line of text to the end of a data stream, just use the dollar sign, which represents the last line of data:
$ sed '$a\
> This is a new line of text.' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is a new line of text.
$
The same idea applies if you want to add a new line at the beginning of the data stream. Just insert a new line before line number one.
To insert or append more than one line of text, you must use a backslash on each line of new text until you reach the last text line where you want to insert or append text:
$ sed '1i\
> This is one line of new text.\
> This is another line of new text.' data6.txt
This is one line of new text.
This is another line of new text.
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
$
Both of the specified lines are added to the data stream.
The change
command allows you to change the contents of an entire line of text in the data stream. It works the same way as the insert
and append
commands, in that you must specify the new line separately from the rest of the sed
command:
$ sed '3c\
> This is a changed line of text.' data6.txt
This is line number 1.
This is line number 2.
This is a changed line of text.
This is line number 4.
$
In this example, the sed
editor changes the text in line number 3. You can also use a text pattern for the address:
$ sed '/number 3/c\
> This is a changed line of text.' data6.txt
This is line number 1.
This is line number 2.
This is a changed line of text.
This is line number 4.
$
The text pattern change
command changes any line of text in the data stream that it matches.
$ cat data8.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is line number 1 again.
This is yet another line.
This is the last line in the file.
$
$ sed '/number 1/c\
> This is a changed line of text.' data8.txt
This is a changed line of text.
This is line number 2.
This is line number 3.
This is line number 4.
This is a changed line of text.
This is yet another line.
This is the last line in the file.
$
You can use an address range in the change
command, but the results may not be what you expect:
$ sed '2,3c\
> This is a new line of text.' data6.txt
This is line number 1.
This is a new line of text.
This is line number 4.
$
Instead of changing both lines with the text, the sed
editor uses the single line of text to replace both lines.
The transform
command (y
) is the only sed
editor command that operates on a single character. The transform
command uses the format:
[address
]y
/inchars
/outchars
/
The transform
command performs a one-to-one mapping of the inchars
and the outchars
values. The first character in inchars
is converted to the first character in outchars
. The second character in inchars
is converted to the second character in outchars
. This mapping continues throughout the length of the specified characters. If the inchars
and outchars
are not the same length, the sed
editor produces an error message.
Here's a simple example of using the transform
command:
$ sed 'y/123/789/' data8.txt
This is line number 7.
This is line number 8.
This is line number 9.
This is line number 4.
This is line number 7 again.
This is yet another line.
This is the last line in the file.
$
As you can see from the output, each instance of the characters specified in the inchars
pattern has been replaced by the character in the same position in the outchars
pattern.
The transform
command is a global command; that is, it performs the transformation on any character found in the text line automatically, without regard to the occurrence:
$ echo "This 1 is a test of 1 try." | sed 'y/123/456/'
This 4 is a test of 4 try.
$
The sed
editor transformed both instances of the matching character 1 in the text line. You can't limit the transformation to a specific occurrence of the character.
The “Introducing more substitution options” section showed you how to use the p
flag with the substitution command to display lines that the sed
editor changed. In addition, three commands that can be used to print information from the data stream:
p
command to print a text line=
) command to print line numbersl
(lowercase L) command to list a lineThe following sections look at these three printing commands in the sed
editor.
Like the p
flag in the substitution
command, the p
command prints a line in the sed
editor output. On its own, this command doesn't offer much excitement:
$ echo "this is a test" | sed 'p'
this is a test
this is a test
$
All it does is print the data text that you already know is there. The most common use for the print
command is printing lines that contain matching text from a text pattern:
$ cat data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
$
$ sed -n '/number 3/p' data6.txt
This is line number 3.
$
By using the -n
option on the command line, you can suppress all the other lines and print only the line that contains the matching text pattern.
You can also use this as a quick way to print a subset of lines in a data stream:
$ sed -n '2,3p' data6.txt
This is line number 2.
This is line number 3.
$
You can also use the print
command when you need to see a line before it gets altered, such as with the substitution
or change
command. You can create a script that displays the line before it's changed:
$ sed -n '/3/{
> p
> s/line/test/p
> }' data6.txt
This is line number 3.
This is test number 3.
$
This sed
editor command searches for lines that contain the number 3 and executes two commands. First, the script uses the p
command to print the original version of the line; then it uses the s
command to substitute text, along with the p
flag to print the resulting text. The output shows both the original line text and the new line text.
The equal sign
command prints the current line number for the line within the data stream. Line numbers are determined by using the newline character in the data stream. Each time a newline character appears in the data stream, the sed
editor assumes that it terminates a line of text:
$ cat data1.txt
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
The quick brown fox jumps over the lazy dog.
$
$ sed '=' data1.txt
1
The quick brown fox jumps over the lazy dog.
2
The quick brown fox jumps over the lazy dog.
3
The quick brown fox jumps over the lazy dog.
4
The quick brown fox jumps over the lazy dog.
$
The sed
editor prints the line number before the actual line of text. The equal sign
command comes in handy if you're searching for a specific text pattern in the data stream:
$ sed -n '/number 4/{
> =
> p
> }' data6.txt
4
This is line number 4.
$
By using the -n
option, you can have the sed
editor display both the line number and text for the line that contains the matching text pattern.
The list
command (l
) allows you to print both the text and nonprintable characters in a data stream. Any nonprintable characters are shown using either their octal values, preceded by a backslash or the standard C-style nomenclature for common nonprintable characters, such as ∖t
for tab characters:
$ cat data9.txt
This line contains tabs.
$
$ sed -n 'l' data9.txt
This\tline\tcontains\ttabs.$
$
The tab character locations are shown with the ∖t
nomenclature. The dollar sign at the end of the line indicates the newline character. If you have a data stream that contains an escape character, the list
command displays it using the octal code if necessary:
$ cat data10.txt
This line contains an escape character.
$
$ sed -n 'l' data10.txt
This line contains an escape character. \a$
$
The data10.txt
file contains an escape control code, which generates a bell sound. When you use the cat
command to display the text file, you don't see the escape control code; you just hear the sound (if your speakers are turned on). However, using the list
command, you can display the escape control code used.
The substitution
command contains flags that allow you to work with files. There are also regular sed
editor commands that let you do that without having to substitute text.
The w
command is used to write lines to a file. Here's the format for the w
command:
[address
]w filename
The filename
can be specified as either a relative or absolute pathname, but in either case, the person running the sed
editor must have write permissions for the file. The address can be any type of addressing method used in sed
, such as a single line number, a text pattern, or a range of line numbers or text patterns.
Here's an example that prints only the first two lines of a data stream to a text file:
$ sed '1,2w test.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
$
$ cat test.txt
This is line number 1.
This is line number 2.
$
Of course, if you don't want the lines to display on STDOUT
, you can use the -n
option for the sed
command.
This is a great tool to use if you need to create a data file from a master file on the basis of common text values, such as those in a mailing list:
$ cat data11.txt
Blum, R Browncoat
McGuiness, A Alliance
Bresnahan, C Browncoat
Harken, C Alliance
$
$ sed -n '/Browncoat/w Browncoats.txt' data11.txt
$
$ cat Browncoats.txt
Blum, R Browncoat
Bresnahan, C Browncoat
$
The sed
editor writes to a destination file only the data lines that contain the text pattern.
You've already seen how to insert data into and append text to a data stream from the sed
command line. The read
command (r
) allows you to insert data contained in a separate file.
Here's the format of the read
command:
[address
]r filename
The filename
parameter specifies either an absolute or relative pathname for the file that contains the data. You can't use a range of addresses for the read
command. You can only specify a single line number or text pattern address. The sed
editor inserts the text from the file after the address.
$ cat data12.txt
This is an added line.
This is the second added line.
$
$ sed '3r data12.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is an added line.
This is the second added line.
This is line number 4.
$
The sed
editor inserts into the data stream all the text lines in the data file. The same technique works when using a text pattern address:
$ sed '/number 2/r data12.txt' data6.txt
This is line number 1.
This is line number 2.
This is an added line.
This is the second added line.
This is line number 3.
This is line number 4.
$
If you want to add text to the end of a data stream, just use the dollar sign address symbol:
$ sed '$r data12.txt' data6.txt
This is line number 1.
This is line number 2.
This is line number 3.
This is line number 4.
This is an added line.
This is the second added line.
$
A cool application of the read
command is to use it in conjunction with a delete
command to replace a placeholder in a file with data from another file. For example, suppose that you had a form stored in a text file that looked like this:
$ cat notice.std
Would the following people:
LIST
please report to the ship's captain.
$
The form letter uses the generic placeholder LIST
in place of a list of people. To insert the list of people after the placeholder, you just use the read
command. However, this still leaves the placeholder text in the output. To remove that, just use the delete
command. The result looks like this:
$ sed '/LIST/{
> r data11.txt
> d
> }' notice.std
Would the following people:
Blum, R Browncoat
McGuiness, A Alliance
Bresnahan, C Browncoat
Harken, C Alliance
please report to the ship's captain.
$
Now the placeholder text is replaced with the list of names from the data file.
Shell scripts can do lots of work on their own, but it's often difficult to manipulate data with just a shell script. Linux provides two handy utilities to help with handling text data. The sed
editor is a stream editor that quickly processes data on the fly as it reads it. You must provide the sed
editor with a list of editing commands, which it applies to the data.
The gawk
program is a utility from the GNU organization that mimics and expands on the functionality of the Unix awk
program. The gawk
program contains a built-in programming language that you can use to write scripts to handle and process data. You can use the gawk
program to extract data elements from large data files and output them in just about any format you desire. This makes processing large log files a snap, as well as creating custom reports from data files.
A crucial element of using both the sed
and gawk
programs is knowing how to use regular expressions. Regular expressions are key to creating customized filters for extracting and manipulating data in text files. The next chapter dives into the often misunderstood world of regular expressions, showing you how to build regular expressions for manipulating all types of data.