Chapter 11. Introduction to Shell Scripts

image with no caption

If you can enter commands into the shell, you can write shell scripts (also known as Bourne shell scripts). A shell script is a series of commands written in a file; the shell reads the commands from the file just as it would if you typed them into a terminal.

Bourne shell scripts generally start with the following line, which indicates that the /bin/sh program should execute the commands in the script file. (Make sure that no whitespace appears at the beginning of the script file.)

#!/bin/sh

The #! part is called a shebang; you’ll see it in other scripts in this book. You can list any commands that you want the shell to execute following the #!/bin/sh line. For example:

#!/bin/sh
#
# Print something, then run ls

echo About to run the ls command.
ls

After creating a shell script and setting its permissions, you can run it by placing the script file in one of the directories in your command path and then running the script name on the command line. You can also run ./script if the script is located in your current working directory, or you can use the full pathname.

As with any program on Unix systems, you need to set the executable bit for a shell script file, but you must also set the read bit in order for the shell to read the file. The easiest way to do this is as follows:

$ chmod +rx script

This chmod command allows other users to read and execute script. If you don’t want that, use the absolute mode 700 instead (and refer to 2.17 File Modes and Permissions for a refresher on permissions).

With the basics behind us, let’s look at some of the limitations of shell scripts.

The Bourne shell manipulates commands and files with relative ease. In 2.14 Shell Input and Output, you saw the way the shell can redirect output, one of the important elements of shell script programming. However, the shell script is only one tool for Unix programming, and although scripts have considerable power, they also have limitations.

One of the main strengths of shell scripts is that they can simplify and automate tasks that you can otherwise perform at the shell prompt, like manipulating batches of files. But if you’re trying to pick apart strings, perform repeated arithmetic computations, or access complex databases, or if you want functions and complex control structures, you’re better off using a scripting language like Python, Perl, or awk, or perhaps even a compiled language like C. (This is important, so we’ll repeat it throughout the chapter.)

Finally, be aware of your shell script sizes. Keep your shell scripts short. Bourne shell scripts aren’t meant to be big (though you will undoubtedly encounter some monstrosities).

One of the most confusing elements of working with the shell and scripts is when to use quotation marks (or quotes) and other punctuation, and why it’s sometimes necessary to do so. Let’s say you want to print the string $100 and you do the following:

$ echo $100
00

Why did this print 00? Because the shell saw $1, which is a shell variable (we’ll cover it soon). So you might think that if you surround it with double quotes, the shell will leave the $1 alone. But it still doesn’t work:

$ echo "$100"
00

Then you ask a friend, who says that you need to use single quotes instead:

$ echo '$100'
$100

Why did this particular incantation work?

When you use quotes, you’re often trying to create a literal, a string that you want the shell to pass to the command line untouched. In addition to the $ in the example that you just saw, other similar circumstances include when you want to pass a * character to a command such as grep instead of having the shell expand it, and when you need to need to use a semicolon (;) in a command.

When writing scripts and working on the command line, just remember what happens whenever the shell runs a command:

Problems involving literals can be subtle. Let’s say you’re looking for all entries in /etc/passwd that match the regular expression r.*t (that is, a line that contains an r followed by a t later in the line, which would enable you to search for usernames such as root and ruth and robot). You can run this command:

$ grep r.*t /etc/passwd

It works most of the time, but sometimes it mysteriously fails. Why? The answer is probably in your current directory. If that directory contains files with names such as r.input and r.output, then the shell expands r.*t to r.input r.output and creates this command:

$ grep r.input r.output /etc/passwd

The key to avoiding problems like this is to first recognize the characters that can get you in trouble and then apply the correct kind of quotes to protect the characters.

Most shell scripts understand command-line parameters and interact with the commands that they run. To take your scripts from being just a simple list of commands to becoming more flexible shell script programs, you need to know how to use the special Bourne shell variables. These special variables are like any other shell variable as described in 2.8 Environment and Shell Variables, except that you cannot change the values of certain ones.

When a Unix program finishes, it leaves an exit code for the parent process that started the program. The exit code is a number and is sometimes called an error code or exit value. When the exit code is zero (0), it typically means that the program ran without a problem. However, if the program has an error, it usually exits with a number other than 0 (but not always, as you’ll see next).

The shell holds the exit code of the last command in the $? special variable, so you can check it out at your shell prompt:

$ ls / > /dev/null
$ echo $?
0
$ ls /asdfasdf > /dev/null
ls: /asdfasdf: No such file or directory
$ echo $?
1

You can see that the successful command returned 0 and the unsuccessful command returned 1 (assuming, of course, that you don’t have a directory named /asdfasdf on your system).

If you intend to use the exit code of a command, you must use or store the code immediately after running the command. For example, if you run echo $? twice in a row, the output of the second command is always 0 because the first echo command completes successfully.

When writing shell code that aborts a script abnormally, use something like exit 1 to pass an exit code of 1 back to whatever parent process ran the script. (You may want to use different numbers for different conditions.)

One thing to note is that some programs like diff and grep use nonzero exit codes to indicate normal conditions. For example, grep returns 0 if it finds something matching a pattern and 1 if it doesn’t. For these programs, an exit code of 1 is not an error; grep and diff use the exit code 2 for real problems. If you think a program is using a nonzero exit code to indicate success, read its manual page. The exit codes are usually explained in the EXIT VALUE or DIAGNOSTICS section.

The Bourne shell has special constructs for conditionals, such as if/then/ else and case statements. For example, this simple script with an if conditional checks to see whether the script’s first argument is hi:

#!/bin/sh
if [ $1 = hi ]; then
   echo 'The first argument was "hi"'
else
   echo -n 'The first argument was not "hi" -- '
   echo It was '"'$1'"'
fi

The words if, then, else, and fi in the preceding script are shell keywords; everything else is a command. This distinction is extremely important because one of the commands is [ $1 = "hi" ] and the [ character is an actual program on a Unix system, not special shell syntax. (This is actually not quite true, as you’ll soon learn, but treat it as a separate command in your head for now.) All Unix systems have a command called [ that performs tests for shell script conditionals. This program is also known as test and careful examination of [ and test should reveal that they share an inode, or that one is a symbolic link to the other.

Understanding the exit codes in 11.4 Exit Codes is vital, because this is how the whole process works:

  1. The shell runs the command after the if keyword and collects the exit code of that command.

  2. If the exit code is 0, the shell executes the commands that follow the then keyword, stopping when it reaches an else or fi keyword.

  3. If the exit code is not 0 and there is an else clause, the shell runs the commands after the else keyword.

  4. The conditional ends at fi.

You’ve seen how [ works: The exit code is 0 if the test is true and nonzero when the test fails. You also know how to test string equality with [ str1 = str2 ]. However, remember that shell scripts are well suited to operations on entire files because the most useful [ tests involve file properties. For example, the following line checks whether file is a regular file (not a directory or special file):

[ -f file ]

In a script, you might see the -f test in a loop similar to this next one, which tests all of the items in the current working directory (you’ll learn more about loops in general shortly):

for filename in *; do
    if [ -f $filename ]; then
        ls -l $filename
        file $filename
    else
       echo $filename is not a regular file.
    fi
done

You can invert a test by placing the ! operator before the test arguments. For example, [ ! -f file ] returns true if file is not a regular file. Furthermore, the -a and -o flags are the logical “and” and “or” operators (for example, [ -f file1 -a file2 ]).

There are dozens of test operations, all of which fall into three general categories: file tests, string tests, and arithmetic tests. The info manual contains complete online documentation, but the test(1) manual page is a fast reference. The following sections outline the main tests. (I’ve omitted some of the less common ones.)

There are two kinds of loops in the Bourne shell: for and while loops.

The Bourne shell can redirect a command’s standard output back to the shell’s own command line. That is, you can use a command’s output as an argument to another command, or you can store the command output in a shell variable by enclosing a command in $().

This example stores a command inside the FLAGS variable. The bold in the second line shows the command substitution.

#!/bin/sh
FLAGS=$(grep ^flags /proc/cpuinfo | sed 's/.*://' | head -1)
echo Your processor supports:
for f in $FLAGS; do
    case $f in
        fpu)   MSG="floating point unit"
               ;;
        3dnow) MSG="3DNOW graphics extensions"
               ;;
        mtrr)  MSG="memory type range register"
               ;;
        *)     MSG="unknown"
               ;;
    esac
    echo $f: $MSG
done

This example is somewhat complicated because it demonstrates that you can use both single quotes and pipelines inside the command substitution. The result of the grep command is sent to the sed command (more about sed in 11.10.3 sed), which removes anything matching the expression .*:, and the result of sed is passed to head.

It’s easy to go overboard with command substitution. For example, don’t use $(ls) in a script because using the shell to expand * is faster. Also, if you want to invoke a command on several filenames that you get as a result of a find command, consider using a pipeline to xargs rather than command substitution, or use the -exec option (see 11.10.4 xargs).

It’s sometimes necessary to create a temporary file to collect output for use by a later command. When making such a file, make sure that the filename is distinct enough that no other programs will accidentally write to it.

Here’s how to use the mktemp command to create temporary filenames. This script shows you the device interrupts that have occurred in the last two seconds:

#!/bin/sh
TMPFILE1=$(mktemp /tmp/im1.XXXXXX)
TMPFILE2=$(mktemp /tmp/im2.XXXXXX)

cat /proc/interrupts > $TMPFILE1
sleep 2
cat /proc/interrupts > $TMPFILE2
diff $TMPFILE1 $TMPFILE2
rm -f $TMPFILE1 $TMPFILE2

The argument to mktemp is a template. The mktemp command converts the XXXXXX to a unique set of characters and creates an empty file with that name. Notice that this script uses variable names to store the filenames so that you only have to change one line if you want to change a filename.

Another problem with scripts that employ temporary files is that if the script is aborted, the temporary files could be left behind. In the preceding example, pressing CTRL-C before the second cat command leaves a temporary file in /tmp. Avoid this if possible. Instead, use the trap command to create a signal handler to catch the signal that CTRL-C generates and remove the temporary files, as in this handler:

#!/bin/sh
TMPFILE1=$(mktemp /tmp/im1.XXXXXX)
TMPFILE2=$(mktemp /tmp/im2.XXXXXX)
trap "rm -f $TMPFILE1 $TMPFILE2; exit 1" INT
--snip--

You must use exit in the handler to explicitly end script execution, or the shell will continue running as usual after running the signal handler.

Say you want to print a large section of text or feed a lot of text to another command. Rather than use several echo commands, you can use the shell’s here document feature, as shown in the following script:

#!/bin/sh
DATE=$(date)
cat <<EOF
Date: $DATE

The output above is from the Unix date command.
It's not a very interesting command.
EOF

The items in bold control the here document. The <<EOF tells the shell to redirect all lines that follow the standard input of the command that precedes <<EOF, which in this case is cat. The redirection stops as soon as the EOF marker occurs on a line by itself. The marker can actually be any string, but remember to use the same marker at the beginning and end of the here document. Also, convention dictates that the marker be in all uppercase letters.

Notice the shell variable $DATE in the here document. The shell expands shell variables inside here documents, which is especially useful when you’re printing out reports that contain many variables.

Several programs are particularly useful in shell scripts. Certain utilities such as basename are really only practical when used with other programs, and therefore don’t often find a place outside shell scripts. However, others such as awk can be quite useful on the command line, too.

The sed program (sed stands for stream editor) is an automatic text editor that takes an input stream (a file or the standard input), alters it according to some expression, and prints the results to standard output. In many respects, sed is like ed, the original Unix text editor. It has dozens of operations, matching tools, and addressing capabilities. As with awk, entire books have been written about sed including a quick reference covering both, sed & awk Pocket Reference, 2nd edition, by Arnold Robbins (O’Reilly, 2002).

Although sed is a big program, and an in-depth analysis is beyond the scope of this book, it’s easy to see how it works. In general, sed takes an address and an operation as one argument. The address is a set of lines, and the command determines what to do with the lines.

A very common task for sed is to substitute some text for a regular expression (see 2.5.1 grep), like this:

$ sed 's/exp/text/'

So if you wanted to replace the first colon in /etc/passwd with a % and send the result to the standard output, you’d do it like this:

$ sed 's/:/%/' /etc/passwd

To substitute all colons in /etc/passwd, add a g modifier to the end of the operation, like this:

$ sed 's/:/%/g' /etc/passwd

Here’s a command that operates on a per-line basis; it reads /etc/passwd and deletes lines three through six and sends the result to the standard output:

$ sed 3,6d /etc/passwd

In this example, 3,6 is the address (a range of lines), and d is the operation (delete). If you omit the address, sed operates on all lines in its input stream. The two most common sed operations are probably s (search and replace) and d.

You can also use a regular expression as the address. This command deletes any line that matches the regular expression exp:

$ sed '/exp/d'

When you have to run one command on a huge number of files, the command or shell may respond that it can’t fit all of the arguments in its buffer. Use xargs to get around this problem by running a command on each filename in its standard input stream.

Many people use xargs with the find command. For example, the script below can help you verify that every file in the current directory tree that ends with .gif is actually a GIF (Graphic Interchange Format) image:

$ find . -name '*.gif' -print | xargs file

In the example above, xargs runs the file command. However, this invocation can cause errors or leave your system open to security problems, because filenames can include spaces and newlines. When writing a script, use the following form instead, which changes the find output separator and the xargs argument delimiter from a newline to a NULL character:

$ find . -name '*.gif' -print0 | xargs -0 file

xargs starts a lot of processes, so don’t expect great performance if you have a large list of files.

You may need to add two dashes (--) to the end of your xargs command if there is a chance that any of the target files start with a single dash (-). The double dash (--) can be used to tell a program that any arguments that follow the double dash are filenames, not options. However, keep in mind that not all programs support the use of a double dash.

There’s an alternative to xargs when using find: the -exec option. However, the syntax is somewhat tricky because you need to supply a {} to substitute the filename and a literal ; to indicate the end of the command. Here’s how to perform the preceding task using only find:

$ find . -name '*.gif' -exec file {} \;

The exec command is a built-in shell feature that replaces the current shell process with the program you name after exec. It carries out the exec() system call that you learned about in Chapter 1. This feature is designed for saving system resources, but remember that there’s no return; when you run exec in a shell script, the script and shell running the script are gone, replaced by the new command.

To test this in a shell window, try running exec cat. After you press CTRL-D or CTRL-C to terminate the cat program, your window should disappear because its child process no longer exists.

Say you need to alter the environment in a shell slightly but don’t want a permanent change. You can change and restore a part of the environment (such as the path or working directory) using shell variables, but that’s a clumsy way to go about things. The easy way around these kinds of problems is to use a subshell, an entirely new shell process that you can create just to run a command or two. The new shell has a copy of the original shell’s environment, and when the new shell exits, any changes you made to its shell environment disappear, leaving the initial shell to run as normal.

To use a subshell, put the commands to be executed by the subshell in parentheses. For example, the following line executes the command uglyprogram in uglydir and leaves the original shell intact:

$ (cd uglydir; uglyprogram)

This example shows how to add a component to the path that might cause problems as a permanent change:

$ (PATH=/usr/confusing:$PATH; uglyprogram)

Using a subshell to make a single-use alteration to an environment variable is such a common task that there is even a built-in syntax that avoids the subshell:

$ PATH=/usr/confusing:$PATH uglyprogram

Pipes and background processes work with subshells, too. The following example uses tar to archive the entire directory tree within orig and then unpacks the archive into the new directory target, which effectively duplicates the files and folders in orig (this is useful because it preserves ownership and permissions, and it’s generally faster than using a command such as cp -r):

$ tar cf - orig | (cd target; tar xvf -)

If you need to include another file in your shell script, use the dot (.) operator. For example, this runs the commands in the file config.sh:

. config.sh

This “include” file syntax does not start a subshell, and it can be useful for a group of scripts that need to use a single configuration file.

The read command reads a line of text from the standard input and stores the text in a variable. For example, the following command stores the input in $var:

$ read var

This is a built-in shell command that can be useful in conjunction with other shell features not mentioned in this book.

The shell is so feature-rich that it’s difficult to condense its important elements into a single chapter. If you’re interested in what else the shell can do, have a look at some of the books on shell programming, such as Unix Shell Programming, 3rd edition, by Stephen G. Kochan and Patrick Wood (SAMS Publishing, 2003), or the shell script discussion in The UNIX Programming Environment by Bran W. Kernighan and Rob Pike (Prentice Hall, 1984).

However, at a certain point (especially when you start using the read built-in), you have to ask yourself if you’re still using the right tool for the job. Remember what shell scripts do best: manipulate simple files and commands. As stated earlier, if you find yourself writing something that looks convoluted, especially if it involves complicated string or arithmetic operations, you should probably look to a scripting language like Python, Perl, or awk.