Chapter 9. Debugging Shell Programs

We hope that we have convinced you that the Korn shell can be used as a serious Unix programming environment. It certainly has plenty of features, control structures, etc. But another essential part of a programming environment is a set of powerful, integrated support tools. For example, there is a wide assortment of screen editors, compilers, debuggers, profilers, cross-referencers, etc., for languages like C, C++ and Java. If you program in one of these languages, you probably take such tools for granted, and you would undoubtedly cringe at the thought of having to develop code with, say, the ed editor and the adb machine-language debugger.

But what about programming support tools for the Korn shell? Of course, you can use any editor you like, including vi and Emacs. And because the shell is an interpreted language, you don’t need a compiler.^[113] But there are no other tools available. The most serious problem is the lack of a debugger.

This chapter addresses that lack. The shell does have a few features that help in debugging shell scripts; we’ll see these in the first part of the chapter. The Korn shell also has a couple of new features, not present in most Bourne shells, that make it possible to implement a full-blown debugging tool. We show these features; more importantly, we present kshdb, a Korn shell debugger that uses them. kshdb is basic yet quite usable, and its implementation serves as an extended example of various shell programming techniques from throughout this book.

Basic Debugging Aids

What sort of functionality do you need to debug a program? At the most empirical level, you need a way of determining what is causing your program to behave badly and where the problem is in the code. You usually start with an obvious what (such as an error message, inappropriate output, infinite loop, etc.), try to work backwards until you find a what that is closer to the actual problem (e.g., a variable with a bad value, a bad option to a command), and eventually arrive at the exact where in your program. Then you can worry about how to fix it.

Notice that these steps represent a process of starting with obvious information and ending up with often obscure facts gleaned through deduction and intuition. Debugging aids make it easier to deduce and intuit by providing relevant information easily or even automatically, preferably without modifying your code.

The simplest debugging aid (for any language) is the output statement, print in the shell’s case. Indeed, old-time programmers debugged their Fortran code by inserting WRITE cards into their decks. You can debug by putting lots of print statements in your code (and removing them later), but you will have to spend lots of time narrowing down not only what exact information you want but also where you need to see it. You will also probably have to wade through lots and lots of output to find the information that you really want.

Set Options

Luckily, the shell has a few basic features that give you debugging functionality beyond that of print. The most basic of these are options to the set -o command (as covered in Chapter 3). These options can also be used on the command line when running a script, as Table 9-1 shows.

The verbose option simply echoes (to standard error) whatever input the shell gets. It is useful for finding the exact point at which a script is bombing. For example, assume your script looks like this:

fred
bob
dave
pete
ed
ralph

Table 9-1. Debugging options

set -o option	Command-line option	Action
noexec	-n	Don’t run commands; check for syntax errors only
verbose	-v	Echo commands before running them
xtrace	-x	Echo commands after command-line processing

None of these commands are standard Unix programs, and they all do their work silently. Say the script crashes with a cryptic message like “segmentation violation.” This tells you nothing about which command caused the error. If you type ksh -v scriptname, you might see this:

fred
bob
dave
segmentation violation
pete
ed
ralph

Now you know that dave is the probable culprit — though it is also possible that dave bombed because of something it expected fred or bob to do (e.g., create an input file) that they did incorrectly.

The xtrace option is more powerful: it echoes each command and its arguments, after the command has been through parameter substitution, command substitution, and the other steps of command-line processing (as listed in Chapter 7). If necessary, the output is quoted in such as a way as to allow it to be reused later as input to the shell.

Here is an example:

$ set -o xtrace
$ fred=bob
+ fred=bob
$ print "$fred"
+ print bob
bob
$ ls -l $(whence emacs)
+ whence emacs
+ ls -l /usr/bin/emacs
-rwxr-xr-x    2 root     root      3471896 Mar 16 20:17 /usr/bin/emacs
$

As you can see, xtrace starts each line it prints with +. This is actually customizable: it’s the value of the built-in shell variable PS4.^[114] If you set PS4 to "xtrace-> " (e.g., in your .profile or environment file), you’ll get xtrace listings that look like this:

$ ls -l $(whence emacs)
xtrace-> whence emacs
xtrace-> ls -l /usr/bin/emacs
-rwxr-xr-x    2 root     root      3471896 Mar 16 20:17 /usr/bin/emacs
$

An even better way of customizing PS4 is to use a built-in variable we haven’t seen yet: LINENO, which holds the number of the currently running line in a shell script. Put this line in your .profile or environment file:

PS4='line $LINENO: '

We use the same technique as we did with PS1 in Chapter 3: using single quotes to postpone the evaluation of the string until each time the shell prints the prompt. This prints messages of the form line N : in your trace output. You could even include the name of the shell script you’re debugging in this prompt by using the positional parameter $0:

PS4='$0 line $LINENO: '

As another example, say you are trying to track down a bug in a script called fred that contains this code:

dbfmq=$1.fmq
...
fndrs=$(cut -f3 -d' ' $dfbmq)

You type fred bob to run it in the normal way, and it hangs. Then you type ksh -x fred bob, and you see this:

+ dbfmq=bob.fmq
...
+ + cut -f3 -d

It hangs again at this point. You notice that cut doesn’t have a filename argument, which means that there must be something wrong with the variable dbfmq. But it has executed the assignment statement dbfmq=bob.fmq properly... ah-hah! You made a typo in the variable name inside the command substitution construct.^[115] You fix it, and the script works properly.

When set at the global level, the xtrace option applies to the main script and to any POSIX-style functions (those created with the name () syntax). If the code you are trying to debug calls function-style functions that are defined elsewhere (e.g., in your .profile or environment file), you can trace through these in the same way with an option to the typeset command. Just enter the command typeset -ft functname, and the named function will be traced whenever it runs. Type typeset +ft functname to turn tracing off. You can also put set -o xtrace into the function body itself, which is good when the function is within the script being debugged.

The last option is noexec, which reads in the shell script and checks for syntax errors but doesn’t execute anything. It’s worth using if your script is syntactically complex (lots of loops, code blocks, string operators, etc.) and the bug has side effects (like creating a large file or hanging up the system).

You can turn on these options with set -o in your shell scripts, and, as explained in Chapter 3, turn them off with set +o option. For example, if you’re debugging a script with a nasty side effect, and you have localized it to a certain chunk of code, you can precede that chunk with set -o xtrace (and, perhaps, close it with set +o xtrace) to watch it in more detail.

Note

The noexec option is a “one-way” option. Once turned on, you can’t turn it off again! That’s because the shell only prints commands and doesn’t execute them. This includes the set +o noexec command you’d want to use to turn the option off. Fortunately, this only applies to shell scripts; the shell ignores this option when it’s interactive.

Fake Signals

A more sophisticated set of debugging aids is the shell’s “fake debugging signals,” which can be used in trap statements to get the shell to act under certain conditions. Recall from the previous chapter that trap allows you to install some code that runs when a particular signal is sent to your script.

Fake signals act like real ones, but they are generated by the shell (as opposed to real signals, which the underlying operating system generates). They represent runtime events that are likely to be interesting to debuggers — both human ones and software tools — and can be treated just like real signals within shell scripts. The four fake signals and their meanings are listed in Table 9-2.

Table 9-2. Fake signals

Fake signal	When sent
EXIT	The shell exits from a function or script
ERR	A command returns a non-zero exit status
DEBUG	Before every statement (after in ksh88)
KEYBD	When reading characters in the editing modes (not for debugging)

The KEYBD signal is not used for debugging. It is an advanced feature, for which we delay discussion until Chapter 10.

EXIT

The EXIT trap, when set, runs its code when the function or script within which it was set exits. Here’s a simple example:

function func {
    trap 'print "exiting from the function"' EXIT
    print 'start of the function'
}

trap 'print "exiting from the script"' EXIT
print 'start of the script'
func

If you run this script, you see this output:

start of the script
start of the function
exiting from the function
exiting from the script

In other words, the script starts by setting the trap for its own exit. Then it prints a message and finally calls the function. The function does the same — sets a trap for its exit and prints a message. (Remember that function-style functions can have their own local traps that supersede any traps set by the surrounding script, while POSIX functions share traps with the main script.)

The function then exits, which causes the shell to send it the fake signal EXIT, which in turn runs the code print "exiting from the function". Then the script exits, and its own EXIT trap code is run. Note also that traps “stack;” the EXIT fake signal is sent to each running function in turn as each more recently called function exits.

An EXIT trap occurs no matter how the script or function exits, whether normally (by finishing the last statement), by an explicit exit or return statement, or by receiving a “real” signal such as INT or TERM. Consider the following inane number-guessing program:

trap 'print "Thank you for playing!"' EXIT

magicnum=$(($RANDOM%10+1))
print 'Guess a number between 1 and 10:'
while read guess'?number> '; do
    sleep 10
    if (( $guess == $magicnum )); then
        print 'Right!'
        exit
    fi
    print 'Wrong!'
done

This program picks a number between 1 and 10 by getting a random number (via the built-in variable RANDOM, see Appendix B), extracting the last digit (the remainder when divided by 10), and adding 1. Then it prompts you for a guess, and after 10 seconds, it tells you if you guessed right.

If you did, the program exits with the message, “Thank you for playing!”, i.e., it runs the EXIT trap code. If you were wrong, it prompts you again and repeats the process until you get it right. If you get bored with this little game and hit CTRL-C while waiting for it to tell you whether you were right, you also see the message.

ERR

The fake signal ERR enables you to run code whenever a command in the surrounding script or function exits with non-zero status. Trap code for ERR can take advantage of the built-in variable ?, which holds the exit status of the previous command. It survives the trap and is accessible at the beginning of the trap-handling code.

A simple but effective use of this is to put the following code into a script you want to debug:

function errtrap {
    typeset es=$?
    print "ERROR: Command exited with status $es."
}

trap errtrap ERR

The first line saves the nonzero exit status in the local variable es.

For example, if the shell can’t find a command, it returns status 1. If you put the code in a script with a line of gibberish (like “lskdjfafd”), the shell responds with:

                     scriptname: line N: lskdjfafd:  not found
ERROR: command exited with status 1.

N is the number of the line in the script that contains the bad command. In this case, the shell prints the line number as part of its own error-reporting mechanism, since the error was a command that the shell could not find. But if the nonzero exit status comes from another program, the shell doesn’t report the line number. For example:

function errtrap {
    typeset es=$?
    print "ERROR: Command exited with status $es."
}

trap errtrap ERR

function bad {
    return 17
}

bad

This only prints ERROR: Command exited with status 17.

It would obviously be an improvement to include the line number in this error message. The built-in variable LINENO exists, but if you use it inside a function, it evaluates to the line number in the function, not in the overall file. In other words, if you used $LINENO in the print statement in the errtrap routine, it would always evaluate to 2.

To get around this problem, we simply pass $LINENO as an argument to the trap handler, surrounding it in single quotes so that it doesn’t get evaluated until the fake signal actually comes in:

function errtrap {
    typeset es=$?
    print "ERROR line $1: Command exited with status $es."
}
trap 'errtrap $LINENO' ERR
...

If you use this with the above example, the result is the message, ERROR line 12: Command exited with status 17. This is much more useful. We’ll see a variation on this technique shortly.

This simple code is actually not a bad all-purpose debugging mechanism. It takes into account that a nonzero exit status does not necessarily indicate an undesirable condition or event: remember that every control construct with a conditional (if, while, etc.) uses a nonzero exit status to mean “false.” Accordingly, the shell doesn’t generate ERR traps when statements or expressions in the “condition” parts of control structures produce nonzero exit statuses.

But a disadvantage is that exit statuses are not as uniform (or even as meaningful) as they should be, as we explained in Chapter 5. A particular exit status need not say anything about the nature of the error or even that there was an error.

DEBUG

The final debugging-related fake signal, DEBUG, causes the trap code to be run before every statement in the surrounding function or script.^[116] This has two possible uses. First is the use for humans, as a sort of a “brute force” method of tracking a certain element of a program’s state that you notice is going awry.

For example, you notice that the value of a particular variable is running amok. The naive approach would be to put in lots of print statements to check the variable’s value at several points. The DEBUG trap makes this easier:

function dbgtrap {
    print "badvar is $badvar"
}

trap dbgtrap DEBUG

... Section of code in which problem occurs ...

trap - DEBUG            # turn off DEBUG trap

This code prints the value of the wayward variable before every statement between the two traps.

The second and far more important use of the DEBUG trap is as a primitive for implementing Korn shell debuggers. In fact, it would be fair to say that the DEBUG trap reduces the task of implementing a useful shell debugger from a large-scale software development project to a manageable exercise. We will get to this shortly.

Signal delivery order

It is possible for multiple signals to arrive simultaneously (or close to it). In that case, the shell runs the trap commands in the following order:

DEBUG
ERR
Real Unix signals, in order of signal number
EXIT

Discipline Functions

In Chapter 4, we introduced the Korn shell’s compound variable notation, such as ${person.name}. Using this notation, ksh93 provides special functions, called discipline functions, that give you control over variables when they are referenced, assigned to, and unset. Simple versions of such functions might look like this:

dave=dave                       Create the variable
function dave.set {             Called when dave is assigned to
    print "dave just got assigned '${.sh.value}'"
}

function dave.get {             Called when $dave retrieved
    print "dave's value referenced, it's '$dave'"    # this is safe

    .sh.value="dave was here"   Change what $dave returns, dave not changed
}

function dave.unset {           Called when dave is unset
    print "goodbye dave!"
    unset dave   # actually make dave go away
}

Note

The unset discipline function must actually use the unset command to unset the variable — this does not cause an infinite loop. Otherwise, the variable won’t be unset, which in turn leads to very surprising behavior.

Here is what happens once all of these functions are in place:

$ print $dave
dave's value referenced, it's 'dave'                    From dave.get
dave was here                                           From print
$ dave='who is this dave guy, anyway?'
dave just got assigned 'who is this dave guy, anyway?'  From dave.set
$ unset dave
goodbye dave!                                           From dave.unset
$ print $dave

$

Discipline functions may only be applied to global variables. They may not be used with local variables — those you create with typeset inside a function-style function.

Table 9-3 summarizes the built-in discipline functions.

Table 9-3. Predefined discipline functions

Name	Purpose
`variable` `.get`	Called when a variable’s value is retrieved. Assigning to `.sh.value` changes the value returned but not the variable itself.
`variable` `.set`	Called when a variable is assigned to. `${.sh.value}` is the new value being assigned. Assigning to `.sh.value` changes the value being assigned.
`variable` `.unset`	Called when a variable is unset. This function must use unset on the variable to actually unset it.

As we’ve just seen, within the discipline functions, there are two special variables that the shell sets which give you information, as well as one variable that you can set to change how the shell behaves. Table 9-4 describes these variables and what they do.

Table 9-4. Special variables for use in discipline functions

Variable	Purpose
`.sh.name`	The name of the variable for which the discipline function is being run.
`.sh.subscript`	The current subscript for an array variable. (The discipline functions apply to the entire array, not each subscripted element.)
`.sh.value`	The new value being assigned in a set discipline function. If assigned to in a get discipline function, changes the value returned.

At first glance, it’s not clear what the value of discipline functions is. But they’re perfect for implementing a very useful debugger feature, called watchpoints. We’re now ready to get down to writing our shell script debugger.

A Korn Shell Debugger

Commercially available debuggers give you much more functionality than the shell’s set options and fake signals. The most advanced have fabulous graphical user interfaces, incremental compilers, symbolic evaluators, and other such amenities. But just about all modern debuggers — even the more modest ones — have features that enable you to “peek” into a program while it’s running, to examine it in detail and in terms of its source language. Specifically, most debuggers let you do these things:

Specify points at which the program stops execution and enters the debugger. These are called breakpoints.
Execute only a bit of the program at a time, usually measured in source code statements. This ability is often called stepping.
Examine and possibly change the state of the program (e.g., values of variables) in the middle of a run, i.e., when stopped at a breakpoint or after stepping.
Specify variables whose values should be printed when they are changed or accessed. These are often called watchpoints.
Do all of the above without having to change the source code.

Our debugger, called kshdb, has these features and a few more. Although it’s a basic tool, without too many bells and whistles, it is not a toy. This book’s web site, http://www.oreilly.com/catalog/korn2/, has a link for a downloadable copy of all the book’s example programs, including kshdb. If you don’t have access to the Internet, you can type or scan the code in. Either way, you can use kshdb to debug your own shell scripts, and you should feel free to enhance it. This is version 2.0 of the debugger. It includes some changes suggested to us by Steve Alston, and the watchpoints feature is brand new. We’ll suggest some enhancements at the end of this chapter.

Structure of the Debugger

The code for kshdb has several features worth explaining in some detail. The most important is the basic principle on which it works: it turns a shell script into a debugger for itself, by prepending debugger functionality to it; then it runs the new script.

The driver script

Therefore the code has two parts: the part that implements the debugger’s functionality, and the part that installs that functionality into the script being debugged. The second part, which we’ll see first, is the script called kshdb. It’s very simple:

# kshdb -- Korn Shell debugger
# Main driver: constructs full script (with preamble) and runs it

print "Korn Shell Debugger version 2.0 for ksh '${.sh.version}'" >&2
_guineapig=$1
if [[ ! -r $1 ]]; then      # file not found or readable
    print "Cannot read $_guineapig." >&2
    exit 1
fi
shift

_tmpdir=/tmp
_libdir=.                   # set to real directory upon installation
_dbgfile=$_tmpdir/kshdb$$   # temp file for script being debugged (copy)
cat $_libdir/kshdb.pre $_guineapig > $_dbgfile
exec ksh $_dbgfile $_guineapig $_tmpdir $_libdir "$@"

kshdb takes as argument the name of the script being debugged, which, for the sake of brevity, we’ll call the guinea pig. Any additional arguments are passed to the guinea pig as its positional parameters. Notice that ${.sh.version} indicates the version of the Korn shell for the startup message.

If the argument is invalid (the file isn’t readable), kshdb exits with an error status. Otherwise, after an introductory message, it constructs a temporary filename like we saw in Chapter 8. If you don’t have (or don’t have access to) /tmp on your system, you can substitute a different directory for _tmpdir.^[117] Also, make sure that _libdir is set to the directory where the kshdb.pre and kshdb.fns files (which we’ll see soon) reside. /usr/share/lib is a good choice if you have access to it.

The cat statement builds the temp file: it consists of a file that we’ll see soon called kshdb.pre, which contains the actual debugger code, followed immediately by a copy of the guinea pig. Therefore the temp file contains a shell script that has been turned into a debugger for itself.

exec

The last line runs this script with exec, a statement that we haven’t seen yet. We’ve chosen to wait until now to introduce it because — as we think you’ll agree — it can be dangerous. exec takes its arguments as a command line and runs the command in place of the current program, in the same process. In other words, the shell running the above script will terminate immediately and be replaced by exec’s arguments. The situations in which you would want to use exec are few, far between, and quite arcane — though this is one of them.

In this case, exec just runs the newly constructed shell script, i.e., the guinea pig with its debugger, in another Korn shell. It passes the new script three arguments — the names of the original guinea pig ($_guineapig), the temp directory ($_tmpdir), and the directory where kshdb.pre and kshdb.fns are kept — followed by the user’s positional parameters, if any.

exec can also be used with just an I/O redirector; this causes the redirector to take effect for the remainder of the script or login session. For example, the line exec 2>errlog at the top of a script directs the shell’s own standard error to the file errlog for the entire script. This can also be used to move the input or output of a coprocess to a regular numbered file descriptor. For example, exec 5<&p moves the coprocess’s output (which is input to the shell) to file descriptor 5. Similarly, exec 6>&p moves the coprocess’s input (which is output from the shell) to file descriptor 6. The predefined alias redirect='command exec' is more mnemonic.

The Preamble

Now we’ll see the code that gets prepended to the script being debugged; we call this the preamble. It’s kept in the following file, kshdb.pre, which is also fairly simple:

# kshdb preamble for kshdb version 2.0
# prepended to shell script being debugged
# arguments:
# $1 = name of original guinea-pig script
# $2 = directory where temp files are stored
# $3 = directory where kshdb.pre and kshdb.fns are stored

_dbgfile=$0
_guineapig=$1
_tmpdir=$2
_libdir=$3
shift 3                         # move user's args into place

. $_libdir/kshdb.fns            # read in the debugging functions
_linebp=
_stringbp=
let _trace=0                    # initialize execution trace to off

typeset -A _lines
let _i=1                        # read guinea-pig file into lines array
while read -r _lines[$_i]; do
    let _i=$_i+1
done < $_guineapig

trap _cleanup EXIT              # erase files before exiting
let _steps=1                    # no. of stmts to run after trap is set
LINENO=0
trap '_steptrap $LINENO' DEBUG

The first few lines save the three fixed arguments in variables and shift them out of the way, so that the positional parameters (if any) are those that the user supplied on the command line as arguments to the guinea pig. Then the preamble reads in another file, kshdb.fns, that contains the meat of the debugger as function definitions. We put this code in a separate file to minimize the size of the temp file. We’ll examine kshdb.fns shortly.

Next, kshdb.pre initializes the two breakpoint lists to empty and execution tracing to off (see below), then reads the guinea pig into an array of lines. We do the latter so that the debugger can access lines in the script when performing certain checks, and so that the execution trace feature can print lines of code as they execute. We use an associative array to hold the shell script source, to avoid the built-in (if large) limit of 4096 elements for indexed arrays. (Admittedly our use is a bit unusual; we use line numbers as indices, but as far as the shell is concerned, these are just strings that happen to contain nothing but digits.)

The real fun begins in the last group of code lines, where we set up the debugger to start working. We use two trap commands with fake signals. The first sets up a cleanup routine (which just erases the temporary file) to be called on EXIT, i.e., when the script terminates for any reason. The second, and more important, sets up the function _steptrap to be called before every statement.

_steptrap gets an argument that evaluates to the number of the line in the guinea pig that was just executed. We use the same technique with the built-in variable LINENO that we saw earlier in the chapter, but with an added twist: if you assign a value to LINENO, it uses that as the next line number and increments from there. The statement LINENO=0 re-starts line numbering so that the first line in the guinea pig is line 1.

After the DEBUG trap is set, the preamble ends. The DEBUG trap executes before the next statement, which is the first statement of the guinea pig. The shell thus enters _steptrap for the first time. The variable _steps is set up so that _steptrap executes its last elif clause, as you’ll see shortly, and enters the debugger. As a result, execution halts just before the first statement of the guinea pig is run, and the user sees a kshdb> prompt; the debugger is now in full operation.

Debugger Functions

The function _steptrap is the entry point into the debugger; it is defined in the file kshdb.fns, listed in its entirety at the end of this chapter. Here is _steptrap:

# Here before each statement in script being debugged.
# Handle single-step and breakpoints.
function _steptrap {
    _curline=$1                       # arg is no. of line that just ran
    (( $_trace )) && _msg "$PS4 line $_curline: ${_lines[$_curline]}"
    if (( $_steps >= 0 )); then       # if in step mode
        let _steps="$_steps - 1"      # decrement counter
    fi

    # first check: if line num breakpoint reached
    if _at_linenumbp; then
        _msg "Reached line breakpoint at line $_curline"
        _cmdloop                      # breakpoint, enter debugger

    # second check: if string breakpoint reached
    elif _at_stringbp; then
        _msg "Reached string breakpoint at line $_curline"
        _cmdloop                      # breakpoint, enter debugger

    # if neither, check whether break condition exists and is true
    elif [[ -n $_brcond ]] && eval $_brcond; then
        _msg "Break condition '$_brcond' true at line $_curline"
        _cmdloop                      # break condition, enter debugger

    # finally, check if step mode and number of steps is up
    elif (( _steps == 0 )); then      # if step mode and time to stop
        _msg "Stopped at line $_curline"
        _cmdloop                      # enter debugger
    fi
}

_steptrap starts by setting _curline to the number of the guinea pig line that just ran. If execution tracing is turned on, it prints the PS4 execution trace prompt (a la xtrace mode), the line number, and the line of code itself.

Then it does one of two things: enter the debugger, the heart of which is the function _cmdloop, or just return so that the shell can execute the next statement. It chooses the former if a breakpoint or break condition (see below) has been reached, or if the user stepped into this statement.

Commands

We’ll explain shortly how _steptrap determines these things; now we’ll look at _cmdloop. It’s a typical command loop, resembling a combination of the case statements we saw in Chapter 5 and the calculator loop we saw in Chapter 8.

# Debugger command loop.
# Here at start of debugger session, when breakpoint reached,
# after single-step.  Optionally here inside watchpoint.
function _cmdloop {
    typeset cmd args

    while read -s cmd"?kshdb> " args; do
        case $cmd in
        \#bp ) _setbp $args ;;       # set breakpoint at line num or string.
        \#bc ) _setbc $args ;;       # set break condition.
        \#cb ) _clearbp ;;           # clear all breakpoints.
        \#g  ) return ;;             # start/resume execution
        \#s  ) let _steps=${args:-1} # single-step N times (default 1)
               return ;;
        \#wp ) _setwp $args ;;       # set a watchpoint
        \#cw ) _clearwp $args ;;     # clear one or more watchpoints

        \#x  ) _xtrace ;;            # toggle execution trace
        \#\? | \#h ) _menu ;;        # print command menu
        \#q  ) exit ;;               # quit
        \#*  ) _msg "Invalid command: $cmd" ;;
        *  ) eval $cmd $args ;;      # otherwise, run shell command
        esac
    done

At each iteration, _cmdloop prints a prompt, reads a command, and processes it. We use read -s so that the user can take advantage of command-line editing within kshdb. All kshdb commands start with # to prevent confusion with shell commands. Anything that isn’t a kshdb command (and doesn’t start with #) is passed off to the shell for execution. Using # as the command character prevents a mistyped command from having any ill effect when the last case catches it and runs it through eval. Table 9-5 summarizes the debugger commands.

Table 9-5. kshdb commands

Command	Action
`#bp` `N`	Set breakpoint at line N.
`#bp` `str`	Set breakpoint at next line containing str.
`#bp`	List breakpoints and break condition.
`#bc` `str`	Set break condition to str.
`#bc`	Clear break condition.
`#cb`	Clear all breakpoints.
`#g`	Start or resume execution (go).
`#s` [`N`]	Step through N statements (default 1).
`#wp` [`-c`] `var` `get`	Set a watchpoint on variable var when the value is retrieved. With -c, enter the command loop from within the watchpoint.
`#wp` [`-c`] `var` `set`	Set a watchpoint on variable var when the value is assigned. With -c, enter the command loop from within the watchpoint.
`#wp` [`-c`] `var` `unset`	Set a watchpoint on variable var when the variable is unset. With -c, enter the command loop from within the watchpoint.
`#cw` `var discipline`	Clear the given watchpoint.
`#cw`	Clear all watchpoints.
`#x`	Toggle execution tracing.
`#h`, `#?`	Print a help menu.
`#q`	Quit.

Before we look at the individual commands, it is important that you understand how control passes through _steptrap, the command loop, and the guinea pig.

_steptrap runs before every statement in the guinea pig as a result of the trap ... DEBUG statement in the preamble. If a breakpoint has been reached or the user previously typed in a step command (#s), _steptrap calls the command loop. In doing so, it effectively interrupts the shell that is running the guinea pig to hand control over to the user.^[118]

The user can invoke debugger commands as well as shell commands that run in the same shell as the guinea pig. This means that you can use shell commands to check values of variables, signal traps, and any other information local to the script being debugged.

The command loop runs, and the user stays in control, until the user types #g, #s, or #q. Let’s look in detail at what happens in each of these cases.

#g has the effect of running the guinea pig uninterrupted until it finishes or hits a breakpoint. But actually, it simply exits the command loop and returns to _steptrap, which exits as well. The shell takes control back; it runs the next statement in the guinea pig script and calls _steptrap again. Assuming that there is no breakpoint, this time _steptrap just exits again, and the process repeats until there is a breakpoint or the guinea pig is done.

Stepping

When the user types #s, the command loop code sets the variable _steps to the number of steps the user wants to execute, i.e., to the argument given. Assume at first that the user omits the argument, meaning that _steps is set to 1. Then the command loop exits and returns control to _steptrap, which (as above) exits and hands control back to the shell. The shell runs the next statement and returns to _steptrap, which sees that _steps is 1 and decrements it to 0. Then the third elif conditional sees that _steps is 0, so it prints a “stopped” message and calls the command loop.

Now assume that the user supplies an argument to #s, say 3. _steps is set to 3. Then the following happens:

After the next statement runs, _steptrap is called again. It enters the first if clause, since _steps is greater than 0. _steptrap decrements _steps to 2 and exits, returning control to the shell.
This process repeats, another step in the guinea pig is run, and _steps becomes 1.
A third statement is run and we’re back in _steptrap. _steps is decremented to 0, the third elif clause is run, and _steptrap breaks out to the command loop again.

The overall effect is that three steps run and then the debugger takes over again.

Finally, the #q command exits. The EXIT trap then calls the function _cleanup, which just erases the temp file and exits the entire program.

All other debugger commands (#bp, #bc, #cb, #wp, #cw, #x, and shell commands) cause the shell to stay in the command loop, meaning that the user prolongs the interruption of the shell.

Breakpoints

Now we’ll examine the breakpoint-related commands and the breakpoint mechanism in general. The #bp command calls the function _setbp, which can set two kinds of breakpoints, depending on the type of argument given. If it is a number, it’s treated as a line number; otherwise, it’s interpreted as a string that the breakpoint line should contain.

For example, the command #bp 15 sets a breakpoint at line 15, and #bp grep sets a breakpoint at the next line that contains the string grep — whatever number that turns out to be. Although you can always look at a numbered listing of a file,^[119] string arguments to #bp can make that unnecessary.

Here is the code for _setbp:

# Set breakpoint(s) at given line numbers or strings
# by appending patterns to breakpoint variables
function _setbp {
    if [[ -z $1 ]]; then
        _listbp
    elif [[ $1 == +([0-9]) ]]; then  # number, set bp at that line
        _linebp="${_linebp}$1|"
        _msg "Breakpoint at line " $1
    else                             # string, set bp at next line w/string
        _stringbp="${_stringbp}$@|"
        _msg "Breakpoint at next line containing '$@'."
    fi
}

_setbp sets the breakpoints by storing them in the variables _linebp (line number breakpoints) and _stringbp (string breakpoints). Both have breakpoints separated by pipe character delimiters, for reasons that will become clear shortly. This implies that breakpoints are cumulative; setting new breakpoints does not erase the old ones.

The only way to remove breakpoints is with the command #cb, which (in function _clearbp) clears all of them at once by simply resetting the two variables to null. If you don’t remember what breakpoints you have set, the command #bp without arguments lists them.

The functions _at_linenumbp and _at_stringbp are called by _steptrap after every statement; they check whether the shell has arrived at a line number or string breakpoint, respectively.

Here is _at_linenumbp:

# See if next line no. is a breakpoint.
function _at_linenumbp {
    [[ $_curline == @(${_linebp%\|}) ]]
}

_at_linenumbp takes advantage of the pipe character as the separator between line numbers: it constructs a regular expression of the form @( N1 | N2 |...) by taking the list of line numbers _linebp, removing the trailing |, and surrounding it with @( and ). For example, if $_linebp is 3|15|19|, the resulting expression is @(3|15|19).

If the current line is any of these numbers, the conditional becomes true, and _at_linenumbp also returns a “true” (0) exit status.

The check for a string breakpoint works on the same principle, but it’s slightly more complicated; here is _at_stringbp:

# Search string breakpoints to see if next line in script matches.
function _at_stringbp {
    [[ -n $_stringbp && ${_lines[$_curline]} == *@(${_stringbp%\|})* ]]
}

The conditional first checks if $_stringbp is non-null (meaning that string breakpoints have been defined). If not, the conditional evaluates to false, but if so, its value depends on the pattern match after the && — which tests the current line to see if it contains any of the breakpoint strings.

The expression on the right side of the double equal sign is similar to the one in _at_linenumbp above, except that it has * before and after it. This gives expressions of the form *@( S1 | S2 |...)*, where the Ss are the string breakpoints. This expression matches any line that contains any one of the possibilities in the parentheses.

The left side of the double equal sign is the text of the current line in the guinea pig. So, if this text matches the regular expression, we’ve reached a string breakpoint; accordingly, the conditional expression and _at_stringbp return exit status 0.

_steptrap tests each condition separately, so that it can tell you which kind of breakpoint stopped execution. In both cases, it calls the main command loop.

# Set a watchpoint on a variable
# usage: _setwp [-c] var discipline
# $1 = variable
# $2 = get|set|unset
typeset -A _watchpoints
function _setwp {
    typeset funcdef do_cmdloop=0
    if [[ $1 == -c ]]; then
        do_cmdloop=1
        shift
    fi

    funcdef="function $1.$2 { "

    case $2 in
    get)    funcdef+="_msg $1 \(\$$1\) retrieved, line \$_curline"
            ;;
    set)    funcdef+="_msg $1 set to "'${.sh.value}'", line \$_curline"
            ;;
    unset)  funcdef+="_msg $1 cleared at line \$_curline"
            funcdef+=$'\nunset '"$1"
            ;;
    *)      _msg invalid watchpoint function $2
            return 1
            ;;
    esac

    if ((do_cmdloop)); then
        funcdef+=$'\n_cmdloop'
    fi
    funcdef+=$'\n}'

    eval "$funcdef"

    _watchpoints[$1.$2]=1
}

This function illustrates several interesting techniques. The first thing it does is declare some local variables and check if it was invoked with the -c option. This indicates that the watchpoint should enter the command loop.

The general idea is to build up the text of the appropriate discipline function in the variable funcdef. The initial value is the function keyword, the discipline function name, and the opening left curly brace. The space following the brace is important, so that the shell will correctly recognize it as a keyword.

Then, for each kind of discipline function, the case construct appends the appropriate function body to the funcdef string. The code uses judiciously placed backslashes to get the correct mixture of immediate and delayed shell variable evaluation. Consider the get case: for the \(, the backslash stays intact for use as a quoting character inside the body of the discipline function. For \$$1, the quoting happens as follows: the \$ becomes a $ inside the function, while the $1 is evaluated immediately inside the double quoted string.

In the case that the -c option was supplied, it uses the $'...' notation to append a newline and a call to _cmdloop to the function body, and then at the end appends another newline and closing right brace. Finally, by using eval, it installs the newly created function.

For example, if -c was used, the text of the generated get function for the variable count ends up looking like this:

function count.get { _msg count \($count\) retrieved, line $_curline
_cmdloop
}

At the end of _setwp, _watchpoints[$1.$2] is set to 1. This creates an entry in the associative array _watchpoints indexed by discipline function name. This conveniently stores the names of all watchpoints for when we want to clear them.

Watchpoints are cleared with the #cw command, which in turn runs the _clearwp function. Here it is:

# Clear watchpoints:
# no args: clear all
# two args: same as for setting: var get|set|unset
function _clearwp {
    if [ $# = 0 ]; then
        typeset _i
        for _i in ${!_watchpoints[*]}; do
            unset -f $_i
            unset _watchpoints[$_i]
        done
    elif [ $# = 2 ]; then
        case $2 in
        get | set | unset)
            unset -f $1.$2
            unset _watchpoints[$1.$2]
            ;;
        *)  _msg $2: invalid watchpoint
            ;;
        esac
    fi
}

When invoked with no arguments, _clearwp clears all the watchpoints, by looping over all the subscripts in the _watchpoints associative array. Otherwise, if invoked with two arguments, the variable name and discipline function, it unsets the function using unset -f. In either case, the entry in _watchpoints is also unset.

Limitations

kshdb was not designed to push the state of the debugger art forward or to have an overabundance of features. It has the most useful basic features; its implementation is compact and (we hope) comprehensible. But it does have some important limitations. The ones we know of are described in the list that follows:

String breakpoints cannot begin with digits or contain pipe characters (|) unless they are properly escaped.
You can only set breakpoints — whether line number or string — on lines in the guinea pig that contain what the shell’s documentation calls simple commands, i.e., actual Unix commands, shell built-ins, function calls, or aliases. If you set a breakpoint on a line that contains only whitespace or a comment, the shell always skips over that breakpoint. More importantly, control keywords like while, if, for, do, done, and even conditionals ([[...]] and ((...))) won’t work either, unless a simple command is on the same line.
kshdb does not “step down” into shell scripts that are called from the guinea pig. To do this, you have to edit your guinea pig and change a call to scriptname to kshdb scriptname.
Similarly, subshells are treated as one gigantic statement; you cannot step down into them at all.
The guinea pig should not trap on the fake signals DEBUG or EXIT; otherwise the debugger won’t work.
Variables that are typeset (see Chapter 4) are not accessible in break conditions. However, you can use the shell command print to check their values.
Command error handling is weak. For example, a non-numeric argument to #s will cause it to bomb.
Watchpoints that invoke the command loop are fragile. For ksh93m under GNU/Linux, trying to unset a watchpoint when in the command loop invoked from the watchpoint causes the shell to core dump. But this does not happen on all platforms, and this will eventually be fixed.

Many of these are not insurmountable; see the exercises.

A Sample kshdb Session

Now we’ll show a transcript of an actual session with kshdb, in which the guinea pig is (a slightly modified version of) the solution to Task 6-3. For convenience, here is a numbered listing of the script, which we’ll call lscol.

 1    set -A filenames $(ls $1)
 2    typeset -L14 fname
 3    let numfiles=${#filenames[*]}
 4    let numcols=5
 5
 6    for ((count = 0; $count < $numfiles ; )); do
 7        fname=${filenames[count]}
 8        print -n "$fname  "
 9        let count++
10        if (( count % numcols == 0 )); then
11            print           # newline
12        fi
13    done
14    
15    if (( count % numcols != 0 )); then
16        print
17    fi

Here is the kshdb session transcript:

$ kshdb lscol book
Korn Shell Debugger version 2.0 for ksh Version M 1993-12-28 m
Stopped at line 1
kshdb> #bp 4
Breakpoint at line  4
kshdb> #g
Reached line breakpoint at line 4
kshdb> #s
Stopped at line 6
kshdb> print $numcols
5
kshdb> #bc (( count == 10 ))
Break when true: (( count == 10 ))
kshdb> #g
appa.xml        appb.xml        appc.xml        appd.xml        appf.xml
book.xml        ch00.xml        ch01.xml        ch02.xml        ch03.xml
Break condition '(( count == 10 ))' true at line 10
kshdb> #bc
Break condition cleared.
kshdb> #bp newline
Breakpoint at next line containing 'newline'.
kshdb> #g
Reached string breakpoint at line 11
kshdb> print $count
10
kshdb> let count=9
kshdb> #g

ch03.xml        Reached string breakpoint at line 11
kshdb> #bp
Breakpoints at lines:
4
Breakpoints at strings:
newline
Break on condition:

kshdb> #g

ch04.xml        ch05.xml        ch06.xml        ch07.xml        ch08.xml
Reached string breakpoint at line 11
kshdb> #g

ch09.xml        ch10.xml        colo1.xml       copy.xml
$

First, notice that we gave the guinea pig script the argument book, meaning that we want to list the files in that directory. We begin by setting a simple breakpoint at line 4 and starting the script. It stops before executing line 4 (let numcols=5). We issue the #s command to single step through the command (i.e., to actually execute it). Then we issue a shell print command to show that the variable numcols is indeed set correctly.

Next, we set a break condition, telling the debugger to kick in when count is 10, and we resume execution. Sure enough, the guinea pig prints 10 filenames and stops at line 10, right after count is incremented. We clear the break condition by typing #bc without an argument, since otherwise the shell would stop after every statement until the condition becomes false.

The next command shows how the string breakpoint mechanism works. We tell the debugger to break when it hits a line that contains the string newline. This string is in a comment on line 11. Notice that it doesn’t matter that the string is in a comment — just that the line it’s on contains an actual command. We resume execution, and the debugger hits the breakpoint at line 11.

After that, we show how we can use the debugger to change the guinea pig’s state while running. We see that $count is still greater than 10; we change it to 9. In the next iteration of the while loop, the script accesses the same filename that it just did (ch03.xml), increments count back to 10, and hits the string breakpoint again. Finally, we list breakpoints and step through to the end, at which point it exits.

Exercises

We conclude this chapter with a few exercises, which are suggested enhancements to kshdb.

Improve command error handling in these ways:
Enhance the #cb command so that the user can delete specific breakpoints (by string or line number).
Remove the major limitation in the breakpoint mechanism:
1. Improve it so that if the line number selected does not contain an actual Unix command, the next closest line above it is used as the breakpoint instead.
2. Do the same thing for string breakpoints. (Hint: first translate each string breakpoint command into one or more line-number breakpoint commands.)
Implement an option that causes a break into the debugger whenever a command exits with nonzero status:
1. Implement it as the command-line option -e.
2. Implement it as the debugger commands #be (to turn the option on) and #ne (to turn it off). (Hint: you won’t be able to use the ERR trap, but bear in mind that when you enter _steptrap, $? is still the exit status of the last command that ran.)
Add the ability to “step down” into scripts that the guinea pig calls (i.e., shell subprocesses) as the command-line option -s. One way to implement this is to change the kshdb script so it plants recursive calls to kshdb in the guinea pig. You can do this by filtering the guinea pig through a loop that reads each line and determines, with the whence -v and file(1) (see the man page) commands, if the line is a call to another shell script.^[122] If so, prepend kshdb -s to the line and write it to the new file; if not, just pass it through as is.
Add support for multiple break conditions, so that kshdb stops execution when any one of them becomes true and prints a message that says which one is true. Do this by storing the break conditions in a colon-separated list or an array. Try to make this as efficient as possible, since the checking has to take place before every statement.
Add any other features you can think of.

Finally, here is the complete source code for the debugger function file kshdb.fns:

# Here before each statement in script being debugged.
# Handle single-step and breakpoints.
function _steptrap {
    _curline=$1                       # arg is no. of line that just ran
    (( $_trace )) && _msg "$PS4 line $_curline: ${_lines[$_curline]}"
    if (( $_steps >= 0 )); then       # if in step mode
        let _steps="$_steps - 1"      # decrement counter
    fi

    # first check: if line num breakpoint reached
    if _at_linenumbp; then
        _msg "Reached line breakpoint at line $_curline"
        _cmdloop                      # breakpoint, enter debugger

    # second check: if string breakpoint reached
    elif _at_stringbp; then
        _msg "Reached string breakpoint at line $_curline"
        _cmdloop                      # breakpoint, enter debugger

    # if neither, check whether break condition exists and is true
    elif [[ -n $_brcond ]] && eval $_brcond; then
        _msg "Break condition '$_brcond' true at line $_curline"
        _cmdloop                      # break condition, enter debugger

    # finally, check if step mode and number of steps is up
    elif (( _steps == 0 )); then      # if step mode and time to stop
        _msg "Stopped at line $_curline"
        _cmdloop                      # enter debugger
    fi
}

# Debugger command loop.
# Here at start of debugger session, when breakpoint reached,
# after single-step.  Optionally here inside watchpoint.
function _cmdloop {
    typeset cmd args

    while read -s cmd"?kshdb> " args; do
        case $cmd in
        \#bp ) _setbp $args ;;       # set breakpoint at line num or string.
        \#bc ) _setbc $args ;;       # set break condition.
        \#cb ) _clearbp ;;           # clear all breakpoints.
        \#g  ) return ;;             # start/resume execution
        \#s  ) let _steps=${args:-1} # single-step N times (default 1)
               return ;;
        \#wp ) _setwp $args ;;       # set a watchpoint
        \#cw ) _clearwp $args ;;     # clear one or more watchpoints

        \#x  ) _xtrace ;;            # toggle execution trace
        \#\? | \#h ) _menu ;;        # print command menu
        \#q  ) exit ;;               # quit
        \#*  ) _msg "Invalid command: $cmd" ;;
        *  ) eval $cmd $args ;;      # otherwise, run shell command
        esac
    done
}

# See if next line no. is a breakpoint.
function _at_linenumbp {
    [[ $_curline == @(${_linebp%\|}) ]]
}

# Search string breakpoints to see if next line in script matches.
function _at_stringbp {
    [[ -n $_stringbp && ${_lines[$_curline]} == *@(${_stringbp%\|})* ]]
}

# Print the given message to standard error.
function _msg {
    print -r -- "$@" >&2
}

# Set breakpoint(s) at given line numbers or strings
# by appending patterns to breakpoint variables
function _setbp {
    if [[ -z $1 ]]; then
        _listbp
    elif [[ $1 == +([0-9]) ]]; then  # number, set bp at that line
        _linebp="${_linebp}$1|"
        _msg "Breakpoint at line " $1
    else                             # string, set bp at next line w/string
        _stringbp="${_stringbp}$@|"
        _msg "Breakpoint at next line containing '$@'."
    fi
}

# List breakpoints and break condition.
function _listbp {
    _msg "Breakpoints at lines:"
    _msg "$(print $_linebp | tr '|' ' ')"
    _msg "Breakpoints at strings:"
    _msg "$(print $_stringbp | tr '|' ' ')"
    _msg "Break on condition:"
    _msg "$_brcond"
}

# Set or clear break condition
function _setbc {
    if [[ $# = 0 ]]; then
        _brcond=
        _msg "Break condition cleared."
    else
        _brcond="$*"
        _msg "Break when true: $_brcond"
    fi
}

# Clear all breakpoints.
function _clearbp {
    _linebp=
    _stringbp=
    _msg "All breakpoints cleared."
}

# Toggle execution trace feature on/off
function _xtrace {
    let _trace="! $_trace"
    if (( $_trace )); then
        _msg "Execution trace on."
    else
        _msg "Execution trace off."
    fi
}

# Print command menu
function _menu {
    _msg 'kshdb commands:
    #bp N                     set breakpoint at line N
    #bp str                   set breakpoint at next line containing str
    #bp                       list breakpoints and break condition
    #bc str                   set break condition to str
    #bc                       clear break condition
    #cb                       clear all breakpoints
    #wp [-c] var discipline   set a watchpoint on a variable
    #cw                       clear all watchpoints
    #g                        start/resume execution
    #s [N]                    execute N statements (default 1)
    #x                        toggle execution trace on/off
    #h, #?                    print this menu
    #q                        quit'
}

# Erase temp files before exiting.
function _cleanup {
    rm $_dbgfile 2>/dev/null
}

# Set a watchpoint on a variable
# usage: _setwp [-c] var discipline
# $1 = variable
# $2 = get|set|unset
typeset -A _watchpoints
function _setwp {
    typeset funcdef do_cmdloop=0
    if [[ $1 == -c ]]; then
        do_cmdloop=1
        shift
    fi

    funcdef="function $1.$2 { "

    case $2 in
    get)    funcdef+="_msg $1 \(\$$1\) retrieved, line \$_curline"
            ;;
    set)    funcdef+="_msg $1 set to "'${.sh.value}'", line \$_curline"
            ;;
    unset)  funcdef+="_msg $1 cleared at line \$_curline"
            funcdef+=$'\nunset '"$1"
            ;;
    *)      _msg invalid watchpoint function $2
            return 1
            ;;
    esac

    if ((do_cmdloop)); then
        funcdef+=$'\n_cmdloop'
    fi
    funcdef+=$'\n}'

    eval "$funcdef"

    _watchpoints[$1.$2]=1
}

# Clear watchpoints:
# no args: clear all
# two args: same as for setting: var get|set|unset
function _clearwp {
    if [ $# = 0 ]; then
        typeset _i
        for _i in ${!_watchpoints[*]}; do
            unset -f $_i
            unset _watchpoints[$_i]
        done
    elif [ $# = 2 ]; then
        case $2 in
        get | set | unset)
            unset -f $1.$2
            unset _watchpoints[$1.$2]
            ;;
        *)  _msg $2: invalid watchpoint
            ;;
        esac
    fi
}

^[113] Actually, if you are really concerned about efficiency, there are shell code compilers on the market; some convert shell scripts to C code that often runs quite a bit faster; however, these tools are usually for Bourne shell scripts. Other “compilers” simply convert the script into a binary form so that customers can’t read the program.

^[114] As with PS1 and PS3, this variable also undergoes parameter, command, and arithmetic substitution before its value is printed.

^[115] We should admit that if you turned on the nounset option at the top of this script, the shell would have flagged this error.

^[116] This is a notable change from ksh88, where the trap was run after each statement.

^[117] All function names and variables (except those local to functions) in kshdb have names beginning with an underscore (_), to minimize the possibility of clashes with names in the guinea pig. A more ksh93-oriented solution would be to use a compound variable, e.g., _db.tmpdir, _db.libdir, and so on.

^[118] In fact, low-level systems programmers can think of the entire trap mechanism as quite similar to an interrupt-handling scheme.

^[119] pr -n filename prints a numbered listing to standard output on System V-derived versions of Unix. Some very old BSD-derived systems don’t support it. If this doesn’t work on your system, try cat -n filename, or if that doesn’t work, create a shell script with the single line awk '{ printf("%d\t%s\n", NR, $0 }' $1

^[120] Bear in mind that if your break condition produces any standard output (or standard error), you will see it before every statement. Also, make sure your break condition doesn’t take a long time to run; otherwise your script will run very, very slowly.

^[121] Actually, by entering typeset -ft funcname, the user can enable tracing on a per-function basis, but it’s probably better to have it all under the debugger’s control.

^[122] Notice that this method should catch most separate shell scripts, but not all of them. For example, it won’t catch shell scripts that follow semicolons (e.g., cmd1; cmd2).