[Section 13.9 introduced a script called cgrep , a general-purpose, grep-like program built with sed. It allows you to look for one or more words that appear on one line or across several lines. This article explains the sed tricks that are necessary to do this kind of thing. It gets into territory that is essential for any advanced applications of this obscure yet wonderful editor. Section 34.14 through Section 34.17 have background information. — JP]
Let's review the two examples from Section 13.9. The first command below finds all lines containing the word system in the file main.c and shows 10 additional lines of context above and below each match. The second command finds all occurrences of the word "awk" where it is followed by the word "perl" somewhere within the next 3 lines:
cgrep -10 system main.c cgrep -3 "awk.*perl"
Now the script, followed by an explanation of how it works:
case
Section 35.11, expr
Section 36.21, shift
Section 35.22, ${?}
Section 36.7, \~..~
Section 34.8, "$@"
Section 35.20
#!/bin/sh # cgrep - multiline context grep using sed # Usage: cgrep [-context] pattern [file...] n=3 case $1 in -[1-9]*) n=`expr 1 - "$1"` shift esac re=${1?}; shift sed -n " 1b start : top \~$re~{ h; n; p; H; g b endif } N : start //{ =; p; } : endif $n,\$D b top " "$@"
The sed script is embedded in a bare-bones
shell wrapper (
Section 35.19) to parse out the
initial arguments because, unlike awk and
perl, sed cannot directly access command-line parameters. If the first
argument looks like a -context
option, variable
n is reset to one more than the number of lines
specified, using a little trick — the argument is treated as a negative number
and subtracted from 1
. The pattern argument
is then stored in $re
, with the ${1?}
syntax causing the shell to abort with an
error message if no pattern was given. Any remaining arguments are passed as
filenames to the sed command.
So that the $re
and $n
parameters can be embedded, the sed script is enclosed in double quotes (Section
27.12). We use the -n
option because we don't want
to print out every line by default, and because we need to use the n
command in the script without its side effect of
outputting a line.
The sed script itself looks rather
unstructured (it was actually designed using a flowchart), but the basic
algorithm is easy enough to understand. We keep a "window" of
n lines in the pattern space and scroll this window
through the input stream. If an occurrence of the pattern comes into the window,
the entire window is printed (providing n lines of previous
context), and each subsequent line is printed until the pattern scrolls out of
view again (providing n lines of following context). The
sed idiom N;D
is used to advance the window, with the D
not kicking in until the first
n lines of input have been accumulated.
The core of the script is basically an if-then-else construct that
decides whether we are currently "in context." (The regular expression here is
delimited by tilde (~
) characters because
tildes are less likely to occur in the user-supplied pattern than slashes.)
If we are still in context, then the next line of input is read and output,
temporarily using the hold space to save the window (and effectively doing an
N
in the process). Else we append the next input line (N
) and search for the pattern again (an empty regular expression
means to reuse the last pattern). If it's now found, the pattern must have just
come into view — so we print the current line number followed by the contents of
the window. Subsequent iterations will take the "then" branch until the pattern
scrolls out of the
window.
— GU