You can use the grep (
Section 13.2) option
-c
to tell you how many occurrences of a pattern appear in
a given file, so you can also use it to find files that
don't contain a pattern (i.e., zero occurrences of the
pattern). This is a handy technique to package into a shell script.
Let's say you're indexing a DocBook (SGML) document and
you want to make a list of files that don't yet contain indexing tags. What
you need to find are files with zero occurrences of the string <indexterm>
. (If your tags might be
uppercase, you'll also want the -i option
(Section 9.22).) The following
command:
% grep -c "<indexterm>" chapter*
might produce the following output:
chapter1.sgm:10 chapter2.sgm:27 chapter3.sgm:19 chapter4.sgm:0 chapter5.sgm:39 ...
This is all well and good, but suppose you need to check index entries in hundreds of reference pages. Well, just filter grep's output by piping it through another grep. The previous command can be modified as follows:
% grep -c "<indexterm>" chapter* | grep :0
This results in the following output:
chapter4.sgm:0
Using sed (Section 34.1) to truncate the
:0
, you can save the output as a list
of files. For example, here's a trick for creating a list of files that
don't contain index macros:
% grep -c "<indexterm>" * | sed -n 's/:0$//p' > ../not_indexed.list
The sed -n command prints only the
lines that contain :0
; it also strips the
:0
from the output so that ../not_indexed.list contains a list of files,
one per line. For a bit of extra safety, we've added a $ anchor (Section
32.5) to be sure sed matches
only 0
at the end of a line — and not,
say, in some bizarre filename that contains :0
. (We've quoted (Section 27.12) the $
for safety — though it's not really
necessary in most shells because $/
can't
match shell variables.) The .. pathname
(Section 1.16) puts the
not_indexed.list file into the
parent directory — this is one easy way to keep grep from searching that file, but it may not be worth the
bother.
To edit all files that need index macros added, you could type this:
% vi `grep -c "<indexterm>" * | sed -n 's/:0$//p'`
This command is more obvious once you start using backquotes a lot.
You
can put the grep -c
technique into a little script named vgrep with a couple of
safety features added:
"$@"
Section 35.20
Go to http://examples.oreilly.com/upt3 for more information on:
vgrep
#!/bin/sh case $# in 0|1) echo "Usage: `basename $0` pattern file [files...]" 1>&2; exit 2 ;; 2) # Given a single filename, grep returns a count with no colon or name. grep -c -e "$1" "$2" | sed -n "s|^0\$|$2|p" ;; *) # With more than one filename, grep returns "name:count" for each file. pat="$1"; shift grep -c -e "$pat" "$@" | sed -n "s|:0\$||p" ;; esac
Now you can type, for example:
% vi `vgrep "<indexterm>" *`
One of the script's safety features works around a problem that happens if
you pass grep just one filename. In that
case, most versions of grep won't print
the file's name, just the number of matches. So the first sed command substitutes a digit 0
with the filename.
The second safety feature is the grep
-e
option. It tells grep that the following argument is the search pattern, even
if that pattern looks like an option because it starts with a dash (-
). This lets you type commands like vgrep -0123 *
to find files that don't contain
the string -0123.
—DG and JP