An outsider might imagine Unix as a nice, uniform command line experience across many different systems, helped by their compliance with the POSIX standards. But anyone who’s ever used more than one Unix system knows how much they can vary within these broad parameters. You’d be hard-pressed to find a Unix or Linux box that doesn’t have ls as a standard command, for example, but does your version support the --color flag? Does your version of the Bourne shell support variable slicing (like ${var:0:2})?
Perhaps one of the most valuable uses of shell scripts is tweaking your particular flavor of Unix to make it more like other systems. Although most modern GNU utilities run just fine on non-Linux Unixes (for example, you can replace clunky old tar with the newer GNU tar), often the system updates involved in tweaking Unix don’t need to be so drastic, and it’s possible to avoid the potential problems inherent in adding new binaries to a supported system. Instead, shell scripts can be used to map popular flags to their local equivalents, to use core Unix capabilities to create a smarter version of an existing command, or even to address the longtime lack of certain functionality.
There are several ways to add line numbers to a displayed file, many of which are quite short. For example, here’s one solution using awk:
awk '{ print NR": "$0 }' < inputfile
On some Unix implementations, the cat command has an -n flag, and on others, the more (or less, or pg) pager has a flag for specifying that each line of output should be numbered. But on some Unix flavors, none of these methods will work, in which case the simple script in Listing 4-1 can do the job.
#!/bin/bash # numberlines--A simple alternative to cat -n, etc. for filename in "$@" do linecount="1" ➊ while IFS="\n" read line do echo "${linecount}: $line" ➋ linecount="$(( $linecount + 1 ))" ➌ done < $filename done exit 0
Listing 4-1: The numberlines script
There’s a trick to the main loop in this program: it looks like a regular while loop, but the important part is actually done < $filename ➌. It turns out that every major block construct acts as its own virtual subshell, so this file redirection is not only valid but also an easy way to have a loop that iterates line by line with the content of $filename. Couple that with the read statement at ➊—an inner loop that loads each line, iteration by iteration, into the line variable—and it’s then easy to output the line with its line number as a preface and increment the linecount variable ➋.
You can feed as many filenames as you want into this script. You can’t feed it input via a pipe, though that wouldn’t be too hard to fix by invoking a cat - sequence if no starting parameters are given.
Listing 4-2 shows a file displayed with line numbers using the numberlines script.
$ numberlines alice.txt
1: Alice was beginning to get very tired of sitting by her sister on the
2: bank, and of having nothing to do: once or twice she had peeped into the
3: book her sister was reading, but it had no pictures or conversations in
4: it, 'and what is the use of a book,' thought Alice 'without pictures or
5: conversations?'
6:
7: So she was considering in her own mind (as well as she could, for the
8: hot day made her feel very sleepy and stupid), whether the pleasure
9: of making a daisy-chain would be worth the trouble of getting up and
10: picking the daisies, when suddenly a White Rabbit with pink eyes ran
11: close by her.
Listing 4-2: Testing the numberlines script on an excerpt from Alice in Wonderland
Once you have a file with numbered lines, you can reverse the order of all the lines in the file, like this:
cat -n filename | sort -rn | cut -c8-
This does the trick on systems supporting the -n flag to cat, for example. Where might this be useful? One obvious situation is when displaying a log file in newest-to-oldest order.
One limitation of the fmt command and its shell script equivalent, Script #14 on page 53, is that they wrap and fill every line they encounter, whether or not it makes sense to do so. This can mess up email (wrapping your .signature is not good, for example) and any input file format where line breaks matter.
What if you have a document in which you want to wrap just the long lines but leave everything else intact? With the default set of commands available to a Unix user, there’s only one way to accomplish this: explicitly step through each line in an editor, feeding the long ones to fmt individually. (You could accomplish this in vi by moving the cursor onto the line in question and using !$fmt.)
The script in Listing 4-3 automates that task, making use of the shell ${#varname} construct, which returns the length of the contents of the data stored in the variable varname.
#!/bin/bash # toolong--Feeds the fmt command only those lines in the input stream # that are longer than the specified length width=72 if [ ! -r "$1" ] ; then echo "Cannot read file $1" >&2 echo "Usage: $0 filename" >&2 exit 1 fi ➊ while read input do if [ ${#input} -gt $width ] ; then echo "$input" | fmt else echo "$input" fi ➋ done < $1 exit 0
Listing 4-3: The toolong script
Notice that the file is fed to the while loop with a simple < $1 associated with the end of the loop ➋ and that each line can then be analyzed by reading it with read input ➊, which assigns each line of the file to the input variable, line by line.
If your shell doesn’t have the ${#var} notation, you can emulate its behavior with the super useful “word count” command wc:
varlength="$(echo "$var" | wc -c)"
However, wc has an annoying habit of prefacing its output with spaces to get values to align nicely in the output listing. To sidestep that pesky problem, a slight modification is necessary to let only digits through the final pipe step, as shown here:
varlength="$(echo "$var" | wc -c | sed 's/[^[:digit:]]//g')"
This script accepts exactly one filename as input, as Listing 4-4 shows.
$ toolong ragged.txt
So she sat on, with closed eyes, and half believed herself in
Wonderland, though she knew she had but to open them again, and
all would change to dull reality--the grass would be only rustling
in the wind, and the pool rippling to the waving of the reeds--the
rattling teacups would change to tinkling sheep-bells, and the
Queen's shrill cries to the voice of the shepherd boy--and the
sneeze
of the baby, the shriek of the Gryphon, and all the other queer
noises, would change (she knew) to the confused clamour of the busy
farm-yard--while the lowing of the cattle in the distance would
take the place of the Mock Turtle's heavy sobs.
Listing 4-4: Testing the toolong script
Notice that unlike a standard invocation of fmt, toolong has retained line breaks where possible, so the word sneeze, which is on a line by itself in the input file, is also on a line by itself in the output.
Many of the most common Unix and Linux commands were originally designed for slow, barely interactive output environments (we did talk about Unix being an ancient OS, right?) and therefore offer minimal output and interactivity. An example is cat: when used to view a short file, it doesn’t give much helpful output. It would be nice to have more information about the file, though, so let’s get it! Listing 4-5 details the showfile command, an alternative to cat.
#!/bin/bash # showfile--Shows the contents of a file, including additional useful info width=72 for input do lines="$(wc -l < $input | sed 's/ //g')" chars="$(wc -c < $input | sed 's/ //g')" owner="$(ls -ld $input | awk '{print $3}')" echo "-----------------------------------------------------------------" echo "File $input ($lines lines, $chars characters, owned by $owner):" echo "-----------------------------------------------------------------" while read line do if [ ${#line} -gt $width ] ; then echo "$line" | fmt | sed -e '1s/^/ /' -e '2,$s/^/+ /' else echo " $line" fi ➊ done < $input echo "-----------------------------------------------------------------" ➋ done | ${PAGER:more} exit 0
Listing 4-5: The showfile script
To simultaneously read the input line by line and add head and foot information, this script uses a handy shell trick: near the end of the script, it redirects the input to the while loop with the snippet done < $input ➊. Perhaps the most complex element in this script, however, is the invocation of sed for lines longer than the specified length:
echo "$line" | fmt | sed -e '1s/^/ /' -e '2,$s/^/+ /'
Lines greater than the maximum allowable length are wrapped with fmt (or its shell script replacement, Script #14 on page 53). To visually denote which lines are continuations and which are retained intact from the original file, the first output line of the excessively long line has the usual two-space indent, but subsequent lines are prefixed with a plus sign and a single space instead. Finally, piping the output into ${PAGER:more} displays the file with the pagination program set with the system variable $PAGER or, if that’s not set, the more program ➋.
You can run showfile by specifying one or more filenames when the program is invoked, as Listing 4-6 shows.
$ showfile ragged.txt
-----------------------------------------------------------------
File ragged.txt (7 lines, 639 characters, owned by taylor):
-----------------------------------------------------------------
So she sat on, with closed eyes, and half believed herself in
Wonderland, though she knew she had but to open them again, and
all would change to dull reality--the grass would be only rustling
+ in the wind, and the pool rippling to the waving of the reeds--the
rattling teacups would change to tinkling sheep-bells, and the
Queen's shrill cries to the voice of the shepherd boy--and the
sneeze
of the baby, the shriek of the Gryphon, and all the other queer
+ noises, would change (she knew) to the confused clamour of the busy
+ farm-yard--while the lowing of the cattle in the distance would
+ take the place of the Mock Turtle's heavy sobs.
Listing 4-6: Testing the showfile script
The inconsistency across the command flags of various Unix and Linux systems is a perpetual problem that causes lots of grief for users who switch between any of the major releases, particularly between a commercial Unix system (SunOS/Solaris, HP-UX, and so on) and an open source Linux system. One command that demonstrates this problem is quota, which supports full-word flags on some Unix systems but accepts only one-letter flags on others.
A succinct shell script (shown in Listing 4-7) solves the problem by mapping any full-word flags specified to the equivalent single-letter alternatives.
#!/bin/bash
# newquota--A frontend to quota that works with full-word flags a la GNU
# quota has three possible flags, -g, -v, and -q, but this script
# allows them to be '--group', '--verbose', and '--quiet' too.
flags=""
realquota="$(which quota)"
while [ $# -gt 0 ]
do
case $1
in
--help) echo "Usage: $0 [--group --verbose --quiet -gvq]" >&2
exit 1 ;;
--group) flags="$flags -g"; shift ;;
--verbose) flags="$flags -v"; shift ;;
--quiet) flags="$flags -q"; shift ;;
--) shift; break ;;
*) break; # Done with 'while' loop!
esac
done
➊ exec $realquota $flags "$@"
Listing 4-7: The newquota script
This script really boils down to a while statement that steps through every argument specified to the script, identifying any of the matching full-word flags and adding the associated one-letter flag to the flags variable. When done, it simply invokes the original quota program ➊ and adds the user-specified flags as needed.
There are a couple of ways to integrate a wrapper of this nature into your system. The most obvious is to rename this script quota, then place this script in a local directory (say, /usr/local/bin), and ensure that users have a default PATH that looks in this directory before looking in the standard Linux binary distro directories (/bin and /usr/bin). Another way is to add system-wide aliases so that a user entering quota actually invokes the newquota script. (Some Linux distros ship with utilities for managing system aliases, such as Debian’s alternatives system.) This last strategy could be risky, however, if users call quota with the new flags in their own shell scripts: if those scripts don’t use the user’s interactive login shell, they might not see the specified alias and will end up calling the base quota command rather than newquota.
Listing 4-8 details running newquota with the --verbose and --quiet arguments.
$ newquota --verbose Disk quotas for user dtint (uid 24810): Filesystem usage quota limit grace files quota limit grace /usr 338262 614400 675840 10703 120000 126000 $ newquota --quiet
Listing 4-8: Testing the newquota script
The --quiet mode emits output only if the user is over quota. You can see that this is working correctly from the last result, where we’re not over quota. Phew!
The secure version of the File Transfer Protocol ftp program is included as part of ssh, the Secure Shell package, but its interface can be a bit confusing for users who are making the switch from the crusty old ftp client. The basic problem is that ftp is invoked as ftp remotehost and it then prompts for account and password information. By contrast, sftp wants to know the account and remote host on the command line and won’t work properly (or as expected) if only the host is specified.
To address this, the simple wrapper script detailed in Listing 4-9 allows users to invoke mysftp exactly as they would have invoked the ftp program and be prompted for the necessary fields.
#!/bin/bash
# mysftp--Makes sftp start up more like ftp
/bin/echo -n "User account: "
read account
if [ -z $account ] ; then
exit 0; # Changed their mind, presumably
fi
if [ -z "$1" ] ; then
/bin/echo -n "Remote host: "
read host
if [ -z $host ] ; then
exit 0
fi
else
host=$1
fi
# End by switching to sftp. The -C flag enables compression here.
➊ exec sftp -C $account@$host
Listing 4-9: The mysftp script, a friendlier version of sftp
There’s a trick in this script worth mentioning. It’s actually something we’ve done in previous scripts, though we haven’t highlighted it for you before: the last line is an exec call ➊. What this does is replace the currently running shell with the application specified. Because you know there’s nothing left to do after calling the sftp command, this method of ending our script is much more resource efficient than having the shell hanging around waiting for sftp to finish using a separate subshell, which is what would happen if we just invoked sftp instead.
As with the ftp client, if users omit the remote host, the script continues by prompting for a remote host. If the script is invoked as mysftp remotehost, the remotehost provided is used instead.
Let’s see what happens when you invoke this script without any arguments versus invoking sftp without any arguments. Listing 4-10 shows running sftp.
$ sftp
usage: sftp [-1246Cpqrv] [-B buffer_size] [-b batchfile] [-c cipher]
[-D sftp_server_path] [-F ssh_config] [-i identity_file] [-l limit]
[-o ssh_option] [-P port] [-R num_requests] [-S program]
[-s subsystem | sftp_server] host
sftp [user@]host[:file ...]
sftp [user@]host[:dir[/]]
sftp -b batchfile [user@]host
Listing 4-10: Running the sftp utility with no arguments yields very cryptic help output.
That’s useful but confusing. By contrast, with the mysftp script you can proceed to make an actual connection, as Listing 4-11 shows.
$ mysftp User account: taylor Remote host: intuitive.com Connecting to intuitive.com... taylor@intuitive.com's password: sftp> quit
Listing 4-11: Running the mysftp script with no arguments is much clearer.
Invoke the script as if it were an ftp session by supplying the remote host, and it’ll prompt for the remote account name (detailed in Listing 4-12) and then invisibly invoke sftp.
$ mysftp intuitive.com User account: taylor Connecting to intuitive.com... taylor@intuitive.com's password: sftp> quit
Listing 4-12: Running the mysftp script with a single argument: the host to connect to
One thing to always think about when you have a script like this is whether it can be the basis of an automated backup or sync tool, and mysftp is a perfect candidate. So a great hack would be to designate a directory on your system, for example, then write a wrapper that would create a ZIP archive of key files, and use mysftp to copy them up to a server or cloud storage system. In fact, we’ll do just that later in the book with Script #72 on page 229.
Some versions of grep offer a remarkable range of capabilities, including the particularly useful ability to show the context (a line or two above and below) of a matching line in the file. Additionally, some versions of grep can highlight the region in the line (for simple patterns, at least) that matches the specified pattern. You might already have such a version of grep. Then again, you might not.
Fortunately, both of these features can be emulated with a shell script, so you can still use them even if you’re on an older commercial Unix system with a relatively primitive grep command. To specify the number of lines of context both above and below the line matching the pattern that you specified, use -c value, followed by the pattern to match. This script (shown in Listing 4-13) also borrows from the ANSI color script, Script #11 on page 40, to do region highlighting.
#!/bin/bash # cgrep--grep with context display and highlighted pattern matches context=0 esc="^[" boldon="${esc}[1m" boldoff="${esc}[22m" sedscript="/tmp/cgrep.sed.$$" tempout="/tmp/cgrep.$$" function showMatches { matches=0 ➊ echo "s/$pattern/${boldon}$pattern${boldoff}/g" > $sedscript ➋ for lineno in $(grep -n "$pattern" $1 | cut -d: -f1) do if [ $context -gt 0 ] ; then ➌ prev="$(( $lineno - $context ))" if [ $prev -lt 1 ] ; then # This results in "invalid usage of line address 0." prev="1" fi ➍ next="$(( $lineno + $context ))" if [ $matches -gt 0 ] ; then echo "${prev}i\\" >> $sedscript echo "----" >> $sedscript fi echo "${prev},${next}p" >> $sedscript else echo "${lineno}p" >> $sedscript fi matches="$(( $matches + 1 ))" done if [ $matches -gt 0 ] ; then sed -n -f $sedscript $1 | uniq | more fi } ➎ trap "$(which rm) -f $tempout $sedscript" EXIT if [ -z "$1" ] ; then echo "Usage: $0 [-c X] pattern {filename}" >&2 exit 0 fi if [ "$1" = "-c" ] ; then context="$2" shift; shift elif [ "$(echo $1|cut -c1-2)" = "-c" ] ; then context="$(echo $1 | cut -c3-)" shift fi pattern="$1"; shift if [ $# -gt 0 ] ; then for filename ; do echo "----- $filename -----" showMatches $filename done else cat - > $tempout # Save stream to a temp file. showMatches $tempout fi exit 0
Listing 4-13: The cgrep script
This script uses grep -n to get the line numbers of all matching lines in the file ➋ and then, using the specified number of lines of context to include, identifies a starting ➌ and ending ➍ line for displaying each match. These are written out to the temporary sed script defined at ➊, which executes a word substitution command that wraps the specified pattern in bold-on and bold-off ANSI sequences. That’s 90 percent of the script, in a nutshell.
The other thing worth mentioning in this script is the useful trap command ➎, which lets you tie events into the shell’s script execution system itself. The first argument is the command or sequence of commands you want invoked, and all subsequent arguments are the specific signals (events). In this case, we’re telling the shell that when the script exits, invoke rm to remove the two temp files.
What’s particularly nice about working with trap is that it works regardless of where you exit the script, not just at the very bottom. In subsequent scripts, you’ll see that trap can be tied to a wide variety of signals, not just SIGEXIT (or EXIT, or the numeric equivalent of SIGEXIT, which is 0). In fact, you can have different trap commands associated with different signals, so you might output a “cleaned-up temp files” message if someone sends a SIGQUIT (CTRL-C) to a script, while that wouldn’t be displayed on a regular (SIGEXIT) event.
This script works either with an input stream, in which case it saves the input to a temp file and then processes the temp file as if its name had been specified on the command line, or with a list of one or more files on the command line. Listing 4-14 shows passing a single file via the command line.
$ cgrep -c 1 teacup ragged.txt ----- ragged.txt ----- in the wind, and the pool rippling to the waving of the reeds--the rattling teacups would change to tinkling sheep-bells, and the Queen's shrill cries to the voice of the shepherd boy--and the
Listing 4-14: Testing the cgrep script
A useful refinement to this script would return line numbers along with the matched lines.
Throughout the years of Unix development, few programs have been reconsidered and redeveloped more times than compress. On most Linux systems, three significantly different compression programs are available: compress, gzip, and bzip2. Each uses a different suffix (.z, .gz, and .bz2, respectively), and the degree of compression can vary among the three programs, depending on the layout of data within a file.
Regardless of the level of compression, and regardless of which compression programs you have installed, working with compressed files on many Unix systems requires decompressing them by hand, accomplishing the desired tasks, and recompressing them when finished. Tedious, and thus a perfect job for a shell script! The script detailed in Listing 4-15 acts as a convenient compression/decompression wrapper for three functions you’ll often find yourself wanting to use on compressed files: cat, more, and grep.
#!/bin/bash # zcat, zmore, and zgrep--This script should be either symbolically # linked or hard linked to all three names. It allows users to work with # compressed files transparently. Z="compress"; unZ="uncompress" ; Zlist="" gz="gzip" ; ungz="gunzip" ; gzlist="" bz="bzip2" ; unbz="bunzip2" ; bzlist="" # First step is to try to isolate the filenames in the command line. # We'll do this lazily by stepping through each argument, testing to # see whether it's a filename. If it is and it has a compression # suffix, we'll decompress the file, rewrite the filename, and proceed. # When done, we'll recompress everything that was decompressed. for arg do if [ -f "$arg" ] ; then case "$arg" in *.Z) $unZ "$arg" arg="$(echo $arg | sed 's/\.Z$//')" Zlist="$Zlist \"$arg\"" ;; *.gz) $ungz "$arg" arg="$(echo $arg | sed 's/\.gz$//')" gzlist="$gzlist \"$arg\"" ;; *.bz2) $unbz "$arg" arg="$(echo $arg | sed 's/\.bz2$//')" bzlist="$bzlist \"$arg\"" ;; esac fi newargs="${newargs:-""} \"$arg\"" done case $0 in *zcat* ) eval cat $newargs ;; *zmore* ) eval more $newargs ;; *zgrep* ) eval grep $newargs ;; * ) echo "$0: unknown base name. Can't proceed." >&2 exit 1 esac # Now recompress everything. if [ ! -z "$Zlist" ] ; then ➊ eval $Z $Zlist fi if [ ! -z "$gzlist"] ; then ➋ eval $gz $gzlist fi if [ ! -z "$bzlist" ] ; then ➌ eval $bz $bzlist fi # And done! exit 0
Listing 4-15: The zcat/zmore/zgrep script
For any given suffix, three steps are necessary: decompress the file, rename the filename to remove the suffix, and add it to the list of files to recompress at the end of the script. By keeping three separate lists, one for each compression program, this script also lets you easily grep across files compressed using different compression utilities.
The most important trick is the use of the eval directive when recom-pressing the files ➊➋➌. This is necessary to ensure that filenames with spaces are treated properly. When the Zlist, gzlist, and bzlist variables are instantiated, each argument is surrounded by quotes, so a typical value might be ""sample.c" "test.pl" "penny.jar"". Because the list has nested quotes, invoking a command like cat $Zlist results in cat complaining that file "sample.c" wasn’t found. To force the shell to act as if the command were typed at a command line (where the quotes are stripped once they have been utilized for arg parsing), use eval, and all will work as desired.
To work properly, this script should have three names. How do you do that in Linux? Simple: links. You can use either symbolic links, which are special files that store the names of link destinations, or hard links, which are actually assigned the same inode as the linked file. We prefer symbolic links. These can easily be created (here the script is already called zcat), as Listing 4-16 shows.
$ ln -s zcat zmore $ ln -s zcat zgrep
Listing 4-16: Symbolically linking the zcat script to the zmore and zgrep commands
Once that’s done, you have three new commands that have the same actual (shared) contents, and each accepts a list of files to process as needed, decompressing and then recompressing them when done.
The ubiquitous compress utility quickly shrinks down ragged.txt and gives it a .z suffix:
$ compress ragged.txt
With ragged.txt in its compressed state, we can view the file with zcat, as Listing 4-17 details.
$ zcat ragged.txt.Z
So she sat on, with closed eyes, and half believed herself in
Wonderland, though she knew she had but to open them again, and
all would change to dull reality--the grass would be only rustling
in the wind, and the pool rippling to the waving of the reeds--the
rattling teacups would change to tinkling sheep-bells, and the
Queen's shrill cries to the voice of the shepherd boy--and the
sneeze of the baby, the shriek of the Gryphon, and all the other
queer noises, would change (she knew) to the confused clamour of
the busy farm-yard--while the lowing of the cattle in the distance
would take the place of the Mock Turtle's heavy sobs.
Listing 4-17: Using zcat to print the compressed text file
And then search for teacup again.
$ zgrep teacup ragged.txt.Z
rattling teacups would change to tinkling sheep-bells, and the
All the while, the file starts and ends in its original compressed state, shown in Listing 4-18.
$ ls -l ragged.txt*
-rw-r--r-- 1 taylor staff 443 Jul 7 16:07 ragged.txt.Z
Listing 4-18: The results of ls, showing only that the compressed file exists
Probably the biggest weakness of this script is that if it is canceled in midstream, the file isn’t guaranteed to recompress. A nice addition would be to fix this with a smart use of the trap capability and a recompress function that does error checking.
As highlighted in Script #33 on page 109, most Linux implementations include more than one compression method, but the onus is on the user to figure out which one does the best job of compressing a given file. As a result, users typically learn how to work with just one compression program without realizing that they could attain better results with a different one. Even more confusing is the fact that some files compress better with one algorithm than with another, and there’s no way to know which is better without experimentation.
The logical solution is to have a script that compresses files using each of the tools and then selects the smallest resultant file as the best. That’s exactly what bestcompress does, shown in Listing 4-19!
#!/bin/bash # bestcompress--Given a file, tries compressing it with all the available # compression tools and keeps the compressed file that's smallest, # reporting the result to the user. If -a isn't specified, bestcompress # skips compressed files in the input stream. Z="compress" gz="gzip" bz="bzip2" Zout="/tmp/bestcompress.$$.Z" gzout="/tmp/bestcompress.$$.gz" bzout="/tmp/bestcompress.$$.bz" skipcompressed=1 if [ "$1" = "-a" ] ; then skipcompressed=0 ; shift fi if [ $# -eq 0 ]; then echo "Usage: $0 [-a] file or files to optimally compress" >&2 exit 1 fi trap "/bin/rm -f $Zout $gzout $bzout" EXIT for name in "$@" do if [ ! -f "$name" ] ; then echo "$0: file $name not found. Skipped." >&2 continue fi if [ "$(echo $name | egrep '(\.Z$|\.gz$|\.bz2$)')" != "" ] ; then if [ $skipcompressed -eq 1 ] ; then echo "Skipped file ${name}: It's already compressed." continue else echo "Warning: Trying to double-compress $name" fi fi # Try compressing all three files in parallel. ➊ $Z < "$name" > $Zout & $gz < "$name" > $gzout & $bz < "$name" > $bzout & wait # Wait until all compressions are done. # Figure out which compressed best. ➋ smallest="$(ls -l "$name" $Zout $gzout $bzout | \ awk '{print $5"="NR}' | sort -n | cut -d= -f2 | head -1)" case "$smallest" in ➌ 1 ) echo "No space savings by compressing $name. Left as is." ;; 2 ) echo Best compression is with compress. File renamed ${name}.Z mv $Zout "${name}.Z" ; rm -f "$name" ;; 3 ) echo Best compression is with gzip. File renamed ${name}.gz mv $gzout "${name}.gz" ; rm -f "$name" ;; 4 ) echo Best compression is with bzip2. File renamed ${name}.bz2 mv $bzout "${name}.bz2" ; rm -f "$name" esac done exit 0
Listing 4-19: The bestcompress script
The most interesting line in this script is at ➋. This line has ls output the size of each file (the original and the three compressed files, in a known order), chops out just the file sizes with awk, sorts these numerically, and ends up with the line number of the smallest resultant file. If the compressed versions are all bigger than the original file, the result is 1, and an appropriate message is printed out ➌. Otherwise, smallest will indicate which of compress, gzip, or bzip2 did the best job. Then it’s just a matter of moving the appropriate file into the current directory and removing the original file.
The three compression calls starting at ➊ are also worth pointing out. These calls are done in parallel by using the trailing & to drop each of them into its own subshell, followed by the call to wait, which stops the script until all the calls are completed. On a uniprocessor, this might not offer much performance benefit, but with multiple processors, it should spread the task out and potentially complete quite a bit faster.
This script should be invoked with a list of filenames to compress. If some of them are already compressed and you want to try compressing them further, use the -a flag; otherwise they’ll be skipped.
The best way to demonstrate this script is with a file that needs to be compressed, as Listing 4-20 shows.
$ ls -l alice.txt
-rw-r--r-- 1 taylor staff 154872 Dec 4 2002 alice.txt
Listing 4-20: Showing the ls output of a copy of Alice in Wonderland. Note the file size of 154872 bytes.
The script hides the process of compressing the file with each of the three compression tools and instead simply displays the results, shown in Listing 4-21.
$ bestcompress alice.txt
Best compression is with compress. File renamed alice.txt.Z
Listing 4-21: Running the bestcompress script on alice.txt
Listing 4-22 demonstrates that the file is now quite a bit smaller.
$ ls -l alice.txt.Z
-rw-r--r-- 1 taylor wheel 66287 Jul 7 17:31 alice.txt.Z
Listing 4-22: Demonstrating the much-reduced file size of the compressed file (66287 bytes) compared to Listing 4-20