4
TWEAKING UNIX

image

An outsider might imagine Unix as a nice, uniform command line experience across many different systems, helped by their compliance with the POSIX standards. But anyone who’s ever used more than one Unix system knows how much they can vary within these broad parameters. You’d be hard-pressed to find a Unix or Linux box that doesn’t have ls as a standard command, for example, but does your version support the --color flag? Does your version of the Bourne shell support variable slicing (like ${var:0:2})?

Perhaps one of the most valuable uses of shell scripts is tweaking your particular flavor of Unix to make it more like other systems. Although most modern GNU utilities run just fine on non-Linux Unixes (for example, you can replace clunky old tar with the newer GNU tar), often the system updates involved in tweaking Unix don’t need to be so drastic, and it’s possible to avoid the potential problems inherent in adding new binaries to a supported system. Instead, shell scripts can be used to map popular flags to their local equivalents, to use core Unix capabilities to create a smarter version of an existing command, or even to address the longtime lack of certain functionality.

#27 Displaying a File with Line Numbers

There are several ways to add line numbers to a displayed file, many of which are quite short. For example, here’s one solution using awk:

awk '{ print NR": "$0 }' < inputfile

On some Unix implementations, the cat command has an -n flag, and on others, the more (or less, or pg) pager has a flag for specifying that each line of output should be numbered. But on some Unix flavors, none of these methods will work, in which case the simple script in Listing 4-1 can do the job.

The Code

   #!/bin/bash

   # numberlines--A simple alternative to cat -n, etc.

   for filename in "$@"
   do
     linecount="1"
   while IFS="\n" read line
     do
       echo "${linecount}: $line"
     linecount="$(( $linecount + 1 ))"
   done < $filename
   done
   exit 0

Listing 4-1: The numberlines script

How It Works

There’s a trick to the main loop in this program: it looks like a regular while loop, but the important part is actually done < $filename . It turns out that every major block construct acts as its own virtual subshell, so this file redirection is not only valid but also an easy way to have a loop that iterates line by line with the content of $filename. Couple that with the read statement at —an inner loop that loads each line, iteration by iteration, into the line variable—and it’s then easy to output the line with its line number as a preface and increment the linecount variable .

Running the Script

You can feed as many filenames as you want into this script. You can’t feed it input via a pipe, though that wouldn’t be too hard to fix by invoking a cat - sequence if no starting parameters are given.

The Results

Listing 4-2 shows a file displayed with line numbers using the numberlines script.

$ numberlines alice.txt
1: Alice was beginning to get very tired of sitting by her sister on the
2: bank, and of having nothing to do: once or twice she had peeped into the
3: book her sister was reading, but it had no pictures or conversations in
4: it, 'and what is the use of a book,' thought Alice 'without pictures or
5: conversations?'
6:
7: So she was considering in her own mind (as well as she could, for the
8: hot day made her feel very sleepy and stupid), whether the pleasure
9: of making a daisy-chain would be worth the trouble of getting up and
10: picking the daisies, when suddenly a White Rabbit with pink eyes ran
11: close by her.

Listing 4-2: Testing the numberlines script on an excerpt from Alice in Wonderland

Hacking the Script

Once you have a file with numbered lines, you can reverse the order of all the lines in the file, like this:

cat -n filename | sort -rn | cut -c8-

This does the trick on systems supporting the -n flag to cat, for example. Where might this be useful? One obvious situation is when displaying a log file in newest-to-oldest order.

#28 Wrapping Only Long Lines

One limitation of the fmt command and its shell script equivalent, Script #14 on page 53, is that they wrap and fill every line they encounter, whether or not it makes sense to do so. This can mess up email (wrapping your .signature is not good, for example) and any input file format where line breaks matter.

What if you have a document in which you want to wrap just the long lines but leave everything else intact? With the default set of commands available to a Unix user, there’s only one way to accomplish this: explicitly step through each line in an editor, feeding the long ones to fmt individually. (You could accomplish this in vi by moving the cursor onto the line in question and using !$fmt.)

The script in Listing 4-3 automates that task, making use of the shell ${#varname} construct, which returns the length of the contents of the data stored in the variable varname.

The Code

   #!/bin/bash
   # toolong--Feeds the fmt command only those lines in the input stream
   #   that are longer than the specified length

   width=72

   if [ ! -r "$1" ] ; then
     echo "Cannot read file $1" >&2
     echo "Usage: $0 filename" >&2
     exit 1
   fi

 while read input
   do
     if [ ${#input} -gt $width ] ; then
       echo "$input" | fmt
     else
       echo "$input"
     fi
 done < $1

   exit 0

Listing 4-3: The toolong script

How It Works

Notice that the file is fed to the while loop with a simple < $1 associated with the end of the loop and that each line can then be analyzed by reading it with read input , which assigns each line of the file to the input variable, line by line.

If your shell doesn’t have the ${#var} notation, you can emulate its behavior with the super useful “word count” command wc:

varlength="$(echo "$var" | wc -c)"

However, wc has an annoying habit of prefacing its output with spaces to get values to align nicely in the output listing. To sidestep that pesky problem, a slight modification is necessary to let only digits through the final pipe step, as shown here:

varlength="$(echo "$var" | wc -c | sed 's/[^[:digit:]]//g')"

Running the Script

This script accepts exactly one filename as input, as Listing 4-4 shows.

The Results

$ toolong ragged.txt
So she sat on, with closed eyes, and half believed herself in
Wonderland, though she knew she had but to open them again, and
all would change to dull reality--the grass would be only rustling
in the wind, and the pool rippling to the waving of the reeds--the
rattling teacups would change to tinkling sheep-bells, and the
Queen's shrill cries to the voice of the shepherd boy--and the
sneeze
of the baby, the shriek of the Gryphon, and all the other queer
noises, would change (she knew) to the confused clamour of the busy
farm-yard--while the lowing of the cattle in the distance would
take the place of the Mock Turtle's heavy sobs.

Listing 4-4: Testing the toolong script

Notice that unlike a standard invocation of fmt, toolong has retained line breaks where possible, so the word sneeze, which is on a line by itself in the input file, is also on a line by itself in the output.

#29 Displaying a File with Additional Information

Many of the most common Unix and Linux commands were originally designed for slow, barely interactive output environments (we did talk about Unix being an ancient OS, right?) and therefore offer minimal output and interactivity. An example is cat: when used to view a short file, it doesn’t give much helpful output. It would be nice to have more information about the file, though, so let’s get it! Listing 4-5 details the showfile command, an alternative to cat.

The Code

   #!/bin/bash
   # showfile--Shows the contents of a file, including additional useful info

   width=72

   for input
   do
     lines="$(wc -l < $input | sed 's/ //g')"
     chars="$(wc -c < $input | sed 's/ //g')"
     owner="$(ls -ld $input | awk '{print $3}')"
     echo "-----------------------------------------------------------------"
     echo "File $input ($lines lines, $chars characters, owned by $owner):"
     echo "-----------------------------------------------------------------"
     while read line
     do
       if [ ${#line} -gt $width ] ; then
         echo "$line" | fmt | sed -e '1s/^/ /' -e '2,$s/^/+ /'
       else
         echo "  $line"
       fi
   done < $input

     echo "-----------------------------------------------------------------"

 done | ${PAGER:more}

   exit 0

Listing 4-5: The showfile script

How It Works

To simultaneously read the input line by line and add head and foot information, this script uses a handy shell trick: near the end of the script, it redirects the input to the while loop with the snippet done < $input . Perhaps the most complex element in this script, however, is the invocation of sed for lines longer than the specified length:

echo "$line" | fmt | sed -e '1s/^/ /' -e '2,$s/^/+ /'

Lines greater than the maximum allowable length are wrapped with fmt (or its shell script replacement, Script #14 on page 53). To visually denote which lines are continuations and which are retained intact from the original file, the first output line of the excessively long line has the usual two-space indent, but subsequent lines are prefixed with a plus sign and a single space instead. Finally, piping the output into ${PAGER:more} displays the file with the pagination program set with the system variable $PAGER or, if that’s not set, the more program .

Running the Script

You can run showfile by specifying one or more filenames when the program is invoked, as Listing 4-6 shows.

The Results

$ showfile ragged.txt
-----------------------------------------------------------------
File ragged.txt (7 lines, 639 characters, owned by taylor):
-----------------------------------------------------------------
  So she sat on, with closed eyes, and half believed herself in
  Wonderland, though she knew she had but to open them again, and
  all would change to dull reality--the grass would be only rustling
+ in the wind, and the pool rippling to the waving of the reeds--the
  rattling teacups would change to tinkling sheep-bells, and the
  Queen's shrill cries to the voice of the shepherd boy--and the
  sneeze
  of the baby, the shriek of the Gryphon, and all the other queer
+ noises, would change (she knew) to the confused clamour of the busy
+ farm-yard--while the lowing of the cattle in the distance would
+ take the place of the Mock Turtle's heavy sobs.

Listing 4-6: Testing the showfile script

#30 Emulating GNU-Style Flags with quota

The inconsistency across the command flags of various Unix and Linux systems is a perpetual problem that causes lots of grief for users who switch between any of the major releases, particularly between a commercial Unix system (SunOS/Solaris, HP-UX, and so on) and an open source Linux system. One command that demonstrates this problem is quota, which supports full-word flags on some Unix systems but accepts only one-letter flags on others.

A succinct shell script (shown in Listing 4-7) solves the problem by mapping any full-word flags specified to the equivalent single-letter alternatives.

The Code

   #!/bin/bash
   # newquota--A frontend to quota that works with full-word flags a la GNU

   # quota has three possible flags, -g, -v, and -q, but this script
   #   allows them to be '--group', '--verbose', and '--quiet' too.

   flags=""
   realquota="$(which quota)"

   while [ $# -gt 0 ]
   do
     case $1
     in
       --help)      echo "Usage: $0 [--group --verbose --quiet -gvq]" >&2
                          exit 1 ;;
       --group)     flags="$flags -g";   shift ;;
       --verbose)   flags="$flags -v";   shift ;;
       --quiet)     flags="$flags -q";   shift ;;
       --)          shift;               break ;;
       *)           break;          # Done with 'while' loop!
     esac

   done

 exec $realquota $flags "$@"

Listing 4-7: The newquota script

How It Works

This script really boils down to a while statement that steps through every argument specified to the script, identifying any of the matching full-word flags and adding the associated one-letter flag to the flags variable. When done, it simply invokes the original quota program and adds the user-specified flags as needed.

Running the Script

There are a couple of ways to integrate a wrapper of this nature into your system. The most obvious is to rename this script quota, then place this script in a local directory (say, /usr/local/bin), and ensure that users have a default PATH that looks in this directory before looking in the standard Linux binary distro directories (/bin and /usr/bin). Another way is to add system-wide aliases so that a user entering quota actually invokes the newquota script. (Some Linux distros ship with utilities for managing system aliases, such as Debian’s alternatives system.) This last strategy could be risky, however, if users call quota with the new flags in their own shell scripts: if those scripts don’t use the user’s interactive login shell, they might not see the specified alias and will end up calling the base quota command rather than newquota.

The Results

Listing 4-8 details running newquota with the --verbose and --quiet arguments.

$ newquota --verbose
Disk quotas for user dtint (uid 24810):
     Filesystem   usage   quota   limit   grace   files   quota   limit   grace
           /usr  338262  614400  675840           10703  120000  126000
$ newquota --quiet

Listing 4-8: Testing the newquota script

The --quiet mode emits output only if the user is over quota. You can see that this is working correctly from the last result, where we’re not over quota. Phew!

#31 Making sftp Look More Like ftp

The secure version of the File Transfer Protocol ftp program is included as part of ssh, the Secure Shell package, but its interface can be a bit confusing for users who are making the switch from the crusty old ftp client. The basic problem is that ftp is invoked as ftp remotehost and it then prompts for account and password information. By contrast, sftp wants to know the account and remote host on the command line and won’t work properly (or as expected) if only the host is specified.

To address this, the simple wrapper script detailed in Listing 4-9 allows users to invoke mysftp exactly as they would have invoked the ftp program and be prompted for the necessary fields.

The Code

   #!/bin/bash

   # mysftp--Makes sftp start up more like ftp

   /bin/echo -n "User account: "
   read account

   if [ -z $account ] ; then
     exit 0;       # Changed their mind, presumably
   fi

   if [ -z "$1" ] ; then
     /bin/echo -n "Remote host: "
     read host
     if [ -z $host ] ; then
       exit 0
     fi
   else
     host=$1
   fi

   # End by switching to sftp. The -C flag enables compression here.

 exec sftp -C $account@$host

Listing 4-9: The mysftp script, a friendlier version of sftp

How It Works

There’s a trick in this script worth mentioning. It’s actually something we’ve done in previous scripts, though we haven’t highlighted it for you before: the last line is an exec call . What this does is replace the currently running shell with the application specified. Because you know there’s nothing left to do after calling the sftp command, this method of ending our script is much more resource efficient than having the shell hanging around waiting for sftp to finish using a separate subshell, which is what would happen if we just invoked sftp instead.

Running the Script

As with the ftp client, if users omit the remote host, the script continues by prompting for a remote host. If the script is invoked as mysftp remotehost, the remotehost provided is used instead.

The Results

Let’s see what happens when you invoke this script without any arguments versus invoking sftp without any arguments. Listing 4-10 shows running sftp.

$ sftp
usage: sftp [-1246Cpqrv] [-B buffer_size] [-b batchfile] [-c cipher]
          [-D sftp_server_path] [-F ssh_config] [-i identity_file] [-l limit]
          [-o ssh_option] [-P port] [-R num_requests] [-S program]
          [-s subsystem | sftp_server] host
       sftp [user@]host[:file ...]
       sftp [user@]host[:dir[/]]
       sftp -b batchfile [user@]host

Listing 4-10: Running the sftp utility with no arguments yields very cryptic help output.

That’s useful but confusing. By contrast, with the mysftp script you can proceed to make an actual connection, as Listing 4-11 shows.

$ mysftp
User account: taylor
Remote host: intuitive.com
Connecting to intuitive.com...
taylor@intuitive.com's password:
sftp> quit

Listing 4-11: Running the mysftp script with no arguments is much clearer.

Invoke the script as if it were an ftp session by supplying the remote host, and it’ll prompt for the remote account name (detailed in Listing 4-12) and then invisibly invoke sftp.

$ mysftp intuitive.com
User account: taylor
Connecting to intuitive.com...
taylor@intuitive.com's password:
sftp> quit

Listing 4-12: Running the mysftp script with a single argument: the host to connect to

Hacking the Script

One thing to always think about when you have a script like this is whether it can be the basis of an automated backup or sync tool, and mysftp is a perfect candidate. So a great hack would be to designate a directory on your system, for example, then write a wrapper that would create a ZIP archive of key files, and use mysftp to copy them up to a server or cloud storage system. In fact, we’ll do just that later in the book with Script #72 on page 229.

#32 Fixing grep

Some versions of grep offer a remarkable range of capabilities, including the particularly useful ability to show the context (a line or two above and below) of a matching line in the file. Additionally, some versions of grep can highlight the region in the line (for simple patterns, at least) that matches the specified pattern. You might already have such a version of grep. Then again, you might not.

Fortunately, both of these features can be emulated with a shell script, so you can still use them even if you’re on an older commercial Unix system with a relatively primitive grep command. To specify the number of lines of context both above and below the line matching the pattern that you specified, use -c value, followed by the pattern to match. This script (shown in Listing 4-13) also borrows from the ANSI color script, Script #11 on page 40, to do region highlighting.

The Code

   #!/bin/bash

   # cgrep--grep with context display and highlighted pattern matches

   context=0
   esc="^["
   boldon="${esc}[1m" boldoff="${esc}[22m"
   sedscript="/tmp/cgrep.sed.$$"
   tempout="/tmp/cgrep.$$"

   function showMatches
   {
     matches=0

   echo "s/$pattern/${boldon}$pattern${boldoff}/g" > $sedscript

   for lineno in $(grep -n "$pattern" $1 | cut -d: -f1)
     do
       if [ $context -gt 0 ] ; then
       prev="$(( $lineno - $context ))"

         if [ $prev -lt 1 ] ; then
           # This results in "invalid usage of line address 0."
           prev="1"
         fi
       next="$(( $lineno + $context ))"

         if [ $matches -gt 0 ] ; then
           echo "${prev}i\\" >> $sedscript
           echo "----" >> $sedscript
         fi
         echo "${prev},${next}p" >> $sedscript
       else
         echo "${lineno}p" >> $sedscript
       fi
       matches="$(( $matches + 1 ))"
     done

     if [ $matches -gt 0 ] ; then
       sed -n -f $sedscript $1 | uniq | more
     fi
   }

 trap "$(which rm) -f $tempout $sedscript" EXIT

   if [ -z "$1" ] ; then
     echo "Usage: $0 [-c X] pattern {filename}" >&2
     exit 0
   fi

   if [ "$1" = "-c" ] ; then
     context="$2"
     shift; shift
   elif [ "$(echo $1|cut -c1-2)" = "-c" ] ; then
     context="$(echo $1 | cut -c3-)"
     shift
   fi

   pattern="$1"; shift

   if [ $# -gt 0 ] ; then
     for filename ; do
       echo "----- $filename -----"
       showMatches $filename
     done
   else
     cat - > $tempout      # Save stream to a temp file.
     showMatches $tempout
   fi

   exit 0

Listing 4-13: The cgrep script

How It Works

This script uses grep -n to get the line numbers of all matching lines in the file and then, using the specified number of lines of context to include, identifies a starting and ending line for displaying each match. These are written out to the temporary sed script defined at , which executes a word substitution command that wraps the specified pattern in bold-on and bold-off ANSI sequences. That’s 90 percent of the script, in a nutshell.

The other thing worth mentioning in this script is the useful trap command , which lets you tie events into the shell’s script execution system itself. The first argument is the command or sequence of commands you want invoked, and all subsequent arguments are the specific signals (events). In this case, we’re telling the shell that when the script exits, invoke rm to remove the two temp files.

What’s particularly nice about working with trap is that it works regardless of where you exit the script, not just at the very bottom. In subsequent scripts, you’ll see that trap can be tied to a wide variety of signals, not just SIGEXIT (or EXIT, or the numeric equivalent of SIGEXIT, which is 0). In fact, you can have different trap commands associated with different signals, so you might output a “cleaned-up temp files” message if someone sends a SIGQUIT (CTRL-C) to a script, while that wouldn’t be displayed on a regular (SIGEXIT) event.

Running the Script

This script works either with an input stream, in which case it saves the input to a temp file and then processes the temp file as if its name had been specified on the command line, or with a list of one or more files on the command line. Listing 4-14 shows passing a single file via the command line.

The Results

$ cgrep -c 1 teacup ragged.txt
----- ragged.txt -----
in the wind, and the pool rippling to the waving of the reeds--the
rattling teacups would change to tinkling sheep-bells, and the
Queen's shrill cries to the voice of the shepherd boy--and the

Listing 4-14: Testing the cgrep script

Hacking the Script

A useful refinement to this script would return line numbers along with the matched lines.

#33 Working with Compressed Files

Throughout the years of Unix development, few programs have been reconsidered and redeveloped more times than compress. On most Linux systems, three significantly different compression programs are available: compress, gzip, and bzip2. Each uses a different suffix (.z, .gz, and .bz2, respectively), and the degree of compression can vary among the three programs, depending on the layout of data within a file.

Regardless of the level of compression, and regardless of which compression programs you have installed, working with compressed files on many Unix systems requires decompressing them by hand, accomplishing the desired tasks, and recompressing them when finished. Tedious, and thus a perfect job for a shell script! The script detailed in Listing 4-15 acts as a convenient compression/decompression wrapper for three functions you’ll often find yourself wanting to use on compressed files: cat, more, and grep.

The Code

   #!/bin/bash

   # zcat, zmore, and zgrep--This script should be either symbolically
   #   linked or hard linked to all three names. It allows users to work with
   #   compressed files transparently.

    Z="compress";  unZ="uncompress"  ;  Zlist=""
   gz="gzip"    ; ungz="gunzip"      ; gzlist=""
   bz="bzip2"   ; unbz="bunzip2"     ; bzlist=""

   # First step is to try to isolate the filenames in the command line.
   #   We'll do this lazily by stepping through each argument, testing to
   #   see whether it's a filename. If it is and it has a compression
   #   suffix, we'll decompress the file, rewrite the filename, and proceed.
   #   When done, we'll recompress everything that was decompressed.

   for arg
   do
     if [ -f "$arg" ] ; then
       case "$arg" in
          *.Z) $unZ "$arg"
               arg="$(echo $arg | sed 's/\.Z$//')"
               Zlist="$Zlist \"$arg\""
               ;;

         *.gz) $ungz "$arg"
               arg="$(echo $arg | sed 's/\.gz$//')"
               gzlist="$gzlist \"$arg\""
               ;;

        *.bz2) $unbz "$arg"
               arg="$(echo $arg | sed 's/\.bz2$//')"
               bzlist="$bzlist \"$arg\""
               ;;
       esac
     fi
     newargs="${newargs:-""} \"$arg\""
   done

   case $0 in
      *zcat* ) eval cat $newargs                   ;;
     *zmore* ) eval more $newargs                  ;;
     *zgrep* ) eval grep $newargs                  ;;
           * ) echo "$0: unknown base name. Can't proceed." >&2
               exit 1
   esac

   # Now recompress everything.

   if [ ! -z "$Zlist"  ] ; then
   eval $Z $Zlist
   fi
   if [ ! -z "$gzlist"] ; then
   eval $gz $gzlist
   fi
   if [ ! -z "$bzlist" ] ; then
   eval $bz $bzlist
   fi

   # And done!

   exit 0

Listing 4-15: The zcat/zmore/zgrep script

How It Works

For any given suffix, three steps are necessary: decompress the file, rename the filename to remove the suffix, and add it to the list of files to recompress at the end of the script. By keeping three separate lists, one for each compression program, this script also lets you easily grep across files compressed using different compression utilities.

The most important trick is the use of the eval directive when recom-pressing the files . This is necessary to ensure that filenames with spaces are treated properly. When the Zlist, gzlist, and bzlist variables are instantiated, each argument is surrounded by quotes, so a typical value might be ""sample.c" "test.pl" "penny.jar"". Because the list has nested quotes, invoking a command like cat $Zlist results in cat complaining that file "sample.c" wasn’t found. To force the shell to act as if the command were typed at a command line (where the quotes are stripped once they have been utilized for arg parsing), use eval, and all will work as desired.

Running the Script

To work properly, this script should have three names. How do you do that in Linux? Simple: links. You can use either symbolic links, which are special files that store the names of link destinations, or hard links, which are actually assigned the same inode as the linked file. We prefer symbolic links. These can easily be created (here the script is already called zcat), as Listing 4-16 shows.

$ ln -s zcat zmore
$ ln -s zcat zgrep

Listing 4-16: Symbolically linking the zcat script to the zmore and zgrep commands

Once that’s done, you have three new commands that have the same actual (shared) contents, and each accepts a list of files to process as needed, decompressing and then recompressing them when done.

The Results

The ubiquitous compress utility quickly shrinks down ragged.txt and gives it a .z suffix:

$ compress ragged.txt

With ragged.txt in its compressed state, we can view the file with zcat, as Listing 4-17 details.

$ zcat ragged.txt.Z
So she sat on, with closed eyes, and half believed herself in
Wonderland, though she knew she had but to open them again, and
all would change to dull reality--the grass would be only rustling
in the wind, and the pool rippling to the waving of the reeds--the
rattling teacups would change to tinkling sheep-bells, and the
Queen's shrill cries to the voice of the shepherd boy--and the
sneeze of the baby, the shriek of the Gryphon, and all the other
queer noises, would change (she knew) to the confused clamour of
the busy farm-yard--while the lowing of the cattle in the distance
would take the place of the Mock Turtle's heavy sobs.

Listing 4-17: Using zcat to print the compressed text file

And then search for teacup again.

$ zgrep teacup ragged.txt.Z
rattling teacups would change to tinkling sheep-bells, and the

All the while, the file starts and ends in its original compressed state, shown in Listing 4-18.

$ ls -l ragged.txt*
-rw-r--r-- 1 taylor staff 443 Jul 7 16:07 ragged.txt.Z

Listing 4-18: The results of ls, showing only that the compressed file exists

Hacking the Script

Probably the biggest weakness of this script is that if it is canceled in midstream, the file isn’t guaranteed to recompress. A nice addition would be to fix this with a smart use of the trap capability and a recompress function that does error checking.

#34 Ensuring Maximally Compressed Files

As highlighted in Script #33 on page 109, most Linux implementations include more than one compression method, but the onus is on the user to figure out which one does the best job of compressing a given file. As a result, users typically learn how to work with just one compression program without realizing that they could attain better results with a different one. Even more confusing is the fact that some files compress better with one algorithm than with another, and there’s no way to know which is better without experimentation.

The logical solution is to have a script that compresses files using each of the tools and then selects the smallest resultant file as the best. That’s exactly what bestcompress does, shown in Listing 4-19!

The Code

   #!/bin/bash

   # bestcompress--Given a file, tries compressing it with all the available
   #   compression tools and keeps the compressed file that's smallest,
   #   reporting the result to the user. If -a isn't specified, bestcompress
   #   skips compressed files in the input stream.

   Z="compress"     gz="gzip"     bz="bzip2"
   Zout="/tmp/bestcompress.$$.Z"
   gzout="/tmp/bestcompress.$$.gz"
   bzout="/tmp/bestcompress.$$.bz"
   skipcompressed=1

   if [ "$1" = "-a" ] ; then
     skipcompressed=0 ; shift
   fi

   if [ $# -eq 0 ]; then
     echo "Usage: $0 [-a] file or files to optimally compress" >&2
     exit 1
   fi

   trap "/bin/rm -f $Zout $gzout $bzout" EXIT

   for name in "$@"
   do
     if [ ! -f "$name" ] ; then
       echo "$0: file $name not found. Skipped." >&2
       continue
     fi

     if [ "$(echo $name | egrep '(\.Z$|\.gz$|\.bz2$)')" != "" ] ; then
       if [ $skipcompressed -eq 1 ] ; then
         echo "Skipped file ${name}: It's already compressed."
         continue
       else
         echo "Warning: Trying to double-compress $name"
       fi
     fi

   # Try compressing all three files in parallel.
   $Z  < "$name" > $Zout  &
     $gz < "$name" > $gzout &
     $bz < "$name" > $bzout &

     wait  # Wait until all compressions are done.

   # Figure out which compressed best.
   smallest="$(ls -l "$name" $Zout $gzout $bzout | \
       awk '{print $5"="NR}' | sort -n | cut -d= -f2 | head -1)"

     case "$smallest" in
     1 ) echo "No space savings by compressing $name. Left as is."
           ;;
       2 ) echo Best compression is with compress. File renamed ${name}.Z
           mv $Zout "${name}.Z" ; rm -f "$name"
           ;;
       3 ) echo Best compression is with gzip. File renamed ${name}.gz
           mv $gzout "${name}.gz" ; rm -f "$name"
           ;;
       4 ) echo Best compression is with bzip2. File renamed ${name}.bz2
           mv $bzout "${name}.bz2" ; rm -f "$name"
     esac

   done

   exit 0

Listing 4-19: The bestcompress script

How It Works

The most interesting line in this script is at . This line has ls output the size of each file (the original and the three compressed files, in a known order), chops out just the file sizes with awk, sorts these numerically, and ends up with the line number of the smallest resultant file. If the compressed versions are all bigger than the original file, the result is 1, and an appropriate message is printed out . Otherwise, smallest will indicate which of compress, gzip, or bzip2 did the best job. Then it’s just a matter of moving the appropriate file into the current directory and removing the original file.

The three compression calls starting at are also worth pointing out. These calls are done in parallel by using the trailing & to drop each of them into its own subshell, followed by the call to wait, which stops the script until all the calls are completed. On a uniprocessor, this might not offer much performance benefit, but with multiple processors, it should spread the task out and potentially complete quite a bit faster.

Running the Script

This script should be invoked with a list of filenames to compress. If some of them are already compressed and you want to try compressing them further, use the -a flag; otherwise they’ll be skipped.

The Results

The best way to demonstrate this script is with a file that needs to be compressed, as Listing 4-20 shows.

$ ls -l alice.txt
-rw-r--r--  1 taylor  staff  154872 Dec  4  2002 alice.txt

Listing 4-20: Showing the ls output of a copy of Alice in Wonderland. Note the file size of 154872 bytes.

The script hides the process of compressing the file with each of the three compression tools and instead simply displays the results, shown in Listing 4-21.

$ bestcompress alice.txt
Best compression is with compress. File renamed alice.txt.Z

Listing 4-21: Running the bestcompress script on alice.txt

Listing 4-22 demonstrates that the file is now quite a bit smaller.

$ ls -l alice.txt.Z
-rw-r--r--  1 taylor  wheel  66287 Jul  7 17:31 alice.txt.Z

Listing 4-22: Demonstrating the much-reduced file size of the compressed file (66287 bytes) compared to Listing 4-20