B
BONUS SCRIPTS

image

Because we couldn’t say no to these gems! As we developed this second edition, we ended up writing a few more scripts for backup purposes. It turns out we didn’t need the spare scripts, but we didn’t want to keep our secret sauce from our readers.

The first two bonus scripts are for the systems administrators out there who have to manage moving or processing a lot of files. The last script is for web users always looking for the next web service that’s just begging to be turned into a shell script; we’ll scrape a website that helps us track the phases of the moon!

#102 Bulk-Renaming Files

Systems administrators are often tasked with moving many files from one system to another, and it’s fairly common for the files in the new system to require a totally different naming scheme. For a few files, renaming is simple to do manually, but when renaming hundreds or thousands of files, it immediately becomes a job better suited for a shell script.

The Code

The simple script in Listing B-1 takes two arguments for the text to match and replace, and a list of arguments specifying the files you want to rename (which can be globbed for easy use).

   #!/bin/bash
   # bulkrename--Renames specified files by replacing text in the filename

 printHelp()
   {
     echo "Usage: $0 -f find -r replace FILES_TO_RENAME*"
     echo -e "\t-f The text to find in the filename"
     echo -e "\t-r The replacement text for the new filename"
     exit 1
   }

 while getopts "f:r:" opt
   do
     case "$opt" in
       r ) replace="$OPTARG"    ;;
       f ) match="$OPTARG"      ;;
       ? ) printHelp            ;;
     esac
   done

   shift $(( $OPTIND - 1 ))

   if [ -z $replace ] || [ -z $match ]
   then
     echo "You need to supply a string to find and a string to replace";
     printHelp
   fi

 for i in $@
   do
     newname=$(echo $i | sed "s/$match/$replace/")
     mv $i $newname
     && echo "Renamed file $i to $newname"
   done

Listing B-1: The bulkrename script

How It Works

We first define a printHelp() function that will print the arguments required and the purpose of the script, and then exit. After defining the new function, the code iterates over the arguments passed to the script with getopts , as done in previous scripts, assigning values to the replace and match variables when their arguments are specified.

The script then checks that we have values for the variables we will use later. If the replace and match variables have a length of zero, the script prints an error telling the user that they need to supply a string to find and a string to replace. The script then prints the printHelp text and exits.

After verifying there are values for match and replace, the script begins iterating over the rest of the arguments specified , which should be the files to rename. We use sed to replace the match string with the replace string in the filename and store the new filename in a bash variable. With the new filename stored, we use the mv command to move the file to the new filename, and then print a message telling the user that the file has been renamed.

Running the Script

The bulkrename shell script takes the two string arguments and the files to rename (which can be globbed for easier use; otherwise, they’re listed individually). If invalid arguments are specified, a friendly help message is printed, as shown in Listing B-2.

The Results

   $ ls ~/tmp/bulk
   1_dave  2_dave  3_dave  4_dave
   $ bulkrename
   You need to supply a string to find and a string to replace
   Usage: bulkrename -f find -r replace FILES_TO_RENAME*
     -f The text to find in the filename
     -r The replacement text for the new filename
 $ bulkrename -f dave -r brandon ~/tmp/bulk/*
   Renamed file /Users/bperry/tmp/bulk/1_dave to /Users/bperry/tmp/bulk/1_brandon
   Renamed file /Users/bperry/tmp/bulk/2_dave to /Users/bperry/tmp/bulk/2_brandon
   Renamed file /Users/bperry/tmp/bulk/3_dave to /Users/bperry/tmp/bulk/3_brandon
   Renamed file /Users/bperry/tmp/bulk/4_dave to /Users/bperry/tmp/bulk/4_brandon
   $ ls ~/tmp/bulk
   1_brandon  2_brandon  3_brandon  4_brandon

Listing B-2: Running the bulkrename script

You can list the files to rename individually or glob them using an asterisk (*) in the file path like we do at . After being moved, each renamed file is printed to the screen with its new name to reassure the user that the files were renamed as expected.

Hacking the Script

Sometimes it may be useful to replace text in a filename with a special string, like today’s date or a timestamp. Then you’d know when the file was renamed without needing to specify today’s date in the -r argument. You can accomplish this by adding special tokens to the script that can then be replaced when the file is renamed. For instance, you could have a replace string containing %d or %t, which are then replaced with today’s date or a timestamp, respectively, when the file is renamed.

Special tokens like this can make moving files for backup purposes easier. You can add a cron job that moves certain files so the dynamic token in the filenames will be updated by the script automatically, instead of updating the cron job when you want to change the date in the filename.

#103 Bulk-Running Commands on Multiprocessor Machines

When this book was first published, it was uncommon to have a multicore or multiprocessor machine unless you worked on servers or mainframes for a living. Today, most laptops and desktops have multiple cores, allowing the computer to perform more work at once. But sometimes programs you want to run are unable to take advantage of this increase in processing power and will only use one core at a time; to use more cores you have to run multiple instances of the program in parallel.

Say you have a program that converts image files from one format to another, and you have a whole lot of files to convert! Having a single process convert each file serially (one after another instead of in parallel) could take a long time. It would be much faster to split up the files across multiple processes running alongside each other.

The script in Listing B-3 details how to parallelize a given command for a certain number of processes you may want to run all at once.

NOTE

If you don’t have multiple cores in your computer, or if your program is slow for other reasons, such as a hard drive access bottleneck, running parallel instances of a program may be detrimental to performance. Be careful with starting too many processes as it could easily overwhelm an underpowered system. Luckily, even a Raspberry Pi has multiple cores nowadays!

The Code

   #!/bin/bash
   # bulkrun--Iterates over a directory of files, running a number of
   #   concurrent processes that will process the files in parallel

   printHelp()
   {
     echo "Usage: $0 -p 3 -i inputDirectory/ -x \"command -to run/\""
   echo -e "\t-p The maximum number of processes to start concurrently"
   echo -e "\t-i The directory containing the files to run the command on"
   echo -e "\t-x The command to run on the chosen files"
     exit 1
   }

 while getopts "p:x:i:" opt
   do
     case "$opt" in
       p ) procs="$OPTARG"    ;;
       x ) command="$OPTARG"  ;;
       i ) inputdir="$OPTARG" ;;
       ? ) printHelp          ;;
     esac
   done

   if [[ -z $procs || -z $command || -z $inputdir ]]
   then
   echo "Invalid arguments"
     printHelp
   fi

   total=$(ls $inputdir | wc -l)
   files="$(ls -Sr $inputdir)"

 for k in $(seq 1 $procs $total)
   do
   for i in $(seq 0 $procs)
     do
       if [[ $((i+k)) -gt $total ]]
       then
         wait
         exit 0
       fi

       file=$(echo "$files" | sed $(expr $i + $k)"q;d")
       echo "Running $command $inputdir/$file"
       $command "$inputdir/$file"&
     done

 wait
   done

Listing B-3: The bulkrun script

How It Works

The bulkrun script takes three arguments: the maximum number of processes to run at any one time , the directory containing the files to process , and the command to run (suffixed with the filename to run on) . After going through the arguments supplied by the user with getopts , the script checks that the user supplied these three arguments. If any of the procs, command, or inputdir variables are undefined after processing the user arguments, the script prints an error message and the help text and then exits.

Once we know we have the variables needed to manage running the parallel processes, the real work of the script can start. First, the script determines the number of files to process and saves a list of the files for use later. Then the script begins a for loop that will be used to keep track of how many files it has processed so far. This for loop uses the seq command to iterate from 1 to the total number of files specified, using the number of processes that will run in parallel as the increment step.

Inside this is another for loop that tracks the number of processes starting at a given time. This inner for loop also uses the seq command to iterate from 0 to the number of processes specified, with 1 as the default increment step. In each iteration of the inner for loop, a new file is pulled out of the file list , using sed to print only the file we want from the list of files saved at the beginning of the script, and the supplied command is run on the file in the background using the & sign.

When the maximum number of processes has been started in the background, the wait command tells the script to sleep until all the commands in the background have finished processing. After wait is finished, the whole workflow starts over again, picking up more processes to work on more files. This is similar to how we quickly achieve the best compression in the script bestcompress (Script #34 on page 113).

Running the Script

Using the bulkrun script is pretty straightforward. The three arguments it takes are the maximum number of processes to run at any one time, the directory of files to work on, and the command to run on them. If you wanted to run the ImageMagick utility mogrify to resize a directory of images in parallel, for instance, you could run something like Listing B-4.

The Results

$ bulkrun -p 3 -i tmp/ -x "mogrify -resize 50%"
Running mogrify -resize 50% tmp//1024-2006_1011_093752.jpg
Running mogrify -resize 50% tmp//069750a6-660e-11e6-80d1-001c42daa3a7.jpg
Running mogrify -resize 50% tmp//06970ce0-660e-11e6-8a4a-001c42daa3a7.jpg
Running mogrify -resize 50% tmp//0696cf00-660e-11e6-8d38-001c42daa3a7.jpg
Running mogrify -resize 50% tmp//0696cf00-660e-11e6-8d38-001c42daa3a7.jpg
--snip--

Listing B-4: Running the bulkrun command to parallelize the mogrify ImageMagick command

Hacking the Script

It’s often useful to be able to specify a filename inside of a command, or use tokens similar to those mentioned in the bulkrename script (Script #102 on page 346): special strings that are replaced at runtime with dynamic values (such as %d, which is replaced with the current date, or %t, which is replaced with a timestamp). Updating the script so that it can replace special tokens in the command or in the filename with something like a date or timestamp as the files are processed would prove useful.

Another useful hack might be to track how long it takes to perform all the processing using the time utility. Having the script print statistics on how many files will be processed, or how many have been processed and how many are left, would be valuable if you’re taking care of a truly massive job.

#104 Finding the Phase of the Moon

Whether you’re a werewolf, a witch, or just interested in the lunar calendar, it can be helpful and educational to track the phases of the moon and learn about waxing, waning, and even gibbous moons (which have nothing to do with gibbons).

To make things complicated, the moon has an orbit of 27.32 days and its phase is actually dependent on where you are on Earth. Still, given a specific date, it is possible to calculate the phase of the moon.

But why go through all the work when there are plenty of sites online that already calculate the phase for any given date in the past, present, or future? For the script in Listing B-5, we’re going to utilize the same site Google uses if you do a search for the current phase of the moon: http://www.moongiant.com/.

The Code

   #!/bin/bash

   # moonphase--Reports the phase of the moon (really the percentage of
   #   illumination) for today or a specified date

   # Format of Moongiant.com query:
   #   http://www.moongiant.com/phase/MM/DD/YYYY

   # If no date is specified, use "today" as a special value.

   if [ $# -eq 0 ] ; then
     thedate="today"
   else
     # Date specified. Let's check whether it's in the right format.
      mon="$(echo $1 | cut -d/ -f1)"
      day="$(echo $1 | cut -d/ -f2)"
     year="$(echo $1 | cut -d/ -f3)"

   if [ -z "$year" -o -z "$day" ] ; then     # Zero length?
       echo "Error: valid date format is MM/DD/YYYY"
       exit 1
     fi
     thedate="$1" # No error checking = dangerous
   fi

   url="http://www.moongiant.com/phase/$thedate"
 pattern="Illumination:"

 phase="$( curl -s "$url" | grep "$pattern" | tr ',' '\
   ' | grep "$pattern" | sed 's/[^0-9]//g')"

   # Site output format is "Illumination: <span>NN%\n<\/span>"

   if [ "$thedate" = "today" ] ; then
     echo "Today the moon is ${phase}% illuminated."
   else
     echo "On $thedate the moon = ${phase}% illuminated."
   fi

   exit 0

Listing B-5: The moonphase script

How It Works

As with other scripts that scrape values from a web query, the moonphase script revolves around identifying the format of different query URLs and pulling the specific value from the resultant HTML data stream.

Analysis of the site shows that there are two types of URLs: one that specifies the current date, simply structured as “phase/today”, and one that specifies a date in the past or future in the format MM/DD/Y Y Y Y, like “phase/08/03/2017”.

Specify a date in the right format and you can get the phase of the moon on that date. But we can’t just append the date to the site’s domain name without some error-checking, so the script splits the user input into three fields—month, day, and year—and then makes sure that the day and year values are nonzero at . There’s more error-checking that can be done, which we’ll explore in “Hacking the Script.”

The trickiest part of any scraper script is properly identifying the pattern that lets you extract the desired data. In the moonphase script, that’s specified at . The longest and most complicated line is at , where the script gets the page from the moongiant.com site, and then uses a sequence of grep and sed commands to pull just the line that matches the pattern specified.

After that, it’s just a matter of displaying the illumination level, either for today or the specified date, using the final if/then/else statement.

Running the Script

Without an argument, the moonphase script shows the percentage of lunar illumination for the current date. Specify any date in the past or future by entering MM/DD/YYYY, as shown in Listing B-6.

The Results

$ moonphase 08/03/2121
On 08/03/2121 the moon = 74% illuminated.

$ moonphase
Today the moon is 100% illuminated.

$ moonphase 12/12/1941
On 12/12/1941 the moon = 43% illuminated.

Listing B-6: Running the moonphase script

NOTE

December 12, 1941 is when the classic Universal horror film The Wolf Man was first released to movie theaters. And it wasn’t a full moon. Go figure!

Hacking the Script

From an internal perspective, the script could be greatly improved by having a better error-checking sequence, or even by just utilizing Script #3 on page 17. That would let users specify dates in more formats. An improvement would be to replace the if/then/else statement at the end with a function that translates illumination level into more common moon phase phrases like “waning,” “waxing,” and “gibbous.” NASA has a web page you could use that defines the different phases: http://starchild.gsfc.nasa.gov/docs/StarChild/ solar_system_level2/moonlight.html.