This appendix contains an assortment of bash scripts that illustrate how to solve some well-known tasks, such as recursion-based solutions for the GCD and LCM of two positive integers, as well as awk
commands for processing multiple datasets in order to perform arithmetic calculations.
The shell scripts are grouped corresponding to their respective chapters: for instance, awk
-related bash scripts are listed as part of the section for Chapter 5. In some cases (such as Chapter 1), N/A is listed when there are no samples for a chapter. Please keep in mind that there is fairly light coverage (in terms of explanations) for the code samples in this Appendix: the assumption is that you have read the code samples in the chapters, thereby enabling you to understand the code without in-depth explanations.
N/A
The examples in this Appendix for Chapter 2 contains the following shell scripts for calculating Fibonacci numbers, the GCD and LCM of two positive integers, and the divisors of a positive integer:
Fibonacci.sh
gcd.sh
lcm.sh
Divisors2.sh
Listing A.1 displays the contents of Fibonacci.sh
that computes the Fibonacci value of a positive integer.
#!/bin/sh LOGFILE="/tmp/a1" rm -f $LOGFILE 2>/dev/null fib() { if [ "$1" -gt 3 ] then echo "1 = $1 2 = $2 3 = $3" >> $LOGFILE decr1=`expr $2 - 1` decr2=`expr $3 - 1` decr3=`expr $3 - 2` echo "d1 = $decr1 d2 = $decr2 d3 = $decr3" >> $LOGFILE fib1=`fib $2 $3 $decr2` fib2=`fib $3 $decr2 $decr3` fib=`expr $fib1 + $fib2` echo $fib else if [ "$1" -eq 3 ] then echo 2 else echo 1 fi fi } echo "Enter a number: " read num # add code to ensure it's a positive integer if [ "$num" -lt 3 ] then echo "fibonacci $num = 1" else decr1=`expr $num - 1` decr2=`expr $num - 2` echo "fibonacci $num = `fib $num $decr1 $decr2`" fi
In case you don't already know, the Fibonacci sequence is defined as follows:
F(1) = 1; F(2) = 2; and F(n) = F(n-1) + F(n-2) for n >= 2.
Listing A.1 looks complicated, but in a sense it "extends" the technique shown in Listing 2.10 in Chapter 2. In particular, the code for calculating factorial values involves decrementing one variable, whereas calculating Fibonacci numbers involves decrementing two variables (which are called decr1
and decr2
in Listing A.1) in order to make recursive invocations of the fib()
function.
Listing A.2 displays the contents of the shell script gcd.sh
that computes the greatest common divisor of two positive integers.
#!/bin/sh function gcd() { if [ $1 -lt $2 ] then result=`gcd $2 $1` echo $result else remainder=`expr $1 % $2` if [ $remainder == 0 ] then echo $2 else echo `gcd $2 $remainder` fi fi } a="4" b="20" result=`gcd $a $b` echo "GCD of $a and $b = $result" a="4" b="22" result=`gcd $a $b` echo "GCD of $b and $a = $result" a="20" b="3" result=`gcd $a $b` echo "GCD of $b and $a = $result" a="10" b="10" result=`gcd $a $b` echo "GCD of $b and $a = $result"
Listing A.2 is a straightforward implementation of the Euclidean algorithm (check Wikipedia for details) for finding the GCD of two positive integers. The output from Listing A.2 shows the GCD of 4 and 20, as shown here:
GCD of 4 and 20 = 4 GCD of 22 and 4 = 2 GCD of 3 and 20 = 1 GCD of 10 and 10 = 10
Listing A.3 displays the contents of the shell script lcm.sh
that computes the lowest common multiple (LCM) of two positive integers. This script contains the code in the shell script gcd.sh
in order to compute the LCM of two positive integers.
#!/bin/sh function gcd() { if [ $1 -lt $2 ] then result=`gcd $2 $1` echo $result else remainder=`expr $1 % $2` if [ $remainder == 0 ] then echo $2 else result=`gcd $2 $remainder` echo $result fi fi } function lcm() { gcd1=`gcd $1 $2` lcm1=`expr $1 / $gcd1` lcm2=`expr $lcm1 \* $2` echo $lcm2 } a="24" b="10" result=`lcm $a $b` echo "The LCM of $a and $b = $result" a="10" b="30" result=`lcm $a $b` echo "The LCM of $a and $b = $result"
Notice that Listing A.3 contains the gcd()
function to compute the GCD of two positive integers. This function is necessary because the next portion of Listing A.3 contains the lcm()
function that invokes the gcd()
function, followed by some multiplication steps in order to calculate the LCM of two numbers. The output from Listing A.3 displays the LCM of 10 and 24, as shown here:
The LCM of 24 and 10 = 120 The LCM of 10 and 30 = 30
Listing A.4 displays the contents of the shell script Divisors2.sh
that calculates the prime factors of a positive integer.
#!/bin/sh function divisors() { div="2" num="$1" primes="" while (true) do remainder=`expr $num % $div` if [ $remainder == 0 ] then #echo "divisor: $div" primes="${primes} $div" num=`expr $num / $div` else div=`expr $div + 1` fi if [ $num -eq 1 ] then break fi done # use 'echo' instead of 'return' echo $primes } num="12" primes=`divisors $num` echo "The prime divisors of $num: $primes" num="768" primes=`divisors $num` echo "The prime divisors of $num: $primes" num="12345" primes=`divisors $num` echo "The prime divisors of $num: $primes" num="23768" primes=`divisors $num` echo "The prime divisors of $num: $primes"
Listing A.4 contains the divisors()
function that consists primarily of a while
loop that checks for the divisors of num
(which is initialized as the value of $1
). The initial value of div
is 2, and each time div
divides num
, the value of div
is appended to the primes string, and num
is replaced by num/div
. If div
does not divide num
, div
is incremented by 1. Note that the while
loop in Listing A.4 terminates when num
reaches the value of 1.
The output from Listing A.4 displays the prime divisors of 12, 768, 12345, and 23768, as shown here:
The prime divisors of 12: 2 2 3 The prime divisors of 768: 2 2 2 2 2 2 2 2 3 The prime divisors of 12345: 3 5 823 The prime divisors of 23768: 2 2 2 2971
The prime factors of 12 and 678 are computed in less than 1 second, but the calculation of the prime factors of 12345 and 23768 is significantly slower.
The first example in this section illustrates how to determine which zip files contain SVG documents. The second example in this section shows you how to check the entries in a log file (with simulated values). The third code sample shows you how to use the grep
command in order to simulate a relational database consisting of three "tables", each of which is represented by a dataset.
Listing A.5 displays the contents of myzip.sh
that produces two lists of files: the first list contains the names of the zip files that contain SVG documents, and the second list contains the names of the zip files that do not contain SVG documents.
foundlist="" notfoundlist="" for f in `ls *zip` do found=`unzip -v $f |grep "svg$"` if [ "$found" != "" ] then #echo "$f contains SVG documents:" #echo "$found" foundlist="$f ${foundlist}" else notfoundlist="$f ${notfoundlist}" fi done echo "Files containing SVG documents:" echo $foundlist| tr ' ' '\n' echo "Files not containing SVG documents:" echo $notfoundlist |tr ' ' '\n'
Listing A.5 searches ("looks inside") zip files for the hard-coded string svg
. If you want to search for some other string in a set of zip files, then manually replace this string with that other string. Alternatively, you can prompt users for a search string so you don't need to make manual modifications to the shell script.
For your convenience, Listing A.6 displays the contents of searchstrings.sh
that illustrates how to enter one or more strings on the command line, in order to search for those strings in the zip files in the current directory.
foundlist="" notfoundlist="" if [ "$#" == 0 ] then echo "Usage: $0 <string-list>" exit fi zipfiles=`ls *zip 2>/dev/null` if [ "$zipfiles" = "" ] then echo "*** No zip files in `pwd` ***" exit fi for str in "$@" do echo "Checking zip files for $str:" for f in `ls *zip` do found=`unzip -v $f |grep "$str"` if [ "$found" != "" ] then foundlist="$f ${foundlist}" else notfoundlist="$f ${notfoundlist}" fi done echo "Files containing $str:" echo $foundlist| tr ' ' '\n' echo "Files not containing $str:" echo $notfoundlist |tr ' ' '\n' foundlist="" notfoundlist="" done
Listing A.6 first checks that at least one file is specified on the command line, and then initializes the zipfiles
variable with the list of zip files in the current directory. If zipfiles
is null, an appropriate message is displayed.
The next section of Listing A.6 contains a for
loop that processes each argument that was specified at the command line. For each such argument, another for loop will check for the names of the zip files that contain that argument. If there is a match, then the variable $foundlist
is updated, otherwise the $notfoundlist
variable is updated. When the inner loop has completed, the names of the matching files and the non-matching files are displayed, and then the outer loop is executed with the next command line argument.
Although the preceding explanation might seem complicated, a sample output from launching Listing A.6 will clarify how the code works:
./searchstrings.sh svg abc Checking zip files for svg: Files containing svg: Files not containing svg: shell-programming-manuscript.zip shell-progr-manuscript-0930-2013.zip shell-progr-manuscript-0207-2015.zip shell-prog-manuscript.zip Checking zip files for abc: Files containing abc: Files not containing abc: shell-programming-manuscript.zip shell-progr-manuscript-0930-2013.zip shell-progr-manuscript-0207-2015.zip shell-prog-manuscript.zip
If you want to perform the search for zip files in subdirectories, modify the loop as shown here:
for f in `find . –print |grep "zip$"` do echo "Searching $f…" unzip -v $f |grep "svg$" done
If you have the Java SDK on your machine, you can also use the jar
command instead of the unzip
command, as shown here:
jar tvf $f |grep "svg$"
Listing A.7 displays the contents of skutotals.sh
that calculates the number of units sold for each SKU
in skuvalues.txt.
SKUVALUES="skuvalues.txt" SKUSOLD="skusold.txt" for sku in `cat $SKUVALUES` do total=`cat $SKUSOLD |grep $sku | awk '{total += $2} END {print total}'` echo "UNITS SOLD FOR SKU $sku: $total" done
Listing A.7 contains a for
loop that iterates through the rows of the file skuvalues.txt
, and passes those SKU
values – one at a time – to a command that involves the cat, grep,
and awk
commands. The purpose of the latter combination of commands is to 1) find the matching lines in skusold.txt
, 2) compute the sum of the values of the numbers in the second column, and 3) print the subtotal for the current SKU
. In essence, this shell script prints the subtotals for each SKU
value.
Launch skutotals.sh
and you will see the following output:
UNITS SOLD FOR SKU 4520: 27 UNITS SOLD FOR SKU 5530: 17 UNITS SOLD FOR SKU 6550: 8 UNITS SOLD FOR SKU 7200: 90 UNITS SOLD FOR SKU 8000: 160
We can generalize the previous shell script to take into account different prices for each SKU
. Listing A.8 displays the contents of skuprices.txt
.
4520 3.50 5530 5.00 6550 2.75 7200 6.25 8000 3.50
Listing A.9 displays the contents of skutotals2.sh
that extends the code in Listing A.8 in order to calculate the revenue for each SKU
.
SKUVALUES="skuvalues.txt" SKUSOLD="skusold.txt" SKUPRICES="skuprices.txt" for sku in `cat $SKUVALUES` do skuprice=`grep $sku $SKUPRICES | cut -d" " -f2` subtotal=`cat $SKUSOLD |grep $sku | awk '{total += $2} END {print total}'` total=`echo "$subtotal * $skuprice" |bc` echo "AMOUNT SOLD FOR SKU $sku: $total" done
Listing A.9 contains a slight enhancement: instead of computing the subtotals of the number of units for each SKU
, the revenue for each SKU
is computed, where the revenue for each item equals the price of the SKU
multiplied by the number of units sold for the given SKU
. Launch skutotals2.sh
and you will see the following output:
AMOUNT SOLD FOR SKU 4520: 94.50 AMOUNT SOLD FOR SKU 5530: 85.00 AMOUNT SOLD FOR SKU 6550: 22.00 AMOUNT SOLD FOR SKU 7200: 562.50 AMOUNT SOLD FOR SKU 8000: 560.00
Listing A.10 displays the contents of skutotals3.sh
that calculates the minimum, maximum, average, and total number of units sold for each SKU
in skuvalues.txt
.
SKUVALUES="skuvalues.txt" SKUSOLD="skusold.txt" TOTALS="totalspersku.txt" rm -f $TOTALS 2>/dev/null ############################## #calculate totals for each sku ############################## for sku in `cat $SKUVALUES` do total=`cat $SKUSOLD |grep $sku | awk '{total += $2} END {print total}'` echo "UNITS SOLD FOR SKU $sku: $total" echo "$sku|$total" >> $TOTALS done ########################## #calculate max/min/average ########################## awk -F"|" ' BEGIN {first = 1;} {if(first) { min = max= avg = sum = $2; first=0; next}} { if($2 < min) { min = $2 } if($2 > max) { max = $2 } sum += $2 } END {print "Minimum = ",min print "Maximum = ",max print "Average = ",avg print "Total = ",sum } ' $TOTALS
Listing A.10 initializes some variables, followed by a for
loop that invokes an awk
command in order to compute subtotals (i.e., number of units sold) for each SKU
value. The next portion of Listing A.10 contains an awk
command that calculates the maximum, minimum, average, and sum for the SKU
units in the files $TOTALS
.
Launch the script file in Listing A.10 and you will see the following output:
UNITS SOLD FOR SKU 4520: 27 UNITS SOLD FOR SKU 5530: 17 UNITS SOLD FOR SKU 6550: 8 UNITS SOLD FOR SKU 7200: 90 UNITS SOLD FOR SKU 8000: 160 Minimum = 8 Maximum = 160 Average = 27 Total = 302
grep
CommandThis section shows you how to combine the grep
and cut
commands in order to keep track of a small database of customers, their purchases, and the details of their purchases that are stored in three text files.
Keep in mind that there are many open source toolkits available that can greatly facilitate working with relational data and non-relational data. Those toolkits can be very robust and also minimize the amount of coding that is required.
Moreover, you can use the join
command (discussed in Chapter 2) to perform SQL-like operations on datasets. Nevertheless, the real purpose of this section is to illustrate some techniques with grep
that might be useful in your own shell scripts.
Listing A.11 displays the contents of the MasterOrders.txt
text file.
M10000 C1000 12/15/2012 M11000 C2000 12/15/2012 M12000 C3000 12/15/2012
Listing A.12 displays the contents of the Customers.txt
text file.
C1000 John Smith LosAltos California 94002 C2000 Jane Davis MountainView California 94043 C3000 Billy Jones HalfMoonBay California 94040
Listing A.13 displays the contents of the PurchaseOrders.txt
text file.
C1000,"Radio",54.99,2,"01/22/2013" C1000,"DVD",15.99,5,"01/25/2013" C2000,"Laptop",650.00,1,"01/24/2013" C3000,"CellPhone",150.00,2,"01/28/2013"
Listing A.14 displays the contents of the MasterOrders.sh
bash script.
# initialize variables for the three main files MasterOrders="MasterOrders.txt" CustomerDetails="Customers.txt" PurchaseOrders="PurchaseOrders.txt" # iterate through the "master table" for mastCustId in `cat $MasterOrders | cut -d" " -f2` do # get the customer information custDetails=`grep $mastCustId $CustomerDetails` # get the id from the previous line custDetailsId=`echo $custDetails | cut -d" " -f1` # get the customer PO from the PO file custPO=`grep $custDetailsId $PurchaseOrders` # print the details of the customer echo "Customer $mastCustId:" echo "Customer Details: $custDetails" echo "Purchase Orders: $custPO" echo "----------------------" echo done
Listing A.14 initializes some variables for orders, details, and purchase-related datasets. The next portion of Listing A.14 contains a for
loop that iterates through the id values in the MasterOrders.txt
file and uses each id
to find the corresponding row in the Customers.txt
file as well as the corresponding row in the PurchaseOrders.txt
file. Finally, the bottom of the loop displays the details of the information that were retrieved from the initial portion of the for
loop. The output from Listing A.14 is here:
Customer C1000: Customer Details: C1000 John Smith LosAltos California 94002 Purchase Orders: C1000,"Radio",54.99,2,"01/22/2013" C1000,"DVD",15.99,5,"01/25/2013" ---------------------- Customer C2000: Customer Details: C2000 Jane Davis MountainView California 94043 Purchase Orders: C2000,"Laptop",650.00,1,"01/24/2013" ---------------------- Customer C3000: Customer Details: C3000 Billy Jones HalfMoonBay California 94040 Purchase Orders: C3000,"CellPhone",150.00,2,"01/28/2013" ----------------------
Listing A.15 displays the contents of CheckLogUpdates.sh
that illustrates how to periodically check the last line in a log file to determine the status of a system. This shell script simulates the status of a system by appending a new row that is based on the current timestamp. The shell script sleeps for a specified number of seconds, and on the third iteration the script appends a row with an error status in order to simulate an error. In the case of a shell script that is monitoring a live system, the error code is obviously generated outside the shell script.
DataFile="mylogfile.txt" OK="okay" ERROR="error" sleeptime="2" loopcount=0 rm -f $DataFile 2>/dev/null; touch $DataFile newline="`date` SYSTEM IS OKAY" echo $newline >> $DataFile while (true) do loopcount=`expr $loopcount + 1` echo "sleeping $sleeptime seconds..." sleep $sleeptime echo "awake again..." lastline=`tail -1 $DataFile` if [ "$lastline" == "" ] then continue fi okstatus=`echo $lastline |grep -i $OK` badstatus=`echo $lastline |grep -i $ERROR` if [ "$okstatus" != "" ] then echo "system is normal" if [ $loopcount –lt 5 ] then newline="`date` SYSTEM IS OKAY" else newline="`date` SYSTEM ERROR" fi echo $newline >> $DataFile elif [ "$badstatus" != "" ] then echo "Error in logfile: $lastline" break fi done
Listing A.15 initializes some variables and then ensures that the log file mylogfile.txt
is empty. After an initial line is added to this log file, a while loop sleeps periodically and then examines the contents of the final line of text in the log file. New text lines are appended to this log file, and when an error message is detected, the code exits the while
loop. A sample invocation of Listing A.15 is here:
sleeping 2 seconds... awake again... system is normal sleeping 2 seconds... awake again... system is normal sleeping 2 seconds... awake again... system is normal sleeping 2 seconds... awake again... system is normal sleeping 2 seconds... awake again... system is normal sleeping 2 seconds... awake again... Error in logfile: Thu Nov 23 18:22:22 PST 2017 SYSTEM ERROR
The contents of the log file are shown here:
Thu Nov 23 18:22:12 PST 2017 SYSTEM IS OKAY Thu Nov 23 18:22:14 PST 2017 SYSTEM IS OKAY Thu Nov 23 18:22:16 PST 2017 SYSTEM IS OKAY Thu Nov 23 18:22:18 PST 2017 SYSTEM IS OKAY Thu Nov 23 18:22:20 PST 2017 SYSTEM IS OKAY Thu Nov 23 18:22:22 PST 2017 SYSTEM ERROR
N/A
This section of the Appendix contains an assortment of bash scripts that use awk in order to perform various tasks:
1)multiline.sh: convert multi-line records into single-line records
2)sumrows.sh: compute the total of each row in a dataset
3)genetics.sh: an example of the awk 'split' function
4)diagonal.sh: display the main/off-diagonal values and also compute the sum of the main/off-diagonal values
5)calculate column totals from multiple files
6)display main diagonal and off-diagonal values, as well as the sum of those values
The details of these shell scripts are discussed in the following sections.
Listing A.16 displays the contents of the dataset multiline.txt
and Listing A.17 displays the contents of the shell script multiline.sh
that combines multiple lines into a single record.
Mary Smith 999 Appian Way Roman Town, SF 94234 Jane Adams 123 Main Street Chicago, IL 67840 John Jones 321 Pine Road Anywhere, MN 94949
Note that each record spans multiple lines that can contain whitespaces, and records are separated by a blank line.
# Records are separated by blank lines awk ' BEGIN { RS = "" ; FS = "\n" } { gsub(/[ \t]+$/, "", $1) gsub(/[ \t]+$/, "", $2) gsub(/[ \t]+$/, "", $3) gsub(/^[ \t]+/, "", $1) gsub(/^[ \t]+/, "", $2) gsub(/^[ \t]+/, "", $3) print $1 ":" $2 ":" $3 "" #printf("%s:%s:%s\n",$1,$2,$3) } ' multiline.txt
Listing A.17 contains a BEGIN
block that sets RS
("record separator") as an empty string and FS
("field separator") as a linefeed. Doing so enables us to "slurp" multiple lines into the same record, using a blank line as a separator for different records. The gsub()
function removes leading and trailing whitespaces and tabs for three fields in the datasets. The output from launching Listing A.17 is here:
Mary Smith:999 Appian Way:Roman Town, SF 94234 Jane Adams:123 Main Street:Chicago, IL 67840 John Jones:321 Pine Road:Anywhere, MN 94949
Listing A.18 displays the contents of the dataset numbers.txt
and Listing A.19 displays the contents of the shell script sumrows.sh
that combines and adds the fields in each record.
1 2 3 4 5 6 7 8 9 10 5 5 5 5 5
awk '{ for(i=1; i<=NF;i++) j+=$i; print j; j=0 }' numbers.txt
Listing A.19 contains a simple invocation of the awk
command that contains a for loop that uses the variable j
to hold the sum of the values of the fields in each record; after which the sum is printed and j
is re-initialized to 0. The output from Listing A.19 is here:
15 40 25
split
Function in awk
Listing A.20 displays the contents of the dataset genetics.txt
(some rows wrap across more than one line) and Listing A.21 displays the contents of the shell script genetics.sh
that uses the split()
function in order to parse the contents of a field in a record.
#extract rows with 'gene' and print rows and 'key' value xyz3 GTF2GFF chro 55555 44444 key=chr1;Name=chr1 xyz3 GTF2GFF gene 77774 11111 key=XYZ123;NB=standard;Name=extra xyz3 GTF2GFF exon 71874 12227 Super=NR_55555 xyz3 GTF2GFF exon 72613 12721 Super=NR_55555 xyz3 GTF2GFF exon 83221 14408 Super=NR_55555 xyz3 GTF2GFF gene 84362 29370 key=WASH7P;Note=extra;Name=ALPHA xyz3 GTF2GFF exon 84362 14829 Super=NR_222222
# required output: #xyz3:77774:XYZ123 #xyz3:84362:WASH7P awk -F" " ' { if( $3 == "gene" ) { split($6, triplet, /[;=]/) printf("%s:%s:%s\n", $1, $4, triplet[2] ) } } ' genetics.txt
Listing A.21 matches input lines whose third field equals gene, after which the array triplet is populated with the components of the sixth field, using the characters ";" and "=" as delimiters in the sixth field. The output consists of the first field, the fourth field, and the second element in the array triplet
. The output from launching Listing A.21 is here:
xyz3:77774:XYZ123 xyz3:84362:WASH7P
Listing A.22 displays the contents of the dataset diagonal.txt
and Listing A.23 displays the contents of the shell script diagonal.sh
that displays the elements in the main diagonal and off-diagonal, and also computes the sum of the elements in the main diagonal and off-diagonal.
1,1,1,1,1 5,4,3,2,1 8,8,1,8,8 5,4,3,2,1 1,6,6,7,7
# NF is the number of fields in the current record. # NR is the number of the current record/line # (not the number of records in the file). # In the END block (or the last line of the file) # it's the number of lines in the file. # Solution in R: https://gist.github.com/dsparks/3693115 echo "Main diagonal:" awk -F"," '{ for (i=0; i<=NF; i++) if (NR >= 1 && NR == i) print $(i) }' diagonal.csv echo "Off diagonal:" awk -F"," '{print $(NF+1-NR)}' diagonal.csv echo "Main diagonal sum:" awk -F"," ' BEGIN { sum = 0 } { for (i=0; i<=NF; i++) { if (NR >= 1 && NR == i) { sum += $i } } } END { printf ("sum = %s\n",sum) } ' diagonal.csv echo "Off diagonal sum:" awk -F"," ' BEGIN { sum = 0 } { for (i=0; i<=NF; i++) { if(NR >= 1 && i+NR == NF+1) { sum += $i; } } } END { printf ("sum = %s\n",sum) } ' diagonal.csv
Listing A.23 starts with an awk
command that contains a loop that matches "diagonal" elements of the dataset, which is to say the first field of the first record, the second field of the second record, the third field of the third record, and so forth. This matching process is handled by the conditional logic inside the for
loop.
The second part of Listing A.23 contains an awk
command that prints the off-diagonal elements of the dataset, using a very simple print statement.
The third part of Listing A.23 contains an awk
command that contains the same logic as the first awk
command, and then calculates the cumulative sum of the diagonal elements.
The fourth part of Listing A.23 contains an awk
command that contains logic that is similar to the first awk
command, with the following variation:
if(NR >= 1 && i+NR == NF+1)
The preceding logic enables us to calculate the cumulative sum of the off-diagonal elements. The output from launching Listing A.23 is here:
Main diagonal: 1 4 1 2 7 Off diagonal: 1 2 1 4 1 Main diagonal sum: sum = 15 Off diagonal sum: sum = 9
Listing A.24, Listing A.25, and Listing A.26 display the contents of the dataset rain1.csv, rain2.csv,
and rain3.csv.txt
that are used in several shell scripts in this section.
1,0.10,53,15 2,0.12,54,16 3,0.19,65,10 4,0.25,86,23 5,0.18,57,17 6,0.23,79,34 7,0.34,66,21
1,0.00,63,24 2,0.02,64,25 3,0.09,75,19 4,0.15,66,28 5,0.08,67,36 6,0.13,79,23 7,0.24,68,25
1,1.00,83,34 2,0.02,84,35 3,1.09,75,19 4,0.15,86,38 5,1.08,87,36 6,0.13,79,33 7,0.24,88,45
Listing A.27 displays the contents of the shell script rainfall1.sh
that adds the numbers in the corresponding fields of several CSV files and displays the results.
# => Calculate COLUMN averages for multiple files #columns in rain.csv: #DOW,inches of rain, degrees F, humidity (%) #files: rain1.csv, rain2.csv, rain3.csv echo "FILENAMES:" ls rain?.csv awk -F',' ' { inches+=$2 degrees+=$3 humidity+=$4 } END { printf("FILENAME: %s\n", FILENAME) printf("inches: %.2f\n", inches/7) printf("degrees: %.2f\n", degrees/7) printf("humidity: %.2f\n", humidity/7) } ' rain?.csv
Listing A.27 calculates the sum of the numbers in three columns (i.e., inches of rainfall, degrees Fahrenheit, and humidity as a percentage) in the datasets specified by the expression rain?.csv
, which in this particular example consists of the datasets rain1.csv
, rain2.csv
, and rain3.csv
. Thus, Listing A.27 can handle multiple datasets (rain1.csv
through rain9.csv
). You can generalize this example to handle any dataset that starts with the string rain
and ends with the suffix csv
with the following expression:
rain*.csv
The output from launching Listing A.27 is here:
FILENAMES: rain1.csv rain2.csv rain3.csv inches: 0.83 degrees: 217.71 humidity: 79.43
Listing A.28 displays the contents of the shell script rainfall12.sh
that adds the numbers in the corresponding fields of several CSV files and displays the results.
# => Calculate ROW averages for multiple files #columns in rain.csv: #DOW,inches of rain, degrees F, humidity (%) #files: rain1.csv, rain2.csv, rain3.csv awk -F',' ' { mon_rain[FNR]+=$2 mon_degrees[FNR]+=$3 mon_humidity[FNR]+=$4 idx[FNR]++ } END { printf("DAY INCHES DEGREES HUMIDITY\n") for(i=1; i<=FNR; i++){ printf("%3d %-6.2f %-8.2f %-7.2f\n", i,mon_rain[i]/idx[i],mon_degrees[i]/idx[i],mon_humidity[i]/ idx[i]) } } ' rain?.csv
Listing A.28 is similar to Listing A.27, except that this code sample uses the value of FNR
in order to calculate the average rainfall, degrees Fahrenheit, and percentage humidity only for Monday. The output from launching Listing A.28 is here:
DAY INCHES DEGREES HUMIDITY 1 0.37 66.33 24.33 2 0.05 67.33 25.33 3 0.46 71.67 16.00 4 0.18 79.33 29.67 5 0.45 70.33 29.67 6 0.16 79.00 30.00 7 0.27 74.00 30.33
Listing A.29, Listing A.30, and Listing A.31 display the contents of the dataset zain1.csv, zain2.csv,
and rainz.csv.txt
that are used in an upcoming shell script in this section.
1,0.10,53,15 2,0.12,54,16 3,0.19,65,10 4,0.25,86,23 5,0.18,57,17 6,0.23,79,34 7,0.34,66,21
1,0.00,63,24 2,0.02,64,25 3,0.09,75,19 4,0.15,66,28 5,0.08,67,36 6,0.13,79,23 7,0.24,68,25
1,1.00,83,34 2,0.02,84,35 3,1.09,75,19 4,0.15,86,38 5,1.08,87,36 6,0.13,79,33 7,0.24,88,45
Listing A.32 displays the contents of the shell script rainfall3.sh
that adds the numbers in the corresponding fields of several CSV files and displays the results.
# => Calculate COLUMN averages for multiple files (backtick) #columns in rain.csv: #DOW,inches of rain, degrees F, humidity (%) # specify the list of CSV files (supports multiple regexs) files=`ls rain*csv zain*csv` echo "FILES: `echo $files`" awk -F',' ' { mon_rain[FNR]+=$2 mon_degrees[FNR]+=$3 mon_humidity[FNR]+=$4 idx[FNR]++ } END { printf("DAY INCHES DEGREES HUMIDITY\n") for(i=1; i<=FNR; i++){ printf("%3d %-6.2f %-8.2f %-7.2f\n", i,mon_rain[i]/idx[i],mon_degrees[i]/idx[i],mon_humidity[i]/ idx[i]) } } ' `echo $files`
Listing A.32 performs the same calculations as Listing A.28, with the following variation: the datasets specified by the variable files
that is defined by the regular expression `ls rain*csv zain*csv`.
You can modify this regular expression to include any list of files that need to be processed. Notice that the final line of code in Listing A.32 uses backtick substitution to expand the regular expression in the definition of the variable files:
' `echo $files`
As yet another variation, you can specify a file – let's call it filelist.txt
- that contains a list of filenames that you want to process, and then replace the preceding line as follows:
' `cat filelist.txt`
The output from launching Listing A.32 is here:
FILES: rain1.csv rain2.csv rain3.csv zain1.csv zain2.csv zain3.csv DAY INCHES DEGREES HUMIDITY 1 0.37 66.33 24.33 2 0.05 67.33 25.33 3 0.46 71.67 16.00 4 0.18 79.33 29.67 5 0.45 70.33 29.67 6 0.16 79.00 30.00 7 0.27 74.00 30.33
Listing A.33 displays the contents of the shell script linear-combo.sh
that computes various linear combinations of the columns in multiple datasets and displays one combined dataset as the output.
# => combinations of columns awk -F',' ' { $2 += $3 * 2 + $4 / 2 $3 += $4 / 3 + $2 * $2 / 10 $4 += $2 + $3 $1 += $2 * 3 - $4 / 10 printf("%d,%.2f,%.2f,%.2f\n",$1,$2,$3,$4) } ' rain?.csv
Listing A.33 processes the values of the datasets rain1.csv, rain2. csv
, and rain3.csv
whose contents are shown earlier in this section. The key observation to make is that the sequence of calculations in the calculations in the body of the awk
statement involved inter-dependencies.
Specifically, the value of $2
is a linear combination of the values of $3
and $4
. Next, the value of $3
is a linear combination of the value of $4
and $2
, where the latter is not the original value from the datasets, but its calculated value. Third, the value of $4
is a linear combination of $2
and of $3
, both of which are calculated values and not the values in the datasets. Finally, the value of $1
is a linear combination of the newly calculated values for $2
and $4
.
As you can see, awk
provides the flexibility to specify practically any combination of calculations (including non-linear combinations) in a very simple and sequential fashion. The output of Listing A.33 is here:
194,113.60,1348.50,1477.10 196,116.12,1407.72,1539.84 204,135.19,1895.97,2041.16 187,183.75,3470.07,3676.82 202,122.68,1567.70,1707.38 194,175.23,3160.89,3370.12 207,142.84,2113.33,2277.17 201,138.00,1975.40,2137.40 202,140.52,2046.92,2212.44 201,159.59,2628.23,2806.82 203,146.15,2211.32,2385.47 203,152.08,2391.83,2579.91 199,169.63,2964.10,3156.73 206,148.74,2288.69,2462.43 183,184.00,3479.93,3697.93 182,185.52,3537.43,3757.95 200,160.59,2660.25,2839.84 179,191.15,3752.50,3981.65 178,193.08,3826.99,4056.07 195,174.63,3139.56,3347.19 173,198.74,4052.76,4296.50
In this appendix, you saw examples of how to use some useful and versatile bash
commands. First you saw examples of shell scripts for various tasks involving recursion, such as computing the GCD (greatest common divisor) and the LCM (lowest common multiple) of two positive integers, the Fibonacci value of a positive integer, and also the prime divisors of a positive integer.
Next you saw a bash
script with the grep
command, a while
loop, and other constructs that append data to a log file, with logic to determine when to exit the bash
script. In addition, you learned how to use the grep
command to simulate a simple relational database.
In the final portion of this Appendix you learned how to use awk
to process records that span multiple lines, how to compute column sums and averages involving multiple datasets, and how to use awk
-related functions such as gsub()
and split()
. Finally, you learned how to dynamically calculate various combinations of columns of numbers from multiple datasets.