Perl Boot Camp, Part 2: Variables and Data Types

Data types are the kinds of values Perl supports. Common data types include arbitrarily long strings (e.g., "hi, bob"), intergers (e.g., 42) and floating point numbers (e.g., 3.14). Perl is a loosely typed language, which means that Perl works hard to let you forget about what kind of data you're dealing with. For the most part, you will be dealing with strings, which plays to Perl's strengths. To manipulate data, variables are employed. Table 41-1 lists the most common variable types in Perl. For the full story on Perl data types, read the perldata manpage.

Table 41-1. Common Perl variables

Name	Example	Description
`scalar`	`$lastname`, `$PI`	Holds single values
`array`	`@people`, `$peple[0]`	Holds an ordered sequence of scalar values
`hash`	`%cgi_params`, `$cgi_params{'action'}`	Holds a set of key-value pairs

Scalars

When you want to store single values, like any of those given in the previous paragraph, you will use a scalar variable. Scalars are labeled with a $ followed by a letter and any sequence of letters, numbers, and underscores. Scalars defined at the top of scripts are often used as constants. You may need to tweak some of them, particularly those containing filesystem paths, to get third-party scripts to run on your system.

Of course, values can be compared to each other or added together. Perl has relational operators that treat values as numbers and other relational operators that treat values as strings. Although Perl has different operators for numbers and strings, Perl makes scalar values do the right thing most of the time. For example, you want to create a series of filenames like mail_num. The following code does this.

foreach my $num (1..10) {
   print "mail_" . $num . "\n";
}

Even though $num is a number, the string concatenation operator is able to use it as a string. Table 40-2 shows string operators, and Table 41-3 shows the numerical ones. See the perlop manpage for the full story.

Table 41-2. String operators

Operator	Example	Description
`.`	`$saluation . " Jones"`	String concatenation
`eq`	`$foo eq $bar`	String equality test
`ne`	`$bar ne $baz`	String inequality test
`gt`	`$name gt "Bob"`	True if left string comes after right in ASCII
`lt`	`$name lt "Xavier"`	True if left string comes before right in ASCII
`cmp`	`$name cmp "Wilson"`	Return -1 if left operand ASCII-sorts before the right; 0 if right and left are equal; 1 if right sorts before left
`lc`	`lc "Bob"`	Return an all-lowercase copy of the given string
`uc`	`uc "lorrie"`	Return an all-uppercase copy of the given string

Table 41-3. Numerical operators

Operator	Example	Description
`+`	`$a + 1`	Numerical addition
`-`	`$c - 2`	Numerical subtraction
`*`	`3 * $b`	Numerical multiplication
`/`	`4/$non_zero`	Numerical division
`++`	`$a++`	Autoincrement; adds one to a number
`==`	`$a == $b`	Numeric equality test
`!=`	`$p != $q`	Numeric inequality test
`<`	`$diff < 32`	Numeric less-than test
`>`	`$sum > 64`	Numeric greater-than test
`<=>`	`$sum <=> 64`	Return -1 if left is numerically less than right; 0 if left equals right; 1 if right is less than left
`<=`	`$sum <= 64`	True if left operand is numerically less than or equal to right
`>=`	`$sum >= 64`	True if left is numerally greater than or equal to right

You may have noticed that some of the operators in the previous tables were described as returning true or false values. A true value in Perl is any value that isn't false, and there are only 4 kinds of false values in Perl:

values that are numerically zero
values that are empty strings
values that are undef
empty lists

Like many other languages, Perl supports Boolean operators (see Table 41-3) that return true or false values. Typically, you encounter these in if statements like the following:

if ($temp < 30 && $is_rainy) {
  print "I'm telecommuting today\n";
}

Another common use of Boolean operators is to short-circuit two expressions. This is a way to prevent the right operand from executing unless the left operand returns a desired truth value. Consider the very ordinary case of opening a filehandle for reading. A common idiom to do this is:

open (FH, "filename") || die "Can't open file";

This short-cut operation depends on the open function returning a true value if it can open the requested file. Only if it cannot is the right side of the || operator executed (die prints whatever message you provide and halts the program).

Table 41-4. Boolean operators

Operator	Example	Description
`&&`	`$a && $b`	True if both $a and $b are true
`\|\|`	`$a \|\| $b`	True if either $a or $b is true
`!`	`!$a`	True if $a is false
`and`	`$a and $b`	Same as &&, but with a lower precedence
`or`	`$a or $b`	Same as \|\|, but with a lower precedence
`not`	`not $a`	Same as !, but with a lower precedence

Looking at Table 41-4, you will notice that there appear to be redundant operators. The operators that are English words have a lower precedence that the symbolic ones. Precedence is simply the order in which Perl executes expressions. You are probably familiar with precedence rules from mathematics:

1 + 2 * 3 + 4 = 11
(1 + 2) * (3 + 4) = 21

Example 41-2. Precedence

lc $a || "BB"   # like (lc $a) || ("BB")
lc ($a || "BB")

Because || has a lower precedence that the lc operator, the first line of Example 41-2 is a Boolean test between two expressions. In the second line, the Boolean || operator is used to create a default argument to lc should $a be a false value.

Because Perl doesn't require parentheses around built-in operators and functions, you will often see code like:

open FH, "> " . "filename" or die "Can't open file";
print FH "[info]: disk write error\n";

Precedence ambiguities can be resolved by using parentheses where doubt occurs.

Although Perl has many special variables, the one you'll encounter most is $_ . Many operators and functions, such as lc and print , will operate on $_ in the absence of an explicit parameter, as in Example 41-3.

Example 41-3. Simple echo loop

while(<>){
   print
}

In this example, every line read from standard input with the <> operator is available inside the while (Section 41.7) loop through $_. The print function, in the absence of an explicit argument, echoes the value of $_. Note that $_ can be assigned to (e.g., $_ = "Hello, Perl") just like any other scalar.

Arrays

When you want to collect more than one value into a variable, you have two ways to go in Perl. If you need an ordered set of values, you will choose to use a Perl array. These variables start with @ and are followed by a label that follows the same convention as a scalar. Two global arrays have already been mentioned: @INC and @ARGV. Since arrays hold multiple values, getting and setting values is a little different from scalars. Here's an example of creating an array with values, looking at one, and assigning a new value to that array index.

@things    = ('phone', 'cat', 'hard drive');
print "The second element is: ", $things[1], "\n";

$things[1] = 'dog';
print "The second element is now: ", $things[1], "\n";

In the first line, the array @things is initialized with a list of three scalar values. Array indexes begin with zero, so the second element is accessed through the index value of 1. Arrays will grow as needed, so you could have added a fourth element like this:

$things[3] = 'DVD player';

Why is a $ used here and not @? Use @ only when referring to the whole array variable. Each element is a scalar whose name is $things[ index]. This rule comes up again when dealing with hashes.

Typically you will want to iterate through all the values in an array, which is done with loops ( Section 41.7). Although there are several looping constructs, the most common idiom to examine all the values in an array sequentially is shown in Example 41-4.

Example 41-4. Using foreach to loop through an array

print "Paths Perl checks for modules\n";
foreach my $el (@INC) {
  print $el, "\n";
}

Lists are a data type that is closely related to arrays. Lists are sequences of scalar values enclosed in parentheses that are not associated with an array variable. They are used to initialize a new array variable. Common array operators are listed in Table 41-5.

my @primes     = (1,3,5,7,9,11);
my @empty_list = ( );

Table 41-5. Common array operators

Name	Example	Description
`pop`	`$last = pop @array;`	Return last element of array; remove that element from array
`push`	`push @array, @` `new_elements` `;`	Add the contents of @`new_elements` to the end of target array
`shift`	`$first = shift @array;`	Return the first element of array; shift all elements one index lower (removing the first element)
`unshift`	`unshift @array, @` `new_elements` `;`	Add @`new_elements` to the beginning of target array

Hashes

Associative arrays, or hashes, are a collection of scalar values that are arranged in key-value pairs. Instead of using integers to retrieve values in a hash, strings are used. Hashes begin with %. Example 41-5 shows a hash variable in action.

Example 41-5. Using hashes

my %birthdays = (
                 'mom'    => 'JUN 14',
                 'archie' => 'JUN 12',
                 'jay'    => 'JUL 11',
                );

print "Archie's birthday is: ", $birthdays{'archie'}, "\n";
$birthday{'joe'} = 'DEC 12';
print "My birthday is: ", $birthdays{'joe'}, "\n";

Hashes are a funny kind of list. When initializing a hash with values, it is common to arrange the list in key-value pairs. The strange-looking => operator is often called a "fat comma" because these two lines of Perl do the same thing:

%birthdays = ( 'jay' => 'JUL 11' );
%birthdays = ( 'jay', 'JUL 11');

Use the fat comma when initializing hashes since it conveys the association between the values better. As an added bonus, the fat comma makes unquoted barewords on its left into quoted strings.

Example 41-6 shows some quoting styles for hash keys.

Example 41-6. Various quoting styles for hash keys

my %baz = ( foo => 1,
            'bar', 2,
            'boz' => 3);

Unlike arrays, hashes use strings to index into the list. So to retrieve the birthday of "jay", put the key inside curly braces, like this:

print "Jay's birthday is: ", $birthdays{'jay'}, "\n";

Because Perl assumes that barewords used as a key when retrieving a hash value are autoquoted, you may omit quotes between the curly braces (e.g., $birthday{jay}). Like arrays, hashes will grow as you need them to. Whenever you need to model a set or record the number of event occurrences, hashes are the variable to use.

Like arrays, you will often need to iterate over the set of key-value pairs in a hash. Two common techniques for doing this are shown in Example 41-7. Table 41-6 lists common Perl hash functions.

Example 41-7. Iterating over a hash

my %example = (foo => 1, bar => 2, baz => 3);

while (my ($key, $value) = %example) {
   print "$key has a value of $value\n";
}

foreach my $key (keys %example) {
  print "$key has a value of $example{$key}\n";
}

Table 41-6. Common Perl hash functions

Name	Example	Description
`delete`	`delete $hash{{` `key`"}	Delete the key-value pair from hash that is indexed on `key`
`each`	`($key, $value) = each %hash`	Return the next key-value pair in hash; the pairs aren't usefully ordered
`exists`	`print "key found" if exists $hash{"` `key`"}	Return true if hash has `key`, even if that key's value if undefined
`keys`	`@keys = keys %hash`	Return the list of keys in the hash; not ordered
`values`	`@values = values %hash`	Return the list of values in the hash; values will be in the same order as keys fetched by `keys %hash`

References

As odd as it may first seem, it is sometimes necessary to have variables for variables. A funny kind of scalar, a reference is a sort of IOU that promises where the original variable's data can be found. References are primarily used in cases. First, because hashes and arrays store only scalar values, the only way to store one multivalued data type in another is to store a reference instead (see the perldsc manpage for more details). Second, when the size of a data structure makes a variable inefficient to pass into subroutines, a reference is passed instead. Third, because arguments passed into subroutines are really just copies of the original, there's no way to change the original values of the arguments back in the calling context. If you give a subroutine a reference as an argument, it can change that value in the caller. Consult the perlref and perlreftut manpages for more details on references.

Taking a reference to a variable is straightforward. Simply use the reference operator, \, to create a reference. For example:

$scalar_ref = \$bob;
$array_ref  = \@things;
$hash_ref   = \%grades;

You can even create references without variables:

$anonymous_array = [ 'Mojo Jo-Jo', 'Fuzzy Lumpkins', 'Him' ];
$anonymous_hash  = { 'pink'  => 'Blossom',
                     'green' => 'Buttercup',
                     'blue'  => 'Bubbles',
                   };

The square brackets return a reference to the list that they surround. The curly braces create a reference to a hash. Arrays and hashes created in this way are called anonymous because there is no named variable to which these references refer.

There are two ways of dereferencing references (that is, getting back the original values). The first way is to use {} . For instance:

print "Your name is: ", ${$scalar_ref};

foreach my $el ( @{$anonymous_array} ) {
  print "Villian: $el\n";
}

while (my ($key, $value) = each %{$anonymous_hash}) {
  print "$key is associated with $value\n";
}

The second way, using ->, is useful only for references to collection types.

print "$anonymous_hash->{'pink'} likes the color pink\n"; # 'Blossom'
print "The scariest villian of all is $anonymous_array->[2]\n"; # 'Him'