A Perl scalar variable holds a single value. An array holds an ordered list of one or more scalars. A hash holds a collection of scalars as values, keyed by other scalars.
Although a scalar can be an arbitrary string, which allows complex data to be encoded into an array or hash, none of the three data types are well-suited to complex data interrelationships. This is a job for the reference. Let’s look at the importance of references by starting with an example.
Before the Minnow can leave on an excursion (e.g., a three-hour tour), every passenger and crew member should be checked to ensure they have all the required trip items in their possession. Let’s say that for maritime safety, every person on board the Minnow needs to have a life preserver, some sunscreen, a water bottle, and a rain jacket. You can write a bit of code to check for the Skipper’s supplies:
my @required = qw(preserver sunscreen water_bottle jacket); my @skipper = qw(blue_shirt hat jacket preserver sunscreen); for my $item (@required) { unless (grep $item eq $_, @skipper) { # not found in list? print "skipper is missing $item.\n"; } }
The grep
in a scalar
context returns the number of times the expression
$item
eq
$_
returns true, which is 1 if the item is in the list and 0 if
not.[14] If the value is 0, it’s false, and you
print the message.
Of course, if you want to check on Gilligan and the Professor, you might write the following code:
my @gilligan = qw(red_shirt hat lucky_socks water_bottle); for my $item (@required) { unless (grep $item eq $_, @gilligan) { # not found in list? print "gilligan is missing $item.\n"; } } my @professor = qw(sunscreen water_bottle slide_rule batteries radio); for my $item (@required) { unless (grep $item eq $_, @professor) { # not found in list? print "professor is missing $item.\n"; } }
You may start to notice a lot of repeated code here and decide that it would be served best in a subroutine:
sub check_required_items { my $who = shift; my @required = qw(preserver sunscreen water_bottle jacket); for my $item (@required) { unless (grep $item eq $_, @_) { # not found in list? print "$who is missing $item.\n"; } } } my @gilligan = qw(red_shirt hat lucky_socks water_bottle); check_required_items("gilligan", @gilligan);
The subroutine is given five items in
its @_
array initially: the name
gilligan
and the four items belonging to Gilligan.
After the shift
, @_
will have
only the items. Thus, the grep
checks each
required item against the list.
So far, so good. You can check the Skipper and the Professor with just a bit more code:
my @skipper = qw(blue_shirt hat jacket preserver sunscreen); my @professor = qw(sunscreen water_bottle slide_rule batteries radio); check_required_items("skipper", @skipper); check_required_items("professor", @professor);
And for the other passengers, you repeat as needed. Although this code meets the initial requirements, you’ve got two problems to deal with:
To create
@_
, Perl copies the entire contents of the array
to be scanned. This is fine for a few items, but if the array is
large, it seems a bit wasteful to copy the data just to pass it into
a subroutine.
Suppose you want to modify the original array to force the provisions
list to include the mandatory items. Because you have a copy in the
subroutine (“pass by value”), any
changes made to @_
aren’t
reflected automatically in the corresponding provisions
array.[15]
To solve either or both of these problems, you need pass by reference rather than pass by value. And that’s just what the doctor (or Professor) ordered.
Among
its many other meanings, the backslash (\
)
character is also the “take a reference
to” operator. When you use it in front of an array
name, e.g., \@skipper
, the result is a
reference to that array. A reference to the
array is like a pointer: it points at the array, but is not the array
itself.
A reference fits wherever a scalar fits. It can go into an element of an array or a hash, or into a plain scalar variable, like this:
my $reference_to_skipper = \@skipper;
The reference can be copied:
my $second_reference_to_skipper = $reference_to_skipper;
or even:
my $third_reference_skipper = \@skipper;
All three references are completely interchangeable. You can even say they’re identical:
if ($reference_to_skipper == $second_reference_to_skipper) { print "They are identical references.\n"; }
This equality compares the numeric forms of
the two references. The numeric form of the reference is the unique
memory address of the @skipper
internal data
structure, unchanging during the life of the variable. If you look at
the string form instead, with eq
or
print
, you get a debugging string:
ARRAY(0x1a2b3c)
which again is unique for this array because it includes the hexadecimal (base 16) representation of the array’s unique memory address. The debugging string also notes that this is an array reference. Of course, if you ever see something like this in your output, it almost certainly means there’s a bug; users of your program have little interest in hex dumps of storage addresses!
Because a reference can be copied, and passing an argument to a subroutine is really just copying, you can use this code to pass a reference to the array into the subroutine:
my @skipper = qw(blue_shirt hat jacket preserver sunscreen); check_required_items("The Skipper", \@skipper); sub check_required_items { my $who = shift; my $items = shift; my @required = qw(preserver sunscreen water_bottle jacket); ... }
Now
$items
in the subroutine will be a reference to
the array of @skipper
. But how do you get from a
reference back into the original array? By
dereferencing the reference.
If you look at
@skipper
, you’ll see that it
consists of two parts: the @
symbol and the name
of the array. Similarly, the syntax $skipper[1]
consists of the name of the array in the middle and some syntax
around the outside to get at the second element of the array (index
value 1 is the second element because you start counting index values
at 0).
Here’s
the trick: any reference to an array can be placed in curly braces
and written in place of the name of an array, ending up with a method
to access the original array. That is, wherever you write
skipper
to name the array, you use the reference
inside curly braces: { $items }
. For example, both
of these lines refer to the entire array:
@ skipper @{ $items }
whereas both of these refer to the second item of the array:[16]
$ skipper [1] ${ $items }[1]
By using the reference form, you’ve decoupled the code and the method of array access from the actual array. Let’s see how that changes the rest of this subroutine:
sub check_required_items { my $who = shift; my $items = shift; my @required = qw(preserver sunscreen water_bottle jacket); for my $item (@required) { unless (grep $item eq $_, @{$items}) { # not found in list? print "$who is missing $item.\n"; } } }
All you did was replace
@_
(the copy of the provisions list) with
@{$items}
, a dereferencing of the reference to the
original provisions array. Now you can call the subroutine a few
times as before:
my @skipper = qw(blue_shirt hat jacket preserver sunscreen); check_required_items("The Skipper", \@skipper); my @professor = qw(sunscreen water_bottle slide_rule batteries radio); check_required_items("Professor", \@professor); my @gilligan = qw(red_shirt hat lucky_socks water_bottle); check_required_items("Gilligan", \@gilligan);
In
each case, $items
points to a different array, so
the same code applies to different arrays each time it is invoked.
This is one of the most important uses of references: decoupling the
code from the data structure on which it operates so the code can be
reused more readily.
Passing
the array by reference fixes the first of the two problems mentioned
earlier. Now, instead of copying the entire provision list into the
@_
array, you get a single element of a reference
to that provisions array.
Could you have eliminated the two
shift
s at the beginning of the subroutine? Sure,
at the expense of clarity:
sub check_required_items { my @required = qw(preserver sunscreen water_bottle jacket); for my $item (@required) { unless (grep $item eq $_, @{$_[1]}) { # not found in list? print "$_[0] is missing $item.\n"; } } }
You still have two elements in
@_
. The first element is the passenger or crew
member name and is used in the error message. The second element is a
reference to the correct provisions array, used in the
grep
expression.
Most of
the time, the dereferenced array reference is contained in a simple
scalar variable, such as @{$items}
or
${$items}[1]
. In those cases, the curly braces can
be dropped, unambiguously, forming @$items
or
$$items[1]
.
However, the braces cannot be dropped if the value within the braces
is not a simple scalar variable. For example, for
@{$_[1]}
from that last subroutine rewrite, you
can’t remove the braces.
This rule also means that it’s easy to see where the
“missing” braces need to go. When
you see $$items[1]
, a pretty noisy piece of
syntax, you can tell that the curly braces must belong around the
simple scalar variable, $items
. Therefore,
$items
must be a reference to an array.
Thus, an easier-on-the-eyes version of that subroutine might be:
sub check_required_items { my $who = shift; my $items = shift; my @required = qw(preserver sunscreen water_bottle jacket); for my $item (@required) { unless (grep $item eq $_, @$items) { # not found in list? print "$who is missing $item.\n"; } } }
The only difference here is that the
braces were removed for @$items
.
You’ve seen how to solve the excessive copying problem with an array reference. Now let’s look at modifying the original array.
For every missing provision, push that provision onto an array, forcing the passenger to consider the item:
sub check_required_items { my $who = shift; my $items = shift; my @required = qw(preserver sunscreen water_bottle jacket); my @missing = ( ); for my $item (@required) { unless (grep $item eq $_, @$items) { # not found in list? print "$who is missing $item.\n"; push @missing, $item; } } if (@missing) { print "Adding @missing to @$items for $who.\n"; push @$items, @missing; } }
Note the addition of the @missing
array. If you
find any items missing during the scan, push them into
@missing
. If there’s anything
there at the end of the scan, add it to the original provision list.
The key is in the last line of that
subroutine. You’re dereferencing the
$items
array reference, accessing the original
array, and adding the elements from @missing
.
Without passing by reference, you’d modify only a
local copy of the data, which has no effect on the original array.
Also,
@$items
(and its more generic form
@{$items}
) works within a double-quoted string. Do
not include any whitespace between the @
and the
immediately following character, although you can include nearly
arbitrary whitespace within the curly braces as if it were normal
Perl code.
In this
example, the array @_
contains two elements, one
of which is also an array. What if you take a reference to an array
that also contains a reference to an array? You end up with a complex
data structure, which can be quite useful.
For example, iterate over the data for the Skipper, Gilligan, and the Professor by first building a larger data structure holding the entire list of provision lists:
my @skipper = qw(blue_shirt hat jacket preserver sunscreen); my @skipper_with_name = ("Skipper", \@skipper); my @professor = qw(sunscreen water_bottle slide_rule batteries radio); my @professor_with_name = ("Professor", \@professor); my @gilligan = qw(red_shirt hat lucky_socks water_bottle); my @gilligan_with_name = ("Gilligan", \@gilligan);
At this point,
@skipper_with_name
has two elements, the second of
which is an array reference, similar to what was passed to the
subroutine. Now group them all:
my @all_with_names = ( \@skipper_with_name, \@professor_with_name, \@gilligan_with_name, );
Note that you have just three elements, each of which is a reference to an array, each of which has two elements: the name and its corresponding initial provisions. A picture of that is in Figure 3-1.
Figure 3-1. The array @all_with_names holds a multilevel data structure containing strings and references to arrays
Therefore, $all_with_names[2]
will be the array
reference for the Gilligan’s data. If you
dereference it as @{$all_with_names[2]}
, you get a
two-element array, "Gilligan
" and another array
reference.
How
would you access that array reference? Using your rules again,
it’s ${$all_with_names[2]}[1]
. In
other words, taking $all_with_names[2]
, you
dereference it in an expression that would be something like
$DUMMY[1]
as an ordinary array, so
you’ll place {$all_with_names[2]}
in place of DUMMY
.
How do you call the existing check_required_items( )
with this data structure? The following code is easy
enough.
for my $person (@all_with_names) { my $who = $$person[0]; my $provisions_reference = $$person[1]; check_required_items($who, $provisions_reference); }
This requires no changes to the subroutine.
$person
will be each of
$all_with_names[0]
,
$all_with_names[1]
, and
$all_with_names[2]
, as the loop progresses. When
you dereference $$person[0]
, you get
“Skipper,”
“Professor,” and
“Gilligan,” respectively.
$$person[1]
is the corresponding array reference
of provisions for that person.
Of course, you can shortcut this as well, since the entire dereferenced array matches the argument list precisely:
for my $person (@all_with_names) { check_required_items(@$person); }
or even:
check_required_items(@$_) for @all_with_names;
As you can see, various levels of optimization can lead to obfuscation. Be sure to consider where your head will be a month from now when you have to reread your own code. If that’s not enough, consider the new person who takes over your job after you have left.
Look
at the curly-brace dereferencing again. As in the earlier example,
the array reference for Gilligan’s provision list is
${$all_with_names[2]}[1]
. Now, what if you want to
know Gilligan’s first provision? You need to
dereference this item one more level, so
it’s Yet Another Layer of Braces:
${${$all_with_names[2]}[1]}[0]
.
That’s a really noisy piece of syntax. Can you
shorten that? Yes!
Everywhere you write ${DUMMY}[$y]
, you can write
DUMMY->[$y]
instead. In other words, you can
dereference an array reference, picking out a particular element of
that array by simply following the expression defining the array
reference with an arrow and a square-bracketed subscript.
For this example, this means you can pick out the array reference for
Gilligan with a simple $all_with_names[2]->[1]
,
and Gilligan’s first provision with
$all_with_names[2]->[1]->[0]
. Wow,
that’s definitely easier on the eyes.
If that wasn’t already simple
enough, there’s one more rule: if the arrow ends up
between “subscripty kinds of
things,” like square brackets, you can also drop the
arrow. $all_with_names[2]->[1]->[0]
becomes
$all_with_names[2][1][0]
. Now
it’s looking even easier on the eye.
The arrow has to be between subscripty things.
Why wouldn’t it be between? Well, imagine a
reference to the array @all_with_names
:
my $root = \@all_with_names;
Now how do you get to Gilligan’s first item?
$root -> [2] -> [1] -> [0]
More simply, using the “drop arrow” rule, you can use:
$root -> [2][1][0]
You cannot drop the first arrow, however, because that would mean an
array @root
’s third element, an
entirely unrelated data structure. Let’s compare
this to the full curly-brace form again:
${${${$root}[2]}[1]}[0]
It looks much better with the arrow. Note, however, that no shortcut gets the entire array from an array reference. If you want all of Gilligan’s provisions, you say:
@{$root->[2][1]}
Reading this from the inside out, you can think of it like this:
Take $root
.
Dereference it as an array reference, taking the third element of that array (index number 2).
Dereference that as an array reference, taking the second element of that array (index number 1).
Dereference that as an array reference, taking the entire array.
The last step doesn’t have a shortcut arrow form. Oh well.[17]
Just as you can take a reference to an array, you can also take a reference to a hash. Once again, you use the backslash as the “take a reference to” operator:
my %gilligan_info = ( name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', ); my $hash_ref = \%gilligan_info;
You can dereference a hash reference to get back to the original data. The strategy is similar to dereferencing an array reference. Write the hash syntax as you would have without references, and then replace the name of the hash with a pair of curly braces surrounding the thing holding the reference. For example, to pick a particular value for a given key, use:
my $name = $ gilligan_info { 'name' }; my $name = $ { $hash_ref } { 'name' };
In this case, the curly braces have two different meanings. The first pair denotes the expression returning a reference, while the second pair delimits the expression for the hash key.
To perform an operation on the entire hash, you proceed similarly:
my @keys = keys % gilligan_info; my @keys = keys % { $hash_ref };
As with array references, you can use shortcuts to replace the complex curly-braced forms under some circumstances. For example, if the only thing inside the curly braces is a simple scalar variable (as shown in these examples so far), you can drop the curly braces:
my $name = $$hash_ref{'name'}; my @keys = keys %$hash_ref;
Like an array reference, when referring to a specific hash element, you can use an arrow form:
my $name = $hash_ref->{'name'};
Because a hash reference fits wherever a scalar fits, you can create an array of hash references:
my %gilligan_info = ( name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', ); my %skipper_info = ( name => 'Skipper', hat => 'Black', shirt => 'Blue', position => 'Captain', ); my @crew = (\%gilligan_info, \%skipper_info);
Thus, $crew[0]
is a hash reference to the
information about Gilligan. You can get to
Gilligan’s name via any one of:
${ $crew[0] } { 'name' } my $ref = $crew[0]; $$ref{'name'} $crew[0]->{'name'} $crew[0]{'name'}
On that last one, you can still drop the arrow between “subscripty kinds of things,” even though one is an array bracket and one is a hash brace.
Let’s print a crew roster:
my %gilligan_info = ( name => 'Gilligan', hat => 'White', shirt => 'Red', position => 'First Mate', ); my %skipper_info = ( name => 'Skipper', hat => 'Black', shirt => 'Blue', position => 'Captain', ); my @crew = (\%gilligan_info, \%skipper_info); my $format = "%-15s %-7s %-7s %-15s\n"; printf $format, qw(Name Shirt Hat Position); for my $crewmember (@crew) { printf $format, $crewmember->{'name'}, $crewmember->{'shirt'}, $crewmember->{'hat'}, $crewmember->{'position'}; }
That last part looks very repetitive. You can shorten it with a hash slice. Again, if the original syntax is:
@ gilligan_info { qw(name position) }
the hash slice notation from a reference looks like:
@ { $hash_ref } { qw(name position) }
You can drop the first brace pair because the only thing within is a simple scalar value, yielding:
@ $hash_ref { qw(name position) }
Thus, you can replace that final loop with:
for my $crewmember (@crew) { printf $format, @$crewmember{qw(name shirt hat position)}; }
There is no shortcut form with an arrow (->
)
for array slices or hash slices, just as there is no shortcut for
entire arrays or hashes.
A hash reference prints as a string
that looks like HASH(0x1a2b3c)
, showing the
hexadecimal memory address of the hash. That’s not
very useful to an end user and only barely more usable to the
programmer, except as an indication of the lack of appropriate
dereferencing.
The answers for all exercises can be found in Section A.2.
How many different things do these expressions refer to?
$ginger->[2][1] ${$ginger[2]}[1] $ginger->[2]->[1] ${$ginger->[2]}[1]
Using the final version of
check_required_items
, write a subroutine
check_items_for_all
that takes a hash reference as
its only parameter, pointing at a hash whose keys are the people
aboard the Minnow, and whose corresponding values are array
references of the things they intend to bring on board.
For example, the hash reference might be constructed like so:
my @gilligan = ... gilligan items ...; my @skipper = ... skipper items ...; my @professor = ... professor items ...; my %all = ( "Gilligan" => \@gilligan, "Skipper" => \@skipper, "Professor" => \@professor, ); check_items_for_all(\%all);
The newly constructed subroutine should call
check_required_items
for each person in the hash,
updating their provisions list to include the required items.
[14] There are more efficient ways to check list membership for large lists, but for a few items, this is probably the easiest way to do so with just a few lines of code.
[15] Actually, assigning new scalars to elements of
@_
after the shift
modifies the
corresponding variable being passed, but that still
wouldn’t let you extend the array with additional
mandatory provisions.
[16] Note that whitespace was added in these two displays to make the similar parts line up. This whitespace is legal in a program, even though most programs won’t use it.
[17] It’s not that it hasn’t been discussed repeatedly by the Perl developers; it’s just that nobody has come up with a nice backward-compatible syntax with universal appeal.