So far, you’ve seen references to arrays and hashes. You can also take a reference to a subroutine (sometimes called a coderef).
Why would you want to do that? Well, in the same way that taking a reference to an array lets you have the same code work on different arrays at different times, taking a reference to a subroutine allows the same code to call different subroutines at different times. Also, references permit complex data structures. A reference to a subroutine allows a subroutine to effectively become part of that complex data structure.
Put another way, a variable or a complex data structure is a repository of values throughout the program. A reference to a subroutine can be thought of as a repository of behavior in a program. The examples in this section show how this works.
The Skipper and Gilligan are having a conversation:
sub skipper_greets { my $person = shift; print "Skipper: Hey there, $person!\n"; } sub gilligan_greets { my $person = shift; if ($person eq "Skipper") { print "Gilligan: Sir, yes, sir, $person!\n"; } else { print "Gilligan: Hi, $person!\n"; } } skipper_greets("Gilligan"); gilligan_greets("Skipper");
This results in:
Skipper: Hey there, Gilligan! Gilligan: Sir, yes, sir, Skipper!
So far, nothing unusual has happened. Note however that Gilligan has two different behaviors, depending on whether he’s addressing the Skipper or someone else.
Now, have the Professor walk into the hut. Both of the Minnow crew greet the newest participant:
skipper_greets("Professor"); gilligan_greets("Professor");
which results in
Skipper: Hey there, Professor! Gilligan: Hi, Professor!
Now the Professor feels obligated to respond:
sub professor_greets { my $person = shift; print "Professor: By my calculations, you must be $person!\n"; } professor_greets("Gilligan"); professor_greets("Skipper");
resulting in:
Professor: By my calculations, you must be Gilligan!\n"; Professor: By my calculations, you must be Skipper!\n";
Whew. A lot of typing and not very general. If each person’s behavior is in a separate named subroutine and a new person walks in the door, you have to figure out what other subroutines to call. You could certainly do it with enough hard-to-maintain code, but you can simplify the process by adding a bit of indirection, just as you did with arrays and hashes.
First, let’s introduce the “take a reference to” operator. It actually needs no introduction because it’s that very same backslash again:
my $ref_to_greeter = \&skipper_greets;
You’re taking a
reference to the subroutine skipper_greets( )
.
Note that the preceding ampersand is mandatory here, and the lack of
trailing parentheses is also intentional. The reference to the
subroutine (a coderef) is stored within
$ref_to_greeter
, and like all other references, it
fits nearly anywhere a scalar fits.
There’s only one reason to get back to the original subroutine by dereferencing the coderef: to invoke it. Dereferencing a code reference is similar to dereferencing other references. First start with the way you would have written it before you heard of references (including the optional ampersand prefix):
& skipper_greets ( "Gilligan" )
Next, you replace the name of the subroutine with curly braces around the thing holding the reference:
& { $ref_to_greeter } ( "Gilligan" )
There you have it. This construct invokes the subroutine currently
referenced by $ref_to_greeter
, passing it the
single Gilligan
parameter.
But boy-oh-boy, is that ugly or what? Luckily, the same reference simplification rules apply. If the value inside the curly braces is a simple scalar variable, you can drop the braces:
& $ref_to_greeter ( "Gilligan" )
You can also flip it around a bit with the arrow notation:
$ref_to_greeter -> ( "Gilligan" )
That last form is particularly handy when the coderef is contained within a larger data structure, as you’ll see in a moment.
To have both Gilligan and the Skipper greet the Professor, you merely need to iterate over all the subroutines:
for my $greet (\&skipper_greets, \&gilligan_greets) { $greet->("Professor"); }
First,
inside the parentheses, you create a list of two items, each of which
is a coderef. The coderefs are then individually dereferenced,
invoking the corresponding subroutine and passing it the
Professor
string.
You’ve seen the coderefs in a scalar variable and as an element of a list. Can you put these coderefs into a larger data structure? Certainly. Create a table that maps people to the behavior they exhibit to greet others, and then rewrite that previous example using the table:
sub skipper_greets { my $person = shift; print "Skipper: Hey there, $person!\n"; } sub gilligan_greets { my $person = shift; if ($person eq "Skipper") { print "Gilligan: Sir, yes, sir, $person!\n"; } else { print "Gilligan: Hi, $person!\n"; } } sub professor_greets { my $person = shift; print "Professor: By my calculations, you must be $person!\n"; } my %greets = ( "Gilligan" => \&gilligan_greets, "Skipper" => \&skipper_greets, "Professor" => \&professor_greets, ); for my $person (qw(Skipper Gilligan)) { $greets{$person}->("Professor"); }
Note that $person
is a name, which you look up in
the hash to get to a coderef. Then you dereference that coderef,
passing it the name of the person being greeted, and you get the
correct behavior, resulting in:
Skipper: Hey there, Professor! Gilligan: Hi, Professor!
Now have everyone greet everyone, in a very friendly room:
sub skipper_greets { my $person = shift; print "Skipper: Hey there, $person!\n"; } sub gilligan_greets { my $person = shift; if ($person eq "Skipper") { print "Gilligan: Sir, yes, sir, $person!\n"; } else { print "Gilligan: Hi, $person!\n"; } } sub professor_greets { my $person = shift; print "Professor: By my calculations, you must be $person!\n"; } my %greets = ... as before ... my @everyone = sort keys %greets; for my $greeter (@everyone) { for my $greeted (@everyone) { $greets{$greeter}->($greeted) unless $greeter eq $greeted; # no talking to yourself } }
This results in:
Gilligan: Hi, Professor! Gilligan: Sir, yes, sir, Skipper! Professor: By my calculations, you must be Gilligan! Professor: By my calculations, you must be Skipper! Skipper: Hey there, Gilligan! Skipper: Hey there, Professor!
Hmm. That’s a bit complex. Let’s let them walk into the room one at a time:
sub skipper_greets { my $person = shift; print "Skipper: Hey there, $person!\n"; } sub gilligan_greets { my $person = shift; if ($person eq "Skipper") { print "Gilligan: Sir, yes, sir, $person!\n"; } else { print "Gilligan: Hi, $person!\n"; } } sub professor_greets { my $person = shift; print "Professor: By my calculations, you must be $person!\n"; } my %greets = ... as before ... my @room; # initially empty for my $person (qw(Gilligan Skipper Professor)) { print "\n"; print "$person walks into the room.\n"; for my $room_person (@room) { $greets{$person}->($room_person); # speaks $greets{$room_person}->($person); # gets reply } push @room, $person; # come in, get comfy }
The result is a typical day on that tropical island:
Gilligan walks into the room. Skipper walks into the room. Skipper: Hey there, Gilligan! Gilligan: Sir, yes, sir, Skipper! Professor walks into the room. Professor: By my calculations, you must be Gilligan! Gilligan: Hi, Professor! Professor: By my calculations, you must be Skipper! Skipper: Hey there, Professor!
In
that last example, subroutines such as professor_greets( )
were never called explicitly, but indirectly through the
coderef. Thus, you wasted some brain cells to come up with a name for
the subroutine used only in one other place, to initialize the data
structure.
But as you can have anonymous hashes and arrays, you can have anonymous subroutines!
Let’s add another island inhabitant: Ginger. But rather than define her greeting behavior as a named subroutine, create an anonymous subroutine:
my $ginger = sub { my $person = shift; print "Ginger: (in a sultry voice) Well hello, $person!\n"; }; $ginger->("Skipper");
An anonymous subroutine looks like:
sub { ... body of subroutine ... }
The value in $ginger
is a coderef, just as if you
had defined the following block as a subroutine and then taken a
reference to it. When you reach the last statement, you see:
Ginger: (in a sultry voice) Well hello, Skipper!
Although you kept the value in a scalar variable, you could have put
that sub { ... }
construct directly into the
initialization of the greetings hash:
my %greets = ( "Skipper" => sub { my $person = shift; print "Skipper: Hey there, $person!\n"; }, "Gilligan" => sub { my $person = shift; if ($person eq "Skipper") { print "Gilligan: Sir, yes, sir, $person!\n"; } else { print "Gilligan: Hi, $person!\n"; } }, "Professor" => sub { my $person = shift; print "Professor: By my calculations, you must be $person!\n"; }, "Ginger" => sub { my $person = shift; print "Ginger: (in a sultry voice) Well hello, $person!\n"; }, ); my @room; # initially empty for my $person (qw(Gilligan Skipper Professor Ginger)) { print "\n"; print "$person walks into the room.\n"; for my $room_person (@room) { $greets{$person}->($room_person); # speaks $greets{$room_person}->($person); # gets reply } push @room, $person; # come in, get comfy }
Notice how much it simplifies the code. The subroutine definitions are right within the only data structure that references them directly. The result is straightforward:
Gilligan walks into the room. Skipper walks into the room. Skipper: Hey there, Gilligan! Gilligan: Sir, yes, sir, Skipper! Professor walks into the room. Professor: By my calculations, you must be Gilligan! Gilligan: Hi, Professor! Professor: By my calculations, you must be Skipper! Skipper: Hey there, Professor! Ginger walks into the room. Ginger: (in a sultry voice) Well hello, Gilligan! Gilligan: Hi, Ginger! Ginger: (in a sultry voice) Well hello, Skipper! Skipper: Hey there, Ginger! Ginger: (in a sultry voice) Well hello, Professor! Professor: By my calculations, you must be Ginger!
Adding a few more castaways is as simple as putting the entry for the greeting behavior into the hash and adding them into the list of people entering the room. You get this scaling of effort because you’ve preserved the behavior as data over which you can iterate and look up, thanks to your friendly subroutine references.
A subroutine reference is often used for a callback. A callback defines what to do when a subroutine reaches a particular place in an algorithm.
For example, the
File::Find
module exports a
find
subroutine that can efficiently walk through
a given filesystem hierarchy in a fairly portable way. In its
simplest form, you give the find
subroutine two
parameters: a starting directory and “what to
do” for each file or directory name found
recursively below that starting directory. The “what
to do” is specified as a subroutine reference:
use File::Find; sub what_to_do { print "$File::Find::name found\n"; } my @starting_directories = qw(.); find(\&what_to_do, @starting_directories);
In this case,
find
starts at the current directory
(.) and locates each file or directory. For each
item, a call is made to the subroutine what_to_do( )
, passing it a few documented values through global
variables. In particular, the value of
$File::Find::name
is the item’s
full pathname (beginning with the starting directory).
In this case,
you’re passing both data (the list of starting
directories) and behavior as parameters to the
find
routine.
It’s a bit silly to invent a subroutine name just to use the name only once, so you can write the previous code using an anonymous subroutine, such as:
use File::Find; my @starting_directories = qw(.); find( sub { print "$File::Find::name found\n"; }, @starting_directories, );
You
could also use File::Find
to find out some other
things about files, such as their size. For the
callback’s convenience, the current working
directory is the item’s containing directory, and
the item’s name within that directory is found in
$_
.
Maybe you have noticed that, in the previous code,
$File::Find::name
was used for the
item’s name. So which name is real,
$_
or $File::Find::name
?
$File::Find::name
gives the name relative to the
starting directory, but during the callback, the working directory is
the one that holds the item just found. For example, suppose that you
want find
to look for files in the current working
directory, so you give it (".")
as the list of
directories to search. If you call find
when the
current working directory is /usr
,
find
looks below that directory. When
find
has located /usr/bin/perl
,
the current working directory (during the callback) is
/usr/bin
. $_
holds
"perl
“; $File::Find::name
holds
"./bin/perl
“, which is the name relative to the
directory in which you started the search.
All of this means that the file tests, such as -s
,
automatically report on the just-found item. Although this is
convenient, the current directory inside the callback is different
from the search’s starting directory.
What if you want to use
File::Find
to accumulate the total size of all
files seen? The callback subroutine doesn’t support
either parameters to be passed in, nor a result returned from the
subroutine. But that doesn’t matter. When
dereferenced, a subroutine reference can
“see” all visible lexical variables
when the reference to the subroutine is taken. For example:
use File::Find; my $total_size = 0; find(sub { $total_size += -s if -f }, "."); print $total_size, "\n";
As before, the find
routine is called with two
parameters: a reference to an anonymous subroutine and the starting
directory. When names are found within that directory (and its
subdirectories), the subroutine is called.
Note that the subroutine accesses the
$total_size
variable. This variable is declared
outside the scope of the subroutine but still visible to the
subroutine. Thus, even though find
invokes the
callback subroutine (and would not have direct access to
$total_size
), the callback subroutine accesses and
updates the variable.
The kind of subroutine that can access all lexical variables that existed at the time it was declared is called a closure (a term borrowed from the world of mathematics).
Furthermore, the access to the variable from within the closure ensures that the variable remains alive as long as the subroutine reference is alive. For example, let’s number the output files:[27]
use File::Find; my $callback; { my $count = 0; $callback = sub { print ++$count, ": $File::Find::name\n" }; } find($callback, ".");
Here, you
declare a variable to hold the callback. This variable cannot be
declared within the naked block (the block following that is not part
of a larger Perl syntax construct), or it would be recycled at the
end of that block. Next, the lexical $count
variable is initialized to 0. An anonymous subroutine is then
declared, and a reference to it is placed into
$callback
. This subroutine is a closure because it
refers to the lexical $count
variable.
At the end of the naked block, the
$count
variable goes out of scope. However,
because it is still referenced by subroutine in
$callback
, it stays alive, now as an anonymous
scalar variable.[28]
When the callback is invoked from find
, the value
of the variable formerly known as $count
is
incremented from 1 to 2 to 3, and so on.
Although a naked block worked nicely to define the callback, having a subroutine return that subroutine reference instead might be more useful:
use File::Find; sub create_find_callback_that_counts { my $count = 0; return sub { print ++$count, ": $File::Find::name\n" }; } my $callback = create_find_callback_that_counts( ); find($callback, ".");
It’s the same
process here, just written a bit differently. When you invoke
create_find_callback_that_counts( )
, a lexical
variable $count
is initialized to 0. The return
value from that subroutine is a reference to an anonymous subroutine
that is also a closure because it accesses the
$count
variable. Even though
$count
goes out of scope at the end of the
create_find_callback_that_counts( )
subroutine,
there’s still a binding between it and the returned
subroutine reference, so the variable stays alive until the
subroutine reference is finally discarded.
If you reuse the callback, the same
variable still has its most recently used value. The initialization
occurred in the original subroutine
(create_find_callback_that_counts
), not the
callback (unnamed) subroutine:
use File::Find; sub create_find_callback_that_counts { my $count = 0; return sub { print ++$count, ": $File::Find::name\n" }; } my $callback = create_find_callback_that_counts( ); print "my bin:\n"; find($callback, "bin"); print "my lib:\n"; find($callback, "lib");
This example prints consecutive numbers starting at 1 for the entries
below my
bin
, but then
continues the numbering when you start entries in
lib
. The same $count
variable
is used in both cases. However, if you invoke the
create_find_callback_that_counts( )
twice, you get
two different $count
variables:
use File::Find; sub create_find_callback_that_counts { my $count = 0; return sub { print ++$count, ": $File::Find::name\n" }; } my $callback1 = create_find_callback_that_counts( ); my $callback2 = create_find_callback_that_counts( ); print "my bin:\n"; find($callback1, "bin"); print "my lib:\n"; find($callback2, "lib");
In this case, you have two separate
$count
variables, each accessed from within their
own callback subroutine.
How would you get the total size of
all found files from the callback? Earlier, you were able to do this
by making $total_size
visible. If you stick the
definition of $total_size
into the subroutine that
returns the callback reference, you won’t have
access to the variable. But you can cheat a bit. For one thing, you
can determine that the callback subroutine is never called with any
parameters, so, if the subroutine is called with a parameter, you can
make it return the total size:
use File::Find; sub create_find_callback_that_sums_the_size { my $total_size = 0; return sub { if (@_) { # it's our dummy invocation return $total_size; } else { # it's a callback from File::Find: $total_size += -s if -f; } }; } my $callback = create_find_callback_that_sums_the_size( ); find($callback, "bin"); my $total_size = $callback->("dummy"); # dummy parameter to get size print "total size of bin is $total_size\n";
Distinguishing actions by the presence or
absence of parameters is not a universal solution. Fortunately, more
than one subroutine reference can be created in
create_find_callback_that_counts( )
:
use File::Find; sub create_find_callbacks_that_sum_the_size { my $total_size = 0; return(sub { $total_size += -s if -f }, sub { return $total_size }); } my ($count_em, $get_results) = create_find_callbacks_that_sum_the_size( ); find($count_em, "bin"); my $total_size = &$get_results( ); print "total size of bin is $total_size\n";
Because both subroutine references were
created from the same scope, they both have access to the same
$total_size
variable. Even though the variable has
gone out of scope before either subroutine is called, they still
share the same heritage and can use the variable to communicate the
result of the calculation.
The two subroutine references are not invoked by returning their references from the creating subroutine. The references are just data at that point. It’s not until you invoke them as a callback or an explicit subroutine derefencing that they actually do their duty.
What if you invoke this new subroutine more than once?
use File::Find; sub create_find_callbacks_that_sum_the_size { my $total_size = 0; return(sub { $total_size += -s if -f }, sub { return $total_size }); } ## set up the subroutines my %subs; foreach my $dir (qw(bin lib man)) { my ($callback, $getter) = create_find_callbacks_that_sum_the_size( ); $subs{$dir}{CALLBACK} = $callback; $subs{$dir}{GETTER} = $getter; } ## gather the data for (keys %subs) { find($subs{$_}{CALLBACK}, $_); } ## show the data for (sort keys %subs) { my $sum = $subs{$_}{GETTER}->( ); print "$_ has $sum bytes\n"; }
In the
“set up the subroutines” section,
you create three instances of callback-and-getter pairs. Each
callback has a corresponding subroutine to get the results. Next, in
the “gather the data” section, you
call find
three times with each corresponding
callback subroutine reference. This updates the individual
$total_size
variables associated with each
callback. Finally, in the “show the
data” section, you call the getter routines to fetch
the results.
The six subroutines (and the
three $total_size
variables they share) are
reference-counted. When %subs
goes away or is
modified, the values have their reference counts reduced, recycling
the contained data. (If that data also references further data, those
reference counts are also reduced appropriately.)
While the previous examples showed closure variables being modified,
closure variables are also useful to provide initial or lasting input
to the subroutine. For example, let’s write a
subroutine to create a File::Find
callback that
prints files exceeding a certain size:
use File::Find; sub print_bigger_than { my $minimum_size = shift; return sub { print "$File::Find::name\n" if -f and -s >= $minimum_size }; } my $bigger_than_1024 = print_bigger_than(1024); find($bigger_than_1024, "bin");
The 1024
parameter is passed into the print_bigger_than
,
which then gets shifted into the $minimum_size
lexical variable. Because you access this variable within the
subroutine referenced by the return value of the
print_bigger_than
variable, it becomes a closure
variable, with a value that persists for the duration of that
subroutine reference. Again, invoking this subroutine multiple times
creates distinct “locked-in” values
for $minimum_size
, each bound to its corresponding
subroutine reference.
Closures are “closed” only on lexical variables, since lexical variables eventually go out of scope. Because a package variable (which is a global) never goes out of scope, a closure never closes on a package variable. All subroutines refer to the same single instance of the global variable.
A subroutine doesn’t have to be an anonymous subroutine to be a closure. If a named subroutine accesses lexical variables and those variables go out of scope, the named subroutine retains a reference to the lexicals, just as you saw with anonymous subroutines. For example, consider two routines that count coconuts for Gilligan:
{ my $count; sub count_one { ++$count } sub count_so_far { return $count } }
If you place this code at the beginning of
the program, the variable $count
is declared, and
the two subroutines that reference the variable become closures.
However, because they have a name, they will persist beyond the end
of the scope (as do all named subroutines). Since the subroutines
persist beyond the scope and access variables declared within that
scope, they become closures and thus can continue to access
$count
throughout the lifetime of the program.
So, with a few calls, you can see an incremented count:
count_one( ); count_one( ); count_one( ); print "we have seen ", count_so_far( ), " coconuts!\n";
$count
retains its value between calls to
count_one( )
or count_so_far( )
, but no other section of code can access this
$count
at all.
In C, this is known as a static local variable: a variable that is visible to only a subset of the program’s subroutines but persists throughout the life of the program, even between calls to those subroutines.
What if you wanted to count down? Something like this will do:
{ my $countdown = 10; sub count_down { $countdown-- } sub count_remaining { $countdown } } count_down( ); count_down( ); count_down( ); print "we're down to ", count_remaining( ), " coconuts!\n";
That is, it’ll do as long as you put it near the
beginning of the program, before any invocations of
count_down( )
or count_remaining( )
. Why?
This block doesn’t work when you put it after those invocations because there are two functional parts to the first line:
my $countdown = 10;
One part is the declaration of
$countdown
as a lexical variable. That part is
noticed and processed as the program is parsed during the
compile phase. The second part is the assignment
of 10 to the allocated storage. This is handled as the code is
executed during the run phase. Unless the run
phase is executed for this code, the variable has its initial
undef
value.
One
practical solution to this problem is to change the block in which
the static local appears into a BEGIN
block:
BEGIN { my $countdown = 10; sub count_down { $countdown-- } sub count_remaining { $countdown } }
The BEGIN
keyword
tells the Perl compiler that as soon as this block has been parsed
successfully (during the compile phase), jump for a moment to run
phase and run the block as well. Presuming the block
doesn’t cause a fatal error, compilation then
continues with the text following the block. The block itself is also
discarded, ensuring that the code within is executed precisely once
in a program, even if it had appeared syntactically within a loop or
subroutine.
The answers for all exercises can be found in Section A.5.
The Professor modified some files on
Monday afternoon, and now he’s forgotten which ones
they were. This happens all the time. He wants you to make a
subroutine called gather_mtime_between
, which,
given a starting and ending timestamp, returns a pair of coderefs.
The first one will be used with File::Find
to
gather the names of only the items that were modified between those
two times; the second one should return the list of items found.
Here’s some code to try; it should list only items
that were last modified on the most recent Monday, although you could
easily change it to work with a different day. (You
don’t have to type all of this code. This program
should be available as the file named ex6-1.plx
in
the downloadable files, available on the O’Reilly
web site.)
Hint: You can find a
file’s timestamp (mtime
) with
code such as:
my $timestamp = (stat $file_name)[9];
Because it’s a slice, remember that those
parentheses are mandatory. Don’t forget that the
working directory inside the callback isn’t
necessarily the starting directory in which find
was called.
use File::Find; use Time::Local; my $target_dow = 1; # Sunday is 0, Monday is 1, ... my @starting_directories = ("."); my $seconds_per_day = 24 * 60 * 60; my($sec, $min, $hour, $day, $mon, $yr, $dow) = localtime; my $start = timelocal(0, 0, 0, $day, $mon, $yr); # midnight today while ($dow != $target_dow) { # Back up one day $start -= $seconds_per_day; # hope no DST! :-) if (--$dow < 0) { $dow += 7; } } my $stop = $start + $seconds_per_day; my($gather, $yield) = gather_mtime_between($start, $stop); find($gather, @starting_directories); my @files = $yield->( ); for my $file (@files) { my $mtime = (stat $file)[9]; # mtime via slice my $when = localtime $mtime; print "$when: $file\n"; }
Note the comment about DST. In many parts of the world, on the days when daylight savings time or summer time kicks in and out, the day is no longer 86,400 seconds long. The program glosses over this issue, but a more pedantic coder might take it into consideration appropriately.
[27] This code seems to have an extra
semicolon at the end of the line that assigns to
$callback
, doesn’t it? But
remember, the construct sub {
..
. }
is an expression. Its
value (a coderef) is assigned to $callback
, and
there’s a semicolon at the end of that statement.
It’s easy to forget to put the proper punctuation
after the closing curly brace of an anonymous subroutine
declaration.
[28] To be more accurate, the closure
declaration increases the reference count of the referent, as if
another reference had been taken explicitly. Just before the end of
the naked block, the reference count of $count
is
two, but after the block has exited, the value still has a reference
count of one. Although no other code may access
$count
, it will still be kept in memory as long as
the reference to the sub is available in $callback
or elsewhere.