Chapter 6. Subroutine References

So far, you’ve seen references to arrays and hashes. You can also take a reference to a subroutine (sometimes called a coderef).

Why would you want to do that? Well, in the same way that taking a reference to an array lets you have the same code work on different arrays at different times, taking a reference to a subroutine allows the same code to call different subroutines at different times. Also, references permit complex data structures. A reference to a subroutine allows a subroutine to effectively become part of that complex data structure.

Put another way, a variable or a complex data structure is a repository of values throughout the program. A reference to a subroutine can be thought of as a repository of behavior in a program. The examples in this section show how this works.

The Skipper and Gilligan are having a conversation:

sub skipper_greets {
  my $person = shift;
  print "Skipper: Hey there, $person!\n";
}

sub gilligan_greets {
  my $person = shift;
  if ($person eq "Skipper") {
    print "Gilligan: Sir, yes, sir, $person!\n";
  } else {
    print "Gilligan: Hi, $person!\n";
  }
}

skipper_greets("Gilligan");
gilligan_greets("Skipper");

This results in:

Skipper: Hey there, Gilligan!
Gilligan: Sir, yes, sir, Skipper!

So far, nothing unusual has happened. Note however that Gilligan has two different behaviors, depending on whether he’s addressing the Skipper or someone else.

Now, have the Professor walk into the hut. Both of the Minnow crew greet the newest participant:

skipper_greets("Professor");
gilligan_greets("Professor");

which results in

Skipper: Hey there, Professor!
Gilligan: Hi, Professor!

Now the Professor feels obligated to respond:

sub professor_greets {
  my $person = shift;
  print "Professor: By my calculations, you must be $person!\n";
}

professor_greets("Gilligan");
professor_greets("Skipper");

resulting in:

Professor: By my calculations, you must be Gilligan!\n";
Professor: By my calculations, you must be Skipper!\n";

Whew. A lot of typing and not very general. If each person’s behavior is in a separate named subroutine and a new person walks in the door, you have to figure out what other subroutines to call. You could certainly do it with enough hard-to-maintain code, but you can simplify the process by adding a bit of indirection, just as you did with arrays and hashes.

First, let’s introduce the “take a reference to” operator. It actually needs no introduction because it’s that very same backslash again:

my $ref_to_greeter = \&skipper_greets;

You’re taking a reference to the subroutine skipper_greets( ). Note that the preceding ampersand is mandatory here, and the lack of trailing parentheses is also intentional. The reference to the subroutine (a coderef) is stored within $ref_to_greeter, and like all other references, it fits nearly anywhere a scalar fits.

There’s only one reason to get back to the original subroutine by dereferencing the coderef: to invoke it. Dereferencing a code reference is similar to dereferencing other references. First start with the way you would have written it before you heard of references (including the optional ampersand prefix):

& skipper_greets ( "Gilligan" )

Next, you replace the name of the subroutine with curly braces around the thing holding the reference:

& { $ref_to_greeter } ( "Gilligan" )

There you have it. This construct invokes the subroutine currently referenced by $ref_to_greeter, passing it the single Gilligan parameter.

But boy-oh-boy, is that ugly or what? Luckily, the same reference simplification rules apply. If the value inside the curly braces is a simple scalar variable, you can drop the braces:

& $ref_to_greeter ( "Gilligan" )

You can also flip it around a bit with the arrow notation:

$ref_to_greeter -> ( "Gilligan" )

That last form is particularly handy when the coderef is contained within a larger data structure, as you’ll see in a moment.

To have both Gilligan and the Skipper greet the Professor, you merely need to iterate over all the subroutines:

for my $greet (\&skipper_greets, \&gilligan_greets) {
  $greet->("Professor");
}

First, inside the parentheses, you create a list of two items, each of which is a coderef. The coderefs are then individually dereferenced, invoking the corresponding subroutine and passing it the Professor string.

You’ve seen the coderefs in a scalar variable and as an element of a list. Can you put these coderefs into a larger data structure? Certainly. Create a table that maps people to the behavior they exhibit to greet others, and then rewrite that previous example using the table:

sub skipper_greets {
  my $person = shift;
  print "Skipper: Hey there, $person!\n";
}

sub gilligan_greets {
  my $person = shift;
  if ($person eq "Skipper") {
    print "Gilligan: Sir, yes, sir, $person!\n";
  } else {
    print "Gilligan: Hi, $person!\n";
  }
}

sub professor_greets {
  my $person = shift;
  print "Professor: By my calculations, you must be $person!\n";
}

my %greets = (
  "Gilligan" => \&gilligan_greets,
  "Skipper" => \&skipper_greets,
  "Professor" => \&professor_greets,
);

for my $person (qw(Skipper Gilligan)) {
  $greets{$person}->("Professor");
}

Note that $person is a name, which you look up in the hash to get to a coderef. Then you dereference that coderef, passing it the name of the person being greeted, and you get the correct behavior, resulting in:

Skipper: Hey there, Professor!
Gilligan: Hi, Professor!

Now have everyone greet everyone, in a very friendly room:

sub skipper_greets {
  my $person = shift;
  print "Skipper: Hey there, $person!\n";
}

sub gilligan_greets {
  my $person = shift;
  if ($person eq "Skipper") {
    print "Gilligan: Sir, yes, sir, $person!\n";
  } else {
    print "Gilligan: Hi, $person!\n";
  }
}

sub professor_greets {
  my $person = shift;
  print "Professor: By my calculations, you must be $person!\n";
}

my %greets = ... as before ...

my @everyone = sort keys %greets;
for my $greeter (@everyone) {
  for my $greeted (@everyone) {
    $greets{$greeter}->($greeted)
      unless $greeter eq $greeted; # no talking to yourself
  }
}

This results in:

Gilligan: Hi, Professor!
Gilligan: Sir, yes, sir, Skipper!
Professor: By my calculations, you must be Gilligan!
Professor: By my calculations, you must be Skipper!
Skipper: Hey there, Gilligan!
Skipper: Hey there, Professor!

Hmm. That’s a bit complex. Let’s let them walk into the room one at a time:

sub skipper_greets {
  my $person = shift;
  print "Skipper: Hey there, $person!\n";
}

sub gilligan_greets {
  my $person = shift;
  if ($person eq "Skipper") {
    print "Gilligan: Sir, yes, sir, $person!\n";
  } else {
    print "Gilligan: Hi, $person!\n";
  }
}

sub professor_greets {
  my $person = shift;
  print "Professor: By my calculations, you must be $person!\n";
}

my %greets = ... as before ...

my @room; # initially empty
for my $person (qw(Gilligan Skipper Professor)) {
  print "\n";
  print "$person walks into the room.\n";
  for my $room_person (@room) {
    $greets{$person}->($room_person); # speaks
    $greets{$room_person}->($person); # gets reply
  }
  push @room, $person; # come in, get comfy
}

The result is a typical day on that tropical island:

Gilligan walks into the room.

Skipper walks into the room.
Skipper: Hey there, Gilligan!
Gilligan: Sir, yes, sir, Skipper!

Professor walks into the room.
Professor: By my calculations, you must be Gilligan!
Gilligan: Hi, Professor!
Professor: By my calculations, you must be Skipper!
Skipper: Hey there, Professor!

In that last example, subroutines such as professor_greets( ) were never called explicitly, but indirectly through the coderef. Thus, you wasted some brain cells to come up with a name for the subroutine used only in one other place, to initialize the data structure.

But as you can have anonymous hashes and arrays, you can have anonymous subroutines!

Let’s add another island inhabitant: Ginger. But rather than define her greeting behavior as a named subroutine, create an anonymous subroutine:

my $ginger = sub {
  my $person = shift;
  print "Ginger: (in a sultry voice) Well hello, $person!\n";
};
$ginger->("Skipper");

An anonymous subroutine looks like:

sub { ... body of subroutine ... }

The value in $ginger is a coderef, just as if you had defined the following block as a subroutine and then taken a reference to it. When you reach the last statement, you see:

Ginger: (in a sultry voice) Well hello, Skipper!

Although you kept the value in a scalar variable, you could have put that sub { ... } construct directly into the initialization of the greetings hash:

my %greets = (

  "Skipper" => sub {
    my $person = shift;
    print "Skipper: Hey there, $person!\n";
  },

  "Gilligan" => sub {
    my $person = shift;
    if ($person eq "Skipper") {
      print "Gilligan: Sir, yes, sir, $person!\n";
    } else {
      print "Gilligan: Hi, $person!\n";
    }
  },

  "Professor" => sub {
    my $person = shift;
    print "Professor: By my calculations, you must be $person!\n";
  },

  "Ginger" => sub {
    my $person = shift;
    print "Ginger: (in a sultry voice) Well hello, $person!\n";
  },

);

my @room; # initially empty
for my $person (qw(Gilligan Skipper Professor Ginger)) {
  print "\n";
  print "$person walks into the room.\n";
  for my $room_person (@room) {
    $greets{$person}->($room_person); # speaks
    $greets{$room_person}->($person); # gets reply
  }
  push @room, $person; # come in, get comfy
}

Notice how much it simplifies the code. The subroutine definitions are right within the only data structure that references them directly. The result is straightforward:

Gilligan walks into the room.

Skipper walks into the room.
Skipper: Hey there, Gilligan!
Gilligan: Sir, yes, sir, Skipper!

Professor walks into the room.
Professor: By my calculations, you must be Gilligan!
Gilligan: Hi, Professor!
Professor: By my calculations, you must be Skipper!
Skipper: Hey there, Professor!

Ginger walks into the room.
Ginger: (in a sultry voice) Well hello, Gilligan!
Gilligan: Hi, Ginger!
Ginger: (in a sultry voice) Well hello, Skipper!
Skipper: Hey there, Ginger!
Ginger: (in a sultry voice) Well hello, Professor!
Professor: By my calculations, you must be Ginger!

Adding a few more castaways is as simple as putting the entry for the greeting behavior into the hash and adding them into the list of people entering the room. You get this scaling of effort because you’ve preserved the behavior as data over which you can iterate and look up, thanks to your friendly subroutine references.

A subroutine reference is often used for a callback. A callback defines what to do when a subroutine reaches a particular place in an algorithm.

For example, the File::Find module exports a find subroutine that can efficiently walk through a given filesystem hierarchy in a fairly portable way. In its simplest form, you give the find subroutine two parameters: a starting directory and “what to do” for each file or directory name found recursively below that starting directory. The “what to do” is specified as a subroutine reference:

use File::Find;
sub what_to_do {
  print "$File::Find::name found\n";
}
my @starting_directories = qw(.);

find(\&what_to_do, @starting_directories);

In this case, find starts at the current directory (.) and locates each file or directory. For each item, a call is made to the subroutine what_to_do( ), passing it a few documented values through global variables. In particular, the value of $File::Find::name is the item’s full pathname (beginning with the starting directory).

In this case, you’re passing both data (the list of starting directories) and behavior as parameters to the find routine.

It’s a bit silly to invent a subroutine name just to use the name only once, so you can write the previous code using an anonymous subroutine, such as:

use File::Find;
my @starting_directories = qw(.);

find(
  sub {
    print "$File::Find::name found\n";
  },
  @starting_directories,
);

You could also use File::Find to find out some other things about files, such as their size. For the callback’s convenience, the current working directory is the item’s containing directory, and the item’s name within that directory is found in $_.

Maybe you have noticed that, in the previous code, $File::Find::name was used for the item’s name. So which name is real, $_ or $File::Find::name?

$File::Find::name gives the name relative to the starting directory, but during the callback, the working directory is the one that holds the item just found. For example, suppose that you want find to look for files in the current working directory, so you give it (".") as the list of directories to search. If you call find when the current working directory is /usr, find looks below that directory. When find has located /usr/bin/perl, the current working directory (during the callback) is /usr/bin. $_ holds "perl“; $File::Find::name holds "./bin/perl“, which is the name relative to the directory in which you started the search.

All of this means that the file tests, such as -s, automatically report on the just-found item. Although this is convenient, the current directory inside the callback is different from the search’s starting directory.

What if you want to use File::Find to accumulate the total size of all files seen? The callback subroutine doesn’t support either parameters to be passed in, nor a result returned from the subroutine. But that doesn’t matter. When dereferenced, a subroutine reference can “see” all visible lexical variables when the reference to the subroutine is taken. For example:

use File::Find;

my $total_size = 0;
find(sub { $total_size += -s if -f }, ".");
print $total_size, "\n";

As before, the find routine is called with two parameters: a reference to an anonymous subroutine and the starting directory. When names are found within that directory (and its subdirectories), the subroutine is called.

Note that the subroutine accesses the $total_size variable. This variable is declared outside the scope of the subroutine but still visible to the subroutine. Thus, even though find invokes the callback subroutine (and would not have direct access to $total_size), the callback subroutine accesses and updates the variable.

The kind of subroutine that can access all lexical variables that existed at the time it was declared is called a closure (a term borrowed from the world of mathematics).

Furthermore, the access to the variable from within the closure ensures that the variable remains alive as long as the subroutine reference is alive. For example, let’s number the output files:[27]

use File::Find;

my $callback;
{
  my $count = 0;
  $callback = sub { print ++$count, ": $File::Find::name\n" };
}
find($callback, ".");

Here, you declare a variable to hold the callback. This variable cannot be declared within the naked block (the block following that is not part of a larger Perl syntax construct), or it would be recycled at the end of that block. Next, the lexical $count variable is initialized to 0. An anonymous subroutine is then declared, and a reference to it is placed into $callback. This subroutine is a closure because it refers to the lexical $count variable.

At the end of the naked block, the $count variable goes out of scope. However, because it is still referenced by subroutine in $callback, it stays alive, now as an anonymous scalar variable.[28]

When the callback is invoked from find, the value of the variable formerly known as $count is incremented from 1 to 2 to 3, and so on.

Although a naked block worked nicely to define the callback, having a subroutine return that subroutine reference instead might be more useful:

use File::Find;

sub create_find_callback_that_counts {
  my $count = 0;
  return sub { print ++$count, ": $File::Find::name\n" };
}

my $callback = create_find_callback_that_counts(  );
find($callback, ".");

It’s the same process here, just written a bit differently. When you invoke create_find_callback_that_counts( ), a lexical variable $count is initialized to 0. The return value from that subroutine is a reference to an anonymous subroutine that is also a closure because it accesses the $count variable. Even though $count goes out of scope at the end of the create_find_callback_that_counts( ) subroutine, there’s still a binding between it and the returned subroutine reference, so the variable stays alive until the subroutine reference is finally discarded.

If you reuse the callback, the same variable still has its most recently used value. The initialization occurred in the original subroutine (create_find_callback_that_counts), not the callback (unnamed) subroutine:

use File::Find;

sub create_find_callback_that_counts {
  my $count = 0;
  return sub { print ++$count, ": $File::Find::name\n" };
}

my $callback = create_find_callback_that_counts(  );
print "my bin:\n";
find($callback, "bin");
print "my lib:\n";
find($callback, "lib");

This example prints consecutive numbers starting at 1 for the entries below my bin, but then continues the numbering when you start entries in lib. The same $count variable is used in both cases. However, if you invoke the create_find_callback_that_counts( ) twice, you get two different $count variables:

use File::Find;

sub create_find_callback_that_counts {
  my $count = 0;
  return sub { print ++$count, ": $File::Find::name\n" };
}

my $callback1 = create_find_callback_that_counts(  );
my $callback2 = create_find_callback_that_counts(  );
print "my bin:\n";
find($callback1, "bin");
print "my lib:\n";
find($callback2, "lib");

In this case, you have two separate $count variables, each accessed from within their own callback subroutine.

How would you get the total size of all found files from the callback? Earlier, you were able to do this by making $total_size visible. If you stick the definition of $total_size into the subroutine that returns the callback reference, you won’t have access to the variable. But you can cheat a bit. For one thing, you can determine that the callback subroutine is never called with any parameters, so, if the subroutine is called with a parameter, you can make it return the total size:

use File::Find;

sub create_find_callback_that_sums_the_size {
  my $total_size = 0;
  return sub {
    if (@_) { # it's our dummy invocation
      return $total_size;
    } else { # it's a callback from File::Find:
      $total_size += -s if -f;
    }
  };
}

my $callback = create_find_callback_that_sums_the_size(  );
find($callback, "bin");
my $total_size = $callback->("dummy"); # dummy parameter to get size
print "total size of bin is $total_size\n";

Distinguishing actions by the presence or absence of parameters is not a universal solution. Fortunately, more than one subroutine reference can be created in create_find_callback_that_counts( ):

use File::Find;

sub create_find_callbacks_that_sum_the_size {
  my $total_size = 0;
  return(sub { $total_size += -s if -f }, sub { return $total_size });
}

my ($count_em, $get_results) = create_find_callbacks_that_sum_the_size(  );
find($count_em, "bin");
my $total_size = &$get_results(  );
print "total size of bin is $total_size\n";

Because both subroutine references were created from the same scope, they both have access to the same $total_size variable. Even though the variable has gone out of scope before either subroutine is called, they still share the same heritage and can use the variable to communicate the result of the calculation.

The two subroutine references are not invoked by returning their references from the creating subroutine. The references are just data at that point. It’s not until you invoke them as a callback or an explicit subroutine derefencing that they actually do their duty.

What if you invoke this new subroutine more than once?

use File::Find;

sub create_find_callbacks_that_sum_the_size {
  my $total_size = 0;
  return(sub { $total_size += -s if -f }, sub { return $total_size });
}

## set up the subroutines
my %subs;
foreach my $dir (qw(bin lib man)) {
  my ($callback, $getter) = create_find_callbacks_that_sum_the_size(  );
  $subs{$dir}{CALLBACK} = $callback;
  $subs{$dir}{GETTER} = $getter;
}

## gather the data
for (keys %subs) {
  find($subs{$_}{CALLBACK}, $_);
}

## show the data
for (sort keys %subs) {
  my $sum = $subs{$_}{GETTER}->(  );
  print "$_ has $sum bytes\n";
}

In the “set up the subroutines” section, you create three instances of callback-and-getter pairs. Each callback has a corresponding subroutine to get the results. Next, in the “gather the data” section, you call find three times with each corresponding callback subroutine reference. This updates the individual $total_size variables associated with each callback. Finally, in the “show the data” section, you call the getter routines to fetch the results.

The six subroutines (and the three $total_size variables they share) are reference-counted. When %subs goes away or is modified, the values have their reference counts reduced, recycling the contained data. (If that data also references further data, those reference counts are also reduced appropriately.)

While the previous examples showed closure variables being modified, closure variables are also useful to provide initial or lasting input to the subroutine. For example, let’s write a subroutine to create a File::Find callback that prints files exceeding a certain size:

use File::Find;

sub print_bigger_than {
  my $minimum_size = shift;
  return sub { print "$File::Find::name\n" if -f and -s >= $minimum_size };
}

my $bigger_than_1024 = print_bigger_than(1024);
find($bigger_than_1024, "bin");

The 1024 parameter is passed into the print_bigger_than, which then gets shifted into the $minimum_size lexical variable. Because you access this variable within the subroutine referenced by the return value of the print_bigger_than variable, it becomes a closure variable, with a value that persists for the duration of that subroutine reference. Again, invoking this subroutine multiple times creates distinct “locked-in” values for $minimum_size, each bound to its corresponding subroutine reference.

Closures are “closed” only on lexical variables, since lexical variables eventually go out of scope. Because a package variable (which is a global) never goes out of scope, a closure never closes on a package variable. All subroutines refer to the same single instance of the global variable.

A subroutine doesn’t have to be an anonymous subroutine to be a closure. If a named subroutine accesses lexical variables and those variables go out of scope, the named subroutine retains a reference to the lexicals, just as you saw with anonymous subroutines. For example, consider two routines that count coconuts for Gilligan:

{
  my $count;
  sub count_one { ++$count }
  sub count_so_far { return $count }
}

If you place this code at the beginning of the program, the variable $count is declared, and the two subroutines that reference the variable become closures. However, because they have a name, they will persist beyond the end of the scope (as do all named subroutines). Since the subroutines persist beyond the scope and access variables declared within that scope, they become closures and thus can continue to access $count throughout the lifetime of the program.

So, with a few calls, you can see an incremented count:

count_one(  );
count_one(  );
count_one(  );
print "we have seen ", count_so_far(  ), " coconuts!\n";

$count retains its value between calls to count_one( ) or count_so_far( ), but no other section of code can access this $count at all.

In C, this is known as a static local variable: a variable that is visible to only a subset of the program’s subroutines but persists throughout the life of the program, even between calls to those subroutines.

What if you wanted to count down? Something like this will do:

{
  my $countdown = 10;
  sub count_down { $countdown-- }
  sub count_remaining { $countdown }
}

count_down(  );
count_down(  );
count_down(  );
print "we're down to ", count_remaining(  ), " coconuts!\n";

That is, it’ll do as long as you put it near the beginning of the program, before any invocations of count_down( ) or count_remaining( ). Why?

This block doesn’t work when you put it after those invocations because there are two functional parts to the first line:

my $countdown = 10;

One part is the declaration of $countdown as a lexical variable. That part is noticed and processed as the program is parsed during the compile phase. The second part is the assignment of 10 to the allocated storage. This is handled as the code is executed during the run phase. Unless the run phase is executed for this code, the variable has its initial undef value.

One practical solution to this problem is to change the block in which the static local appears into a BEGIN block:

BEGIN {
  my $countdown = 10;
  sub count_down { $countdown-- }
  sub count_remaining { $countdown }
}

The BEGIN keyword tells the Perl compiler that as soon as this block has been parsed successfully (during the compile phase), jump for a moment to run phase and run the block as well. Presuming the block doesn’t cause a fatal error, compilation then continues with the text following the block. The block itself is also discarded, ensuring that the code within is executed precisely once in a program, even if it had appeared syntactically within a loop or subroutine.

The answers for all exercises can be found in Section A.5.

The Professor modified some files on Monday afternoon, and now he’s forgotten which ones they were. This happens all the time. He wants you to make a subroutine called gather_mtime_between, which, given a starting and ending timestamp, returns a pair of coderefs. The first one will be used with File::Find to gather the names of only the items that were modified between those two times; the second one should return the list of items found.

Here’s some code to try; it should list only items that were last modified on the most recent Monday, although you could easily change it to work with a different day. (You don’t have to type all of this code. This program should be available as the file named ex6-1.plx in the downloadable files, available on the O’Reilly web site.)

Hint: You can find a file’s timestamp (mtime) with code such as:

 my $timestamp = (stat $file_name)[9];

Because it’s a slice, remember that those parentheses are mandatory. Don’t forget that the working directory inside the callback isn’t necessarily the starting directory in which find was called.

use File::Find;
use Time::Local;

my $target_dow = 1;        # Sunday is 0, Monday is 1, ... 
my @starting_directories = (".");

my $seconds_per_day = 24 * 60 * 60;
my($sec, $min, $hour, $day, $mon, $yr, $dow) = localtime;
my $start = timelocal(0, 0, 0, $day, $mon, $yr);        # midnight today
while ($dow != $target_dow) {
  # Back up one day
  $start -= $seconds_per_day;        # hope no DST! :-)
  if (--$dow < 0) {
    $dow += 7;
  }
}
my $stop = $start + $seconds_per_day;

my($gather, $yield)  = gather_mtime_between($start, $stop);
find($gather, @starting_directories);
my @files = $yield->(  );

for my $file (@files) {
  my $mtime = (stat $file)[9];        # mtime via slice
  my $when = localtime $mtime;
  print "$when: $file\n";
}

Note the comment about DST. In many parts of the world, on the days when daylight savings time or summer time kicks in and out, the day is no longer 86,400 seconds long. The program glosses over this issue, but a more pedantic coder might take it into consideration appropriately.



[27] This code seems to have an extra semicolon at the end of the line that assigns to $callback, doesn’t it? But remember, the construct sub { ... } is an expression. Its value (a coderef) is assigned to $callback, and there’s a semicolon at the end of that statement. It’s easy to forget to put the proper punctuation after the closing curly brace of an anonymous subroutine declaration.

[28] To be more accurate, the closure declaration increases the reference count of the referent, as if another reference had been taken explicitly. Just before the end of the naked block, the reference count of $count is two, but after the block has exited, the value still has a reference count of one. Although no other code may access $count, it will still be kept in memory as long as the reference to the sub is available in $callback or elsewhere.