Chapter 11. Some Advanced Object Topics

You might wonder, “do all objects inherit from a common class?” “What if a method is missing?” “What about multiple inheritance?” “How come we haven’t seen a reference to a filehandle yet?” Well, wonder no more. This chapter covers these subjects and more.

UNIVERSAL Methods

As you define classes, you create inheritance hierarchies through the global @ISA variables in each package. To search for a method, Perl wanders through the @ISA tree until it finds a match or fails.

After the search fails however, Perl always looks in one special class called UNIVERSAL and invokes a method from there, if found, just as if it had been located in any other class or superclass.

One way to look at this is that UNIVERSAL is the base class from which all objects are derived. Any method you place here, such as:

sub UNIVERSAL::fandango {
  warn "object ", shift, " can do the fandango!\n";
}

enables all objects of your program to be called as $some_object->fandango.

Generally, you should provide a fandango method for specific classes of interest, and then provide a definition in UNIVERSAL::fandango as a backstop, in case a more specific method can’t be found. A practical example might be a data-dumping routine for debugging or maybe a marshalling strategy to dump all application objects to a file. Simply provide the general method in UNIVERSAL and override it in the specific classes for unusual objects.

Obviously, UNIVERSAL should be used sparingly because there’s only one universe of objects, and your fandango might collide with some other included module’s fandango. For this reason, UNIVERSAL is hardly used for anything except methods which must be completely, well, universal. Like during debugging.

Testing Your Objects for Good Behavior

Besides providing a place for you to put universally available methods, the UNIVERSAL package comes preloaded with two very useful utility methods: isa and can. Because these methods are defined in UNIVERSAL, they are automatically available to all objects.

The isa method tests to see whether a given class or instance is a member of a given class or a member of a class that inherits from the given class. For example, continuing on with the Animal-family from the past chapters:

if (Horse->isa("Animal")) {    # does Horse inherit from Animal?
  print "A Horse is an Animal.\n";
}

my $tv_horse = Horse->named("Mr. Ed");
if ($tv_horse->isa("Animal")) { # is it an Animal?
  print $tv_horse->name, " is an Animal.\n";
  if ($tv_horse->isa("Horse")) { # is it a Horse?
    print "In fact, ", $tv_horse->name, " is a Horse.\n";
  } else {
    print "...but it's not a Horse.\n";
  }
}

This is handy when you have a heterogeneous mix of objects in a data structure and want to distinguish particular categories of objects:

my @horses = grep $_->isa("Horse"), @all_animals;

The result will be only the horses (or race horses) from the array. Compare that with:

my @horses_only = ref $_ eq "Horse", @all_animals;

which picks out just the horses because a RaceHorse won’t return Horse for ref.

In general, you shouldn’t use:

 ref($some_object) eq "SomeClass"

in your programs because it prevents future users from subclassing that class. Use the isa construct as given earlier.

One downside of the isa call here is that it works only on blessed references or scalars that look like class names. If you happen to pass it an unblessed reference, you get a fatal (but trappable) error of:

Can't call method "isa" on unblessed reference at ...

To call isa more robustly, don’t call it as a method. Instead, call it as a subroutine:

if (UNIVERSAL::isa($unknown_thing, "Animal")) {
  ... it's an Animal! ...
}

This works regardless of what $unknown_thing contains.

As in the case of isa, you can test for acceptable behaviors with the can method. For example:

if ($tv_horse->can("eat")) {
  $tv_horse->eat("hay");
}

If the result of can is true, then somewhere in the inheritance hierarchy, a class has defined an eat method. Again, the caveats about $tv_horse being only either a blessed reference or a class name as a scalar still apply, so the robust solution when you might deal with nearly anything looks like:

if (UNIVERSAL::can($tv_horse, "eat")) { ... }

Note that if you defined UNIVERSAL::fandango earlier, then:

 $object->can("fandango")

always returns true because all objects can do the fandango.

AUTOLOAD as a Last Resort

After Perl searches the inheritance tree and UNIVERSAL for a method, it doesn’t just stop there if the search is unsuccessful. Perl repeats the search through the very same hierarchy (including UNIVERSAL), looking for a method named AUTOLOAD.

If an AUTOLOAD exists, the subroutine is called in place of the original method, passing it the normal predetermined argument list: the class name or instance reference, followed by any arguments provided to the method call. The original method name is passed in the package variable called $AUTOLOAD (in the package where the subroutine was compiled) and contains the fully qualified method name, so you should generally strip everything up to the final double colon if you want a simple method name.

The AUTOLOAD subroutine can execute the desired operation itself, install a subroutine and then jump into it, or perhaps just die if asked to perform an unknown method.

One use of AUTOLOAD defers the compilation of a large subroutine until it is actually needed. For example, suppose the eat method for an animal is complex but unused in nearly every invocation of the program. You can defer its compilation as follows:

## in Animal
sub AUTOLOAD {
  our $AUTOLOAD;
  (my $method = $AUTOLOAD) =~ s/.*:://s; # remove package name
  if ($method eq "eat") {
    ## define eat:
    eval q{
      sub eat {
        ...
        long
        definition
        goes
        here
        ...
      }
    };                # End of eval's q{  } string
    die $@ if $@;                        # if typo snuck in
    goto &eat;                           # jump into it
  } else {                               # unknown method
    croak "$_[0] does not know how to $method\n";
  }
}

If the method name is eat, you’ll define eat (which had previously been held in a string but not compiled), and then jump into it with a special construct that replaces the current subroutine invocation with an invocation to eat.^[57] After the first AUTOLOAD hit, the eat subroutine is now defined, so won’t be coming back here. This is great for compile-as-you-go programs because it minimizes startup overhead.

For a more automated way of creating code to do this, which makes it easy to turn the autoloading off during development and debugging, see the AutoLoader and SelfLoader core module documentation.

Using AUTOLOAD for Accessors

Chapter 9 showed how to create color and set_color to get and set the color of an animal. If you had 20 attributes instead of one or two, the code would be painfully repetitive. However, using an AUTOLOAD method, you can construct the nearly identical accessors as needed, saving both compilation time and wear-and-tear on the developer’s keyboard.

Use a code reference as a closure to do the job. First, set up an AUTOLOAD for the object and define a list of hash keys for which you want trivial accessors:

sub AUTOLOAD {
  my @elements = qw(color age weight height);

Next, you’ll see if the method is a getter for one of these keys, and if so, install a getter and jump to it:

our $AUTOLOAD;
if ($AUTOLOAD =~ /::(\w+)$/ and grep $1 eq $_, @elements) {
  my $field = ucfirst $1;
  {
    no strict 'refs';
    *{$AUTOLOAD} = sub { $_[0]->{$field} };
  }
  goto &{$AUTOLOAD};
}

You need to use ucfirst because you named the method color to fetch the hash element called Color. The glob notation here installs a wanted subroutine as defined by the coderef closure, which fetches the corresponding key from the object hash. Consider this part to be magic that you just cut and paste into your program. Finally, the goto construct jumps into the newly defined subroutine.

Otherwise, perhaps it’s a setter:

if ($AUTOLOAD =~ /::set_(\w+)$/ and grep $1 eq $_, @elements) {
  my $field = ucfirst $1;
  {
    no strict 'refs';
    *{$AUTOLOAD} = sub { $_[0]->{$field} = $_[1] };
  }
  goto &{$AUTOLOAD};
}

If it is neither, death awaits:

  die "$_[0] does not understand $method\n";
}

Again, you pay the price for the AUTOLOAD only on the first hit of a particular getter or setter. After that, a subroutine is now already defined, and you can just invoke it directly.

Creating Getters and Setters More Easily

If all that coding for creating accessors using AUTOLOAD looks messy, rest assured that you really don’t need to tackle it, because there’s a CPAN module that does it a bit more directly: Class::MethodMaker.

For example, a simplified version of the Animal class might be defined as follows:

package Animal;
use Class::MethodMaker
  new_with_init => 'new',
  get_set => [-eiffel => [qw(color height name age)]],
  abstract => [qw(sound)],
;
sub init {
  my $self = shift;
  $self->set_color($self->default_color);
}
sub named {
  my $self = shift->new;
  $self->set_name(shift);
  $self;
}
sub speak {
  my $self = shift;
  print $self->name, " goes ", $self->sound, "\n";
}
sub eat {
  my $self = shift;
  my $food = shift;
  print $self->name, " eats $food\n";
}
sub default_color {
  "brown";
}

The getters and setters for the four instance attributes (name, height, color, and age) are defined automatically, using the method color to get the color and set_color to set the color. (The eiffel flag says “do it the way the Eiffel language does it,” which is the way it should be done here.) The messy blessing step is now hidden behind a simple new method. The initial color is defined as the default color, as before, because the init method is automatically called from new.

However, you can still call Horse->named('Mr. Ed') because it immediately calls the new routine as well.

The sound method is automatically generated as an abstract method. Abstract methods are placeholders, meant to be defined in a subclass. If a subclass fails to define the method, the method generated for Animal’s sound dies.

You lose the ability to call the getters (such as name) on the class itself, rather than an instance. In turn, this breaks your prior usage of calling speak and eat on generic animals, since they call the accessors. One way around this is to define a more general version of name to handle either a class or instance and then change the other routines to call it:

sub generic_name {
  my $either = shift;
  ref $either ? $either->name : "an unnamed $either";
}
sub speak {
  my $either = shift;
  print $either->generic_name, " goes ", $either->sound, "\n";
}
sub eat {
  my $either = shift;
  my $food = shift;
  print $either->generic_name, " eats $food\n";
}

There. Now it’s looking nearly drop-in compatible with the previous definition, except for those friend classes that referenced the attribute names directly in the hash as the initial-cap-keyed versions (such as Color) rather than through the accessors ($self->color).

That brings up the maintenance issue again. The more you can decouple your implementation (hash versus array, names of hash keys, or types of elements) from the interface (method names, parameter lists, or types of return values), the more flexible and maintainable your system becomes.

That flexibility is not free, however. The cost of a method call is higher than the cost of a hash lookup, so it may be acceptable (or even necessary) for a friend class to peer inside. You may have to pay the programmer-time price of development and maintenance so you don’t pay the runtime price of an overly flexible system.

On the other hand, don’t go overboard in the other direction. Many anecdotes float around about systems where everything was so indirected (to be flexible) that the system ran too slowly to be used.

Multiple Inheritance

How does Perl wander through the @ISA tree? The answer may be simple or complex. If you don’t have multiple inheritance (that is, if no @ISA has more than one element), it is simple: Perl simply goes from one @ISA to the next until it finds the ultimate base class whose @ISA is empty.

Multiple inheritance is more complex. It occurs when a class’s @ISA has more than one element. For example, suppose someone had given an existing class, called Racer, which has the basic abilities for anything that can race, so that it’s ready to be the base class for a runner, a fast car, or a racing turtle. With that, you can make the RaceHorse class as simply as this, maybe:^[58]

{
  package RaceHorse;
  our @ISA = qw{ Horse Racer };
}

Now a RaceHorse can do anything a Horse can do, and anything a Racer can do as well. When Perl searches for a method that’s not provided directly by RaceHorse, it first searches through all the capabilities of the Horse (including all its parent classes, such as Animal). When the Horse possibilities are exhausted, Perl turns to see whether Racer (or one of its subclasses) supplies the needed method. On the other hand, if you want Perl to search Racer and its subclasses before searching Horse, put them into @ISA in that order (see Figure 11-1).

A class may not need to implement any methods of its own if it inherits everything it needs from its parent classes through multiple inheritance

Figure 11-1. A class may not need to implement any methods of its own if it inherits everything it needs from its parent classes through multiple inheritance

References to Filehandles

So far, you’ve seen references to scalars, arrays, hashes, and subroutines. Another important value type in Perl is the filehandle.

However, a filehandle isn’t stored in a variable. The filehandle is the handle itself. You can’t take a reference directly to a filehandle.^[59] However, using the IO::File built-in class, you can create objects that act like filehandles within Perl. Here’s a typical use:

use IO::File;

my $fh = IO::File->open("/etc/passwd")
  or die "constructor failed: $!";

while (<$fh>) {        # $fh acts like any filehandle
  print "a password line is $_";
}

close $fh;             # nearly all built-ins can use IO::File

Here, $fh is constructed using the open class method of IO::File, and then used in places where ordinarily you’d use a traditional (bareword) filehandle. Furthermore, you also get some additional methods:

if ($fh->opened) { ... } # file is open

$fh->blocking(0);       # make I/O be "non-blocking" if supported

The core built-in operations that use filehandles can all use an IO::File objects instead. If the IO::File object is within a simple scalar variable, you can always replace the filehandle with the scalar:

use IO::File;
my $fh = IO::File->new; # create unopened "filehandle" object

open $fh, ">my_new_file" or die "Cannot create: $!";
print $fh "$_\n" for 1..10;
close $fh;

An IO::File object automatically gets closed cleanly when destroyed, so you can simplify the previous code as:

use IO::File;
{
  my $fh = IO::File->open(">my_new_file")
    or die "Cannot create my_new_file: $!";
  print $fh, "$_\n" for 1..10;
}

As $fh goes out of scope, the filehandle is automatically closed. Nice.

If the IO::File object is not named by a simple scalar variable, some operations require a slightly modified syntax to work. For example, copy every file matched by the glob pattern of *.input to a corresponding file whose suffix is .output, but do it in parallel. First, open all the files, both inputs and outputs:

my @handlepairs;
foreach my $file (<*.input>) {
  (my $out = $file) =~ s/\.input$/.output/;
  push @handlepairs, [
    (IO::File->new("<$file") || die),
    (IO::File->new(">$out") || die),
  ];
}

Now you have an array of references to arrays, each element of which is an IO::File object. Let’s pump the data:

while (@handlepairs) {
  @handlepairs = grep {
    if (defined(my $line = $_->[0]->getline)) {
      print { $_->[1] } $line;
    } else {
      0;
    }
  } @handlepairs;
}

As long as you have pairs, keep passing the list through the grep structure:

@handlepairs = grep { CONDITION } @handlepairs;

On each pass, only the handle pairs that evaluate as true in the grep CONDITION survive. Inside, you take the first element of each pair and try to read from it. If that’s successful, write that line to the second element of the pair (the corresponding output handle). If the print is successful, it returns true, which lets grep know that you want to keep that pair. If either the print fails or the getline returned undef, the grep sees the false value as an indication to discard that pair. Discarding the pair automatically closes both filehandles. Cool!

Note that you can’t use the more traditional filehandle read or filehandle print operations because the reading and writing filehandles weren’t in a simple scalar variable. Rewrite that loop to see if copying the handles is easier:

while (@handlepairs) {
  @handlepairs = grep {
    my ($IN, $OUT) = @$_;
    if (defined(my $line = <$IN>)) {
      print $OUT $line;
    } else {
      0;
    }
  } @handlepairs;
}

This scenario is arguably better. Most of the time, simply copying the complexly referenced value into a simple scalar is easier on the eyes. In fact, another way to write that loop is to get rid of the ugly if structure:

while (@handlepairs) {
  @handlepairs = grep {
    my ($IN, $OUT) = @$_;
    my $line;
    defined($line = <IN>) and print $OUT $line;
  } @handlepairs;
}

As long as someone understands that and is a partial evaluator and that print returns true when everything is OK, this is a fine replacement. Remember the Perl motto: “There’s more than one way to do it” (although not all of them are equally nice or legitimate).

Exercise

The answers for all exercises can be found in Section A.10.

Exercise [30 min]

The Professor has to read a log file that looks like:

Gilligan: 1 coconut
Skipper: 3 coconuts
Gilligan: 1 banana
Ginger: 2 papayas
Professor: 3 coconuts
MaryAnn: 2 papayas
...

He wants to write a series of files, called gilligan.info, maryann.info, and so on. Each file should contain all the lines that begin with that name. (Names are always delimited by the trailing colon.) At the end, gilligan.info should start with:

Gilligan: 1 coconut
Gilligan: 1 banana

Now the log file is large, and the coconut-powered computer is not very fast, so he wants to process the input file in one pass and write all output files in parallel. How does he do it?

Hint: Use a hash, keyed by the castaway name, holding IO::File objects for each output file. Create them as necessary.

^[57] Although goto is generally (and rightfully) considered evil, this form of goto, which gives a subroutine name as a target, is not really the evil goto; it’s the good goto.

^[58]If there is a conflict among the methods of Horse and Racer, or if their implementations aren’t able to work together, the situation can become much more difficult.

^[59]You can use the glob, take a reference to the glob, or take a reference to the I/O structure within a glob, but that’s still not a reference to the filehandle.