Chapter 9. Objects with Data

Using the simple syntax introduced in Chapter 8, you have class methods, (multiple) inheritance, overriding, and extending. You’ve been able to factor out common code and provide a way to reuse implementations with variations. This is at the core of what objects provide, but objects also provide instance data, which we haven’t even begun to cover.

Let’s look at the code used in Chapter 8 for the Animal classes and Horse classes:

{ package Animal;
  sub speak {
    my $class = shift;
    print "a $class goes ", $class->sound, "!\n"
  }
}
{ package Horse;
  @ISA = qw(Animal);
  sub sound { "neigh" }
}

This lets you invoke Horse->speak to ripple upward to Animal::speak, calling back to Horse::sound to get the specific sound, and the output of:

a Horse goes neigh!

But all Horse objects would have to be absolutely identical. If you add a subroutine, all horses automatically share it. That’s great for making horses identical, but how do you capture the properties of an individual horse? For example, suppose you want to give your horse a name. There’s got to be a way to keep its name separate from those of other horses.

You can do so by establishing an instance. An instance is generally created by a class, much like a car is created by a car factory. An instance will have associated properties, called instance variables (or member variables, if you come from a C++ or Java background). An instance has a unique identity (like the serial number of a registered horse), shared properties (the color and talents of the horse), and common behavior (i.e., pulling the reins back tells the horse to stop).

In Perl, an instance must be a reference to one of the built-in types. Start with the simplest reference that can hold a horse’s name: a scalar reference:[37]

my $name = "Mr. Ed";
my $tv_horse = \$name;

Now $tv_horse is a reference to what will be the instance-specific data (the name). The final step in turning this into a real instance involves a special operator called bless:

bless $tv_horse, "Horse";

The bless operator follows the reference to find what variable it points to—in this case the scalar $name. Then it “blesses” that variable, turning $tv_horse into an object —a Horse object, in fact. (Imagine that a little sticky-note that says Horse is now attached to $name.)

At this point, $tv_horse is an instance of Horse.[38] That is, it’s a specific horse. The reference is otherwise unchanged and can still be used with traditional dereferencing operators.[39]

The method arrow can be used on instances, as well as names of packages (classes). Let’s get the sound that $tv_horse makes:

my $noise = $tv_horse->sound;

To invoke sound, Perl first notes that $tv_horse is a blessed reference, and thus an instance. Perl then constructs an argument list, similar to the way an argument list was constructed when you used the method arrow with a class name. In this case, it’ll be just ($tv_horse). (Later you’ll see that arguments will take their place following the instance variable, just as with classes.)

Now for the fun part: Perl takes the class in which the instance was blessed, in this case Horse, and uses it to locate the subroutine to invoke the method, as if you had said Horse->sound instead of $tv_horse->sound. The purpose of the original blessing is to associate a class with that reference to allow the proper method (subroutine) to be found.

In this case, Horse::sound is found directly (without using inheritance), yielding the final subroutine invocation:

Horse::sound($tv_horse)

Note that the first parameter here is still the instance, not the name of the class as before. neigh is the return value, which ends up as the earlier $noise variable.

If Horse::sound had not been found, you’d wander up the @Horse::ISA list to try to find the method in one of the superclasses, just as for a class method. The only difference between a class method and an instance method is whether the first parameter is an instance (a blessed reference) or a class name (a string).[40]

Because you get the instance as the first parameter, you can now access the instance-specific data. In this case, let’s add a way to get at the name:

{ package Horse;
  @ISA = qw(Animal);
  sub sound { "neigh" }
  sub name {
    my $self = shift;
    $$self;
  }
}

Now you call for the name:

print $tv_horse->name, " says ", $tv_horse->sound, "\n";

Inside Horse::name, the @_ array contains just $tv_horse, which the shift stores into $self. It’s traditional to shift the first parameter into a variable named $self for instance methods, so stay with that unless you have strong reasons otherwise. Perl places no significance on the name $self, however.[41]

Then $self is dereferenced as a scalar reference, yielding Mr. Ed. The result is:

Mr. Ed says neigh.

If you constructed all your horses by hand, you’d most likely make mistakes from time to time. Making the “inside guts” of a Horse visible also violates one of the principles of OOP. That’s good if you’re a veterinarian but not if you just like to own horses. Let the Horse class build a new horse:

{ package Horse;
  @ISA = qw(Animal);
  sub sound { "neigh" }
  sub name {
    my $self = shift;
    $$self;
  }
  sub named {
    my $class = shift;
    my $name = shift;
    bless \$name, $class;
  }
}

Now with the new named method, build a Horse:

my $tv_horse = Horse->named("Mr. Ed");

You’re back to a class method, so the two arguments to Horse::named are "Horse" and "Mr. Ed“. The bless operator not only blesses $name, it also returns the reference to $name, so that’s fine as a return value. And that’s how to build a horse.

You called the constructor named here so it quickly denotes the constructor’s argument as the name for this particular Horse. You can use different constructors with different names for different ways of “giving birth” to the object (such as recording its pedigree or date of birth). However, you’ll find that most people coming to Perl from less-flexible languages (such as Java or C++) use a single constructor named new, with various ways of interpreting the arguments to new. Either style is fine, as long as you document your particular way of giving birth to an object. Most core and CPAN modules use new, with notable exceptions, such as DBI’s DBI->connect( ). It’s really up to the author. It all works, as long as it’s documented.

Was there anything specific to Horse in that method? No. Therefore, it’s also the same recipe for building anything else inherited from Animal, so let’s put it there:

{ package Animal;
  sub speak {
    my $class = shift;
    print "a $class goes ", $class->sound, "!\n"
  }
  sub name {
    my $self = shift;
    $$self;
  }
  sub named {
    my $class = shift;
    my $name = shift;
    bless \$name, $class;
  }
}
{ package Horse;
  @ISA = qw(Animal);
  sub sound { "neigh" }
}

Ahh, but what happens if you invoke speak on an instance?

my $tv_horse = Horse->named("Mr. Ed");
$tv_horse->speak;

You get a debugging value:

a Horse=SCALAR(0xaca42ac) goes neigh!

Why? Because the Animal::speak routine expects a classname as its first parameter, not an instance. When the instance is passed in, you’ll use a blessed scalar reference as a string, which shows up as you saw it just now—similar to a stringified reference, but with the class name in front.

All you need to fix this is a way to detect whether the method is called on a class or an instance. The most straightforward way to find out is with the ref operator. This operator returns a string (the classname) when used on a blessed reference, and undef when used on a string (like a classname). Modify the name method first to notice the change:

sub name {
  my $either = shift;
  ref $either
    ? $$either                # it's an instance, return name
    : "an unnamed $either";   # it's a class, return generic
}

Here the ?: operator selects either the dereference or a derived string. Now you can use it with either an instance or a class. Note that you changed the first parameter holder to $either to show that it is intentional:

print Horse->name, "\n";      # prints "an unnamed Horse\n"

my $tv_horse = Horse->named("Mr. Ed");
print $tv_horse->name, "\n";   # prints "Mr Ed.\n"

and now you’ll fix speak to use this:

sub speak {
  my $either = shift;
  print $either->name, " goes ", $either->sound, "\n";
}

Since sound already worked with either a class or an instance, you’re done!

Let’s train your animals to eat:

{ package Animal;
  sub named {
    my $class = shift;
    my $name = shift;
    bless \$name, $class;
  }
  sub name {
    my $either = shift;
    ref $either
      ? $$either # it's an instance, return name
      : "an unnamed $either"; # it's a class, return generic
  }
  sub speak {
    my $either = shift;
    print $either->name, " goes ", $either->sound, "\n";
  }
  sub eat {
    my $either = shift;
    my $food = shift;
    print $either->name, " eats $food.\n";
  }
}
{ package Horse;
  @ISA = qw(Animal);
  sub sound { "neigh" }
}
{ package Sheep;
  @ISA = qw(Animal);
  sub sound { "baaaah" }
}

Now try it out:

my $tv_horse = Horse->named("Mr. Ed");
$tv_horse->eat("hay");
Sheep->eat("grass");

It prints:

Mr. Ed eats hay.
an unnamed Sheep eats grass.

An instance method with parameters gets invoked with the instance, and then the list of parameters. That first invocation is like:

Animal::eat($tv_horse, "hay");

The instance methods form the Application Programming Interface (API) for an object. Most of the effort involved in designing a good object class goes into the API design because the API defines how reusable and maintainable the object and its subclasses will be. Do not rush to freeze an API design before you’ve considered how the object will be used.

What if an instance needs more data? Most interesting instances are made of many items, each of which can in turn be a reference or another object. The easiest way to store these items is often in a hash. The keys of the hash serve as the names of parts of the object (also called instance or member variables), and the corresponding values are, well, the values.

How do you turn the horse into a hash?[42] Recall that an object is any blessed reference. You can just as easily make it a blessed hash reference as a blessed scalar reference, as long as everything that looks at the reference is changed accordingly.

Let’s make a sheep that has a name and a color:

my $lost = bless { Name => "Bo", Color => "white" }, Sheep;

$lost->{Name} has Bo, and $lost->{Color} has white. But you want to make $lost->name access the name, and that’s now messed up because it’s expecting a scalar reference. Not to worry, because it’s pretty easy to fix up:

## in Animal
sub name {
  my $either = shift;
  ref $either
    ? $either->{Name}
    : "an unnamed $either";
}

named still builds a scalar sheep, so let’s fix that as well:

## in Animal
sub named {
  my $class = shift;
  my $name = shift;
  my $self = { Name => $name, Color => $class->default_color };
  bless $self, $class;
}

What’s this default_color? If named has only the name, you still need to set a color, so you’ll have a class-specific initial color. For a sheep, you might define it as white:

## in Sheep
sub default_color { "white" }

Then to keep from having to define one for each additional class, define a backstop method, which serves as the “default default,” directly in Animal:

## in Animal
sub default_color { "brown" }

Thus, all animals are brown (muddy, perhaps), unless a specific animal class gives a specific override to this method.

Now, because name and named were the only methods that referenced the structure of the object, the remaining methods can stay the same, so speak still works as before. This supports another basic rule of OOP: if the structure of the object is accessed only by the object’s own methods or inherited methods, there’s less code to change when it’s time to modify that structure.

Having all horses be brown would be boring. Let’s add a method or two to get and set the color:

## in Animal
sub color {
  my $self = shift;
  $self->{Color};
}
sub set_color {
  my $self = shift;
  $self->{Color} = shift;
}

Now you can fix that color for Mr. Ed:

my $tv_horse = Horse->named("Mr. Ed");
$tv_horse->set_color("black-and-white");
print $tv_horse->name, " is colored ", $tv_horse->color, "\n";

which results in:

Mr. Ed is colored black-and-white

Because of the way the code is written, the setter also returns the updated value. Think about this (and document it) when you write a setter. What does the setter return? Here are some common variations:

Each has advantages and disadvantages. For example, if you return the updated parameter, you can use it again for another object:

$tv_horse->set_color( $eating->set_color( color_from_user(  ) ));

The implementation given earlier returns the newly updated value. Frequently, this is the easiest code to write, and often the fastest to execute.

If you return the previous parameter, you can easily create “set this value temporarily to that” functions:

{
  my $old_color = $tv_horse->set_color("orange");
  ... do things with $tv_horse ...
  $tv_horse->set_color($old_color);
}

This is implemented as:

sub set_color {
  my $self = shift;
  my $old = $self->{Color};
  $self->{Color} = shift;
  $old;
}

For more efficiency, you can avoid stashing the previous value when in a void context using the wantarray function:

sub set_color {
  my $self = shift;
  if (defined wantarray) {
    # this method call is not in void context, so
    # the return value matters
    my $old = $self->{Color};
    $self->{Color} = shift;
    $old;
  } else {
    # this method call is in void context
    $self->{Color} = shift;
  }
}

If you return the object itself, you can chain settings:

my $tv_horse =
  Horse->named("Mr. Ed")
       ->set_color("grey")
       ->set_age(4)
       ->set_height("17 hands");

This works because the output of each setter is the original object, becoming the object for the next method call. Implementing this is again relatively easy:

sub set_color {
  my $self = shift;
  $self->{Color} = shift;
  $self;
}

The void context trick can be used here too, although with questionable value because you’ve already established $self.

Finally, returning a success status is useful if it’s fairly common for an update to fail, rather than an exceptional event. The other variations would have to indicate failure by throwing an exception with die.

In summary: use what you want, be consistent if you can, but document it nonetheless (and don’t change it after you’ve already released one version).

You might have obtained or set the color outside the class simply by following the hash reference: $tv_horse->{Color}. However, this violates the encapsulation of the object by exposing its internal structure. The object is supposed to be a black box, but you’ve pried off the hinges and looked inside.

One purpose of OOP is to enable the maintainer of Animal or Horse to make reasonably independent changes to the implementation of the methods and still have the exported interface work properly. To see why accessing the hash directly violates this, let’s say that Animal no longer uses a simple color name for the color, but instead changes to use a computed RGB triple to store the color (holding it as an arrayref), as in:

use Color::Conversions qw(color_name_to_rgb rgb_to_color_name);
...
sub set_color {
  my $self = shift;
  my $new_color = shift;
  $self->{Color} = color_name_to_rgb($new_color);  # arrayref
}
sub color {
  my $self = shift;
  rgb_to_color_name($self->{Color});               # takes arrayref
}

The old interface can be maintained if you use a setter and getter because they can perform the translations. You can also add new interfaces now to enable the direct setting and getting of the RGB triple:

sub set_color_rgb {
  my $self = shift;
  $self->{Color} = [@_];                # set colors to remaining parameters
}
sub get_color_rgb {
  my $self = shift;
  @{ $self->{Color} };                  # return RGB list
}

If you use code outside the class that looks at $tv_horse->{Color} directly, this change is no longer possible. Store a string ('blue') where an arrayref is needed ([0,0,255]) or use an arrayref as a string.

Because you’re going to play nice and always call the getters and setters instead of reaching into the data structure, getters and setters are called frequently. To save a teeny-tiny bit of time, you might see these getters and setters written as:

## in Animal
sub color {
  $_[0]->{Color}
}
sub set_color {
  $_[0]->{Color} = $_[1];
}

Here’s an alternate way to access the arguments: $_[0] is used in place, rather than with a shift. Functionally, this example is identical to the previous implementation, but it’s slightly faster, at the expense of some ugliness.

Another alternative to the pattern of creating two different methods for getting and setting a parameter is to create one method that notes whether or not it gets any additional arguments. If the arguments are absent, it’s a get operation; if the arguments are present, it’s a set operation. A simple version looks like:

sub color {
  my $shift;
  if (@_) {              # are there any more parameters?
    # yes, it's a setter:
    $self->{Color} = shift;
  } else {
    # no, it's a getter:
    $self->{Color};
  }
}

Now you can say:

my $tv_horse = Horse->named("Mr. Ed");
$tv_horse->color("black-and-white");
print $tv_horse->name, " is colored ", $tv_horse->color, "\n";

The presence of the parameter in the second line denotes that you are setting the color, while its absence in the third line indicates a getter.

While this strategy might at first seem attractive because of its apparent simplicity, it complicates the actions of the getter (which will be called frequently). This strategy also makes it difficult to search through your listings to find only the setters of a particular parameter, which are often more important than the getters. In fact, we’ve been burned by this in the past when a setter became a getter because another function returned more parameters than expected after an upgrade.

Setting the name of an unnameable generic Horse is probably not a good idea; neither is calling named on an instance. Nothing in the Perl subroutine definition says “this is a class method” or “this is an instance method.” Fortunately, the ref operator lets you throw an exception when called incorrectly. As an example of instance- or class-only methods, consider the following:

use Carp qw(croak);

sub instance_only {
  ref(my $self = shift) or croak "instance variable needed";
  ... use $self as the instance ...
}

sub class_only {
  ref(my $class = shift) and croak "class name needed";
  ... use $class as the class ...
}

Here, the ref function returns true for an instance or false for a class. If the undesired value is returned, you’ll croak, which has the added advantage of placing the blame on the caller, not on you. The caller will get an error message like this, giving the line number in their code where the wrong method was called:

instance variable needed at their_code line 1234

While this seems like a good thing to do all the time, practically no CPAN or core modules add this extra checking. Maybe it’s only for the ultra-paranoid.

The answers for all exercises can be found in Section A.8.



[37] The simplest, but rarely used in real code for reasons you’ll see shortly

[38] Actually, $tv_horse points to the object, but in common terms, you nearly always deal with objects by references to those objects. Hence, it’s simpler to say that $tv_horse is the horse, not “the thing that $tv_horse is referencing.”

[39] Although doing so outside the class is a bad idea, as you’ll see later.

[40] This is perhaps different from other OOP languages with which you may be familiar.

[41] If you come from another OO language background, you might choose $this or $me for the variable name, but you’ll probably confuse most other Perl OO-hackers.

[42] Other than calling on a butcher, that is.