Chapter 12. Using Modules

A module is a building block for your program: a set of related subroutines and variables packaged so it can be reused. This chapter looks at the basics of modules: how to bring in modules that others have written, and how to write modules of your own.

Sample Function-Oriented Interface: File::Basename

To understand what happens with use, look at one of the many modules included with a normal Perl distribution: File::Basename. This module parses file specifications into useful pieces in a mostly portable manner. The default usage:

use File::Basename;

introduces three subroutines, fileparse, basename, and dirname,^[60] into the current package: typically, main in the main part of your program. From this point forward, within this package, you can say: ^[61]

my $basename = basename($some_full_path);
my $dirname = dirname($some_full_path);

as if you had written the basename and dirname subroutines yourself, or (nearly) as if they were built-in Perl functions.^[62]

However, suppose you already had a dirname subroutine? You’ve now overwritten it with the definition provided by File::Basename! If you had turned on warnings, you’d see a message stating that, but otherwise, Perl really doesn’t care.

Selecting What to Import

Fortunately, you can tell the use operation to limit its actions. Do this by specifying a list of subroutine names following the module name, called the import list:

use File::Basename ("fileparse", "basename");

Now define the two given subroutines from the module, leaving your own dirname alone. Of course, this is awkward to type, so more often you’ll see this written as:

use File::Basename qw( fileparse basename );

In fact, even if there’s only one item, you tend to write it with a qw( ) list for consistency and maintenance; often you’ll go back to say “give me another one from here,” and it’s simpler if it’s already a qw( ) list.

You’ve protected the local dirname routine, but what if you still want the functionality provided by File::Basename’s dirname? No problem. Just spell it out in full:

my $dirname = File::Basename::dirname($some_path);

The list of names following use doesn’t change which subroutine is defined in the module’s package (in this case, File::Basename). You can always use the full name regardless of the import list, as in:^[63]

my $basename = File::Basename::basename($some_path);

In an extreme (but extremely useful) case, you can specify an empty list for the import list, as in:

use File::Basename (  );              # no import
my $base = File::Basename::basename($some_path);

An empty list is different from an absent list. An empty list says “don’t give me anything in my current package,” while an absent list says “give me the defaults.”^[64] If the module’s author has done her job well, the default will probably be exactly what you want.

Sample Object-Oriented Interface: File::Spec

Contrast the subroutines imported by File::Basename with what another core (non-CPAN) module has by looking at File::Spec. The File::Spec module is designed to support operations commonly performed on file specifications. (A file specification is usually a file or directory name, but it may be a name of a file that doesn’t exist—in which case, it’s not really a filename, is it?)

Unlike the File::Basename module, the File::Spec module has a primarily object-oriented interface. Saying:

use File::Spec;

in your program imports no subroutines into the current package. Instead, you’re expected to access the functionality of the module using class methods:

my $filespec = File::Spec->catfile( $homedir{gilligan},
      'web_docs', 'photos', 'USS_Minnow.gif' );

This calls the class method catfile of the File::Spec class, building a path appropriate for the local operating system, and returns a single string.^[65] This is similar in syntax to the nearly two dozen other operations provided by File::Spec: they’re all called as class methods. No instances are ever created.

While it is never stated in the documentation, perhaps the purpose of creating these as class methods rather than as imported subroutines is to free the user of the module from having to worry about namespace collisions (as you saw for dirname in the previous section). The idea of an object class without objects, however, seems a bit off kilter. Perhaps, that’s why the module’s author also provides a more traditional interface with:

use File::Spec::Functions qw(catfile curdir);

in which two of the File::Spec’s many functions are imported as ordinary callable subroutines:

my $filespec = catfile( $homedir{gilligan},
      'web_docs', 'photos', 'USS_Minnow.gif' );

A More Typical Object-Oriented Module: Math::BigInt

So as not to get dismayed about how “un-OO” the File::Spec module might be, let’s look at yet another core module, Math::BigInt:

use Math::BigInt;
my $value = Math::BigInt->new(2); # start with 2
$value->bpow(1000);               # take 2**1000
print $value->bstr(  ), "\n";       # print it out

Here, nothing is imported. The entire interface calls class methods such as new against the class name to create instances, and then calls instance methods against those instances.

The Differences Between OO and Non-OO Modules

A primarily OO module is distinguished from a primarily non-OO module in two ways:

A primarily OO module has functions that are meant to be called as a class methods, possibly returning instances upon which you issue further instance method calls.
A primarily non-OO module generally doesn’t export any functions at all, making the import list rather irrelevant.

Because methods of the OO module are meant to be called as class methods, they should all set aside their first argument, which is the class name. This class name blesses a new instance but is otherwise ignored. Thus, you should not call OO modules as if they were functional modules, and vice versa. Stick with the design of the module.

What use Is Doing

So, just what is that use doing? How does the import list come in to action? Perl interprets the use list as a particular form of BEGIN block wrapped around a require and a method call. For example, the following two operations are equivalent:

use Island::Plotting::Maps qw( load_map scale_map draw_map );

BEGIN {
  require Island::Plotting::Maps;
  Island::Plotting::Maps->import( qw( load_map scale_map draw_map ) );
}

Break this code down piece by piece. First, the require. This require is a package-name require, rather than the string-expression require from earlier chapters. The colons are turned into the native directory separator (such as / for Unix-like systems), and the name is suffixed with .pm (for “perl module”). For this example on a Unix-like system, you end up with:

require "Island/Plotting/Maps.pm";

Recalling the operation of require from earlier, this means you look in the current value of @INC, checking through each directory for a subdirectory named Island that contains a further subdirectory named Plotting that contains the file named Maps.pm.^[66]

If an appropriate file isn’t found after looking at all of @INC, the program dies.^[67] Otherwise, the first file found is read and evaluated. As always with require, the last expression evaluated must be true (or the program dies),^[68] and once a file has been read, it will not be reread if requested again.^[69]

In the module interface, the require‘d file is expected to define subroutines in the same-named package, not the caller’s package. So, for example, a portion of the File::Basename file might look something like this, if you took out all the good stuff:

package File::Basename;
sub dirname { ... }
sub basename { ... }
sub fileparse { ... }
1;

These three subroutines are then defined in the File::Basename package, not the package in which the use occurs. A require‘d file must return a true value, so it’s traditional to use 1; as the last line of a module’s code.

How are these subroutines imported from the module’s package to the user’s package? That’s the second step inside the BEGIN block. A routine called import in the module’s package is called, passing along the entire import list.The module author is responsible for providing an appropriate import routine. It’s easier than it sounds, as discussed later in this chapter.

Finally, the whole thing is wrapped in a BEGIN block. This implies that the use operation happens at compile time, rather than runtime, and indeed it does. Thus, subroutines are associated with those defined in the module, prototypes are properly defined, and so on.

Setting the Path at the Right Time

The downside of use being executed at compile time is that it also looks at @INC at compile time, which can break your program in hard-to-understand ways unless you take @INC into consideration.

For example, suppose you have your own directory under /home/gilligan/lib, and you place your own Navigation::SeatOfPants module in /home/gilligan/lib/Navigation/SeatOfPants.pm. Simply saying:

use Navigation::SeatOfPants;

is unlikely to do anything useful because only the system directories (and typically the current directory) are considered for @INC. However, even adding:

push @INC, "/home/gilligan/lib";   # broken
use Navigation::SeatOfPants;

doesn’t work. Why? Because the push happens at runtime, long after the use was attempted at compile time. One way to fix this is to add a BEGIN block around the push:

BEGIN { push @INC, "/home/gilligan/lib"; }
use Navigation::SeatOfPants;

Now the BEGIN block compiles and executes at compile time, setting up the proper path for the following use.

However, this is noisy and prone to require far more explanation than you might be comfortable with, especially for the maintenance programmer who has to edit your code later. Let’s replace all that clutter with a simple pragma:

use lib "/home/gilligan/lib";
use Navigation::SeatOfPants;

Here, the lib pragma takes one or more arguments and adds them at the beginning of the @INC array (think “unshift”).^[70] It does so because it is processed at compile time, not runtime. Hence, it’s ready in time for the use immediately following.

Because a use lib pragma will pretty much always have a site-dependent pathname, it is traditional and encouraged to put it near the top of the file. This makes it easier to find and update when the file needs to move to a new system or when the lib directory’s name changes. (Of course, you can eliminate use lib entirely if you can install your modules in a standard @INC locations, but that’s not always practical.)

Think of use lib as not “use this library,” but rather “use this path to find my libraries (and modules).” Too often, you see code written like:

use lib "/home/gilligan/lib/Navigation/SeatOfPants.pm"; # WRONG

and then the programmer wonders why it didn’t pull in the definitions. Be aware that use lib indeed runs at compile time, so this also doesn’t work:

my $LIB_DIR = "/home/gilligan/lib";
...
use lib $LIB_DIR;     # BROKEN
use Navigation::SeatOfPants;

Certainly the declaration of $LIB_DIR is established at compile time (so you won’t get an error with use strict, although the actual use lib should complain), but the actual initialization to the /home/gilligan/lib/ path happens at runtime. Oops, too late again!

At this point, you need to put something inside a BEGIN block or perhaps rely on yet another compile-time operation: setting a constant with use constant:

use constant LIB_DIR => "/home/gilligan/lib";
...
use lib LIB_DIR;
use Navigation::SeatOfPants;

There. Fixed again. That is, until you need the library to depend on the result of a calculation. (Where will it all end? Somebody stop the madness!) This should handle about 99 percent of your needs.

Importing with Exporter

Earlier we skipped over that “and now magic happens” part where the import routine (defined by the module author) is supposed to take File::Basename::fileparse and somehow alias it into the caller’s package so it’s callable as fileparse.

Perl provides a lot of introspection capabilities. Specifically, you can look at the symbol table (where all subroutines and most variables are named), see what is defined, and alter those definitions. You saw a bit of that back in the AUTOLOAD mechanism earlier. In fact, as the author of File::Basename, if you simply want to force filename, basename, and fileparse from the current package into the main package, you can write import like this:

sub import {
  no strict 'refs';
  for (qw(filename basename fileparse)) {
    *{"main::$_"} = \&$_;
  }
}

Boy, is that cryptic! And limited. What if you didn’t want fileparse? What if you invoked use in a package other than main?

Thankfully, there’s a standard import that’s available in the Exporter module. As the module author, all you do is add:

use Exporter;
our @ISA = qw(Exporter);

Now the import call to the package will inherit upward to the Exporter class, providing an import routine that knows how to take a list of subroutines^[71] and export them to the caller’s package.

@EXPORT and @EXPORT_OK

The import provided by Exporter examines the @EXPORT variable in the module’s package to determine which variables are exported by default. For example, File::Basename might do something like:

package File::Basename;
our @EXPORT = qw( basename dirname fileparse );
use Exporter;
our @ISA = qw(Exporter);

The @EXPORT list both defines a list of available subroutines for export (the public interface) and provides a default list to be used when no import list is specified. For example, these two calls are equivalent:

use File::Basename;

BEGIN { require File::Basename; File::Basename->import }

No list is passed to import. In that case, the Exporter->import routine looks at @EXPORT and provides everything in the list.^[72]

What if you had subroutines you didn’t want as part of the default import but would still be available if requested? You can add those subroutines to the @EXPORT_OK list in the module’s package. For example, suppose that Gilligan’s module provides the guess_direction_toward routine by default but could also provide the ask_the_skipper_about and get_north_from_professor routines, if requested. You can start it like this:

package Navigate::SeatOfPants;
our @EXPORT = qw(guess_direction_toward);
our @EXPORT_OK = qw(ask_the_skipper_about get_north_from_professor);
use Exporter;
our @ISA = qw(Exporter);

The following invocations would then be valid:

use Navigate::SeatOfPants;  # gets guess_direction_toward

use Navigate::SeatOfPants qw(guess_direction_toward); # same

use Navigate::SeatOfPants
  qw(guess_direction_toward ask_the_skipper_about);

use Navigate::SeatOfPants
  qw(ask_the_skipper_about get_north_from_professor);
  ## does NOT import guess_direction_toward!

If any names are specified, they must come from either @EXPORT or @EXPORT_OK, so this request is rejected by Exporter->import:

use Navigate::SeatOfPants qw(according_to_GPS);

because according_to_GPS is in neither @EXPORT nor @EXPORT_OK.^[73] Thus, with those two arrays, you have control over your public interface. This does not stop someone from saying Navigate::SeatOfPants::according_to_GPS (if it existed), but at least now it’s obvious that they’re using something the module author didn’t intend to offer them.

As described in the Exporter manpage, a few shortcuts are available automatically. You can provide a list that is the same as asking for the default:

use Navigate::SeatOfPants qw(:DEFAULT);

or the default plus some others:

use Navigate::SeatOfPants qw(:DEFAULT get_north_from_professor);

These are rarely seen in practice. Why? The purpose of explicitly providing an import list generally means you want to control the subroutine names you use in your program. Those last examples do not insulate you from future changes to the module, which may import additional subroutines that could collide with your code.^[74]

In a few cases, a module may supply dozens or hundreds of possible symbols. These modules can use advanced techniques (described in the Exporter documentation) to make it easy to import batches of related symbols. For example, the core Fcntl module makes the flock constants available as a group with the :flock tag:

use Fcntl qw( :flock );        # import all flock constants

Exporting in a Primarily OO Module

As seen earlier, the normal means of using an object-oriented module is to call class methods and then methods against instances resulting from constructors of that class. This means that an OO module typically exports nothing, so you’ll have:

package My::OOModule::Base;
our @EXPORT = (  ); # you may even omit this line
use Exporter;
our @ISA = qw(Exporter);

As stated in Chapter 8, you can even shorten this down:

package My::OOModule::Base;
use base qw(Exporter);

What if you then derive a class from this base class? The most important thing to remember is that the import method must be defined from the Exporter class, so you add it like so:

package My::OOModule::Derived;
use base qw(Exporter My::OOModule::Base);

However, wouldn’t the call to My::OOModule::Derived->import eventually find its way up to Exporter via My::OOModule::Base? Sure it would. So you can leave that out:

package My::OOModule::Derived;
use base qw(My::OOModule::Base);

Only the base classes at the top of the tree need specify Exporter and only when they derive from no other classes.

Please be aware of all the other reserved method names that can’t be used by your OO module (as described in the Exporter manpage). At the time of this writing, the list is export_to_level, require_version, and export_fail. Also, you may wish to reserve unimport because that routine will be called by replacing use with no. That use is rare for user-written modules, however.

Even though an OO module typically exports nothing, you might choose to export a named constructor or management routine. This routine typically acts a bit like a class method but is meant to be called as a normal routine.

One example can be found in the LWP library (on the CPAN). The URI::URL module (now deprecated and replaced by the URI module) deals with universal resource identifiers, most commonly seen as URLs such as http://www.gilligan.crew.hut/maps/island.pdf. You can construct a URI::URL object as a traditional object constructor with:

use URI::URL;
my $u = URI::URL->new("http://www.gilligan.crew.hut/maps/island.pdf");

The default import list for URI::URL also imports a url subroutine, which can be used as a constructor as well:

use URI::URL;
my $u = url("http://www.gilligan.crew.hut/maps/island.pdf");

Because this imported routine isn’t a class method, you don’t use the arrow method call to invoke it. Also, the routine is unlike anything else in the module: no initial class parameter is passed. Even though normal subroutines and method calls are both defined as subroutines in the package, the caller and the author must agree as to which is which.

The url convenience routine was nice, initially. However, it also clashed with the same-name routine in CGI.pm, leading to interesting errors (especially in a mod_perl setting). (The modern interface in the URI module doesn’t export such a constructor.) Prior to that, in order to prevent a crash, you had to remember to bring it in as:

use URI::URL (  );        # don't import "url"
my $u = URI::URL->new(...);

Custom Import Routines

Let’s use CGI.pm as an example of a custom import routine. Not satisfied with the incredible flexibility of the Exporter’s import routine, author Lincoln Stein created a special import for the CGI module.^[75] If you’ve ever gawked at the dizzying array of options that can appear after use CGI, it’s all a simple matter of programming.

As part of the extension provided by this custom import, you can use the CGI module as an object-oriented module:

use CGI;
my $q = CGI->new;         # create a query object
my $f = $q->param("foo"); # get the foo field

or a function-oriented module:

use CGI qw(param);        # import the param function
my $f = param("foo");     # get the foo field

If you don’t want to spell out every possible subfunction, bring them all in:

use CGI qw(:all);         # define "param" and 800-gazillion others
my $f = param("foo");

And then there’s pragmata available. For example, if you want to disable the normal sticky field handling, simply add -nosticky into the import list:

use CGI qw(-nosticky :all);

If you want to create the start_table and end_table routines, in addition to the others, it’s simply:

use CGI qw(-nosticky :all *table);

Truly a dizzying array of options.

Exercise

The answers for all exercises can be found in Section A.11.

Exercise [15 min]

Take the library you created in Chapter 2 and turn it into a module you can bring in with use. Alter the invoking code so that it uses the imported routines (rather than the full path), and test it.

^[60]As well as a utility routine, fileparse_set_fstype.

^[61]The new symbols are available for all code compiled in the current package from this point on, whether it’s in this same file or not. However, these symbols won’t be available in a different package.

^[62]These routines pick out the filename and the directory parts of a pathname. For example, if $some_full_path were D:\Projects\Island Rescue\plan 7.rtf (presumably, the program is running on a Windows machine), the basename would be plan 7.rtf and the dirname would be D:\Projects\Island Rescue.

^[63]You don’t need the ampersand in front of any of these subroutine invocations because the subroutine name is already known to the compiler following use.

^[64]As you’ll see later in this chapter, the default list comes from the module’s @EXPORT array.

^[65]That string might be something like /home/gilligan/web_docs/photos/USS_Minnow.gif on a Unix system. On a Windows system, it would typically use backslashes as directory separators. As you can see, this module lets you write portable code easily, at least where file specs are concerned.

^[66]The .pm portion is defined by the interface and can’t be changed. Thus, all module filenames must end in dot-p-m.

^[67]Trappable with an eval, of course.

^[68]Again trappable with eval.

^[69]Thanks to the %INC hash.

^[70] use lib also unshifts an architecture-dependent library below the requested library, making it more valuable than the explicit counterpart presented earlier.

^[71]And variables, although far less common, and arguably the wrong thing to do.

^[72]Remember, having no list is not the same as having an empty list. If the list is empty, the module’s import method is simply not called at all.

^[73]This check also catches misspellings and mistaken subroutine names, keeping you from wondering why the get_direction_from_professor routine isn’t working.

^[74]For this reason, it is generally considered a bad idea for an update to a released module to introduce new default imports. If you know that your first release is still missing a function, though, there’s no reason why you can’t put in a placeholder: sub according_to_GPS { die "not implemented yet" }.

^[75]Some have dubbed this the “Lincoln Loader” out of simultaneous deep respect for Lincoln and the sheer terror of having to deal with something that just doesn’t work like anything else they’ve encountered.