This chapter looks at how to break up a program into pieces and includes some of the concerns that arise when you put those pieces back together again, or when many people work together on the same program.
Let’s say a famous sailor (we’ll call him “the Skipper”) uses Perl to help navigate his ocean-going vessel (call it “the Minnow”). The Skipper writes many Perl programs to provide navigation for all the common ports of call for the Minnow. He finds himself cutting and pasting a very common routine into each program:
sub turn_towards_heading { my $new_heading = shift; my $current_heading = current_heading( ); print "Current heading is ", $current_heading, ".\n"; print "Come about to $new_heading "; my $direction = "right"; my $turn = ($new_heading - $current_heading) % 360; if ($turn > 180) { # long way around $turn = 360 - $turn; $direction = "left"; } print "by turning $direction $turn degrees.\n"; }
This routine gives the shortest turn to make from the current heading
(returned by the subroutine current_heading( )
) to
a new heading (given as the first parameter to the subroutine).
The first line of this subroutine might have read instead:
my ($new_heading) = @_;
This is mostly a style call: in both
cases, the first parameter ends up in
$new_heading
. However, in later chapters,
you’ll see that removing the items from
@_
as they are identified does have some
advantages. So, this book sticks (mostly) with the
“shifting” style of argument
parsing. Now back to the matter at hand...
Suppose that after having written a dozen programs using this routine, the Skipper realizes that the output is excessively chatty when he’s already taken the time to steer the proper course (or perhaps simply started drifting in the proper direction). After all, if the current heading is 234 degrees and he needs to turn to 234 degrees, you see:
Current heading is 234. Come about to 234 by turning right 0 degrees.
How annoying! The Skipper decides to fix this problem by checking for a zero turn value:
sub turn_towards_heading { my $new_heading = shift; my $current_heading = current_heading( ); print "Current heading is ", $current_heading, ".\n"; my $direction = "right"; my $turn = ($new_heading - $current_heading) % 360; unless ($turn) { print "On course (good job!).\n"; return; } print "Come about to $new_heading "; if ($turn > 180) { # long way around $turn = 360 - $turn; $direction = "left"; } print "by turning $direction $turn degrees.\n"; }
Great. The new subroutine works nicely in the current navigation program. However, because it had previously been cut-and-pasted into a half dozen other navigation programs, those other programs will still annoy the Skipper with extraneous turning messages.
You need a way to write the code in one place and then share it among many programs. And like most things in Perl, there’s more than one way to do it.
The Skipper
can save disk space (and brainspace) by bringing the definition for
turn_towards_heading
out into a separate file. For
example, suppose the Skipper figures out a half-dozen common
subroutines related to navigating the Minnow that he seems to use in
most or all of the programs he’s writing for the
task. He can put them in a separate file called
navigation.pl
, which consists only of the needed
subroutines.
But now, how can you tell Perl to pull in that program snippet from another file? You could do it the hard way:
sub load_common_subroutines { open MORE_CODE, "navigation.pl" or die "navigation.pl: $!"; undef $/; # enable slurp mode my $more_code = <MORE_CODE>; close MORE_CODE; eval $more_code; die $@ if $@; }
The code from
navigation.pl
is read into the
$more_code
variable. You then use
eval
to process that text as Perl code. Any
lexical variables in $more_code
will remain local
to the evaluated code.[4] If
there’s a syntax error, the $@
variable is set and causes the subroutine to die
with the appropriate error message.
Now instead of a few dozen lines of common subroutines to place in each file, you simply have one subroutine to insert in each file.
But that’s not very nice, especially if you need to keep doing this kind of task repeatedly. Luckily, there’s (at least) one Perl built-in to help you out.
The Skipper has placed a few common
navigation subroutines into navigation.pl
. If the
Skipper merely inserts:
do "navigation.pl"; die $@ if $@;
into his typical navigation program, it’s almost the
same as if the eval
code were executed
earlier.[5]
That is, the
do
operator acts as if the code from
navigation.pl
were incorporated into the current
program, although in its own scope block so that lexicals
(my
variables) and most directives (such as
use
strict
) from the included
file don’t leak into the main program.
Now the Skipper can safely update and maintain only one copy of the common subroutines, without having to copy and recopy all the fixes and extensions into the many separate navigation programs he is creating and using. See Figure 2-1 for an illustration.
Of course, this requires a bit of discipline because breaking the expected interface of a given subroutine will now break many programs instead of just one.[6] Careful thought will need to be given as to how to design and write reusable components and modular design. We’ll presume The Skipper has had some experience at that.
Another
advantage to placing some of the code into a separate file is that
other programmers can reuse the Skipper’s routines
and vice versa. For example, suppose the Skipper’s
sidekick (we’ll call him
“Gilligan”) writes a routine to
drop_anchor( )
and places it in the file
drop_anchor.pl
.[7]
Then, the Skipper can use the code with:
do "drop_anchor.pl"; die $@ if $@; ... drop_anchor( ) if at_dock( ) or in_port( );
Thus, the code is brought into separate files to permit easy maintenance and interprogrammer cooperation.
While the code brought in from a .pl
file can have
direct executable statements, it’s much more common
to simply define subroutines that can be called by the code
containing the do
.
Going back to that drop_anchor.pl
library for a
second, imagine what would happen if the Skipper wrote a program that
needed to “drop anchor” as well as
navigate:
do "drop_anchor.pl"; die $@ if $@; do "navigate.pl"; die $@ if $@; ... turn_towards_heading(90); ... drop_anchor( ) if at_dock( );
That works fine and dandy. The subroutines defined in both libraries are available to this program.
Suppose
navigate.pl
itself also pulls in
drop_anchor.pl
for some common navigation task.
You’ll end up reading the file once directly, and
then again while processing the navigation package. This will
needlessly redefine drop_anchor( )
. Worse than
that, if warnings are enabled,[8] you’ll get a
warning from Perl that you’ve redefined the
subroutine, even though it’s the same definition.
What
you need is a mechanism that tracks what files have been brought in
and bring them in only once. Perl has such an operation, called
require
. Change the previous code to simply:
require "drop_anchor.pl"; require "navigate.pl";
The require
operator
keeps track of the files it has read.[9] Once a file has been processed
successfully, any further require
operations on
that same file are simply ignored. This means that even if
navigate.pl
contains require
"drop_anchor.pl
“, the
drop_anchor.pl
file is brought in exactly once,
and you’ll get no annoying error messages about
duplicate subroutine definitions (see Figure 2-2).
Most importantly, you’ll also save time by not
processing the file more than once .
Figure 2-2. Once the drop_anchor.pl file is brought in, another attempt to require the file is harmless
The require
operator also has two additional
features:
Because of the second point, most files evaluated for
require
have a cryptic 1;
as
their last line of code. This ensures that the last evaluated
expression is in fact true. Try to carry on this tradition as well.
Originally,
the mandatory true value was intended as a way for an included file
to signal to the invoker that the code was processed successfully and
that no error condition existed. However, nearly everyone has adopted
the die
if
..
. strategy instead, deeming the
“last expression evaluated is
false” strategy a mere historic annoyance.
So far, the examples have glossed over
the directory structure of where the main code and the included files
(either with do
or require
) are
located. That’s because it “just
works” for the simplest case, in which you have a
program and its libraries in the same directory, and you run the
program from that directory.
Things get a bit more complicated when the libraries
aren’t located in the current directory. In fact,
Perl searches for libraries along a library search path (similar to
what the shell does with the PATH
environment
variable). The current directory (represented in Unix by a single
dot) is an element of the search path, so as long as your libraries
are in your current working directory, everything is fine.
The search path is given in the special
@INC
array. By default, the array contains the
current directory and a half-dozen directories built in to the
perl
binary during the compilation of
perl
itself. You can see what these directories
are by typing perl
-V
at the
command line and noting the last dozen lines of the output. Also at
the command line, you can execute the following to get just the
@INC
directories:[10]
perl -le 'print for @INC'
Except for . in that list, you probably won’t be able to write to any of the other directories, unless you’re the person responsible for maintaining Perl on your machine, in which case you should be able to write to all of them. The remaining directories are where Perl searches for system-wide libraries and modules, as you’ll see later.
Although you may not be able to alter the content of the directories
named by @INC
, you can alter
@INC
itself before the require
,
to bring in libraries from one or more directories of your choosing.
The @INC
array is an ordinary array, so have the
Skipper add a directory below his home directory to the mix:
unshift @INC, "/home/skipper/perl-lib";
Now, in addition to searching the
standard directories and the current directory, Perl searches the
Skipper’s personal Perl library. In fact, Perl
searches in that directory first, since it is the first one in
@INC
. By using unshift
rather
than push
, any conflict in names between the
Skipper’s private files and the system-installed
files are resolved with the Skipper’s file taking
precedence.
The Skipper must edit each program that
uses the private libraries to include this line. If that seems like
too much editing, the Skipper can instead set the
PERL5LIB
environment variable to the directory
name. For example, in the C shell, it’d be:
setenv PERL5LIB /home/skipper/perl-lib
In Bourne-style shells, it’d be something like:
PERL5LIB=/home/skipper/perl-lib; export PERL5LIB
The advantage of using PERL5LIB
is that the
Skipper can set it once and forget it. The disadvantage comes when
someone else (like Gilligan) comes along to execute the program.
Unless Gilligan has also added the same PERL5LIB
environment variable, the program will fail! Thus, while
PERL5LIB
is interesting for personal use, do not
rely on it for programs you intend to share with others. (And
don’t make your entire team of programmers add a
common PERL5LIB
variable. That’s
just wrong.)
The PERL5LIB
variable
can include multiple directories, separated by colons. Any specified
directory is inserted at the beginning of @INC
.
While a system administrator might add a setting of
PERL5LIB
to a system-wide startup script, this
process is generally frowned upon. The purpose of
PERL5LIB
is to enable nonadministrators to extend
Perl to recognize additional directories. If a system administrator
wants additional directories, he merely needs to recompile and
reinstall Perl, answering the appropriate questions during the
configuration phase.
If Gilligan recognizes that one of the
Skipper’s programs is missing the proper directive,
Gilligan can either add the proper PERL5LIB
variable or invoke perl
directly with one or more
-I
options. For example, to invoke the
Skipper’s get_us_home
program,
the command line might be something like:
perl -I/home/skipper/perl-lib /home/skipper/bin/get_us_home
Obviously, it’s easier for Gilligan if the program
itself defines the extra libraries. But sometimes just adding a
-I
fixes things right up.[11]
This works even if Gilligan can’t edit the Skipper’s program. He still has to be able to read it, of course, but Gilligan can use this technique to try a new version of his library with the Skipper’s program, for example.
Suppose that the Skipper has
added all his cool and useful routines to
navigation.pl
and that Gilligan has incorporated
the library into his own navigation package
head_towards_island
:
#!/usr/bin/perl require 'navigation.pl'; sub turn_toward_port { turn_toward_heading(compute_heading_to_island( )); } sub compute_heading_to_island { .. code here .. } .. more program here ..
Gilligan then has his program debugged (perhaps with the aid of a smart person whom we’ll call “the Professor”), and everything works well.
However, now the Skipper decides to modify his
navigation.pl
library, adding a routine called
turn_toward_port
that makes a 45-degree turn
toward the left (known as “port” in
nautical jargon).
Gilligan’s program will fail in a catastrophic way,
as soon as he tries to head to port: he’ll start
steering the ship in circles! The problem is that the Perl compiler
first compiles turn_toward_port
from
Gilligan’s main program, then when the
require
is evaluated at runtime, the definition
for turn_toward_port
is redefined as the
Skipper’s definition. Sure, if Gilligan has warnings
enabled, he’ll notice something is wrong, but why
should he have to count on that?
The problem is that Gilligan defined
turn_toward_port
as meaning “turn
toward the port on the island,” while the Skipper
defined it as “turn toward the
left.” How do you resolve this?
One way
is to require that the Skipper put an explicit prefix in front of
every name defined in the library, say
navigation_
. Thus, Gilligan’s
program ends up looking like:
#!/usr/bin/perl require 'navigation.pl'; sub turn_toward_port { navigation_turn_toward_heading(compute_heading_to_island( )); } sub compute_heading_to_island { .. code here .. } .. more program here ..
Clearly, the navigation_turn_toward_heading
comes
from the navigation.pl
file. This is great for
Gilligan, but awkward for the Skipper, as his file now becomes:
sub navigation_turn_toward_heading { .. code here .. } sub navigation_turn_toward_port { .. code here .. } 1;
Yes, every scalar, array, hash, filehandle, or subroutine now has to
have a navigation_
prefix in front of it to
guarantee that the names won’t collide with any
potential users of the library. Obviously, for that old sailor, this
ain’t gonna float his boat. So, what do you do
instead?
If the name prefix of the last example didn’t have to be spelled out on every use, things would work much better. Well, you can improve the situation by using a package:
package Navigation; sub turn_towards_heading { .. code here .. } sub turn_towards_port { .. code here .. } 1;
The package
declaration at the beginning of this
file tells Perl to insert Navigation:
: in front of
most names within the file: Thus, the code above practically says:
sub Navigation::turn_towards_heading { .. code here .. } sub Navigation::turn_towards_port { .. code here .. } 1;
Now when Gilligan uses this file, he simply adds
Navigation:
: to the subroutines defined in the
library, and leaves the Navigation:
: prefix off
for subroutines he defines on his own:
#!/usr/bin/perl require 'navigation.pl'; sub turn_toward_port { Navigation::turn_toward_heading(compute_heading_to_island( )); } sub compute_heading_to_island { .. code here .. } .. more program here ..
Package names are like variable names:
they consist of alphanumerics and underscores, as long as you
don’t begin with a digit. Also, for reasons
explained in the perlmodlib
documentation, a
package name should begin with a capital letter and not overlap an
existing CPAN or core module name. Package names can also consist of
multiple names separated by double colons, such as
Minnow::Navigation
and
Minnow::Food::Storage
.
Nearly every scalar, array, hash, subroutine, and filehandle name[12] is actually prefixed by the current package, unless the name already contains one or more double-colon markers.
So, in
navigation.pl
, you can use variables such as:
package Navigation; @homeport = (21.1, -157.525); sub turn_toward_port { .. code .. }
(Trivia note: 21.1 degrees north, 157.525 degrees west is the location of the real-life marina where the opening shot of a famous television series was filmed.)
You can refer to the @homeport
variable in the
main code as:
@destination = @Navigation::homeport;
If every name has a package name inserted in
front of it, what about names in the main program? Yes, they are also
in a package, called main
. It’s
as if package
main;
were at the
beginning of each file. Thus, to keep Gilligan from having to say
Navigation::turn_towards_heading
, the
navigation.pl
file can say:
sub main::turn_towards_heading { .. code here .. }
Now the subroutine is defined in the main
package,
not the Navigation
package. This
isn’t an optimal solution (you’ll
see better solutions in Chapter 12), but at least
there’s nothing sacred or terribly unique about
main
compared to any other package.
All files start as if you had said
package
main;
. Any
package
directive remains in effect until the next
package
directive, unless that
package
directive is inside a curly-braced scope.
In that case, the prior package is remembered and restored when the
scope ends. Here’s an example:
package Navigation; { # start scope block package main; # now in package main sub turn_towards_heading { # main::turn_towards_heading .. code here .. } } # end scope block # back to package Navigation sub turn_towards_port { # Navigation::turn_towards_port .. code here .. }
The current package is lexically
scoped, similar to the scope of my
variables,
narrowed to the innermost-enclosing brace pair or file in which the
package is introduced.
Most libraries have only one
package declaration at the top of the file. Most programs leave the
package at the default main
package. However
it’s nice to know that you can temporarily have a
different current package.[13]
A lexical variable (a variable introduced
with my
) isn’t prefixed by the
current package because package variables are always
global: you can always reference a package
variable if you know its full name. A lexical variable is usually
temporary and accessible for only a portion of the program. If a
lexical variable is declared, then using that name without a package
prefix results in accessing the lexical variable. However, a package
prefix ensures that you are accessing a package variable and never a
lexical variable.
For example, suppose a subroutine within
navigation.pl
declares a lexical
@homeport
variable. Any mention of
@homeport
will then be the newly introduced
lexical variable, but a fully qualified mention of
@Navigation::homeport
accesses the package
variable instead.
package Navigation; @homeport = (21.1, -157.525); sub get_me_home { my @homeport; .. @homeport .. # refers to the lexical variable .. @Navigation::homeport .. # refers to the package variable } .. @homeport .. # refers to the package variable
Obviously, this can lead to confusing code, so you shouldn’t introduce such a duplication needlessly. The results are completely predictable, though.
The answers for all exercises can be found in Section A.1.
The Oogaboogoo natives on the island have unusual names for the days and months. Here is some simple but not very well-written code from Gilligan. Fix it up, add a conversion function for the month names, and make the whole thing into a library. For extra credit, add suitable error checking and consider what should be in the documentation.
@day = qw(ark dip wap sen pop sep kir); sub number_to_day_name { my $num = shift @_; $day[$num]; } @month = qw(diz pod bod rod sip wax lin sen kun fiz nap dep);
Make a program that uses your library and the following code to print
out a message, such as Today
is
dip
, sen
11
,
2008
, meaning that today is a Monday in August.
(Hint: The year and month numbers returned by
localtime
may not be what you’d
expect, so you need to check the documentation.)
my($sec, $min, $hour, $mday, $mon, $year, $wday) = localtime;
[4] Oddly, the variable
$more_code
is also visible to the evaluated code,
not that it is of any use to change that variable during the
eval
.
[5] Except in regard to @INC
,
%INC
, and missing file handling, which
you’ll see later.
[6] In later chapters, you’ll see how to set up tests to be used while maintaining reused code.
[7] The
.pl
here stands for “perl
library,” the common extension used for included
Perl code. It is unfortunate that some non-Unix Perl vendors also use
to use the same extension for the top-level Perl programs, because
you then can’t tell whether something is a program
or a library. If you have a choice, the experts recommend ending your
program filenames with .plx
(“Perl executable”), or better yet,
with no extension at all unless your system requires one.
[8] You
are using warnings, right? You can enable them
with either -w
or use warnings;
.
[9] In the
%INC
hash, as described in the entry for
require
in the perlfunc
documentation.
[10] On a Windows machine, use double quotes instead of single quotes on the command line.
[11] Extending @INC
with either
PERL5LIB
or -I
also
automatically adds the version- and architecture-specific
subdirectories of the specified directories. Adding these directories
automatically simplifies the task of installing Perl modules that
include architecture- or version-sensitive components, such as
compiled C code.
[12] Except lexicals, as you’ll see in a moment.
[13] Some names are always in
package main
regardless of the current package:
ARGV
, ARGVOUT
,
ENV
, INC
,
SIG
, STDERR
,
STDIN
, and STDOUT
. You can
always refer to @INC
and be assured of getting
@main::INC
. The punctuation mark variables such as
$_
, $2
, and
$!
are either all lexicals or forced into package
main
, so when you write $
. you
never get $Navigation::
. by mistake.