Chapter 2. Building Larger Programs

This chapter looks at how to break up a program into pieces and includes some of the concerns that arise when you put those pieces back together again, or when many people work together on the same program.

The Cure for the Common Code

Let’s say a famous sailor (we’ll call him “the Skipper”) uses Perl to help navigate his ocean-going vessel (call it “the Minnow”). The Skipper writes many Perl programs to provide navigation for all the common ports of call for the Minnow. He finds himself cutting and pasting a very common routine into each program:

sub turn_towards_heading {
  my $new_heading = shift;
  my $current_heading = current_heading(  );
  print "Current heading is ", $current_heading, ".\n";
  print "Come about to $new_heading ";
  my $direction = "right";
  my $turn = ($new_heading - $current_heading) % 360;
  if ($turn > 180) { # long way around
    $turn = 360 - $turn;
    $direction = "left";
  }
  print "by turning $direction $turn degrees.\n";
}

This routine gives the shortest turn to make from the current heading (returned by the subroutine current_heading( )) to a new heading (given as the first parameter to the subroutine).

The first line of this subroutine might have read instead:

my ($new_heading) = @_;

This is mostly a style call: in both cases, the first parameter ends up in $new_heading. However, in later chapters, you’ll see that removing the items from @_ as they are identified does have some advantages. So, this book sticks (mostly) with the “shifting” style of argument parsing. Now back to the matter at hand...

Suppose that after having written a dozen programs using this routine, the Skipper realizes that the output is excessively chatty when he’s already taken the time to steer the proper course (or perhaps simply started drifting in the proper direction). After all, if the current heading is 234 degrees and he needs to turn to 234 degrees, you see:

Current heading is 234.
Come about to 234 by turning right 0 degrees.

How annoying! The Skipper decides to fix this problem by checking for a zero turn value:

sub turn_towards_heading {
  my $new_heading = shift;
  my $current_heading = current_heading(  );
  print "Current heading is ", $current_heading, ".\n";
  my $direction = "right";
  my $turn = ($new_heading - $current_heading) % 360;
  unless ($turn) {
    print "On course (good job!).\n";
    return;
  }
  print "Come about to $new_heading ";
  if ($turn > 180) { # long way around
    $turn = 360 - $turn;
    $direction = "left";
  }
  print "by turning $direction $turn degrees.\n";
}

Great. The new subroutine works nicely in the current navigation program. However, because it had previously been cut-and-pasted into a half dozen other navigation programs, those other programs will still annoy the Skipper with extraneous turning messages.

You need a way to write the code in one place and then share it among many programs. And like most things in Perl, there’s more than one way to do it.

Inserting Code with eval

The Skipper can save disk space (and brainspace) by bringing the definition for turn_towards_heading out into a separate file. For example, suppose the Skipper figures out a half-dozen common subroutines related to navigating the Minnow that he seems to use in most or all of the programs he’s writing for the task. He can put them in a separate file called navigation.pl, which consists only of the needed subroutines.

But now, how can you tell Perl to pull in that program snippet from another file? You could do it the hard way:

sub load_common_subroutines {
  open MORE_CODE, "navigation.pl" or die "navigation.pl: $!";
  undef $/; # enable slurp mode
  my $more_code = <MORE_CODE>;
  close MORE_CODE;
  eval $more_code;
  die $@ if $@;
}

The code from navigation.pl is read into the $more_code variable. You then use eval to process that text as Perl code. Any lexical variables in $more_code will remain local to the evaluated code.^[4] If there’s a syntax error, the $@ variable is set and causes the subroutine to die with the appropriate error message.

Now instead of a few dozen lines of common subroutines to place in each file, you simply have one subroutine to insert in each file.

But that’s not very nice, especially if you need to keep doing this kind of task repeatedly. Luckily, there’s (at least) one Perl built-in to help you out.

Using do

The Skipper has placed a few common navigation subroutines into navigation.pl. If the Skipper merely inserts:

do "navigation.pl";
die $@ if $@;

into his typical navigation program, it’s almost the same as if the eval code were executed earlier.^[5]

That is, the do operator acts as if the code from navigation.pl were incorporated into the current program, although in its own scope block so that lexicals (my variables) and most directives (such as use strict) from the included file don’t leak into the main program.

Now the Skipper can safely update and maintain only one copy of the common subroutines, without having to copy and recopy all the fixes and extensions into the many separate navigation programs he is creating and using. See Figure 2-1 for an illustration.

Figure 2-1. The navigation.pl file being used by the other navigation programs

Of course, this requires a bit of discipline because breaking the expected interface of a given subroutine will now break many programs instead of just one.^[6] Careful thought will need to be given as to how to design and write reusable components and modular design. We’ll presume The Skipper has had some experience at that.

Another advantage to placing some of the code into a separate file is that other programmers can reuse the Skipper’s routines and vice versa. For example, suppose the Skipper’s sidekick (we’ll call him “Gilligan”) writes a routine to drop_anchor( ) and places it in the file drop_anchor.pl.^[7]

Then, the Skipper can use the code with:

do "drop_anchor.pl";
die $@ if $@;
...
drop_anchor(  ) if at_dock(  ) or in_port(  );

Thus, the code is brought into separate files to permit easy maintenance and interprogrammer cooperation.

While the code brought in from a .pl file can have direct executable statements, it’s much more common to simply define subroutines that can be called by the code containing the do.

Going back to that drop_anchor.pl library for a second, imagine what would happen if the Skipper wrote a program that needed to “drop anchor” as well as navigate:

do "drop_anchor.pl";
die $@ if $@;
do "navigate.pl";
die $@ if $@;
...
turn_towards_heading(90);
...
drop_anchor(  ) if at_dock(  );

That works fine and dandy. The subroutines defined in both libraries are available to this program.

Using require

Suppose navigate.pl itself also pulls in drop_anchor.pl for some common navigation task. You’ll end up reading the file once directly, and then again while processing the navigation package. This will needlessly redefine drop_anchor( ). Worse than that, if warnings are enabled,^[8] you’ll get a warning from Perl that you’ve redefined the subroutine, even though it’s the same definition.

What you need is a mechanism that tracks what files have been brought in and bring them in only once. Perl has such an operation, called require. Change the previous code to simply:

require "drop_anchor.pl";
require "navigate.pl";

The require operator keeps track of the files it has read.^[9] Once a file has been processed successfully, any further require operations on that same file are simply ignored. This means that even if navigate.pl contains require "drop_anchor.pl“, the drop_anchor.pl file is brought in exactly once, and you’ll get no annoying error messages about duplicate subroutine definitions (see Figure 2-2). Most importantly, you’ll also save time by not processing the file more than once .

Figure 2-2. Once the drop_anchor.pl file is brought in, another attempt to require the file is harmless

The require operator also has two additional features:

Any syntax error in the required file causes the program to die, thus the many die $@ if $@ statements are unnecessary.
The last expression evaluated in the file must return a true value.

Because of the second point, most files evaluated for require have a cryptic 1; as their last line of code. This ensures that the last evaluated expression is in fact true. Try to carry on this tradition as well.

Originally, the mandatory true value was intended as a way for an included file to signal to the invoker that the code was processed successfully and that no error condition existed. However, nearly everyone has adopted the die if ... strategy instead, deeming the “last expression evaluated is false” strategy a mere historic annoyance.

require and @INC

So far, the examples have glossed over the directory structure of where the main code and the included files (either with do or require) are located. That’s because it “just works” for the simplest case, in which you have a program and its libraries in the same directory, and you run the program from that directory.

Things get a bit more complicated when the libraries aren’t located in the current directory. In fact, Perl searches for libraries along a library search path (similar to what the shell does with the PATH environment variable). The current directory (represented in Unix by a single dot) is an element of the search path, so as long as your libraries are in your current working directory, everything is fine.

The search path is given in the special @INC array. By default, the array contains the current directory and a half-dozen directories built in to the perl binary during the compilation of perl itself. You can see what these directories are by typing perl -V at the command line and noting the last dozen lines of the output. Also at the command line, you can execute the following to get just the @INC directories:^[10]

perl -le 'print for @INC'

Except for . in that list, you probably won’t be able to write to any of the other directories, unless you’re the person responsible for maintaining Perl on your machine, in which case you should be able to write to all of them. The remaining directories are where Perl searches for system-wide libraries and modules, as you’ll see later.

Extending @INC

Although you may not be able to alter the content of the directories named by @INC, you can alter @INC itself before the require, to bring in libraries from one or more directories of your choosing. The @INC array is an ordinary array, so have the Skipper add a directory below his home directory to the mix:

unshift @INC, "/home/skipper/perl-lib";

Now, in addition to searching the standard directories and the current directory, Perl searches the Skipper’s personal Perl library. In fact, Perl searches in that directory first, since it is the first one in @INC. By using unshift rather than push, any conflict in names between the Skipper’s private files and the system-installed files are resolved with the Skipper’s file taking precedence.

Extending @INC with PERL5LIB

The Skipper must edit each program that uses the private libraries to include this line. If that seems like too much editing, the Skipper can instead set the PERL5LIB environment variable to the directory name. For example, in the C shell, it’d be:

setenv PERL5LIB /home/skipper/perl-lib

In Bourne-style shells, it’d be something like:

PERL5LIB=/home/skipper/perl-lib; export PERL5LIB

The advantage of using PERL5LIB is that the Skipper can set it once and forget it. The disadvantage comes when someone else (like Gilligan) comes along to execute the program. Unless Gilligan has also added the same PERL5LIB environment variable, the program will fail! Thus, while PERL5LIB is interesting for personal use, do not rely on it for programs you intend to share with others. (And don’t make your entire team of programmers add a common PERL5LIB variable. That’s just wrong.)

The PERL5LIB variable can include multiple directories, separated by colons. Any specified directory is inserted at the beginning of @INC.

While a system administrator might add a setting of PERL5LIB to a system-wide startup script, this process is generally frowned upon. The purpose of PERL5LIB is to enable nonadministrators to extend Perl to recognize additional directories. If a system administrator wants additional directories, he merely needs to recompile and reinstall Perl, answering the appropriate questions during the configuration phase.

Extending @INC with -I

If Gilligan recognizes that one of the Skipper’s programs is missing the proper directive, Gilligan can either add the proper PERL5LIB variable or invoke perl directly with one or more -I options. For example, to invoke the Skipper’s get_us_home program, the command line might be something like:

perl -I/home/skipper/perl-lib /home/skipper/bin/get_us_home

Obviously, it’s easier for Gilligan if the program itself defines the extra libraries. But sometimes just adding a -I fixes things right up.^[11]

This works even if Gilligan can’t edit the Skipper’s program. He still has to be able to read it, of course, but Gilligan can use this technique to try a new version of his library with the Skipper’s program, for example.

The Problem of Namespace Collisions

Suppose that the Skipper has added all his cool and useful routines to navigation.pl and that Gilligan has incorporated the library into his own navigation package head_towards_island:

#!/usr/bin/perl

require 'navigation.pl';

sub turn_toward_port {
  turn_toward_heading(compute_heading_to_island(  ));
}

sub compute_heading_to_island {
  .. code here ..
}

.. more program here ..

Gilligan then has his program debugged (perhaps with the aid of a smart person whom we’ll call “the Professor”), and everything works well.

However, now the Skipper decides to modify his navigation.pl library, adding a routine called turn_toward_port that makes a 45-degree turn toward the left (known as “port” in nautical jargon).

Gilligan’s program will fail in a catastrophic way, as soon as he tries to head to port: he’ll start steering the ship in circles! The problem is that the Perl compiler first compiles turn_toward_port from Gilligan’s main program, then when the require is evaluated at runtime, the definition for turn_toward_port is redefined as the Skipper’s definition. Sure, if Gilligan has warnings enabled, he’ll notice something is wrong, but why should he have to count on that?

The problem is that Gilligan defined turn_toward_port as meaning “turn toward the port on the island,” while the Skipper defined it as “turn toward the left.” How do you resolve this?

One way is to require that the Skipper put an explicit prefix in front of every name defined in the library, say navigation_. Thus, Gilligan’s program ends up looking like:

#!/usr/bin/perl

require 'navigation.pl';

sub turn_toward_port {
  navigation_turn_toward_heading(compute_heading_to_island(  ));
}

sub compute_heading_to_island {
  .. code here ..
}

.. more program here ..

Clearly, the navigation_turn_toward_heading comes from the navigation.pl file. This is great for Gilligan, but awkward for the Skipper, as his file now becomes:

sub navigation_turn_toward_heading {
  .. code here ..
}

sub navigation_turn_toward_port {
  .. code here ..
}

1;

Yes, every scalar, array, hash, filehandle, or subroutine now has to have a navigation_ prefix in front of it to guarantee that the names won’t collide with any potential users of the library. Obviously, for that old sailor, this ain’t gonna float his boat. So, what do you do instead?

Packages as Namespace Separators

If the name prefix of the last example didn’t have to be spelled out on every use, things would work much better. Well, you can improve the situation by using a package:

package Navigation;

sub turn_towards_heading {
  .. code here ..
}

sub turn_towards_port {
  .. code here ..
}

1;

The package declaration at the beginning of this file tells Perl to insert Navigation:: in front of most names within the file: Thus, the code above practically says:

sub Navigation::turn_towards_heading {
  .. code here ..
}

sub Navigation::turn_towards_port {
  .. code here ..
}

1;

Now when Gilligan uses this file, he simply adds Navigation:: to the subroutines defined in the library, and leaves the Navigation:: prefix off for subroutines he defines on his own:

#!/usr/bin/perl

require 'navigation.pl';

sub turn_toward_port {
  Navigation::turn_toward_heading(compute_heading_to_island(  ));
}

sub compute_heading_to_island {
  .. code here ..
}

.. more program here ..

Package names are like variable names: they consist of alphanumerics and underscores, as long as you don’t begin with a digit. Also, for reasons explained in the perlmodlib documentation, a package name should begin with a capital letter and not overlap an existing CPAN or core module name. Package names can also consist of multiple names separated by double colons, such as Minnow::Navigation and Minnow::Food::Storage.

Nearly every scalar, array, hash, subroutine, and filehandle name^[12] is actually prefixed by the current package, unless the name already contains one or more double-colon markers.

So, in navigation.pl, you can use variables such as:

package Navigation;
@homeport = (21.1, -157.525);

sub turn_toward_port {
  .. code ..
}

(Trivia note: 21.1 degrees north, 157.525 degrees west is the location of the real-life marina where the opening shot of a famous television series was filmed.)

You can refer to the @homeport variable in the main code as:

@destination = @Navigation::homeport;

I f every name has a package name inserted in front of it, what about names in the main program? Yes, they are also in a package, called main. It’s as if package main; were at the beginning of each file. Thus, to keep Gilligan from having to say Navigation::turn_towards_heading, the navigation.pl file can say:

sub main::turn_towards_heading {
  .. code here ..
}

Now the subroutine is defined in the main package, not the Navigation package. This isn’t an optimal solution (you’ll see better solutions in Chapter 12), but at least there’s nothing sacred or terribly unique about main compared to any other package.

Scope of a Package Directive

All files start as if you had said package main;. Any package directive remains in effect until the next package directive, unless that package directive is inside a curly-braced scope. In that case, the prior package is remembered and restored when the scope ends. Here’s an example:

package Navigation;

{  # start scope block
  package main;  # now in package main

  sub turn_towards_heading {  # main::turn_towards_heading
    .. code here ..
  }

}  # end scope block

# back to package Navigation

sub turn_towards_port { # Navigation::turn_towards_port
  .. code here ..
}

The current package is lexically scoped, similar to the scope of my variables, narrowed to the innermost-enclosing brace pair or file in which the package is introduced.

Most libraries have only one package declaration at the top of the file. Most programs leave the package at the default main package. However it’s nice to know that you can temporarily have a different current package.^[13]

Packages and Lexicals

A lexical variable (a variable introduced with my) isn’t prefixed by the current package because package variables are always global: you can always reference a package variable if you know its full name. A lexical variable is usually temporary and accessible for only a portion of the program. If a lexical variable is declared, then using that name without a package prefix results in accessing the lexical variable. However, a package prefix ensures that you are accessing a package variable and never a lexical variable.

For example, suppose a subroutine within navigation.pl declares a lexical @homeport variable. Any mention of @homeport will then be the newly introduced lexical variable, but a fully qualified mention of @Navigation::homeport accesses the package variable instead.

package Navigation;
@homeport = (21.1, -157.525);

sub get_me_home {
  my @homeport;

  .. @homeport .. # refers to the lexical variable
  .. @Navigation::homeport .. # refers to the package variable

}

.. @homeport .. # refers to the package variable

Obviously, this can lead to confusing code, so you shouldn’t introduce such a duplication needlessly. The results are completely predictable, though.

Exercises

The answers for all exercises can be found in Section A.1.

Exercise 1 [30 min]

The Oogaboogoo natives on the island have unusual names for the days and months. Here is some simple but not very well-written code from Gilligan. Fix it up, add a conversion function for the month names, and make the whole thing into a library. For extra credit, add suitable error checking and consider what should be in the documentation.

@day = qw(ark dip wap sen pop sep kir);
sub number_to_day_name { my $num = shift @_; $day[$num]; }
@month = qw(diz pod bod rod sip wax lin sen kun fiz nap dep);

Exercise 2 [10 min]

Make a program that uses your library and the following code to print out a message, such as Today is dip, sen 11, 2008, meaning that today is a Monday in August. (Hint: The year and month numbers returned by localtime may not be what you’d expect, so you need to check the documentation.)

my($sec, $min, $hour, $mday, $mon, $year, $wday) = localtime;

^[4] Oddly, the variable $more_code is also visible to the evaluated code, not that it is of any use to change that variable during the eval.

^[5]Except in regard to @INC, %INC, and missing file handling, which you’ll see later.

^[6]In later chapters, you’ll see how to set up tests to be used while maintaining reused code.

^[7]The .pl here stands for “perl library,” the common extension used for included Perl code. It is unfortunate that some non-Unix Perl vendors also use to use the same extension for the top-level Perl programs, because you then can’t tell whether something is a program or a library. If you have a choice, the experts recommend ending your program filenames with .plx (“Perl executable”), or better yet, with no extension at all unless your system requires one.

^[8]You are using warnings, right? You can enable them with either -w or use warnings;.

^[9]In the %INC hash, as described in the entry for require in the perlfunc documentation.

^[10]On a Windows machine, use double quotes instead of single quotes on the command line.

^[11]Extending @INC with either PERL5LIB or -I also automatically adds the version- and architecture-specific subdirectories of the specified directories. Adding these directories automatically simplifies the task of installing Perl modules that include architecture- or version-sensitive components, such as compiled C code.

^[12]Except lexicals, as you’ll see in a moment.

^[13]Some names are always in package main regardless of the current package: ARGV, ARGVOUT, ENV, INC, SIG, STDERR, STDIN, and STDOUT. You can always refer to @INC and be assured of getting @main::INC. The punctuation mark variables such as $_, $2, and $! are either all lexicals or forced into package main, so when you write $. you never get $Navigation::. by mistake.