Chapter 17. Modules as Programs

Perl has excellent tools for creating, testing, and distributing modules. Perl’s also good for writing standalone programs that don’t need anything else to be useful, but we don’t have tools for standalone programs as good (or at all) as those for modules. I want my programs to use the module development tools and be testable in the same way as modules. To do this, I restructure my programs to turn them into modulinos.

The main Thing

Other languages aren’t as DWIM (Do What I Mean) as Perl, and they make us create a top-level subroutine that serves as the starting point for the application. In C or Java, I have to name this subroutine main:

/* hello_world.c */

#include <stdio.h>

int main ( void ) {
    printf( "Hello C World!\n" );

    return 0;
    }

Perl, in its desire to be helpful, already knows this and does it for me. My entire program is the main routine, which is how Perl ends up with the default package main. When I run my Perl program, Perl starts to compile the code it contains as if I had wrapped my main subroutine around the entire file.

In a module most of the code is in methods or subroutines, so, unlike a program, most of it doesn’t immediately execute. I have to call a subroutine to make something happen. Try that with your favorite module; run it from the command line. In most cases, you won’t see anything happen. I can use perldoc’s -l switch to locate the actual module file:

% perldoc -l Astro::MoonPhase
/perls/perl-5.18.0/lib/site_perl/5.18.0/Astro/Sunrise.pm

When I run it, nothing happens:

% perl /perls/perl-5.18.0/lib/site_perl/5.18.0/Astro/Sunrise.pm
%

I can write my program as a module, then decide at runtime how to treat the code. If I run my file as a program it will act just like a program, but if I include it as a module, perhaps in a test suite, then it won’t run the code and it will wait for me to do something. This way I get the benefit of a standalone program while using the development tools for modules.

Backing Up

My first step takes me backwards in Perl evolution. I need to get that explicit main routine back, and then run it only when I decide I want to run it. For simplicity, I’ll do this with a “Just another Perl hacker” (JAPH) program, but develop something more complex later.

Normally, Perl’s version of “Hello World” is simple, but I’ve thrown in package main just for fun, and I use the string “Just another Perl hacker,” instead. I don’t need that for anything other than reminding the next maintainer what the default package is. I’ll use this idea later:

#!/usr/bin/perl
package main;

print "Just another Perl hacker, \n";

Obviously, when I run that program, I get the string as output. I don’t want that in this case, though. I want it to behave more like a module so when I run the file, nothing appears to happen. Perl compiles the code but doesn’t have anything to execute. I wrap the entire program in its own subroutine:

#!/usr/bin/perl
package main;

sub run {
    print "Just another Perl hacker, \n";
    }

The print statement won’t run until I execute the subroutine, and now I have to figure out when to do that. I have to know how to tell the difference between a program and a module.

Who’s Calling?

The caller built-in tells me about the call stack, which lets me know where I am in Perl’s descent into my program. Programs and modules can use caller too; I don’t have to use it in a subroutine. If I use caller in the top level of a file I run as a program, it returns nothing because I’m already at the top level. That’s the root of the entire program. Since I know that for a file I use as a module caller returns something, and that when I call the same file as a program caller returns nothing, I have what I need to decide how to act depending on how I employ the module:

#!/usr/bin/perl
package main;

run() unless caller();

sub run {
    print "Just another Perl hacker, \n";
    }

I’m going to save this program in a file, but now I have to decide how to name it. Its schizophrenic nature doesn’t suggest a file extension, but I want to use this file as a module later, so I could go along with the module file-naming convention, which adds a .pm to the name. That way, I can use it, and Perl can find it just as it finds other modules. Still, the terms program and module get in the way because it’s really both. It’s not a module in the usual sense, though, and I think of it as a tiny module, so I call it a modulino.

Now that I have my terms straight, I save my modulino as Japh.pm. It’s in my current directory, so I also want to ensure that Perl will look for modules there (i.e., it has “.” in the search path). I check the behavior of my modulino. First, I use it as a module. From the command line, I can load a module with the -M switch. I use a “null program,” which I specify with the -e switch. When I load it as a module nothing appears to happen:

% perl -MJaph -e 0
%

Perl compiles the module, then goes through the statements it can execute immediately. It executes caller, which returns the package name that loaded my modulino or undef if I ran it directly. Since the package name is true, the unless catches it and doesn’t call run(). I’ll do more with this in a moment.

Now I want to run Japh.pm as a program. This time, caller returns nothing because it is at the top level. This fails the unless check, and so Perl invokes run() and I see the output. The only difference is how I called the file. As a module it does module things, and as a program it does program things. Here I run it as a script and get output:

% perl Japh.pm
Just another Perl hacker,
%

Testing the Program

Now that I have the basic framework of a modulino, I can take advantage of its benefits. Since my program doesn’t execute if I include it as a module, I can load it into a test program without it doing anything immediately. I can use all of the Perl testing framework to test programs, too.

If I write my code well—separating things into small subroutines that only do one thing—I can test each subroutine on its own. Since the run subroutine does its work by printing, I use Test::Output to capture standard output and compare the result:

use Test::More tests => 2;
use Test::Output;

use_ok( 'Japh' );

stdout_is( sub{ main::run() }, "Just another Perl hacker, \n" );

This way, I can test each part of my program until I finally put everything together in my run() subroutine, which now looks more like what I would expect from a program in C, where the main loop calls everything in the right order.

Modules as Tests

So far my modulino concept is simple. It checks caller to see if it’s the top-level program or if it was loaded by something else. I can choose any condition and any action though, to make my single file do something else.

Once installed, the tests for Perl modules don’t stick around. The CPAN client cleans up the test files along with the rest of the distribution files. What if I want to embed my tests in the code and have them execute under certain conditions? I can embed the tests in the module file. The Test::Inline module does this by embedding testing statements in code; I’d rather put everything in methods instead. I’ve wanted this for Perl since I first saw it in Python.

Here’s a small demonstration of the idea. I define some subroutines that know how to tell if they’re running in a certain fashion. For the tests, it checks the CPANTEST environment variable. I use those subroutines to figure out which method I’ll execute. I still have the run method as before, but now I also have a test method. Since I have moved the caller checks into subroutines, I’ve introduced another level in the call stack, so I use caller(1) to look back one level:

package Modulino::Test;
use utf8;
use strict;
use warnings;

use v5.10;

our $VERSION = '0.10_01';

sub _running_under_tester {
    !! $ENV{CPANTEST}
    }

sub _running_as_app {
    ! defined scalar caller(1)
    }

sub _loaded_as_module {
    defined scalar caller(1);
    }

my $method = do {
       if( _running_under_tester()    ) { 'test' }
    elsif( _loaded_as_module()        ) { say "Loaded as module"; undef  }
    elsif( _running_as_app()          ) { 'run'  }
    else                                { undef  }
    };

__PACKAGE__->$method(@ARGV) if defined $method;

sub run {
    say "Running as program";
    }

In the test method, I get a list of other methods that I want to run. In this case, those are methods that start with _test_. Once I have all of those method names, I run them through Test::More’s subtest method and call the _test_ method, which I expect to output proper TAP:

sub test {
    say "Running as test";

    my( $class ) = @_;
    my @tests = $class->_get_tests;

    require Test::More;

    foreach my $test ( @tests ) {
        Test::More::subtest( $test => sub {
            my $rc = eval { $class->$test(); 1 };
            Test::More::diag( $@ ) unless defined $rc;
            } );
        }

    Test::More::done_testing();
    }

sub _get_tests {
    my( $class ) = @_;
    no strict 'refs';
    my $stub = $class . '::';
    my @tests =
        grep { defined &{"$stub$_"}    }
        grep { 0 == index $_, '_test_' }
        keys %{ "$stub" };

    say "Tests are @tests";
    @tests;
    }

I have one test in this file, and it’s nothing fancy. I use some Test::More subroutines that don’t really test anything. This is just a demonstration that I can make these tests run:

sub _test_run {
    require Test::More;

    Test::More::pass();
    Test::More::pass();

    SKIP: {
        Test::More::skip( "These tests don't work", 2 );
        Test::More::fail();
        Test::More::fail();
        }
    }

1;

Putting this all together means I can run this module as a program with CPANTEST set to a true value:

% CPANTEST=1 perl -Ilib lib/Modulino/Test.pm 
Running as test
Tests are _test_run
    ok 1
    ok 2
    ok 3 # skip These tests don't work
    ok 4 # skip These tests don't work
    1..4
ok 1 - _test_run
1..1

Since the tests exist in the module (just as the documentation does), I can run them any time I like, including after dependency upgrades to see if my module still works. For some people, the extra cost of compilation might be worth that; if I had many tests I could store the code in a string and compile it on demand, so I wouldn’t have to compile it for normal runs.

If I want embedded tests, I’m not likely to want to copy the test runner code in every module. I can move most of this into another module that other modules can include.

Because the UNITCHECK block isn’t going to work from an included module, I have to adjust my technique. It’s not as easy to inspect caller while compiling; I’ll have to wait until everything is compiled. Not only that, all of the methods in the module have to be defined by the time the base module wants to test, since I want to get the test names by looking at the symbol list. I can use the common module at the end of the file so it does its work after everything else is compiled, or I can require it so it compiles during the run phase. Here’s what that looks like; it’s the same code but in a different file and with adjustments to get the right level of caller:

package Modulino::Base;
use utf8;
use strict;
no warnings;

use vars qw($VERSION);
use Carp;

our $VERSION = '0.10_01';

sub _running_under_tester { !! $ENV{CPANTEST} }

sub _running_as_app {
    my $caller = scalar caller(1);
    (defined $caller) && $caller ne 'main';
    }

# run directly
if( ! defined caller(0) ) {
    carp sprintf "You cannot run %s directly!", __PACKAGE__;
    }
# loaded from a module that was run directly
elsif( ! defined caller(1) ) {
    my @caller = caller(0);
    my $method = do {
           if( _running_under_tester()    ) { 'test' }
        elsif( _running_as_app()          ) { 'run'  }
        else                                { undef  }
        };

    if( $caller[0]->can( $method ) ) {
        $caller[0]->$method( @ARGV );
        }
    elsif( __PACKAGE__->can( $method ) ) { # faking inheritance
        __PACKAGE__->$method( $caller[0], @ARGV )
        }
    else {
        carp "There is no $method() method defined in $caller[0]\n";
        }
    }

sub test {
    my( $class, $caller ) = @_;

    my @tests = do {
        if( $caller->can( '_get_tests' ) ) {
            $caller->_get_tests;
            }
        else {
            $class->_get_tests( $caller );
            }
        };

    require Test::More;
    Test::More::note( "Running $caller as a test" );
    foreach my $test ( @tests ) {
        Test::More::subtest( $test => sub {
            my $rc = eval { $caller->$test(); 1 };
            Test::More::diag( $@ ) unless defined $rc;
            } );
        }

    Test::More::done_testing();
    }

sub _get_tests {
    my( $class, $caller ) = @_;
    print "_get_tests class is [$class]\n";
    no strict 'refs';
    my $stub = $caller . '::';
    my @tests =
        grep { defined &{"$stub$_"}    }
        grep { 0 == index $_, '_test_' }
        keys %{ "$stub" };

    @tests;
    }

1;

I employ Modulino::Base with require so I can put it at the top of the file near the rest of the setup:

package Modulino::TestWithBase;
use utf8;
use strict;
use warnings;

use v5.10;

our $VERSION = '0.10_01';

require Modulino::Base;

...

I check that it still works:

% CPANTEST=1 perl -Ilib lib/Modulino/TestWithBase.pm
_get_tests class is [Modulino::Base]
# Running Modulino::TestWithBase as a test
    ok 1
    ok 2
    ok 3 # skip These tests don't work
    ok 4 # skip These tests don't work
    1..4
ok 1 - _test_run
1..1

Now that I’ve shown this, I will warn you about it. Many technical books show things the authors invented for the book, and this is no different. Most authors, on publishing the book, abandon the invention. Modulino::Demo, although on CPAN, is probably no different. It’s a simple concept that you can reinvent locally to get exactly what you need.

Creating a Program Distribution

There are a variety of ways to make a Perl distribution, and we covered these in Chapter 12 of Intermediate Perl. If I start with a program that I already have, I like to use my scriptdist program, which is available on CPAN (and beware, because everyone seems to write this program for themselves at some point). It builds a distribution around the program based on templates I created in ~/.scriptdist, so I can make the distro any way that I like, which also means that you can make yours any way you like, not just my way.

At this point, I need the basic tests and a Makefile.PL to control the whole thing, just as I do with normal modules. Everything ends up in a directory named after the program but with .d appended to it. I typically don’t use that directory name for anything other than a temporary placeholder, since I immediately import everything into source control:

% scriptdist Japh.pm
Quiet is 0
Home directory is /Users/Amelia
RC directory is /Users/Amelia/.scriptdist
Processing Japh.pm...
Install Module::Extract::Use to detect prerequisites
Install Module::Extract::DeclaredMinimumPerl to detect minimum versions
Making directory Japh.pm.d...
Making directory Japh.pm.d/t...
RC directory is /Users/Amelia/.scriptdist
cwd is /Users/Amelia/Desktop
Checking for file [.gitignore]... Adding file [.gitignore]...
Checking for file [.releaserc]... Adding file [.releaserc]...
Checking for file [Changes]... Adding file [Changes]...
Checking for file [MANIFEST.SKIP]... Adding file [MANIFEST.SKIP]...
Checking for file [Makefile.PL]... Adding file [Makefile.PL]...
Checking for file [t/compile.t]... Adding file [t/compile.t]...
Checking for file [t/pod.t]... Adding file [t/pod.t]...
Checking for file [t/test_manifest]... Adding file [t/test_manifest]...
Adding [Japh.pm]...
Copying script...
Opening input [Japh.pm] for output [Japh.pm.d/Japh.pm]
Copied [Japh.pm] with 0 replacements
Creating MANIFEST...
Initialized empty Git repository in /Users/Amelia/Desktop/Japh.pm.d/.git/
[master (root-commit) a799d24] Initial commit by /Users/Amelia/bin/perls/
scriptdist 0.22
 10 files changed, 77 insertions(+)
 create mode 100644 .gitignore
 create mode 100644 .releaserc
 create mode 100644 Changes
 create mode 100644 Japh.pm
 create mode 100644 MANIFEST
 create mode 100644 MANIFEST.SKIP
 create mode 100644 Makefile.PL
 create mode 100644 t/compile.t
 create mode 100644 t/pod.t
 create mode 100644 t/test_manifest
------------------------------------------------------------------
Remember to push this directory to your source control system.
In fact, why not do that right now?
------------------------------------------------------------------

Inside the Makefile.PL I have to make only a few minor adjustments to the usual module setup so it handles things as a program. I put the name of the program in the anonymous array for EXE_FILES, and ExtUtils::MakeMaker will do the rest. When I run make install, the program ends up in the right place (also based on the PREFIX setting):

WriteMakefile(
    'NAME'      => $script_name,
    'VERSION'   => '0.10',

    'EXE_FILES' =>  [ $script_name ],

    'PREREQ_PM' => {},

    'MAN1PODS'  => {
            $script_name => "\$(INST_MAN1DIR)/$script_name.1",
            },

    clean => { FILES => "*.bak $script_name-*" },
    );

An advantage of EXE_FILES is that ExtUtils::MakeMaker modifies the shebang line to point to the path of the perl binary that I used to run Makefile.PL. I don’t have to worry about the location of perl.

Once I have the basic distribution set up, I start off with some basic tests. I’ll spare you the details since you can look in scriptdist to see what it creates. The compile.t test simply ensures that everything at least compiles. If the program doesn’t compile, there’s no sense going on. The pod.t file checks the program documentation for Pod errors (see Chapter 14 for more details on Pod). These are the tests that clear up my most common mistakes (or, at least the ones I made most frequently before I started using these test files with all of my distributions).

Before I get started, I’ll check to ensure everything works correctly. Now that I’m treating my program as a module, I’ll test it every step of the way. The program won’t actually do anything until I run it as a program, though:

% cd Japh.pm.d
% perl Makefile.PL; make test
Checking if your kit is complete...
Looks good
Writing Makefile for Japh.pm
Writing MYMETA.yml and MYMETA.json
roscoe_brian[3120]$ make test
cp Japh.pm blib/lib/Japh.pm
cp Japh.pm blib/script/Japh.pm
/usr/bin/perl -MExtUtils::MY -e 'MY->fixin(shift)' -- 
blib/script/Japh.pm
PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" 
"-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
t/compile.t .. ok
t/pod.t ...... ok
All tests successful.
Files=2, Tests=3,  0 wallclock secs ( 0.04 usr  0.02 sys +  0.13 cusr  0.02 csys 
=  0.21 CPU)
Result: PASS

Adding to the Script

Now that I have all of the infrastructure in place, I want to further develop the program. Since I’m treating it as a module, I want to add subroutines that I can call when I want it to do the work. These subroutines should be small and easy to test. I might even be able to reuse these subroutines by simply including my modulino in another program. It’s just a module, after all, so why shouldn’t other programs use it?

First, I move away from a hardcoded message. I’ll do this in baby steps to illustrate the development of the modulino, and the first thing I’ll do is move the actual message to its own subroutine. That hides the message to print behind an interface, and later I’ll change how I get the message without having to change the run subroutine. I’ll also be able to test message separately. At the same time, I’ll put the entire program in its own package, which I’ll call Japh. That helps compartmentalize anything I do when I want to test the modulino or use it in another program:

#!/usr/bin/perl

package Japh;

run() unless caller();

sub run {
    print message(), "\n";
    }

sub message {
    'Just another Perl hacker, ';
    }

I can add another test file to the t/ directory now. My first test is simple. I check that I can use the modulino and that my new subroutine is there. I won’t get into testing the actual message yet, since I’m about to change that:

# message.t
use Test::More tests => 4;

use_ok( 'Japh' );

ok( defined &Japh::message );

Now I want to be able to configure the message. At the moment it’s in English, but maybe I don’t always want that. How am I going to get the message in other languages? I could do all sorts of fancy internationalization things, but for simplicity I’ll create a file that contains the language, the template string for that language, and the locales for that language. Here’s a configuration file that maps the locales to a template string for that language:

en_US "Just another %s hacker, "
eu_ES "apenas otro hacker del %s, "
fr_FR "juste un autre hacker de %s, "
de_DE "gerade ein anderer %s Hacker, "
it_IT "appena un altro hacker del %s, "

I add some bits to read the language file. I need to add a subroutine to read the file and return a data structure based on the information, and my message routine has to pick the correct template. Since message is now returning a template string, I need run to use sprintf instead. I also add another subroutine, topic, to return the type of hacker I am. I won’t branch out into the various ways I can get the topic, although you can see how I’m moving the program away from doing (or saying) one thing to making it much more flexible:

sub run {
    my $template = get_template();

    print message( $template ), "\n";
    }

sub message {
    my $template = shift;

    return sprintf $template, get_topic();
    }

sub get_topic { 'Perl' }

sub get_template { ... shown later ... }

I can add some tests to ensure that my new subroutines still work and also check that the previous tests still work.

Being quite pleased with myself that my modulino now works in many languages and that the message is configurable, I’m disappointed to find out that I’ve just introduced a possible problem. Since the user can decide the format string, he can do anything that printf allows him to do, and that’s quite a bit. I’m using user-defined data to run the program, so I should really turn on taint checking (see Chapter 2), but even better than that, I should get away from the problem rather than trying to put a bandage on it.

Instead of printf, I’ll use the Template module. My format strings will turn into templates:

en_US "Just another [% topic %] hacker, "
eu_ES "apenas otro hacker del [% topic %], "
fr_FR "juste un autre hacker de [% topic %], "
de_DE "gerade ein anderer [% topic %] Hacker, "
it_IT "Solo un altro hacker del [% topic %], "

Inside my modulino, I’ll include the Template module and configure the Template parser so it doesn’t evaluate Perl code. I only need to change message, because nothing else needs to know how message does its work:

sub message {
    my $template = shift;

    require Template;

    my $tt = Template->new(
        INCLUDE_PATH => '',
        INTERPOLATE  => 0,
        EVAL_PERL    => 0,
        );

    $tt->process( \$template, { topic => get_topic() }, \ my $cooked );

    return $cooked;
    }

Now I have a bit of work to do on the distribution side. My modulino now depends on Template, so I need to add that to the list of prerequisites. This way, CPAN (or CPANPLUS) will automatically detect the dependency and install it as it installs my modulino. That’s just another benefit of wrapping the program in a distribution:

WriteMakefile(
    ...

    'PREREQ_PM' => {
        Template => '0',
        },

    ...
    );

What happens if there is no configuration file, though? My message subroutine should still do something, so I give it a default message from get_template, but I also issue a warning if I have warnings enabled:

use File::Spec::Functions qw(catfile);
use Carp qw(carp);

sub get_template {
    my $default = "Just another [% topic %] hacker, ";

    my $file = catfile( qw( t config.txt) );

    my $fh;
    unless( open $fh, '<', $file ) {
        carp "Could not open '$file'";
        return $default;
        }

    my $locale = shift || 'en_US';
    while( <$fh> ) {
        chomp;
        my( $this_locale, $template ) = m/(\S+)\s+"(.*?)"/g;

        return $template if $this_locale eq $locale;
        }

    return $default;
    }

You know the drill by now: the new additions to the program require more tests. Again, I’ll leave that up to you.

Finally, I need to test the whole thing as a program. I’ve tested the bits and pieces individually, but do they all work together? To find out, I use the Test::Output module to run an external command and capture the output. I’ll compare that with what I expect. How I do this for programs depends on what the particular program is supposed to actually do. To run my program inside the test file, I wrap it in a subroutine and use the value of $^X for the perl binary I should use (that will be the same perl binary that’s running the tests):

#!/usr/bin/perl

use File::Spec::Functions qw(catfile);

use Test::More 'no_plan';
use Test::Output;

my $script = catfile( qw(blib script Japh.pm ) );

sub run_program {
    print `$^X $script`;
    }

{ # test for US English
local %ENV;
$ENV{LANG} = 'en_US';

stdout_is( \&run_program, "Just another Perl hacker, \n" );
}

{ # test for Spanish
local %ENV;
$ENV{LANG} = 'eu_ES';

stdout_is( \&run_program, "apenas otro hacker del Perl, \n" );
}

{ # test with no LANG setting
local %ENV;
delete $ENV{LANG};

stdout_is( \&run_program, "Just another Perl hacker, \n" );
}

{ # test with nonsense LANG setting
local %ENV;
$ENV{LANG} = 'blah blah';

stdout_is( \&run_program, "Just another Perl hacker, \n" );
}

Distributing the Programs

Once I create the program distribution, I can upload it to CPAN (or anywhere else I like) so other people can download it. To create the archive, I do the same thing I do for modules. First, I run make disttest, which creates a distribution, unwraps it in a new directory, and runs the tests. That ensures that the archive I give out has the necessary files and everything runs properly (well, most of the time):

% make disttest

After that, I create the archive in whichever format I like:

% make tardist
==OR==
% make zipdist

Finally, I upload it to PAUSE and announce it to the world. In real life, however, I use my release utility that comes with Module::Release and this (and much more) all happens in one step.

As a module living on CPAN, my modulino is a candidate for review by CPAN Testers, the loosely connected group of volunteers and automated computers that test just about every module. They don’t test programs, but our modulino doesn’t look like a program.

There is a little-known area of CPAN called “scripts” where people have uploaded stand-alone programs without full distribution support. Kurt Starsinic did some work on it to automatically index programs by category, and his solution simply looks in the program’s Pod documentation for a section called “SCRIPT CATEGORIES”. If I wanted, I could add my own categories to that section, and the programs archive should automatically index those on its next pass:

=pod SCRIPT CATEGORIES

CPAN/Administrative

=cut

Summary

I can create programs that look like modules. The entire program (outside of third-party modules) exists in a single file. Although it runs just like any other program, I can develop and test it just like a module. I get all the benefits of both forms, including testability, dependency handling, and installation. Since my program is a module, I can easily reuse parts of it in other programs, too.