Chapter 9. Modifying and Jury-Rigging Modules

Although there are over 25,000 distributions in CPAN, sometimes it doesn’t have exactly what I need. Sometimes a module has a bug or needs a new feature. I have several options for fixing things, whether or not the module’s author accepts my changes. The trick is to leave the module source the same but still fix the problem.

I can do several things to fix a module, and no solution is the right answer for every situation. I like to go with the solutions that mean the least amount of work for me and the most benefit for the Perl community, although those aren’t always compatible. For the rest of this section, I won’t give you a straight answer. All I can do is point out some of the issues involved so you can figure out what’s best for your situation.

The least amount of work in most cases is to fix anything I need and send a patch to the author so that he can incorporate it in the next release of the module. There’s even a bug tracker for every CPAN module, and the module author automatically gets an email notifying him about the issue.

Sometimes the author is available, has time to work on the module, and releases a new distribution. In that case, I’m done. On the other hand, CPAN is mostly the result of a lot of volunteer work, so the author may not have enough free time to commit to something that won’t pay his rent or put food in his mouth. Even the most conscientious module maintainer gets busy sometimes.

To be fair, even the seemingly simplest fixes aren’t trivial matters to all module maintainers. Patches hardly ever come with corresponding updates to the tests or documentation, and the patches might have consequences to other parts of the modules or to portability. Furthermore, patch submitters tend to change the interface in ways that work for them but somehow make the rest of the interface inconsistent. Things that seem like five minutes to the submitter might seem like a couple of hours to the maintainer, so they make it onto the To-Do list, but not the Done list.

When I’ve made my fix I get the diff, which is just the parts of the file that have changed. The diff command creates the patch:

% diff -u original_file updated_file  original_file.diff>

The patch shows which changes someone needs to make to the original version to get my new version:

% diff -u -d ISBN.pm.dist ISBN.pm
--- ISBN.pm.dist        2007-02-05 00:26:27.000000000 -0500
+++ ISBN.pm     2007-02-05 00:27:57.000000000 -0500
@@ -59,8 +59,8 @@
        $self->{'isbn'}      = $common_data;
        if($isbn13)
        {
-               $self->{'positions'} = [12];
-               ${$self->{'positions'}}[3] = 3;
+               $self->{'positions'}    = [12];
+               $self->{'positions'}[3] = 3;
        }
        else
        { $self->{'positions'} = [9]; }

The author can take the diff and apply it to his source using the patch program, which can read the diff to figure out the file and what it needs to do to update it:

% patch < original_file.diff

That’s the low-tech version. With Git (and many other lightweight source control systems), I can import the distribution into Git, make my changes, and make my patch from that. Once I have the distribution downloaded and unpacked and I change into that directory, I set up the repository and get to work:

% git init; git add .; git commit -a -m 'Original sources'
... make changes ...
% git format-patch --stdout -1

Yanick Champoux turned this idea into Git::CPAN::Patch, which automates most of the process for me. It defines a Git command that I call as git-cpan, which handles downloading the latest version of the module and creating the repository:

% git-cpan clone Foo::Bar
% cd Foo::Bar
... make changes ...
% git commit -a -m 'I made some changes'

When I’m done working, it’s easy for me to send the patch:

% git-cpan sendpatch

The last resort is forking, or creating a parallel distribution next to the official one. This is a danger of any popular open source project, but it’s been only on very rare occasions that this has happened with a Perl module. PAUSE will allow me to upload a module with a name registered to another author. The module will show up on CPAN but PAUSE will not index it. Since it’s not in the index, the tools that work with CPAN won’t see it even though CPAN stores it. Sites such as CPAN Search and MetaCPAN may mark it as “unauthorized.”

If I fork, I don’t have to use the same module name as the original. If I choose a different name, I can upload my fixed module, PAUSE will index it under its new name, and the CPAN tools can install it automatically. Nobody knows about my module because everybody uses the original version with the name they already know about and the interface they already use. It might help if my new interface is compatible with the original module or at least provides some sort of compatibility layer.

I had to debug a problem with a program that used Email::Stuff to send email through Gmail. Just like other mail servers, the program was supposed to connect to the mail server and send its mail, but it was hanging on the local side. It’s a long chain of calls, starting at Email::Stuff, then going through Email::Simple, Email::Send::SMTP, Net::SMTP::SSL, and Net::SMTP, and ending up in IO::Socket::INET. Somewhere in there something wasn’t happening right. This problem, by the way, prompted my Carp modifications in Chapter 3 so I could see a full dump of the arguments at each level.

I finally tracked it down to something going on in Net::SMTP. For some reason, the local port and address, which should have been selected automatically, weren’t. Here’s an extract of the real new method from Net::SMTP:

package Net::SMTP;

sub new
{
 my $self = shift;
 my $type = ref($self) || $self;

 ...
 my $h;
 foreach $h (@{ref($hosts) ? $hosts : [ $hosts ]})
  {
   $obj = $type->SUPER::new(PeerAddr => ($host = $h),
                PeerPort => $arg{Port} || 'smtp(25)',
                LocalAddr => $arg{LocalAddr},
                LocalPort => $arg{LocalPort},
                Proto    => 'tcp',
                Timeout  => defined $arg{Timeout}
                        ? $arg{Timeout}
                        : 120
               ) and last;
  }

...
 $obj;
}

The typical call to new passes the remote hostname as the first argument, then a series of pairs after that. Since I don’t want the standard SMTP port for Google’s service I specify it myself:

my $mailer = Net::SMTP->new(
    'smtp.gmail.com',
    Port => 465,
    ...
    );

The problem comes in when I don’t specify a LocalAddr or LocalPort argument. I shouldn’t have to do that, and the lower levels should find an available port for the default local address. For some reason, these lines were causing problems when they didn’t get a number. They don’t work if they are undef, which should convert to 0 when used as a number, and should tell the lower levels to choose appropriate values on their own:

LocalAddr => $arg{LocalAddr},
LocalPort => $arg{LocalPort},

To investigate the problem, I wanted to change Net::SMTP, but I didn’t want to edit Net/SMTP.pm directly. I get nervous when editing standard modules. Instead of editing it, I’ll surgically replace part of the module. I want to handle the case of the implicit LocalAddr and LocalPort values but also retain the ability to explicitly choose them. I’ve excerpted the full solution to show the relevant parts:

BEGIN {
use Net::SMTP;

no warnings 'redefine';

*Net::SMTP::new = sub
{
print "In my Net::SMTP::new...\n";

package Net::SMTP;

# ... snip

my $hosts = defined $host ? $host : $NetConfig{smtp_hosts};
 my $obj;

 my $h;
 foreach $h (@{ref($hosts) ? $hosts : [ $hosts ]})
  {
   $obj = $type->SUPER::new(PeerAddr => ($host = $h),
                PeerPort => $arg{Port} || 'smtp(25)',
                $arg{LocalAddr} ? ( LocalAddr => $arg{LocalAddr} ) : (),
                $arg{LocalPort} ? ( LocalPort => $arg{LocalPort} ) : (),
                Proto    => 'tcp',
                Timeout  => defined $arg{Timeout}
                        ? $arg{Timeout}
                        : 120
               );

  last if $obj;
  }

# ... snip

 $obj;
}

To make everything work out I have to do a few things. First I wrap the entire thing in a BEGIN block so this code runs before anyone really has a chance to use anything from Net::SMTP. Inside the BEGIN, I immediately load Net::SMTP so anything it defines is already in place; I wouldn’t want Perl to replace all of my hard work by loading the original code on top of it. Immediately after I load Net::SMTP, I tell Perl not to warn me about what I’m going to do next. That’s a little clue that I shouldn’t do this lightly, but it’s not enough to stop me.

Once I have everything in place, I redefine Net::SMTP::new() by assigning to the typeglob for that name. The big change is inside the foreach loop. If the argument list didn’t have true values for LocalAddr and LocalPort, I don’t include them in the argument list to the SUPER class:

$arg{LocalAddr} ? ( LocalAddr => $arg{LocalAddr} ) : (),
$arg{LocalPort} ? ( LocalPort => $arg{LocalPort} ) : (),

Inside new, there’s a call to SUPER. Unlike what most people expect, that virtual method works with the current package and not the class of the object. As such, inside my subroutine I change the default package to Net::SMTP.

That’s a nifty trick. If $arg{LocalAddr} has a true value, it selects the first option in the conditional operator, so I include LocalAddr => $arg{LocalAddr} in the argument list. If $arg{LocalAddr} doesn’t have a true value, I get the second option of the ternary operator, which is just the empty list. In that case, the lower levels choose appropriate values on their own.

Now I have my fix to my Net::SMTP problem, but I haven’t changed the original file. Even if I don’t want to use my trick in production, it’s extremely effective for figuring out what’s going on. I can change the offending module and instantly discard my changes to get back to the original. It also serves as an example I can send to the module author when I report my problem.

I could have done this another way. I could copy the module source to a new location and add that new location to @INC. I then modify my copy, leaving the original untouched. This would work in the short term, but I’ll end up with leftover files that get in the way of other things.

I could have copied the original Net/STMP.pm to ~/lib/Net/SMTP.pm (in my home directory). I can modify and test the copy. If I forget to remove that file but have that directory in @INC for another module, I might unintentionally load the modified Net::SMTP with whatever unsupported or broken changes I made.

The best solution, if possible, is a subclass that inherits from the module I need to alter. My changes live in their own source files, and I don’t have to touch the source of the original module. We mostly covered this in our barnyard example in Intermediate Perl, so I won’t go over it again here.

Before I do too much work, I create an empty subclass. I’m not going to do a lot of work if I can’t even get it working when I haven’t changed anything yet. For this example, I want to subclass the Foo module so I can add a new feature. I can use the Local namespace, which should never conflict with a real module name. My Local::Foo module inherits from the module I want to fix, Foo, using the parent pragma:

package Local::Foo

use parent qw(Foo);

1;

If I’m going to be able to subclass this module, I should be able to simply change the class name I use, and everything should still work. In my program, I use the same methods from the original class, and since I didn’t actually override anything, I should get the exact same behavior as the original module. This is sometimes called the “empty” or “null subclass test”:

#!/usr/bin/perl

# use Foo
use Local::Foo;

#my $object = Foo->new();
my $object = Local::Foo->new( ... );

The next part depends on what I want to do. Am I going to completely replace a feature or method, or do I just want to add a little bit to it? I can add a method to my subclass. I probably want to call the SUPER method first to let the original method do its work:

package Local::Foo

use parent qw(Foo);

sub new {
    my( $class, @args ) = @_;

    ... munge arguments here

    my $self = $class->SUPER::new( @_ );

    ... do my new stuff here.
    }

1;

Sometimes this won’t work, though, because the original module can’t be subclassed, either by design or accident. For instance, the unsuspecting module author might have used the one-argument form of bless. Without the second argument, bless uses the current package for the object type. No matter what I do in the subclass, the one-argument bless will return an object that ignores the subclass:

sub new {
    my( $class, @args ) = @_;

    my $self = { ... };

    bless $self;
    }

To make this subclassable I need to use the first argument to new, which I stored in $class, as the second argument to bless:

sub new {  # in the subclass to make it subclassable!
    my( $class, @args ) = @_;

    my $self = { ... };

    bless $self, $class;
    }

The value in $class is the original class name that I used, not the current package. Unless I have a good reason to ignore the original class name, I should always use it with bless.

In testing this, there are two things I want to check. First, I need to ensure that inheritance works. That means that somewhere in the inheritance tree I find the parent class, Foo, as well as the class I used to create the object, Local::Foo:

# some file in t/
use Test::More;

my $object = Local::Foo->new();

foreach my $isa_class ( qw( Foo Local::Foo ) )
    {
    isa_ok( $object, $isa_class, "Inherits from $isa_class" );
    }

Normally, that should be enough. If I need the object to belong in a particular class rather than merely inherit from it, I can check the exact class using ref:

is( ref $object, 'Local::Foo', 'Object is type Local::Foo' );

The ref built-in isn’t as good as the blessed function from the Scalar::Util module that is included in Perl since 5.8. It does the same thing but returns undef if its argument isn’t blessed. That avoids the case of ref returning true for an unblessed reference:

use Scalar::Util qw(blessed);
is( blessed $object, 'Local::Foo', 'Object is type Local::Foo' );

Once I’m satisfied that I can make the subclass, I start to override methods in the subclass to get my desired behavior.

An ExtUtils::MakeMaker Example

Sometimes module authors know that their module won’t meet everyone’s needs, and they provide a way to get around the default behavior.

ExtUtils::MakeMaker works for most module installers but if it doesn’t do something that I need, I can easily change it through subclassing. To do this, ExtUtils::MakeMaker uses the special subclass name My. Before it calls its hardcoded methods, it looks for the same method names in the package My and will use those preferentially.

As MakeMaker performs its magic, it writes to the file Makefile according to what its methods tell it to do. What it decides to write comes from ExtUtils::MM_Any, the parent class for the magic, and then perhaps a subclass, such as ExtUtils::MM_Unix or ExtUtils::MM_Win32, that might override methods for platform-specific issues.

In my Test::Manifest module I want to change how testing works. I want the make test step to execute the test files in the order I specify rather than the order in which glob returns the filenames from the t/ directory. The function test_via_harness writes out a section of the Makefile. I know this because I look in the Makefile to find which bits do the part I want to change, then look for that text in the module to find the right function:

package ExtUtils::MM_Any;

sub test_via_harness {
    my($self, $perl, $tests) = @_;

    return qq{\t$perl "-MExtUtils::Command::MM" }.
        qq{"-e" "test_harness(\$(TEST_VERBOSE),
        '\$(INST_LIB)', '\$(INST_ARCHLIB)')" $tests\n};
}

After interpolations and replacements, the output in the Makefile shows up as something like this (although results may differ by platform):

test_dynamic :: pure_all
        PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e"
        "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')"
        $(TEST_FILES)

After boiling everything down, a make test essentially runs a command that globs all of the files in the t/ directory and executes them in that order. This leads module authors to name their modules odd things like 00.load.t or 99.pod.t to make the order come out how they like:

perl -MExtUtils::Command::MM -e 'test_harness( ... )' t/*.t

It doesn’t matter much what test_harness actually does as long as my replacement does the same thing. In this case, I don’t want the test files to come from @ARGV because I want to control their order.

To change how that works, I need to get my function in the place of test_harness. By defining my own test_via_harness subroutine in the package MY, I can put any text I like in place of the normal test_via_harness. I want to use my function from Test::Manifest. I use the full package specification as the subroutine name to put it into the right namespace:

package Test::Manifest;

sub MY::test_via_harness {
    my($self, $perl, $tests) = @_;

    return qq|\t$perl "-MTest::Manifest" | .
        qq|"-e" "run_t_manifest(\$(TEST_VERBOSE), '\$(INST_LIB)', | .
        qq|'\$(INST_ARCHLIB)', \$(TEST_LEVEL) )"\n|;
    };

Instead of taking the list of files as arguments, in my run_t_manifest subroutine I call get_t_files(), which looks in the file t/test_manifest. Once run_t_manifest() has the list of files, it passes it to Test::Harness::runtests(), the same thing that the original test_harness() ultimately calls:

use File::Spec::Functions;

my $Manifest = catfile( "t", "test_manifest" );

sub run_t_manifest {
    ...;

    my @files = get_t_files( $level );

    ...;
    Test::Harness::runtests( @files );
    }

sub get_t_files {
    return unless open my $fh, $Manifest;

    my @tests = ();

    while( <$fh> ) {
        ...;

        push @tests, catfile( "t", $test ) if -e catfile( "t", $test );
        }
    close $fh;

    return wantarray ? @tests : join " ", @tests;
    }

In t/test_manifest I list the test files to run, optionally commenting lines I want to skip. I list them in any order I like, and that’s the order I’ll run them:

load.t
pod.t
pod_coverage.t
#prereq.t
new.t
feature.t
other_feature.t

By subclassing the module I don’t have to fool with ExtUtils::MakeMaker, which I certainly don’t want to do. I get the feature I want and I don’t break the module for anyone else. I still have the same ExtUtils::MakeMaker source that everyone else has. I go through the same process if I need to change any other behavior in ExtUtils::MakeMaker.

Instead of replacing a subroutine or method, I might just want to wrap it in another subroutine. That way I can inspect and validate the input before I run the subroutine and I can intercept and clean up the return value before I pass it back to the original caller. The basic idea looks like this:

sub wrapped_foo {
    my @args = @_;

    ...; # prepare @args for next step;

    my $result = foo( @args );

    ...; # clean up $result

    return $result;
    }

To do this right, however, I need to handle the different contexts. If I call wrapped_foo in list context, I need to call foo in list context too. It’s not unusual for Perl subroutines to have contextual behavior and for Perl programmers to expect it. My basic template changes to handle scalar, list, and void contexts:

sub wrapped_foo {
    my @args = @_;
    ...; # prepare @args for next step;
    if( wantarray ) {          # list context
        my @result = foo( @args );

        return @result;
        }
    elsif( defined wantarray ) { # scalar context
        my $result = foo( @args );
        ...; # clean up $result
        return $result;
        }
    else {                      # void context
        foo( @args );
        }
    }

It gets a bit more complicated than this, but Damian Conway makes it easy with Hook::LexWrap. He lets me add pre- and posthandlers that run before and after the wrapped subroutine, and he takes care of all of the details in the middle. His interface is simple; I use the wrap subroutine and provide the handlers as anonymous subroutines. The wrapped version is sub_to_watch(), and I call it as a normal subroutine:

#!/usr/bin/perl

use Hook::LexWrap;

wrap 'sub_to_watch',
    pre  => sub { print "The arguments are [@_]\n" },
    post => sub { print "Result was [$_[-1]]\n" };

sub_to_watch( @args );

Hook::LexWrap adds another element to @_ to hold the return value, so in my posthandler I look in $_[-1] to see the result.

I can use this to rewrite my divide example from Chapter 3. In that example, I had a subroutine to return the quotient of two numbers. In my made-up situation, I was passing it the wrong arguments, hence getting the wrong answer. Here’s my subroutine again:

sub divide {
    my( $n, $m ) = @_;
    my $quotient = $n / $m;
    }

Now I want to inspect the arguments before they go in, and see the return value before it comes back. If the actual arguments going in and the quotient match, then the subroutine is doing the right thing, but someone is using the wrong arguments. If the arguments are right but the quotient is wrong, then the subroutine is wrong:

#!/usr/bin/perl

use Hook::LexWrap;

sub divide {
    my( $n, $m ) = @_;
    my $quotient = $n / $m;
    }

wrap 'divide',
    pre  => sub { print "The arguments are [@_]\n" },
    post => sub { print "Result was [$_[-1]]\n" };

my $result = divide( 4, 4 );

After I wrap the subroutine, I call divide as I normally would. More importantly, though, I’m not changing my program for calls to divide, because Hook::LexWrap does some magic behind the scenes to replace the subroutine definition so that my entire program sees the wrapped version. I’ve changed the subroutine without editing the original source. Without (apparently) changing the subroutine, whenever I call it I get a chance to see extra output:

The arguments are [4 4 ]
Result was [1]

When I remove the wrap, I leave everything just as I found it and don’t have to worry about reverting my changes.

I don’t have to change module code to change how a module works. For an object-oriented module, I can create a subclass to change the parts I don’t like. If I can’t subclass it for some reason, I can replace parts of it, just as I can with any other module. No matter what I do, however, I usually want to leave the original code alone (unless it’s my module and I need to fix it) so I don’t make the problem worse.

The perlboot documentation has an extended subclassing example, although this document was removed in v5.16. You can read it in earlier versions of Perl. It’s also in Intermediate Perl.

I talk about Hook::LexWrap in “Wrapping Subroutines to Trace Code Execution”, The Perl Journal, July 2005.

Code::Splice and Monkey::Patch are interesting alternatives to Hook::Lex::Wrap, but they are fairly young experiments.

The documentation of diff and patch discuss their use. The patch man page is particularly instructive since it contains a section near the end that talks about the pragmatic considerations of using the tools and dealing with other programmers.