Although there are over 25,000 distributions in CPAN, sometimes it doesn’t have exactly what I need. Sometimes a module has a bug or needs a new feature. I have several options for fixing things, whether or not the module’s author accepts my changes. The trick is to leave the module source the same but still fix the problem.
I can do several things to fix a module, and no solution is the right answer for every situation. I like to go with the solutions that mean the least amount of work for me and the most benefit for the Perl community, although those aren’t always compatible. For the rest of this section, I won’t give you a straight answer. All I can do is point out some of the issues involved so you can figure out what’s best for your situation.
The least amount of work in most cases is to fix anything I need and send a patch to the author so that he can incorporate it in the next release of the module. There’s even a bug tracker for every CPAN module, and the module author automatically gets an email notifying him about the issue.
Sometimes the author is available, has time to work on the module, and releases a new distribution. In that case, I’m done. On the other hand, CPAN is mostly the result of a lot of volunteer work, so the author may not have enough free time to commit to something that won’t pay his rent or put food in his mouth. Even the most conscientious module maintainer gets busy sometimes.
To be fair, even the seemingly simplest fixes aren’t trivial matters to all module maintainers. Patches hardly ever come with corresponding updates to the tests or documentation, and the patches might have consequences to other parts of the modules or to portability. Furthermore, patch submitters tend to change the interface in ways that work for them but somehow make the rest of the interface inconsistent. Things that seem like five minutes to the submitter might seem like a couple of hours to the maintainer, so they make it onto the To-Do list, but not the Done list.
When I’ve made my fix I get the diff, which is just the parts of the file that have changed. The diff command creates the patch:
% diff -u original_file updated_file
original_file.diff>
The patch shows which changes someone needs to make to the original version to get my new version:
% diff -u -d ISBN.pm.dist ISBN.pm --- ISBN.pm.dist 2007-02-05 00:26:27.000000000 -0500 +++ ISBN.pm 2007-02-05 00:27:57.000000000 -0500 @@ -59,8 +59,8 @@ $self->{'isbn'} = $common_data; if($isbn13) { - $self->{'positions'} = [12]; - ${$self->{'positions'}}[3] = 3; + $self->{'positions'} = [12]; + $self->{'positions'}[3] = 3; } else { $self->{'positions'} = [9]; }
The author can take the diff and apply it to his source using the patch program, which can read the diff to figure out the file and what it needs to do to update it:
% patch < original_file.diff
That’s the low-tech version. With Git (and many other lightweight source control systems), I can import the distribution into Git, make my changes, and make my patch from that. Once I have the distribution downloaded and unpacked and I change into that directory, I set up the repository and get to work:
% git init; git add .; git commit -a -m 'Original sources'
... make changes ...% git format-patch --stdout -1
Yanick Champoux turned this idea into Git::CPAN::Patch
, which automates most of the process for me. It defines a
Git command that I call as git-cpan,
which handles downloading the latest version of the module and creating
the repository:
% git-cpan clone Foo::Bar
% cd Foo::Bar
... make changes ...% git commit -a -m 'I made some changes'
When I’m done working, it’s easy for me to send the patch:
% git-cpan sendpatch
If I can’t get the attention of the module maintainer, I might just make changes to the sources myself. Doing it this way usually seems like it works for a while, but when I update modules from CPAN, my changes disappear as a new version of the module overwrites the file. I can partially solve that by making my module version number very high, hoping an authentic version isn’t greater than the one I choose:
our $VERSION = 99999;
This has the disadvantage of making my job tough if I want to install an official version of the distribution that the maintainer has fixed. That version will most likely have a smaller number, so tools such as cpan and cpanminus will think my patched version is up-to-date and won’t install the seemingly older, but actually newer, version over it.
Other people who want to use my software might have the same problems, but they won’t realize what’s going on when things break after they update seemingly unrelated modules. Some software vendors get around this by creating a module directory about which only their application knows and putting all the approved versions of modules, including their patched versions, in that directory. That’s more work than I want, personally, but it does work.
If the module is important to you (or your business) and the author has disappeared, you might consider officially taking over its maintenance. Although every module on CPAN has an owner, the administrators of the Perl Authors Upload Server (PAUSE) can make you a co-maintainer, or even transfer complete ownership of the module to you.
The process is simple, although not automated. First, send a message to modules@perl.org inquiring about the module status. Often, the administrators can reach the author when you cannot because the author recognizes their names. Second, the admins will tell you to publicly announce your intent to take over the module, which really means to announce it where most of the community will see it. Next, just wait. This sort of thing doesn’t happen quickly, because the administrators give the author plenty of time to respond. They don’t want to transfer a module while an author’s on holiday!
Once you take over the module, though, you’ve taken over the module. You’ll probably find that the grass isn’t greener on the other side, and at least empathize with the plight of the maintainers of free software, starting the cycle once again.
The last resort is forking, or creating a parallel distribution next to the official one. This is a danger of any popular open source project, but it’s been only on very rare occasions that this has happened with a Perl module. PAUSE will allow me to upload a module with a name registered to another author. The module will show up on CPAN but PAUSE will not index it. Since it’s not in the index, the tools that work with CPAN won’t see it even though CPAN stores it. Sites such as CPAN Search and MetaCPAN may mark it as “unauthorized.”
If I fork, I don’t have to use the same module name as the original. If I choose a different name, I can upload my fixed module, PAUSE will index it under its new name, and the CPAN tools can install it automatically. Nobody knows about my module because everybody uses the original version with the name they already know about and the interface they already use. It might help if my new interface is compatible with the original module or at least provides some sort of compatibility layer.
I might just decide to not use a third-party module at all. If I write the module myself I can always find the maintainer. Of course, now that I’m the creator and the maintainer, I’ll probably be the person about whom everyone else complains. Doing it myself means I have to do it myself. That doesn’t quite fit my goal of doing the least amount of work. In very rare cases do these replacement modules catch on, and I should consider that before I do a lot of work.
I had to debug a problem with a program that used Email::Stuff
to send email through Gmail. Just like other mail servers,
the program was supposed to connect to the mail server and send its mail,
but it was hanging on the local side. It’s a long chain of calls, starting
at Email::Stuff
,
then going through Email::Simple
, Email::Send::SMTP
, Net::SMTP::SSL
, and Net::SMTP
, and ending up in IO::Socket::INET
. Somewhere in there something wasn’t happening right. This
problem, by the way, prompted my Carp
modifications in Chapter 3 so I could
see a full dump of the arguments at each level.
I finally tracked it down to something going on in Net::SMTP
.
For some reason, the local port and address, which should have been
selected automatically, weren’t. Here’s an extract of the real new
method from Net::SMTP
:
package Net::SMTP; sub new { my $self = shift; my $type = ref($self) || $self; ... my $h; foreach $h (@{ref($hosts) ? $hosts : [ $hosts ]}) { $obj = $type->SUPER::new(PeerAddr => ($host = $h), PeerPort => $arg{Port} || 'smtp(25)', LocalAddr => $arg{LocalAddr}, LocalPort => $arg{LocalPort}, Proto => 'tcp', Timeout => defined $arg{Timeout} ? $arg{Timeout} : 120 ) and last; } ... $obj; }
The typical call to new
passes
the remote hostname as the first argument, then a series of pairs after
that. Since I don’t want the standard SMTP port for Google’s service I
specify it myself:
my $mailer = Net::SMTP->new( 'smtp.gmail.com', Port => 465, ... );
The problem comes in when I don’t specify a LocalAddr
or LocalPort
argument. I shouldn’t have to do that,
and the lower levels should find an available port for the default local
address. For some reason, these lines were causing problems when they
didn’t get a number. They don’t work if they are undef
, which should convert to 0
when used as a number, and should tell the
lower levels to choose appropriate values on their own:
LocalAddr => $arg{LocalAddr}, LocalPort => $arg{LocalPort},
To investigate the problem, I wanted to change Net::SMTP
,
but I didn’t want to edit Net/SMTP.pm
directly. I get nervous when editing standard modules. Instead of editing
it, I’ll surgically replace part of the module. I want to handle the case
of the implicit LocalAddr
and LocalPort
values but also retain the ability to
explicitly choose them. I’ve excerpted the full solution to show the
relevant parts:
BEGIN { use Net::SMTP; no warnings 'redefine'; *Net::SMTP::new = sub { print "In my Net::SMTP::new...\n"; package Net::SMTP; # ... snip my $hosts = defined $host ? $host : $NetConfig{smtp_hosts}; my $obj; my $h; foreach $h (@{ref($hosts) ? $hosts : [ $hosts ]}) { $obj = $type->SUPER::new(PeerAddr => ($host = $h), PeerPort => $arg{Port} || 'smtp(25)', $arg{LocalAddr} ? ( LocalAddr => $arg{LocalAddr} ) : (), $arg{LocalPort} ? ( LocalPort => $arg{LocalPort} ) : (), Proto => 'tcp', Timeout => defined $arg{Timeout} ? $arg{Timeout} : 120 ); last if $obj; } # ... snip $obj; }
To make everything work out I have to do a few things. First I wrap
the entire thing in a BEGIN
block so this code runs before anyone really has a chance to use
anything from Net::SMTP
.
Inside the BEGIN
, I immediately load
Net::SMTP
so anything it defines is already in place; I wouldn’t want Perl to
replace all of my hard work by loading the original code on top of it.
Immediately after I load Net::SMTP
,
I tell Perl not to warn me about what I’m going to do next. That’s a
little clue that I shouldn’t do this lightly, but it’s not enough to stop
me.
Once I have everything in place, I redefine Net::SMTP::new()
by assigning to the typeglob
for that name. The big change is inside the foreach
loop. If the argument list didn’t have
true values for LocalAddr
and LocalPort
, I don’t include them in the argument
list to the SUPER
class:
$arg{LocalAddr} ? ( LocalAddr => $arg{LocalAddr} ) : (), $arg{LocalPort} ? ( LocalPort => $arg{LocalPort} ) : (),
Inside new
, there’s a call to
SUPER
. Unlike what most people expect,
that virtual method works with the current package and not the class of
the object. As such, inside my subroutine I change the default package to
Net::SMTP
.
That’s a nifty trick. If $arg{LocalAddr}
has a true value, it selects the
first option in the conditional operator, so I include LocalAddr => $arg{LocalAddr}
in the argument
list. If $arg{LocalAddr}
doesn’t have a
true value, I get the second option of the ternary operator, which is just
the empty list. In that case, the lower levels choose appropriate values
on their own.
Now I have my fix to my Net::SMTP
problem, but I haven’t changed the original file. Even if I don’t want to
use my trick in production, it’s extremely effective for figuring out
what’s going on. I can change the offending module and instantly discard
my changes to get back to the original. It also serves as an example I can
send to the module author when I report my problem.
I could have done this another way. I could copy the module source
to a new location and add that new location to @INC
. I then modify my copy, leaving the
original untouched. This would work in the short term, but I’ll end up
with leftover files that get in the way of other things.
I could have copied the original Net/STMP.pm to ~/lib/Net/SMTP.pm (in my home directory). I can
modify and test the copy. If I forget to remove that file but have that
directory in @INC
for another module, I
might unintentionally load the modified Net::SMTP
with whatever unsupported or broken changes I made.
The best solution, if possible, is a subclass that inherits from the module I need to alter. My changes live in their own source files, and I don’t have to touch the source of the original module. We mostly covered this in our barnyard example in Intermediate Perl, so I won’t go over it again here.
Before I do too much work, I create an empty subclass. I’m not going
to do a lot of work if I can’t even get it working when I haven’t changed
anything yet. For this example, I want to subclass the Foo
module so I can add a new feature. I can use
the Local
namespace, which should never
conflict with a real module name. My Local::Foo
module inherits from the module I
want to fix, Foo
, using the parent
pragma:
package Local::Foo use parent qw(Foo); 1;
If I’m going to be able to subclass this module, I should be able to simply change the class name I use, and everything should still work. In my program, I use the same methods from the original class, and since I didn’t actually override anything, I should get the exact same behavior as the original module. This is sometimes called the “empty” or “null subclass test”:
#!/usr/bin/perl # use Foo use Local::Foo; #my $object = Foo->new(); my $object = Local::Foo->new( ... );
The next part depends on what I want to do. Am I going to completely
replace a feature or method, or do I just want to add a little bit to it?
I can add a method to my subclass. I probably want to call the SUPER
method first to let the original method do
its work:
package Local::Foo use parent qw(Foo); sub new { my( $class, @args ) = @_; ... munge arguments here my $self = $class->SUPER::new( @_ ); ... do my new stuff here. } 1;
Sometimes this won’t work, though, because the original module can’t
be subclassed, either by design or accident. For instance, the
unsuspecting module author might have used the one-argument form of
bless
. Without the second argument,
bless
uses the current package for the
object type. No matter what I do in the subclass, the one-argument
bless
will return an object that
ignores the subclass:
sub new { my( $class, @args ) = @_; my $self = { ... }; bless $self; }
To make this subclassable I need to use the first argument to
new
, which I stored in $class
, as the second argument to bless
:
sub new { # in the subclass to make it subclassable! my( $class, @args ) = @_; my $self = { ... }; bless $self, $class; }
The value in $class
is the
original class name that I used, not the current package. Unless I have a
good reason to ignore the original class name, I should always use it with
bless
.
In testing this, there are two things I want to check. First, I need
to ensure that inheritance works. That means that somewhere in the
inheritance tree I find the parent class, Foo
, as well as the class I used to create the
object, Local::Foo
:
# some file in t/ use Test::More; my $object = Local::Foo->new(); foreach my $isa_class ( qw( Foo Local::Foo ) ) { isa_ok( $object, $isa_class, "Inherits from $isa_class" ); }
Normally, that should be enough. If I need the object to belong in a
particular class rather than merely inherit from it, I can check the exact
class using ref
:
is( ref $object, 'Local::Foo', 'Object is type Local::Foo' );
The ref
built-in isn’t as good as
the blessed
function from the Scalar::Util
module that is included in Perl since 5.8. It does the same
thing but returns undef
if its argument
isn’t blessed. That avoids the case of ref
returning true for an unblessed
reference:
use Scalar::Util qw(blessed); is( blessed $object, 'Local::Foo', 'Object is type Local::Foo' );
Once I’m satisfied that I can make the subclass, I start to override methods in the subclass to get my desired behavior.
Sometimes module authors know that their module won’t meet everyone’s needs, and they provide a way to get around the default behavior.
ExtUtils::MakeMaker
works for most module installers but if it doesn’t do
something that I need, I can easily change it through subclassing. To do
this, ExtUtils::MakeMaker
uses the special subclass name My
.
Before it calls its hardcoded methods, it looks for the same method
names in the package My
and will use
those preferentially.
As MakeMaker
performs its
magic, it writes to the file Makefile according to what its methods tell it to do. What it decides to
write comes from ExtUtils::MM_Any
, the parent class for the magic, and then perhaps a
subclass, such as ExtUtils::MM_Unix
or ExtUtils::MM_Win32
, that might override methods for platform-specific
issues.
In my Test::Manifest
module I want to change how testing works. I want the
make test
step to execute the test
files in the order I specify rather than the order in which glob
returns the filenames from the t/ directory. The function test_via_harness
writes out a section of the
Makefile. I know this because I
look in the Makefile to find which
bits do the part I want to change, then look for that text in the module
to find the right function:
package ExtUtils::MM_Any; sub test_via_harness { my($self, $perl, $tests) = @_; return qq{\t$perl "-MExtUtils::Command::MM" }. qq{"-e" "test_harness(\$(TEST_VERBOSE), '\$(INST_LIB)', '\$(INST_ARCHLIB)')" $tests\n}; }
After interpolations and replacements, the output in the Makefile shows up as something like this (although results may differ by platform):
test_dynamic :: pure_all PERL_DL_NONLAZY=1 $(FULLPERLRUN) "-MExtUtils::Command::MM" "-e" "test_harness($(TEST_VERBOSE), '$(INST_LIB)', '$(INST_ARCHLIB)')" $(TEST_FILES)
After boiling everything down, a make test
essentially runs a command that globs all of the files in the t/ directory and executes them in that order.
This leads module authors to name their modules odd things like
00.load.t or 99.pod.t to make the order come out how they
like:
perl -MExtUtils::Command::MM -e 'test_harness( ... )' t/*.t
It doesn’t matter much what test_harness
actually does as long as my
replacement does the same thing. In this case, I don’t want the test
files to come from @ARGV
because I
want to control their order.
To change how that works, I need to get my function in the place
of test_harness
. By defining my own
test_via_harness
subroutine in the
package MY
, I can put any text I like
in place of the normal test_via_harness
. I want to use my function
from Test::Manifest
.
I use the full package specification as the subroutine name to put it
into the right namespace:
package Test::Manifest; sub MY::test_via_harness { my($self, $perl, $tests) = @_; return qq|\t$perl "-MTest::Manifest" | . qq|"-e" "run_t_manifest(\$(TEST_VERBOSE), '\$(INST_LIB)', | . qq|'\$(INST_ARCHLIB)', \$(TEST_LEVEL) )"\n|; };
Instead of taking the list of files as arguments, in my run_t_manifest
subroutine I call get_t_files()
, which looks in the file
t/test_manifest. Once run_t_manifest()
has the list of files, it
passes it to Test::Harness::runtests()
, the same thing that
the original test_harness()
ultimately calls:
use File::Spec::Functions; my $Manifest = catfile( "t", "test_manifest" ); sub run_t_manifest { ...; my @files = get_t_files( $level ); ...; Test::Harness::runtests( @files ); } sub get_t_files { return unless open my $fh, $Manifest; my @tests = (); while( <$fh> ) { ...; push @tests, catfile( "t", $test ) if -e catfile( "t", $test ); } close $fh; return wantarray ? @tests : join " ", @tests; }
In t/test_manifest I list the test files to run, optionally commenting lines I want to skip. I list them in any order I like, and that’s the order I’ll run them:
load.t pod.t pod_coverage.t #prereq.t new.t feature.t other_feature.t
By subclassing the module I don’t have to fool with ExtUtils::MakeMaker
,
which I certainly don’t want to do. I get the feature I want and I don’t
break the module for anyone else. I still have the same ExtUtils::MakeMaker
source that everyone else has. I go through the same process if I need
to change any other behavior in ExtUtils::MakeMaker
.
For another example of subclassing, see Chapter 14, where I subclass Pod::Simple
. Sean Burke wrote the module specifically for others to subclass. Most
of this book started as pseudopod, a special O’Reilly Media variant of
plain ol’ documentation, and I created my own Pod::PseudoPod
subclasses to convert the source to HTML pages for the
website and for the final sources for the production team.
Instead of replacing a subroutine or method, I might just want to wrap it in another subroutine. That way I can inspect and validate the input before I run the subroutine and I can intercept and clean up the return value before I pass it back to the original caller. The basic idea looks like this:
sub wrapped_foo { my @args = @_; ...; # prepare @args for next step; my $result = foo( @args ); ...; # clean up $result return $result; }
To do this right, however, I need to handle the different contexts.
If I call wrapped_foo
in list context,
I need to call foo
in list context too.
It’s not unusual for Perl subroutines to have contextual behavior and for
Perl programmers to expect it. My basic template changes to handle scalar,
list, and void contexts:
sub wrapped_foo { my @args = @_; ...; # prepare @args for next step; if( wantarray ) { # list context my @result = foo( @args ); return @result; } elsif( defined wantarray ) { # scalar context my $result = foo( @args ); ...; # clean up $result return $result; } else { # void context foo( @args ); } }
It gets a bit more complicated than this, but Damian Conway makes it easy with Hook::LexWrap
. He lets me add pre- and posthandlers that run before and
after the wrapped subroutine, and he takes care of all of the details in
the middle. His interface is simple; I use the wrap
subroutine and provide the handlers as
anonymous subroutines. The wrapped version is sub_to_watch()
, and I call it as a normal
subroutine:
#!/usr/bin/perl use Hook::LexWrap; wrap 'sub_to_watch', pre => sub { print "The arguments are [@_]\n" }, post => sub { print "Result was [$_[-1]]\n" }; sub_to_watch( @args );
Hook::LexWrap
adds another element to @_
to hold the
return value, so in my posthandler I look in $_[-1]
to see the result.
I can use this to rewrite my divide
example from Chapter 3. In that example, I had a subroutine to return the
quotient of two numbers. In my made-up situation, I was passing it the
wrong arguments, hence getting the wrong answer. Here’s my subroutine
again:
sub divide { my( $n, $m ) = @_; my $quotient = $n / $m; }
Now I want to inspect the arguments before they go in, and see the return value before it comes back. If the actual arguments going in and the quotient match, then the subroutine is doing the right thing, but someone is using the wrong arguments. If the arguments are right but the quotient is wrong, then the subroutine is wrong:
#!/usr/bin/perl use Hook::LexWrap; sub divide { my( $n, $m ) = @_; my $quotient = $n / $m; } wrap 'divide', pre => sub { print "The arguments are [@_]\n" }, post => sub { print "Result was [$_[-1]]\n" }; my $result = divide( 4, 4 );
After I wrap
the subroutine, I
call divide
as I normally would. More
importantly, though, I’m not changing my program for calls to
divide
, because Hook::LexWrap
does some magic behind the scenes to replace the subroutine definition so
that my entire program sees the wrapped version. I’ve changed the
subroutine without editing the original source. Without (apparently)
changing the subroutine, whenever I call it I get a chance to see extra
output:
The arguments are [4 4 ] Result was [1]
When I remove the wrap
, I leave
everything just as I found it and don’t have to worry about reverting my
changes.
I don’t have to change module code to change how a module works. For an object-oriented module, I can create a subclass to change the parts I don’t like. If I can’t subclass it for some reason, I can replace parts of it, just as I can with any other module. No matter what I do, however, I usually want to leave the original code alone (unless it’s my module and I need to fix it) so I don’t make the problem worse.
The perlboot documentation has an extended subclassing example, although this document was removed in v5.16. You can read it in earlier versions of Perl. It’s also in Intermediate Perl.
I talk about Hook::LexWrap
in “Wrapping
Subroutines to Trace Code Execution”, The Perl
Journal, July 2005.
Code::Splice
and Monkey::Patch
are interesting alternatives to Hook::Lex::Wrap
, but they are fairly young
experiments.
The documentation of diff and patch discuss their use. The patch man page is particularly instructive since it contains a section near the end that talks about the pragmatic considerations of using the tools and dealing with other programmers.