Chapter 14. Essential Testing

As briefly described in Chapter 13, a distribution contains a testing facility invoked from make test. This testing facility permits a module author to write and run tests during development and maintenance and the ultimate module installer to verify that the module works in the new environment.

Why have tests during development? One emerging school of thought states that the tests should be written first, even before the module is created, as a reflection of the module’s specification. Of course, the initial test run against the unwritten module will show nearly complete failure. However, as functionality is added, proper functionality is verified immediately. (It’s also handy to invoke the tests frequently as you code to make sure you’re getting closer to the goal, not breaking more things.)

Certainly, errors may be found in the test suite. However, the defect rate for tests are usually far lower than the defect rate for complex module code; if a test fails, it’s usually a good indication that there’s more work to be done.

But even when Version 1.0 of the module is finally shipped, there’s no need to abandon the test suite. Unless You code the mythical “bug-free module,” there will be bug reports. Each bug report can (and should) be turned into a test.^[92] While fixing the bug, the remaining tests prevent regression to a less functional version of the code—hence the name regression testing.

Then there’s always the 1.1 or 2.0 releases to think about. When you want to add functionality, start by adding tests.^[93] Because the existing tests ensure your upward compatibility, you can be confident that your new release does everything the old release did, and then some.

Good tests also give small examples of what you meant in your documentation, in case your writing isn’t clear.^[94] Good tests also give confidence to the installer that this code is portable enough to work on both your system and his system, including all stated and unstated dependencies.

Testing is an art. Dozens of how-to-test books have been written and read, and often ignored. Mostly, it’s important to remember everything you have ever done wrong while programming (or heard other people do), and then test that you didn’t do it again for this project.

Test things that should break (throw exceptions or return false values) as well as things that should work. Test the edges. Test the middle. Test one more or one less than the edge. Test things one at a time. Test many things at once. If something should throw an exception, make sure it didn’t also negatively affect the state of the world before it threw the exception. Pass extra parameters. Pass insufficient parameters. Mess up the capitalization on named parameters. Throw far too much data at it. Throw far too little. Test what happens for undef. And so on.

For example, suppose that you want to test Perl’s sqrt function, which calculates square roots. It’s obvious that you need to make sure it returns the right values when its parameter is 0, 1, 49, or 100. It’s nearly as obvious to see that sqrt(0.25) should come out to be 0.5. You should also ensure that multiplying the value for sqrt(7) by itself gives something between 6.99999 and 7.00001.^[95] You should make sure that sqrt(-1) yields a fatal error and that sqrt(-100) does too. See what happens when you request sqrt(&test_sub( )), and &test_sub returns a string of "10000“. What does sqrt(undef) do? How about sqrt( ) or sqrt(1,1)? Maybe you want to give your function a googol: sqrt( '1' . '0' x 100 ). Because this function is documented to work on $_ by default, you should ensure that it does so. Even a simple function such as sqrt should get a couple of dozen tests; if your code does more complex tasks than sqrt does, expect it to need more tests, too. There are never too many tests.

If you write the code and not just the tests, think about how to get every line of your code exercised at least once for full code coverage. (Are you testing the else clause? Are you testing every elsif case?) If you aren’t writing the code or aren’t sure, use the code coverage facilities.^[96]

Check out other test suites. The Perl distribution itself comes with thousands of tests, designed to verify that Perl compiles correctly on your machine in every possible way. Michael Schwern earned the title of “Perl Test Master” for getting the Perl core completely tested, and, still constantly beats the drum for “test! test! test!” in the community.

In summary, please write tests. Let’s see how this is done.

What the Test Harness Does

Tests are usually invoked (either for the developer or the installer) using make test. The Makefile invokes the test harness, which eventually gets around to using the Test::Harness module to run the tests.

Each test lives in a separate .t file in the t directory at the top level of the distribution. Each test is invoked separately, so an exit or die terminates only that test file, not the whole testing process.

The test file communicates with the test harness through simple messages on standard output. The three most important messages are the test count, a success message, and a failure message.

An individual test file consists of one or more tests. These tests are numbered as small integers starting with one. The first thing a test file must announce to the test harness (on STDOUT) is the expected test number range, as a string 1..n. For example, if there are 17 tests, the first line of output should be:

1..17

followed by a newline. The test harness uses the upper number here to verify that the test file hasn’t just terminated early. If the test file is testing optional things and has no testing to do for this particular invocation, the string 1..0 suffices.

After the header, individual successes and failures are indicated by messages of the form ok N and not ok N. For example, here’s a test of basic arithmetic. First, print the header:

print "1..4\n"; # the header

Now test that 1 plus 2 is 3:

if (1 + 2 == 3) {
  print "ok 1\n"; # first test is OK
} else {
  print "not ok 1\n"; # first test failed
}

You can also print the not if the test failed.^[97]

Don’t forget the space!

print "not " unless 2 * 4 == 8;
print "ok 2\n";

You could perhaps test that the results are close enough (important when dealing with floating-point values):

my $divide = 5 / 3;
print "not " if abs($divide - 1.666667) > 0.001; # too much error
print "ok 3\n";

Finally, you may want to deal with potential portability problems:

my $subtract = -3 + 3;
print +(($subtract eq "0" or $subtract eq "-0") ? "ok 4" : "not ok 4"), "\n";

As you can see, there are many styles for writing the tests. In ancient Perl development, you saw many examples of each style. Thanks to Michael Schwern and chromatic and the other Perl Testing Cabal members, you can now write these much more simply, using Test::Simple.

Writing Tests with Test::Simple

The Test::Simple module is included with the Perl distribution, starting in Perl 5.8.^[98]

Test::Simple automates the boring task of writing “ok 1”, “ok 2”, “ok 3”, and so on, in your program. Test::Simple exports one subroutine, called (appropriately) ok. It’s best illustrated by example. For the earlier code, you can rewrite it as:

use Test::Simple tests => 4;

ok(1 + 2 == 3, '1 + 2 == 3');
ok(2 * 4 == 8, '2 * 4 == 8');
my $divide = 5 / 3;
ok(abs($divide - 1.666667) < 0.001, '5 / 3 == (approx) 1.666667');
my $subtract = -3 + 3;
ok(($subtract eq "0" or $subtract eq "-0"), '-3 + 3 == 0');

Ahh. So much simpler. The use not only pulls the module in but also defines the number of tests. This generates the 1..4 header. Each ok test evaluates its first argument. If the argument is true, it prints the proper ok message. If not, it prints the proper not ok message. For this particular example, the output looks like:^[99]

1..4
ok 1 - 1 + 2 == 3
ok 2 - 2 * 4 == 8
ok 3 - 5 / 3 == (approx) 1.666667
ok 4 - -3 + 3 == 0

The ok N messages are followed with the labels given as the second parameters. This is great for identifying each test, especially because the numbers 1 through 4 don’t appear in the original test anymore. The test harness ignores this information, unless you invoke make test with make test TEST_VERBOSE=1, in which case, the information is displayed for each test.

What if a test fails? If you change the first test to 1 + 2 == 4, you get:

1..4
not ok 1 - 1 + 2 == 4
#     Failed test (1.t at line 4)
ok 2 - 2 * 4 == 8
ok 3 - 5 / 3 == (approx) 1.666667
ok 4 - -3 + 3 == 0
# Looks like you failed 1 tests of 4.

The ok 1 became not ok 1. But also notice the extra message indicating the failed test, including its file and line number. Messages preceded by a pound-sign comment marker are merely comments, and are (mostly) ignored by the test harness.

For many people, Test::Simple is simple enough to use for a wide range of tests. However, as your Perl hackery evolves, you’ll want to step up to the next level of Perl testing hackery as well, with Test::More.

Writing Tests with Test::More

Like Test::Simple, Test::More is included with the distribution starting with Perl 5.8. The Test::More module is upward-compatible with Test::Simple, so you can simply change the module name to start using it. In this example so far, you can use:

use Test::More tests => 4;

ok(1 + 2 == 3, '1 + 2 == 3');
ok(2 * 4 == 8, '2 * 4 == 8');
my $divide = 5 / 3;
ok(abs($divide - 1.666667) < 0.001, '5 / 3 == (approx) 1.666667');
my $subtract = -3 + 3;
ok(($subtract eq "0" or $subtract eq "-0"), '-3 + 3 == 0');

You get nearly the same output you got with Test::Simple, but there’s that nasty little 4 constant in the first line. That’s fine once shipping the code, but if you’re testing, retesting, and adding more tests, it can be a bit painful to keep the number in sync with the data. You can change that to no_plan,^[100] as in:

use Test::More "no_plan";        # during development

ok(1 + 2 == 3, '1 + 2 == 3');
ok(2 * 4 == 8, '2 * 4 == 8');
my $divide = 5 / 3;
ok(abs($divide - 1.666667) < 0.001, '5 / 3 == (approx) 1.666667');
my $subtract = -3 + 3;
ok(($subtract eq "0" or $subtract eq "-0"), '-3 + 3 == 0');

The output is now rearranged:

ok 1 - 1 + 2 == 3
ok 2 - 2 * 4 == 8
ok 3 - 5 / 3 == (approx) 1.666667
ok 4 - -3 + 3 == 0
1..4

Note that the number of tests are now at the end. The test harness knows that if it doesn’t see a header, it’s expecting a footer. If the number of tests disagree or there’s no footer (and no header), it’s a broken result. You can use this while developing, but be sure to put the final number of tests in the script before you ship it as real code.

But wait: there’s more (to Test::More). Instead of a simple yes/no, you can ask if two values are the same:

use Test::More "no_plan";

is(1 + 2, 3, '1 + 2 is 3');
is(2 * 4, 8, '2 * 4 is 8');

Note that you’ve gotten rid of numeric equality and instead asked if “this is that.” On a successful test, this doesn’t give much advantage, but on a failed test, you get much more interesting output. The result of this:

use Test::More "no_plan";

is(1 + 2, 3, '1 + 2 is 3');
is(2 * 4, 6, '2 * 4 is 6');

is the interesting:

ok 1 - 1 + 2 is 3
not ok 2 - 2 * 4 is 6
#     Failed test (1.t at line 4)
#          got: '8'
#     expected: '6'
1..2
# Looks like you failed 1 tests of 2.

Of course, this is an error in the test, but note that the output told you what happened: you got an 8 but were expecting a 6.^[101] This is far better than just “something went wrong” as before. There’s also a corresponding isnt( ) when you want to compare for inequality rather than equality.

What about that third test, where the value had to be less than a tolerance? Well, just use the cmp_ok routine instead:

use Test::More "no_plan";

my $divide = 5 / 3;
cmp_ok(abs($divide - 1.666667), '<' , 0.001,
  '5 / 3 should be (approx) 1.666667');

If the test given in the second argument fails between the first and third arguments, then you get a descriptive error message with both of the values and the comparison, rather than a simple pass/fail value as before.

How about that last test? You wanted to see if the result was a 0 or minus 0 (on the rare systems that give back a minus 0). You can do that with the like function:

use Test::More "no_plan";

my $subtract = -3 + 3;
like($subtract, qr/^-?0$/, '-3 + 3 == 0');

Here, you’ll take the string form of the first argument and attempt to match it against the second argument. The second argument is typically a regular expression object (created here with qr) but can also be a simple string, which is converted to a regular expression object. The string form can even be written as if it was (almost) a regular expression:

like($subtract, q/^-?0$/, '-3 + 3 == 0');

The advantage to using the string form is that it is portable back to older Perls.^[102]

If the match succeeds, it’s a good test. If not, the original string and the regex are reported along with the test failure. You can change like to unlike if you expect the match to fail instead.

For object-oriented modules, you might want to ensure that object creation has succeeded. For this, isa_ok and can_ok give good interface tests:

use Test::More "no_plan";

use Horse;
my $trigger = Horse->named("Trigger");
isa_ok($trigger, "Horse");
isa_ok($trigger, "Animal");
can_ok($trigger, $_) for qw(eat color);

This results in:

ok 1 - The object isa Horse
ok 2 - The object isa Animal
ok 3 - Horse->can('eat')
ok 4 - Horse->can('color')
1..4

Here you’re testing that it’s a horse, but also that it’s an animal, and that it can both eat and return a color.^[103]

You could further test to ensure that each horse has a unique name:

use Test::More "no_plan";

use Horse;

my $trigger = Horse->named("Trigger");
isa_ok($trigger, "Horse");

my $tv_horse = Horse->named("Mr. Ed");
isa_ok($tv_horse, "Horse");

# Did making a second horse affect the name of the first horse?
is($trigger->name, "Trigger", "Trigger's name is correct");
is($tv_horse->name, "Mr. Ed", "Mr. Ed's name is correct");
is(Horse->name, "a generic Horse");

The output of this is:

ok 1 - The object isa Horse
ok 2 - The object isa Horse
ok 3 - Trigger's name is correct
ok 4 - Mr. Ed's name is correct
not ok 5
#     Failed test (1.t at line 13)
#          got: 'an unnamed Horse'
#     expected: 'a generic Horse'
1..5
# Looks like you failed 1 tests of 5.

Oops! Look at that. You wrote a generic Horse, but the string really is an unnamed Horse. That’s an error in the test, not in the module, so you should correct that test error and retry. Unless, of course, the module’s spec actually called for 'a generic Horse‘.

Again, don’t be afraid to just write the tests and test the module. If you get either one wrong, the other will generally catch it.

Even the use can be tested by Test::More:

use Test::More "no_plan";

BEGIN { use_ok("Horse") }

my $trigger = Horse->named("Trigger");
isa_ok($trigger, "Horse");
# .. other tests as before ..

The difference between doing this as a test and doing it as a simple use is that the test won’t completely abort if the use fails, although many other tests are likely to fail as well. It’s also counted as one of the tests, so you get a “test succeeded” for free even if all it does is compile properly to help pad your success numbers for the weekly status report.

The use is placed inside a BEGIN block so any exported subroutines are properly declared for the rest of the program, as recommended by the documentation. For most object-oriented modules, this won’t matter because they don’t export subroutines.

Conditional Tests

If you write tests directly from the specification before you’ve written the code, the tests are expected to fail. You can include some of your tests inside a TODO block to include them for test count but denote them as unavailable at the same time. For example, suppose you haven’t taught your horses how to talk yet:

use Test::More 'no_plan';

use_ok("Horse");
my $tv_horse = Horse->named("Mr. Ed");
TODO: {
  local $TODO = "haven't taught Horses to talk yet";

  can_ok($tv_horse, "talk");  # he can talk!
}
is($tv_horse->name, "Mr. Ed", "I am Mr. Ed!");

Here, the test is inside a TODO block, setting a package $TODO variable with the reason why the items are unfinished:^[104]

ok 1 - use Horse;
not ok 2 - Horse->can('talk') # TODO haven't taught Horses to talk yet
#     Failed (TODO) test (1.t at line 7)
#     Horse->can('talk') failed
ok 3 - I am Mr. Ed!
1..3

Note that the TODO test counts toward the total number of tests. Also note that the message about why the test is a TODO test is displayed as a comment. The comment has a special form, noted by the test harness, so you will see it during a make test run.

You can have multiple TODO tests in a given block, but only one reason per block, so it’s best to group things that are related but use different blocks for different reasons.

More Complex Tests (Multiple Test Scripts)

Initially, the h2xs program gives you a single testing file, t/1.t.^[105] You can stick all your tests into this file, but it generally makes more sense to break the tests into logical groups.

The easiest way to add additional tests is to create t/2.t. That’s it—just bump the 1 to a 2. You don’t need to change anything in the Makefile.PL or in the test harness: the file is noticed and executed automatically.

You can keep adding files until you get to 9.t, but once you add 10.t, you might notice that it gets executed between 1.t and 2.t. Why? Because the tests are always executed in sorted order. This is a good thing because it lets you ensure that the most fundamental tests are executed before the more exotic tests, simply by controlling the names.

Many people choose to rename the files to reflect a specific ordering and purpose by using names like 01-core.t, 02-basic.t, 03-advanced.t, 04-saving.t, and so on. The first two digits control the testing order, while the rest of the name gives a hint about the general area of testing. Whatever plan you decide to use, stick with it, document it if necessary, and remember that the default order is controlled by the name.

Testing Things That Write to STDOUT and STDERR

One advantage to using the ok( ) functions (and friends) is that they don’t write to STDOUT directly, but to a filehandle secretly duplicated from STDOUT when your test script begins. If you don’t change STDOUT in your program, of course, this is a moot point. But let’s say you wanted to write test a routine that writes something to STDOUT, such as making sure a horse eats properly:

use Test::More 'no_plan';
use_ok 'Horse';
isa_ok(my $trigger = Horse->named('Trigger'), 'Horse');

open STDOUT, ">test.out" or die;
$trigger->eat("hay");
close STDOUT;

open T, "test.out" or die;
my @contents = <T>;
close T;
is(join("", @contents), "Trigger eats hay.\n", "Trigger ate properly");

END { unlink "test.out" }  # clean up after the horses

Note that just before you start testing the eat method, you (re-)open STDOUT to your temporary output file. The output from this method ends up in the test.out file. Bring the contents of that file in and give it to the is( ) function. Even though you’ve closed STDOUT, the is( ) function can still access the original STDOUT, and thus the test harness sees the proper ok or not ok messages.

If you create temporary files like this, please note that your current directory is the same as the test script (even if you’re running make test from the parent directory). Also pick fairly safe cross-platform names if you want people to be able to use and test your module portably.

Exercise

The answers for all exercises can be found in Section A.12.

Exercise [60 min]

Write a module distribution, starting from the tests first.

Your goal is to create a module My::List::Util that exports two routines on request: sum and shuffle. The sum routine takes a list of values and returns the numeric sum. The shuffle routine takes a list of values and randomly shuffles the ordering, returning the list.

Start with sum. Write the tests, and then add the code. You’ll know you’re done when the tests pass. Now include tests for shuffle, and then add the implementation for shuffle.

Be sure to update the documentation and MANIFEST file as you go along.

If you can pair up with someone on this exercise, even better. One of you writes the test for sum and the implementation code for shuffle, and the other does the opposite. Swap the t/* files, and see if you can locate any errors!

^[92] If you’re reporting a bug in someone else’s code, you can generally assume that sending them a test for the bug will be appreciated. A patch would be appreciated even more!

^[93]And writing the documentation at the same time, made easier by Test::Inline, as you’ll see later.

^[94]Many modules we’ve used from the CPAN were documented more by test examples than by the actual POD. Of course, any really good example should be repeated in your module’s POD documentation.

^[95]Remember, floating-point numbers aren’t always exact; there’s usually a little roundoff. Feel free to write your tests to require more precision than this test implies but don’t require more precision than you can get on another machine!

^[96]Basic code coverage tools such as Devel::Cover are found in the CPAN.

^[97]On some platforms, this may fail unnecessarily. For maximum portability, print the entire string of ok N or not ok N in one print step.

^[98]Older Perl versions back to 5.004_03 can install the same module from the CPAN.

^[99]Don’t be misled when reading the mathematics of the output. The first number and the dash on each ok line are just labels; Perl isn’t telling you that 1 - 1 + 2 == 3!

^[100]You can do this with Test::Simple as well.

^[101]More precisely: you got an '8' but were expecting a '6‘. Did you notice that these are strings? The is test checks for string equality. If you don’t want that, just build an ok test instead. Or try cmp_ok, coming up in a moment.

^[102]The qr// form wasn’t introduced until Perl 5.005.

^[103]Well, you’re testing to see that it can('eat') and can('color'). You haven’t checked whether it really can use those method calls to do what you want!

^[104] TODO tests require Test::Harness Version 2.0 or later, which comes with Perl 5.8, but in earlier releases, they have to be installed from the CPAN .

^[105]As of Perl 5.8, that is. Earlier versions create a test.pl file, which is still run from a test harness during make test, but the output wasn’t captured in the same way.