NAME
Test::Builder2::Design - Explaining the design of Test::Builder2
DESCRIPTION
This document is about the design and philosophy of Test::Builder2 for those willing to contribute to the project, design their own advanced testing modules, extend Test::Builder2 and for the curious.
Those not already familiar with Test::Builder may wish to have a gander at it.
There is a glossary of terms at the end of this document.
What is Test::Builder2?
Test::Builder2 is a complete rewrite of Test::Builder with less assumptions about how testing is to be done and more extensibility.
So what is Test::Builder? It is the object which backs all the Perl testing modules worth knowing about. It provides the baseline functionality of testing, coordinates testing modules and formats the results as TAP. It is what allows you to use multiple Test::
modules written by completely different authors independently together in the same test script without them stepping on each other.
Fundamentally, TB2 coordinates these things:
- Record when a stream starts and ends
-
This alllows setting of the plan, if any, and maybe printing out headers.
- Record results
-
Handing them to a TB2::History object.
- Format results
-
Handing them to a TB2::Formatter object.
- Stream (output) results
-
The TB2::Formatter hands it to a TB2::Streamer.
You hand TB2 the result of an assert, and it decides when and how to format and output it. That's about it (rjbs gets credit for bringing this revelation of simplicity).
There's the additional meta-responsibility of being the central point to coordinate extensions and provide hooks into testing events.
Design Goals
To understand Test::Builder2 you must understand its design goals.
Test::Builder2 takes a very long and broad view in its design. As such, many things which might seem overcomplicated at first glance are actually to cover an obscure, but important, condition.
Universal
If TB2 were a person, "thought shalt be universally applicable" would be tatooed upside down on their belly in flaming, writhing letters to remind itself of its responsbility to all Perl users everywhere.
This is the goal which has the farthest reaching consequences. Every testing module worth knowing about is backed by TB1 and will be backed by TB2. As a result, TB2 has to work, it has to work everywhere Perl does and it has to be able to test every conceivable thing.
Every assumption TB2 makes about its environment, how testing is done or what is being tested eliminates that category from being easily tested by Perl programmers. I knocks the entire modern suite of Perl testing tools from their hands.
While TB2 has to be universal, extensions do not. TB2 will reject many features because they are not universally applicable, but extensions and test libraries built on top of it will not.
Portable
TB2 has to work in every environment which Perl does. If it doesn't, TB2 de facto eliminates that environment from serious use by Perl. It can make no assumptions about the environment which are not universal. There's no "it'll work on most systems" compromises.
As far as Perl goes, TB2 supports back to 5.8.1. 5.8.0's threading is too unstable. 5.6 is missing too many features (the biggest is scalar refere ce filehandles critical for debugging) and is all but extinct in the wild. Even Debian oldstable ships 5.8.8.
Reliable
There has to be total confidence in the test libraries, that a test failure indicates a failed test and not a bug in TB2. There are no "this works 99% of the time" implementations.
Extensible
Users must be able to write test libraries on top of TB2 that do pretty much anything.
There are three primary means of doing this. The most common will be writing a test module using Test::Builder2::Module. Users will write logic to test their statement, and then hand the result off to a TB2 object for processing.
The second is by adding roles to TB2. Those not familiar with roles, they're kind of like object plugins... sort of.
The third is by hooking into events, like when an assert starts or finishes.
See "Extending TB2" for details.
Multiple Formats
TB1 outputs only TAP. TB2 will output TAP by default, but can be extended to output any format desired. As such, its internal structures have to be free of assumptions. For example, TAP is one of the few testing systems which requires a test count and TB1 is riddled with that assumption.
See Test::Builder2::Format for details.
Multiple Streams
TB1 has limited support for outputing somewhere other than STDOUT and STDERR. TB2 will have full control over where and how output occurs.
See Test::Builder2::Streamer for details.
Minimal Assumptions
TB1 and Test::More avoid making assumptions about how and what you're testing, sticking to functions which provide unambiguous functionality and are universally applicable. That is, you're not going to find code that tests XML.
TB2 pushes this further, in ways already mentioned, and by stripping itself down at the core to just storing, formatting and streaming results. It does not provide anything but the equivalent of ok()
. Additional functionality will be available in various roles which ship with TB2. Most library authors will use those roles by default, but radical extensions may use the stripped down TB2.
Two drivers for this are Fennec and the need for Test::Builder itself to use TB2. The less TB2 does the weirder test library authors can get.
No Dependencies
TB2 can have no external dependencies (except Perl itself). It may create circular dependencies, but more importantly they introduce unreliabilies in TB2.
Easy
It has to be very easy to write a basic testing module using TB2. The underlying complexity must be hidden from the casual test module author.
Extending TB2
Writing Test Libraries
The simplest way to extend TB2 will be to write a test library. Test::Builder2::Module provides the conveniences to the test author.
Writing TB2 Roles
For internal use, roles can be applied to the TB2 singleton to expand its functionality while leaving the TB2 class slim. Roles include TB2::Assert::More which will provide most of the Test::More functionality currently in TB1.
TB2 Events
The primary way test libraries will alter the behavior of TB2 is by hooking into test events, such as an assert ending or the plan being set. Rather than design an event callback system, method modifiers (provided by Mouse) will be used. Test libraries can simply put modifiers on public TB2 methods and TB2 methods will be decomposed to correspond with test events. This allows event modifiers and callbacks to stack in a sane way. Not having an explicit callback system allows test authors to extend TB2 in ways not anticipated. Finally, TB2 is not gummed up by checking for callbacks all over the place.
Use Cases
Diagnostics contain the file and line number where the user called the assert
When an assert fails, it should report its file and line number so the user can easily find the failing assert. This is one of the trickiest aspects of TB1 and TB2. It has its own section, "File and Line Diagnostics".
Test::NoWarnings
Must be able to add code at the start of a test suite (to capture warnings) and add an assertion at the end. It also must be able to add one to a user-set plan.
Test::Warn
This is the case where an assert called inside another assert is not part of the same stack.
See "File and Line Diagnostics" for details.
Testing a test
You should be able to easily write tests for test modules without having to hard code formatted results.
Test::Builder
Test::Builder and all derived modules should continue to work and be compatible with TB2.
Fennec
Fennec should be able to use TB2 for all its functionality.
Die on fail
It should be possible to have an extension which causes the test suite to halt upon the first failed assert and still receive all the diagnostics of that assert.
Debug on fail
It should be possible to have an extension which starts the debugger when an assert fails.
Action on failed (or passed) test stream
It should be possible to perform an action when a test stream completes in certain states. For example, beep on completion or send an email on failure.
Stacked Asserts
TB2 lets asserts build on other asserts by just calling them. For example, here is a simple implementation of is().
install_test is => sub {
my($have, $want, $name) = @_;
my $result = ok( $have eq $want, $name );
$result->diagnostics(
have => $have,
want => $want
);
return $result;
};
is() uses ok() to do most of the work. Then it adds its diagnostics to the result (an overloaded object in TB2) and passes it along.
To accomplish this, TB2 records the stack of asserts being called and who called them. This turns out to be one of the more involved parts of TB2, but its necessary for some critical features.
File and Line Information
One of the friendliest and trickiest features of TB1 is correctly reporting the file and line where an assert failed. This is because asserts call asserts which call asserts which ultimately call ok
which does the actual passing or failing. It needs to know where the user originally called the assert, the "top" of the assert stack.
Similar in functionality to Carp, but it has to be far more robust. It cannot guess based on crossing package boundaries, it has to know. TB1 accomplished this by keeping track of how far down the stack you are at any moment with $Level
. This results in complicated accounting that's often not quite right, and it bubbles up to the user's level who must remember to localize and increment $Level
.
You must be able to trivially wrap an assert in another assert and still have the file and line number come out at the outermost assert which is presumably the one the user wrote. This will allow users to quickly and trivially compose new domain specific assert without having to know about TB2.
For example, here is how is() would ideally be implemented by a user (ignoring diagnostics for the moment):
sub is {
my($have, $want, $name) = @_;
return ok $have eq $want, $name;
}
When is()
fails, you want the diagnostics to contain the file and line number of the call to is(), not the call to ok()
inside is().
Assert end actions
This goes beyond just file and line numbers. It also allows actions to happen when an assert fully completes, suck as "die-on-fail". To expand on the is() example above:
sub is {
my($have, $want, $name) = @_;
my $ok = ok $have, $want, $name;
$ok->diagnostics(
have => $have,
want => $want,
);
return $ok;
}
is() takes the result from ok() (now an object) and adds its own diagnostics. If ok() were to fail, and die-on-fail is active, TB2 must know to wait until is() has had a chance to add its diagnostics and print the result before failing. Otherwise you don't get full diagnostics about the failure. TB2 must know that is() was called by the user and ok() was not.
Result output
Finally, TB2 must wait until the entire assert stack has had an opportunity to add diagnostics to the result before it can print the result. Why? TB1, only doing TAP, is fortunate in that its a stream. It can print a result as soon as it gets it, and then append diagnostics onto the end. But this isn't true of other output formats. XML, for example, requires an opening and closing tag.
Declaring Asserts
We cannot assume that asserts are exported functions. Or that every function in a Test::
library is an assert. TB2 takes the approach of having a test library declare that a function is an assert. This is done with the least fanfare possible:
package My::Test::Module;
use Test::Builder2::Module;
install_test is => sub {
my($have, $want, $name) = @_;
return ok $have, $want, $name;
};
install_test
is exported by TB2::Module. It wraps the user's function in a little shim that triggers assert_start and assert_end events. It also records this fact in an assert stack, assert_start pushes onto the stack and assert_end pops it. The asset stack tracks where each assert was called. If an assert fails anywhere in the stack it can get the file and line number information from the top of the stack.
Multiple Stacks
Currently there is only one stack, see top_stack
in TB2. This must be developed into multiple stacks to handle some use cases, the most important is Test::Warn and Test::Exception.
For example:
#line 1
warnings_is {
is( $foo, $bar);
} "something";
If the is
called inside warnings_is()
fails, it should report diagnostics from line 2, not line 1 where warnings_is()
is called. In addition, is()
failing should not result in warnings_is()
failing.
There is no way for TB2 to infer this special case, it must be declared by the author of warnings_is
. warnings_is
must tell TB2 that it should save its current assert stack and start a new one.
use Test::Builder2::Module;
install_test warnings_is => sub (&$) {
my($code, $warning) = @_;
...set up capturing warnings however...
# Declare a new assert stack
$Builder->start_assert_stack;
# Run the code with that new stack
$code->();
# End the assert stack and go back to the old one
$Builder->end_assert_stack;
# Run an assert to check the warnings using the original stack
return is( $captured_warnings, $warning );
};
In effect, there is a stack of stacks maintained inside TB2. $Builder
is provided by TB2::Module as a shortcut for $class->builder
.
This will also allow tests to work in cooperative multitasking situations such as POE. One stack of asserts may start running only to yield control to another stack. The details are beyond the scope of this design, only that it is made possible.
Multiple asserts inside an assert
There is a final case to consider, this:
install_test file_contents_ok => sub {
my($file, $want) = @_;
ok( open(my $fh, "<", $file), "open $file" );
my $content = join "", <$fh>;
ok( close $fh, "close $file" );
return is $content, $want, "contents of $file";
};
This assert has multiple asserts inside it, but the final one is the important one. In this case, TB2 must display them all as if they came from the point where file_contents_ok()
was called. Also make sure that the results of the two ok()
s get output. And do it all in the right order. This is an open problem.
Mouse
TB2 uses Mouse, which is the Moose interface without dependencies and sluggishness. It takes a great risk in relying on a complicated module. The decision was made on the speculation that if TB2 used a real object system that might allow the design to go in interesting directions not otherwise easily available.
Two user visible design features have come out of this. First is that TB2's event system, rather than having explict event callbacks, is modelled by simply wrapping public TB2 methods using Mouse method wrappers. This greatly simplifies designing and implementing test events and provides a more flexible system since users can safely wrap any public method rather than waiting for TB2 to add a hook. Hook points fall naturally out of decomposing the steps of handling results.
The second is the ability to compose TB2 with roles. Rather than adding functionality by subclassing, it can be added with roles. Subclassing is untenable in the long run. Test::A will want to use their TB2 subclass while Test::B will want to use its own TB2 subclass. They can't both have the singleton. Rather than come up with increasingly complicated ways to reconcile this, users can add functionality to TB2 with roles applied to the TB2 singleton. Roles are add, rather than modify, functionality. Collisions will only occur in method names and will be very clear, about the same level of risk as exporting a function. Modifications of existing functionality comes from method modifiers outlines above.
Roles also let TB2 shed large amounts of TB1 functionality, leaving them to roles. For example, most of the helper assert methods in TB1 like is_eq
and like
are not present in core TB2 making it much simpler. They will be in something like TB2::More::Asserts which will probably be composed in by default.
Roles and wrapping methods allow TB2 to remain a singleton while being extensible by multiple authors without explicit coordination.
Mouse Risk Mitgation
TB2 can have no dependencies, so it ships with its own copy of Mouse.
Mouse is a large, complicated system and TB2 has already hit bugs. It has also had breakages from one version of Mouse to the next. In order to avoid this, TB2 will ONLY use its shipped copy of Mouse. That is, TB2 will ship with a copy of Mouse matched to that particular release.
Finally, to avoid stepping on the installed copy of Mouse, TB2 will ship its version of Mouse as TB2::Mouse with all internal packages similarly changed. This will avoid colliding with Mouse both on disk and in memory.
Mouse In The Core?
If TB2 ships with Mouse, and TB2 ships in the core of Perl... does that mean core Perl will ship Mouse? No, it is not required that Perl ship Mouse. TB2 will require that core Perl ships TB2::Mouse, but as such it is an internal module of TB2 and should not be used by the public. Making Mouse publicly available in the core is a separate issue and would, in fact, require a separate copy of Mouse anyway.
Glossary
TB1
Refers to Test::Builder.
TB2
For brevity's sake, Test::Builder2 will be referred to as TB2. Similarly, sub-modules will be referred to as TB2::Foo even though it is really the long form Test::Builder2::Foo.
builder
Refers to the TB1 or TB2 object central to Perl's testing system. It gathers results, coordinates output, and handles the details of writing test functions so test module authors don't have to.
assert
An assertion is a single statement which is tested. It corresponds to a traditional ok
function call.
result
The information from a single assert. Includes things like if it passed or failed, if it had any directives, its file and line number, and any additional user specified diagnostics.
directive
A flag which modifies the result in some way. For example, skip
says that the test was skipped. todo
means the test was expected to fail.
diagnostics
Additional structured information about a test result. For example, what file and line number it occurred on. Users can add their own diagnostics to a test result.
Test::Builder's "diagnostics" (output by diag()
) are actually comments which were used to provide diagnostics. These were unparsable.
comment
A piece of information in the stream which may be parsed but is ignored and has no effect on the result of the stream.
note
A comment which is not normally shown to the user put there for debugging and informational purposes.
stream
The output from a single test unit. In a traditional Perl system this is the output from a single .t file.
suite
The complete set of tests run by a project. In a traditional Perl system this is all the .t files.
formatter
What takes the abstract result and turns it into parsable output. For example, Test::Builder2::Formatter::TAP turns test results into the familiar TAP.
See Test::Builder2::Formatter for details.
streamer
What takes the formatted stream and outputs it, usually to STDOUT and STDERR but it may capture it for debugging purposes, or send it as an email, or write it to a file, or all of that.
A formatter contains a streamer.
See Test::Builder2::Streamer for details.
TAP
Test Anything Protocol, the name for the usual ok 1
output you see from most Perl tests.
test
An ambiguous term often used to mean an assert, a stream or a suite. We'll avoid using it without qualification.