NAME

TAP::DOM - TAP as Document Object Model.

SYNOPSIS

# Create a DOM from TAP
use TAP::DOM;
my $tapdom = TAP::DOM->new( tap => $tap ); # same options as TAP::Parser
print Dumper($tapdom);

# Recreate TAP from DOM
my $tap2 = $tapdom->to_tap;

DESCRIPTION

The purpose of this module is

A) to define a reliable data structure (a DOM)
B) create a DOM from TAP
C) recreate TAP from a DOM

That is useful when you want to analyze the TAP in detail with "data exploration tools", like Data::DPath.

``Reliable'' means that this structure is kind of an API that will not change, so your data tools can, well, rely on it.

METHODS

new

Constructor which immediately triggers parsing the TAP via TAP::Parser and returns a big data structure containing the extracted results.

Synopsis

my $tap;
{
  local $/; open (TAP, '<', 't/some_tap.txt') or die;
  $tap = <TAP>;
  close TAP;
}
my $tapdata = TAP::DOM->new (
  tap                                  => $tap
  disable_global_kv_data               => 1,
  put_dangling_kv_data_under_lazy_plan => 1,
  ignorelines                          => '(## |# Test-mymeta_)',
  dontignorelines                      => '# Test-mymeta_(tool1|tool2)_',
  preprocess_ignorelines               => 1,
  preprocess_tap                       => 1,
  usebitsets                           => 0,
  ignore                               => ['as_string'], # keep 'raw' which is the unmodified variant
  document_data_prefix                 => '(MyApp|Test)-',
  lowercase_fieldnames                 => 1,
  trim_fieldvalues                     => 1,
);

Arguments

ignore

Arrayref of fieldnames not to contain in generated TAP::DOM. For example you can skip the as_string field which is often a redundant variant of raw.

ignorelines

A regular expression describing lines to ignore.

Be careful to not screw up semantically relevant lines, like indented YAML data.

The regex is internally prepended with a start-of-line ^ anchor.

dontignorelines (EXPERIMENTAL!)

This is the whitelist of lines to not being skipped when using the ignore blacklist.

The dontignorelines feature is HIGHLY EXPERIMENTAL, in particular in combination with preprocess_ignorelines.

Background: the preprocessing is done in a single regex operation for speed reasons, and to do that the dontignorelines regex is turned into a zero-width negative-lookahead condition and prepended before the ignorelines condition into a combined regex.

Without preprocess_ignorelines it is a relatively harmless additional condition during TAP line processing.

Survival tips:

  • have unit tests for your setup

  • do not use ^ anchors neither in ignorelines nor in dontignorelines but rely on the implicitly prepended anchors.

  • write both ignorelines and dontignorelines completely describing from beginning of line (yet without the ^ anchor).

  • do not use it but define ignorelines instead with your own zero-width negative-lookaround conditions

  • know the zero-width negative look-around conditions of your use Perl version

usebitsets

Instead of having a lot of long boolean fields like

has_skip => 1
has_todo => 0

you can encode all of them into a compact bitset

is_has => $SOME_NUMERIC_REPRESENTATION

This field must be evaluated later with bit-comparison operators.

Originally meant as memory-saving mechanism it turned out not to be worth the hazzle.

disable_global_kv_data

Early TAP::DOM versions put all lines like

# Test-foo: bar

into a global hash. Later these fields are placed as children under their parent ok/not ok line but kept globally for backwards compatibility. With this flag you can drop the redundant global hash.

But see also put_dangling_kv_data_under_lazy_plan.

put_dangling_kv_data_under_lazy_plan

This addresses the situation what to do in case a key/value field from a line

# Test-foo: bar

appears without a parent ok/not ok line and the global kv_data hash is disabled. When this option is set it's placed under the plan as parent.

document_data_prefix

To interpret lines like

# Test-foo: bar

the document_data_prefix is by default set to Test- so that a key/value field

foo => 'bar'

is generated. However, you can have a regular expression to capture other or multiple different values as allowed prefixes.

document_data_ignore

This is another regex-based way to avoid generating particular fields. This regex is matched against the already extracted keys, and stops processing of this field for document_data and kv_data.

lowercase_fieldnames

If set to a true value all recognized fields are lowercased.

lowercase_fieldvalues

If set to a true value all recognized values are lowercased.

trim_fieldvalues

If set to a true value all field values are trimmed of trailing whitespace. Note that fields don't have leading whitespace as it's already consumed away after the fieldname separator colon :.

All other provided parameters are passed through to TAP::Parser, see sections "HOW TO STRIP DETAILS" and "USING BITSETS". Usually the options are just one of those:

tap => $some_tap_string

or

source => $test_file

But there are more, see TAP::Parser.

to_tap

Called on a TAP::DOM object it returns a string that is TAP.

STRUCTURE

The data structure is basically a nested hash/array structure with keys named after the functions of TAP::Parser that you normally would use to extract results.

See the TAP example file in t/some_tap.txt and its corresponding result structure in t/some_tap.dom.

Here is a slightly commented and beautified excerpt of t/some_tap.dom. Due to it's beeing manually washed for readability there might be errors in it, so for final reference, dump a DOM by yourself.

bless( {
 # general TAP stats:
 'version'       => 13,
 'plan'          => '1..6',
 'tests_planned' => 6
 'tests_run'     => 8,
 'is_good_plan'  => 0,
 'has_problems'  => 2,
 'skip_all'      => undef,
 'parse_errors'  => 1,
 'parse_errors_msgs'  => [
                     'Bad plan.  You planned 6 tests but ran 8.'
                    ],
 'pragmas'       => [
                     'strict'
                    ],
 'exit'          => 0,
 'start_time'    => '1236463400.25151',
 'end_time'      => '1236463400.25468',
 # the used TAP::DOM specific options to TAP::DOM->new():
 'tapdom_config' => {
                     'ignorelines' => qr/(?-xism:^## )/,
                     'usebitsets' => undef,
                     'ignore' => {}
                    },
 # summary according to TAP::Parser::Aggregator:
 'summary' => {
                'status'          => 'FAIL',
                'total'           => 8,
                'passed'          => 6,
                'failed'          => 2,
                'all_passed'      => 0,
                'skipped'         => 1,
                'todo'            => 4,
                'todo_passed'     => 2,
                'parse_errors'    => 1,
                'has_errors'      => 1,
                'has_problems'    => 1,
                'exit'            => 0,
                'wait'            => 0
                'elapsed'         => bless( [
                                             0,
                                             '0',
                                             0,
                                             0,
                                             0,
                                             0
                                            ], 'Benchmark' ),
                'elapsed_timestr' => ' 0 wallclock secs ( 0.00 usr +  0.00 sys =  0.00 CPU)',
              },
 # all recognized TAP lines:
 'lines' => [
             {
              'is_actual_ok' => 0,
              'is_bailout'   => 0,
              'is_comment'   => 0,
              'is_plan'      => 0,
              'is_pragma'    => 0,
              'is_test'      => 0,
              'is_unknown'   => 0,
              'is_version'   => 1,                      # <---
              'is_yaml'      => 0,
              'has_skip'     => 0,
              'has_todo'     => 0,
              'raw'          => 'TAP version 13'
              'as_string'    => 'TAP version 13',
             },
             {
               'is_actual_ok' => 0,
               'is_bailout'   => 0,
               'is_comment'   => 0,
               'is_plan'      => 1,                     # <---
               'is_pragma'    => 0,
               'is_test'      => 0,
               'is_unknown'   => 0,
               'is_version'   => 0,
               'is_yaml'      => 0,
               'has_skip'     => 0,
               'has_todo'     => 0,
               'raw'          => '1..6'
               'as_string'    => '1..6',
             },
             {
               'is_actual_ok' => 0,
               'is_bailout'   => 0,
               'is_comment'   => 0,
               'is_ok'        => 1,                     # <---
               'is_plan'      => 0,
               'is_pragma'    => 0,
               'is_test'      => 1,                     # <---
               'is_unknown'   => 0,
               'is_unplanned' => 0,
               'is_version'   => 0,
               'is_yaml'      => 0,
               'has_skip'     => 0,
               'has_todo'     => 0,
               'number'       => '1',                   # <---
               'type'         => 'test',
               'raw'          => 'ok 1 - use Data::DPath;'
               'as_string'    => 'ok 1 - use Data::DPath;',
               'description'  => '- use Data::DPath;',
               'directive'    => '',
               'explanation'  => '',
               '_children'    => [
                                  # ----- children are the subsequent comment/yaml lines -----
                                  {
                                    'is_actual_ok' => 0,
                                    'is_unknown'   => 0,
                                    'has_todo'     => 0,
                                    'is_bailout'   => 0,
                                    'is_pragma'    => 0,
                                    'is_version'   => 0,
                                    'is_comment'   => 0,
                                    'has_skip'     => 0,
                                    'is_test'      => 0,
                                    'is_yaml'      => 1,              # <---
                                    'is_plan'      => 0,
                                    'raw'          => '   ---
    - name: \'Hash one\'
      value: 1
    - name: \'Hash two\'
      value: 2
  ...'
                                    'as_string'    => '   ---
    - name: \'Hash one\'
      value: 1
    - name: \'Hash two\'
      value: 2
  ...',
                                    'data'         => [
                                                       {
                                                         'value' => '1',
                                                         'name' => 'Hash one'
                                                       },
                                                       {
                                                         'value' => '2',
                                                         'name' => 'Hash two'
                                                       }
                                                      ],
                                }
                              ],
             },
             {
               'is_actual_ok' => 0,
               'is_bailout'   => 0,
               'is_comment'   => 0,
               'is_ok'        => 1,                     # <---
               'is_plan'      => 0,
               'is_pragma'    => 0,
               'is_test'      => 1,                     # <---
               'is_unknown'   => 0,
               'is_unplanned' => 0,
               'is_version'   => 0,
               'is_yaml'      => 0,
               'has_skip'     => 0,
               'has_todo'     => 0,
               'explanation'  => '',
               'number'       => '2',                   # <---
               'type'         => 'test',
               'description'  => '- KEYs + PARENT',
               'directive'    => '',
               'raw'          => 'ok 2 - KEYs + PARENT'
               'as_string'    => 'ok 2 - KEYs + PARENT',
             },
             # etc., see the rest in t/some_tap.dom ...
            ],
}, 'TAP::DOM')                                          # blessed

NESTED LINES

As you can see above, diagnostic lines (comment or yaml) are nested into the line before under a key _children which simply contains an array of those comment/yaml line elements.

With this you can recognize where the diagnostic lines semantically belong.

HOW TO STRIP DETAILS

You can make the DOM a bit more terse (i.e., less blown up) if you do not need every detail.

Strip unneccessary TAP-DOM fields

For this provide the ignore option to new(). It is an array ref specifying keys that should not be contained in the TAP-DOM. Currently supported are:

has_todo
has_skip
directive
as_string
explanation
description
is_unplanned
is_actual_ok
is_bailout
is_unknown
is_version
is_bailout
is_comment
is_pragma
is_plan
is_test
is_yaml
is_ok
number
type
raw

Use it like this:

$tapdom = TAP::DOM->new (tap    => $tap,
                         ignore => [ qw( raw as_string ) ],
                        );

Strip unneccessary lines

You can ignore complete lines from the input TAP as if they weren't existing by by setting a regular expression in ignorelines. Of course you can break the TAP with this, so usually you only apply this to non-TAP lines or diagnostics you are not interested in.

My primary use-case is TAP with large parts of logfiles included with a prefixed "## " just for dual-using the TAP also as an archive of the log. When evaluating the TAP later I leave those log lines out because they only blow up the memory for the TAP-DOM:

$tapdom = TAP::DOM->new (tap         => $tap,
                         ignorelines => qr/^## /,
                        );

See t/some_tap_ignore_lines.t for an example.

Pre-process TAP

WARNING, experimental features!

  • preprocess_ignorelines

    By setting that option, ignorelines is applied to the input TAP text before it is parsed.

    This could help to speed up TAP parsing when there is a huge amount of non-TAP lines that the regex engine could throw away faster than TAP::Parser would parse it line by line.

    There is a risk: without that option, only lines are filtered that are already parsed as lines by the TAP parser. If applied before parsing, the regex could mis-match non-trivial situations.

  • preprocess_tap

    With this option, any lines that don't obviously look like TAP are stripped away.

    There is a substantial risk, though: the purely line-based regex processing could screw up when it mis-matches lines. Parsing TAP is not as obvious as it seems first. Just think of unindented YAML or indented YAML with strange multi-line spanning values at line starts, or the (non-standardized and unsupported) nested indented TAP. So be careful!

USING BITSETS

Option "usebitsets"

You can make the DOM even smaller by using the option usebitsets:

$tapdom = TAP::DOM->new (tap => $tap, usebitsets => 1 );

In this case all the 'has_*' and 'is_*' attributes are stored in a common bitset entry 'is_has' with their respective bits set.

This reduces the memory footprint of a TAP::DOM remarkably (for large TAP-DOMs ~40%) and is meant as an optimization option for memory constrained problems.

Access bitset attributes via methods

You can get the actual values of 'is_*' and 'has_*' attributes regardless of their storage as hash entries or bitsets by using the respective methods on single entries:

if ($tapdom->{lines}[4]->is_test) {...}
if ($tapdom->{lines}[4]->is_ok)   {...}
...

or with even less direct hash access

if ($tapdom->lines->[4]->is_test) {...}
if ($tapdom->lines->[4]->is_ok)   {...}
...

Access bitset attributes via bit comparisons

You can also use constants that represent the respective bits in expressions like this:

if ($tapdom->{lines}[4]{is_has} | $TAP::DOM::IS_TEST) {...}

And the constants can be imported into your namespace:

use TAP::DOM ':constants';
if ($tapdom->{lines}[4]{is_has} | $IS_TEST ) {...}

Tweak the resulting DOM

Lowercase all key:value fieldnames

By setting option lowercase_fieldnames all field names (hash keys) in document_data and kv_data are set to lowercase. This is especially helpful to normalize different casing like

# Test-Strange-Key: some value
# Test-strange-key: some value
# Test-STRANGE-KEY: some value

etc. all into

"strange-key" => "some value"

Lowercase all key:value values

By setting option lowercase_fieldvalues all field values in document_data and kv_data are set to lowercase. This is especially helpful to normalize different casing like

# Test-strange-key: Some Value
# Test-strange-key: Some value
# Test-strange-key: SOME VALUE

etc. all into

"strange-key" => "some value"

Warning: while the sister option lowercase_fieldnames above is obviously helpful to keep the information more together, this lowercase_fieldvalues option here should be used with care. You loose much more information here which is usually better searched via case-insensitive options of the mechanism you use, regular expressions, Elasticsearch, etc.

Placing key:value pairs

Normally a key:value pair {foo = bar}> from a line like

# Test-foo: bar

ends up as entry in a has kv_values under the entry before that line - which ideally is either a normal ok/not_ok line or a plan line.

If that's not the case then it is not clear where they belong. Early TAP::DOM versions had put them under a global entry document_data.

However this makes these entries inconsistently appear in different levels of the DOM. so you can suppress that old behaviour by setting disable_global_kv_data to 1.

However, with that option now, there can be lines that appear directly at the start with no preceding parent line, in case the plan comes at the end of the document. To not loose those key values they can be saved up until the plan appears later and put it there. As this reorders data inside the DOM differently from the original document you must explicitely request that behaviour by setting put_dangling_kv_data_under_lazy_plan to 1.

Summary: for consistency it is suggested to set both options:

disable_global_kv_data => 1,
put_dangling_kv_data_under_lazy_plan => 1

ACCESSORS

end_time

exit

has_problems

is_good_plan

parse_errors

parse_errors_msgs

plan

pragmas

skip_all

start_time

summary

tapdom_config

document_data

A document can contain comment lines which actually contain key/value data, like this:

# Test-vendor-id:  GenuineIntel
# Test-cpu-model:  Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz
# Test-cpu-family: 6
# Test-flags.fpu:  1

Those lines are converted into a hash by splitting it at the : delimiter and stripping the # Test- prefix. The resulting data structure looks like this:

# ... inside TAP::DOM ...
document_data => {
                  'vendor-id' => 'GenuineIntel',
                  'cpu-model' => #Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz',
                  'cpu-family' => 6,
                  'flags.fpu' =>  1,
                 },

tests_planned

tests_run

version

AUTHOR

Steffen Schwigon <ss5@renormalist.net>

COPYRIGHT AND LICENSE

This software is copyright (c) 2020 by Steffen Schwigon.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.