NAME

XML::SAX::ByRecord - Record oriented processing of (data) documents

SYNOPSIS

use XML::SAX::Machines qw( ByRecord ) ;

my $m = ByRecord(
    "My::RecordFilter1",
    "My::RecordFilter2",
    ...
    {
        Handler => $h, ## optional
    }
);

$m->parse_uri( "foo.xml" );

DESCRIPTION

XML::SAX::ByRecord is a SAX machine that treats a document as a series of records. Everything before and after the records is emitted as-is while the records are excerpted in to little mini-documents and run one at a time through the filter pipeline contained in ByRecord.

The output is a document that has the same exact things before, after, and between the records that the input document did, but which has run each record through a filter. So if a document has 10 records in it, the per-record filter pipeline will see 10 sets of ( start_document, body of record, end_document ) events. An example is below.

This has several use cases:

Big, record oriented documents

Big documents can be treated a record at a time with various DOM oriented processors like XML::Filter::XSLT.
Streaming XML

Small sections of an XML stream can be run through a document processor without holding up the stream.
Record oriented style sheets / processors

Sometimes it's just plain easier to write a style sheet or SAX filter that applies to a single record at at time, rather than having to run through a series of records.

Topology

Here's how the innards look:

  +-----------------------------------------------------------+
  |                  An XML:SAX::ByRecord                     |
  |    Intake                                                 |
  |   +----------+    +---------+         +--------+  Exhaust |
--+-->| Splitter |--->| Stage_1 |-->...-->| Merger |----------+----->
  |   +----------+    +---------+         +--------+          |
  |               \                            ^              |
  |                \                           |              |
  |                 +---------->---------------+              |
  |                   Events not in any records               |
  |                                                           |
  +-----------------------------------------------------------+

The Splitter is an XML::Filter::DocSplitter by default, and the Merger is an XML::Filter::Merger by default. The line that bypasses the "Stage_1 ..." filter pipeline is used for all events that do not occur in a record. All events that occur in a record pass through the filter pipeline.

Example

Here's a quick little filter to uppercase text content:

package My::Filter::Uc;

use vars qw( @ISA );
@ISA = qw( XML::SAX::Base );

use XML::SAX::Base;

sub characters {
    my $self = shift;
    my ( $data ) = @_;
    $data->{Data} = uc $data->{Data};
    $self->SUPER::characters( @_ );
}

And here's a little machine that uses it:

$m = Pipeline(
    ByRecord( "My::Filter::Uc" ),
    \$out,
);

When fed a document like:

<root> a
    <rec>b</rec> c
    <rec>d</rec> e
    <rec>f</rec> g
</root>

the output looks like:

<root> a
    <rec>B</rec> c
    <rec>C</rec> e
    <rec>D</rec> g
</root>

and the My::Filter::Uc got three sets of events like:

start_document
start_element: <rec>
characters:    'b'
end_element:   </rec>
end_document

start_document
start_element: <rec>
characters:    'd'
end_element:   </rec>
end_document

start_document
start_element: <rec>
characters:   'f'
end_element:   </rec>
end_document

METHODS

new

my $d = XML::SAX::ByRecord->new( @channels, \%options );

Longhand for calling the ByRecord function exported by XML::SAX::Machines.

CREDIT

Proposed by Matt Sergeant, with advise by Kip Hampton and Robin Berjon.

Writing an aggregator.

To be written. Pretty much just that start_manifold_processing and end_manifold_processing need to be provided. See XML::Filter::Merger and it's source code for a starter.

To install XML::SAX::Machines, copy and paste the appropriate command in to your terminal.

cpanm

cpanm XML::SAX::Machines

CPAN shell

perl -MCPAN -e shell
install XML::SAX::Machines

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)