NAME
YAX::Parser - fast pure Perl tree and stream parser
SYNOPSIS
use YAX::Parser;
my $xml_str = <<XML
<?xml version="1.0" ?>
<doc>
<content id="42"><![CDATA[
This is a cdata section, so >>anything goes!<<
]]>
</content>
<!-- comments are nodes too -->
</doc>
XML
# tree parse - the common case
my $xml_doc = YAX::Parser->parse( $xml_str );
my $xml_doc = YAX::Parser->parse_file( $path );
# shallow parse
my @tokens = YAX::Parser->tokenize( $xml_str );
# stream parse
YAX::Parser->stream( $xml_str, $state, %handlers )
YAX::Parser->stream_file( '/some/file.xml', $state, %handlers );
DESCRIPTION
This module implements a fast DOM and stream parser based on Robert D. Cameron's regular expression shallow parsing grammar and technique. It doesn't implement the full W3C DOM API by design. Instead, it takes a more pragmatic approach. DOM trees are constructed with everything being an object except for attributes, which are stored as a hash reference.
We also borrow some ideas from browser implementations, in particular, nodes are keyed in a table in the document on their id
attributes (if present) so you can say:
my $found = $xml_doc->get( $node_id );
Parsing is usually done by calling class methods on YAX::Parser, which, if invoked as a tree parser, returns an instance of YAX::Document
my $xml_doc = YAX::Parser->parse( $xml_str );
METHODS
See the "SYNOPSIS" for, here's just the list for now:
- parse( $xml_str )
-
Parse $xml_str and return a YAX::Document object.
- parse_file( $path )
-
Same as above by read the file at $path for the input.
- stream( $xml_str, $state, %handlers )
-
Although not its main focus, YAX::Parser also provides for stream parsing. It tries to be a bit more sane than Expat, in that it allows you to specify a state holder which can be anything and is passed as the first argument to the handler functions. A typical case is to use a hash reference with a stack (for tracking nesting):
my $state = { stack => [ ] };
all handler functions are optional, but the full list is:
my %handlers = ( text => \&handle_text, # called for text nodes elmt => \&handle_element_open, # called for open tags elcl => \&handle_element_close, # called for tag close decl => \&handle_declaration, # called for declarations proc => \&handle_proc_inst, # called for processing instructions pass => \&handle_passthrough, # called when no handlers match );
an element handler is passed the state, tag name and attributes hash:
sub handle_element_open { my ( $state, $name, %attributes ) = @_; if ( $name eq 'a' and $attributes{href} ) { ... } }
element close handlers take two arguments: state and tag name:
sub handle_element_close { my ( $state, $name ) = @_; die "not well formed" unless pop @{ $state->{stack} } eq $name; }
all other handlers take the state and the entire matched token
sub handle_proc_inst { my ( $state, $token ) = @_; $token =~ /^<\?(.*?)\?>$/; my $instr = $1; ... }
- stream_file( $path, $state, %handlers )
-
Same as above by read the file at $path for the input.
- tokenize( $xml_str )
-
Useful for quick and dirty tokenizing of $xml_str. Returns a list of tokens.
SEE ALSO
LICENSE
This program is free software and may be modified and distributed under the same terms as Perl itself.
AUTHOR
Richard Hundt