NAME
XML::Essex - Essex XML processing primitives
SYNOPSIS
TODO
DESCRIPTION
Result Value
The return value will be returned to the caller. For handlers, this is usually a "1" for success or some other value, such as a data structure that has been built or the result of a query.
For generators and filters, it is important that the result of the next filter's end_document() is returned at the end of your Essex script, so that it may be used upstream of such modules as XML::Simple.
Errors should be reported using die().
Result Values
Essex is designed to Do The Right Thing for the vast majority of uses, so it manages result values automatically unless you take control. Below is a set of detailed rules for how it manages the result value for a filter's processing run, but the overview is:
Filters normally do not need to manage a result. The result from the next filter downstream will be returned automatically, or an exception will be thrown if an incomplete document is sent downstream.
Generators act like filters mostly, except that if a generator decides not to send any results downstream, it should either set a result value by calling
result()
with it, orreturn
that result normally, just like a handler.Handlers should either set a result value by calling
result()
with it, orreturn
that result normally.Generators, filters and handlers should all die() on unexpected conditions and most error conditions (a FALSE or undefined result is not necessarily an error condition for a handler).
Generators and filters generally should not return a value of their own because this will surprise calling code which is expecting a return value of the type that the final SAX handler returns.
Exported Functions
These are exported by default, use the use XML::Essex ();
syntax to avoid exporting any of these or export only the ones you want.
The following export tags are also defined:
:read get read_from parse_doc isa next_event path type xeof
:rules on
:write put write_to start_doc end_doc start_elt chars ...
so you can
use XML::Essex qw( :read :rules );
for an Essex script that just handles input and uses some rules, or even:
use XML::Essex qw( parse_doc :rules );
for a purely rule-based script.
Importing only what you need is a little quicker and more memory efficient, but it cal also allow XML::Essex to run more efficiently. If you don't import any output functions (see :write
above), it will not load the output routines. Same for the input and rule based APIs.
- get
-
my $e = get;
Returns the next SAX event. Sets $_ as an EXPERIMENTAL feature.
Throws an exception (which is silently caught outside the main code) on end of input.
See
isa()
andtype()
functions and method (in XML::Essex::Object) for how to test what was just gotten. - read_from
-
read_from \*STDIN; ## From a filehandle read_from "-"; ## From \*STDIN read_from "foo.xml"; ## From a file or URI (URI support is parser dependant) read_from \$xml_string; ## From a string. read_from undef; ## STDIN or files named in @ARGV, as appropriate
Tells the next get() or parse_doc() to read from the indicated source.
Calling read_from automatically disassembles the current processing chain and builds a new one (just like Perl's open() closes an already open filehandle).
- push_output_filters
-
Adds an output filter to the end of the current list (and before the eventual writer). Can be a class name (which will be
require()
ed unless the class can already new()) or a reference to a filter. - parse_doc
-
Parses a single document from the current input. Morally equivalent to
get() while 1;
but exits normally (as opposed to throwing an exception) when the end of document is reached. Also slightly faster now and hopefully moreso when optimizations can be made.Used to read to the end of a document, primarily in rule-based processing ("on").
TODO: Allow parse_doc to take rules.
- put
-
Output one or more events. Usually these events are created by constructors like
start_elt()
(see XML::Generator::Essex for details) or are objects returnedget()
method. - write_to
-
write_to \*STDOUT; ## To a filehandle write_to "-"; ## To \*STDOUT write_to "foo.xml"; ## To a file or URI (URI support is parser dependant) write_to \$xml_string; ## To a string.
Tells the next put() to write the indicated source.
Miscellaneous
- isa
-
get until isa "start_elt" and $_->name eq "foo"; $r = get until isa $r, "start_elt" and $_->name eq "foo";
Returns true if the parameter is of the indicated object type. Tests $_ unless more than one parameter is passed.
- next_event
-
Like
get()
(see below), but does not remove the next event from the input stream.get "start_document::*"; get if next_event->isa( "xml_decl" ); ...process remainder of document...
- path
-
get "start_element::*" until path eq "/path/to/foo:bar"
Returns the path to the current element as a string.
- type
-
get until type eq "start_document"; $r = get until type $r eq "start_document";
Return the type name of the object. This is the class name with a leading XML::Essex:: stripped off. This is a wrapper around the event's
type()
method. - xeof
-
Return TRUE if the last event has been read.
Namespaces
If this section doesn't make any sense, see http://www.jclark.com/xml/xmlns.htm for your next dose of XML koolaid. If it still doesn't make any sense then ding me for writing gibberish.
Element names, attribute names, and PI targets returned by Essex are generated in one of three forms, depending on whether the named item has a namespace URI associated with it and whether the filter program has mapped that namespace URI to a prefix. You may also use any of these three forms when passing a name to Essex:
- "id"
-
If an attribute has no NamespaceURI or an empty string for a NamespaceURI, it will be returned as a simple string.
TODO: Add an option to enable this for the default namespace or for attrs in the element's namespace.
- "foo:id"
-
If the attribute is in a namespace and there is a namespace -> prefix mapping has been declared by the filter
- "{http://foo/}id"
-
If the attribute is in a namespace with no prefix mapped to it by the filter.
Namespace prefixes from the source document are ignored; there's no telling what prefix somebody might have used. Intercept the start_prefix_mapping and end_prefix_mapping events to follow the weave of source document namespace mappings.
When outputting events that belong to a namespace not in the source document, you need to put()
the start_prefix_mapping and end_prefix_mapping events manually, and be careful avoid existing prefixes from the document if need be while doing so. Future additions to Essex should make this easier and perhaps automatic.
Essex lets you manage namespace mappings by mapping, hiding, and destroying ( $namespace => $prefix ) pairs using the functions:
- namespace_map
-
aka: ns_map
my $map = ns_map( $ns1 => $prefix1, $ns2 => $prefix2, ... );
Creates a new set of mappings in addition to any that are already in effect. If a namespace is mapped to multiple prefixes, the last one created is used. The mappings stay in effect until the map objected referred to by
$map
is destroyed.
Rule Based Processing
It is often advantageous to declare exceptional events that should be processed as they occur in the stream rather than testing for them explicitly everywhere they might occur in the script. This is done using the "on" function.
- on
-
on( "start_document::*" => sub { warn "start of document reached" }, "end_document::*" => sub { warn "end of document reached" }, );
This declares that a rule should be in effect until the end of the document
For now, this must be called before the first get() for predictable results.
Rules remain in effect after the main() routine has exited to facilitate pure rule based processing.
- xvalue
-
Returns the result of the expression that fired an action. Valid only within rules.
- xpush
-
Returns the result of the expression that fired an action. Valid only within rules.
- xpop
-
Returns the result of the expression that fired an action. Valid only within rules.
- xset
-
Returns the result of the expression that fired an action. Valid only within rules.
- xadd
-
Returns the result of the expression that fired an action. Valid only within rules.
Event Constructors
These are exported by :write (in addition to being available individually).
- chars
-
aka: characters
- end_doc
-
aka: end_document
- end_elt
-
aka: end_element
- start_doc
-
aka: start_document
- start_elt
-
aka: start_element
- xml_decl
IMPLEMENTATION NOTES
XML::Essex is a source filter that wraps from the use
line to the end of the file in an eval { ... } block.
LIMITATIONS
Stay tuned.
COPYRIGHT
Copyright 2002, R. Barrie Slaymaker, Jr., All Rights Reserved
LICENSE
You may use this module under the terms of the BSD, Artistic, oir GPL licenses, any version.
AUTHOR
Barrie Slaymaker <barries@slaysys.com>