NAME
XML::Essex::Model - Essex objects representing SAX events and DOM trees
SYNOPSIS
Used internally by Essex, see below for external API examples.
DESCRIPTION
A description of all of the events explicitly supported so far. Unsupported events are still handled as anonymous events, see XML::Essex::Event for details.
A short word on abbreviations
A goal of essex is to allow code to be as terse or verbose as is appropriate for the job at hand. So almost every object may be abbreviated. So start_element
may be abbreviated as start_elt
for both the isa()
method/function and for class creation.
All objects are actually blessed in to classes using the long name, like XML::Essex::start_element
even if you use an abbreviation like XML::Essex::start_elt-
new> to create them.
Stringification
All events are stringifiable for debugging purposes and so that attribute values, character data, comments, and processing instructions may be matched with Perl string operations like regular expressions and index()
. It is usually more effective to use EventPath, but you can use stringified values for things like complex regexp matching.
It is unwise to match other events with string operators because no XML escaping of data is done, so "<" in an attribute value or character data is stringified as "<", so print()
ing out the three events associated with the sequence "<foo><bar/></foo>" will look like "<foo><bar/></foo>", obviously not what the document intended. Given the rarity of such constructs in real life XML, though, this is sufficient for debugging purposes, and does make it easy to match against strings.
Ordinarily, you tell get()
what kind of object you want using an EventPath expression:
my $start_element = get "start_element::*";
You can also just get()
whatever's next in the document or use a union expression. In this case, you may need to see what you've gotten. The isa()
method (see below) and the isa()
functions (see XML::Essex) should be used to figure out what type of object is being used before relying on the stringification:
get until isa "chars" and /Bond, James Bond/;
get until type eq "characters" and /Bond, James Bond/;
get until isa( "chars" ) && /Bond, James Bond/;
get "text()" until /Bond, James Bond/;
This makes it easier to match characters data, but other methods should be used to select things like start tags and elements:
get "start_element::*" until $_->name eq "address" && $_->{id} eq $id;
get "start_element::address" until $_->{id} eq $id;
The lack of escaping only affects stringification of objects, for instance:
warn $_; ## See what event is being dealt with right now
/Bond, James Bond/ ## Match current event
. Things are escaped properly when the put operator is used, using put()
emits properly escaped XML.
Some observervations:
Stringifying an event does not produce a well formed chunk of XML. Events must be emitted through a downstream filter.
Events with no natural XML representation--like start_document--stringify as their name: "start_document()". If it's not listed on this page, it stringifies this way.
Whitespace is inserted only where manditory inside XML constructs, and is a single space. It is left unmolested in character data, comments, processing instructions (other than
<?xml ...?>
, which is parsed by all XML parsers).Attributes in start_element events are stringified in alphabetical order according to Perl's
sort()
function.Processing instructions, including the
<?xml...?>
declaration, often have things that look like attributes but are not, so the items above about whitespace and attribute sort order do not apply. Actually, the<?xml ... ?>
declaration is well defined and there will be only a single whitespace character, though the pseudo-attributes version, encoding and standalone will not be sorted.No escapes are used. See above.
Character data is catenated, including mixed data and CDATA, in to single strings. CDATA sections are tracked and may be analyzed.
Namespaces are stringified according to any prefixes that have been registered, otherwise they stringify in james clark notation (
"{}foo"
), except for the empty namespace URI, which alway stringifies as "" (ie no prefix). See XML::Essex's Namspaces section for details.
Common Methods
All of the objects in the model provide the following methods. These methods are exported as functions from the XML::Essex module for convenience (those functions are wrappers around these methods).
- isa
-
Returns TRUE if the object is of the type, abbreviated type, or class passed. So, for an object encapsulating a characters event, returns TRUE for any of:
XML::Essex::Event ## The base class for all events XML::Essex::start_document ## The actuall class name start_document ## The event type start_doc ## The event type, abbreviated
- class
-
Returns the class name, such as
XML::Essex::start_document
. - type
-
Returns the class name, such as
start_document
. - types
-
Returns the class name, the type name and any abbreviations. The abbreviations are sorted from longest to shortest.
start_document
aka: start_doc
my $e = start_doc \%values; ## %values is not defined in SAX1/SAX2
Stringifies as: start_document($reserved)
where $reserved is a character string that may sometime include info passed in the start_document event, probably formatted as attributes.
xml_decl
aka: (no abbreviations)
my $e = xml_decl;
my $e = xml_decl
Version => "1",
Encoding => "UTF-8",
Standalone => "yes";
my $e = xml_decl {
Version => "1",
Encoding => "UTF-8",
Standalone => "yes"
};
Stringifies as: <?xml version="$version" encoding="$enc" standalone="$yes_or_no"?>
Note that this does not follow the sorted attribute order behavior of start_element, as the seeming attributes here are not attributes, like processing instructions that have pretend attributes.
end_document
aka: end_doc
my $e = end_doc \%values; ## %values is not defined in SAX1/SAX2
Stringifies as: end_document($reserved)
where $reserved is a character string that may sometime include info passed in the end_document event, probably formatted as attributes.
start_element
aka: start_elt
my $e = start_elt foo => { attr => "val" };
my $e = start_elt $start_elt; ## Copy constructor
my $e = start_elt $end_elt; ## end_elt deconstructor
my $e = start_elt $elt; ## elt deconstructor
Stringifies as: <foo attr1="$val1" attr2="val2">
The element name and any attribute names are prefixed according to namespace mappings registered in the Essex processor, the prefixes they had in the source document are ignored. If no prefix has been mapped, jclark notation ({http:...}foo
) is used. Then they are sorted according to Perl's sort()
function, so jclarked attribute names come last, as it happens.
TODO: Support attribute ordering via consecutive {...} sets.
Attributes may be accessed using hash dereferences:
get "start_element::*" until $_->{id} eq "10"; ## No namespace prefix
get "start_element::*" until $_->{"{}id"} eq "10";
get "start_element::*" until $_->{"{http://foo/}id"} eq "10";
get "start_element::*" until $_->{"foo:id"} eq "10";
and the attribute names may be obtained by:
keys %$_;
. Keys are returned in no predictable order, see Namespaces for details on the three formats keys may be returned in.
Methods
- name
-
Returns the name of the node according to the namespace stringification rules.
- jclark_name
-
Returns the name of the node in James Clark notation.
- jclark_keys
-
my @keys = $e->jclark_keys
Returns a list of attribute names in jclark notation ("{...}name").
attribute
aka: attr
my $name_attr = $start_elt->{name};
my $attr = attr $name;
my $attr = attr $name => $value;
my $attr = attr {
LocalName => $local_name,
NamespaceURI => $ns_uri,
Value => $value,
};
Stringifies as its value: harvey
This is not a SAX event, but an object returned from within element or start_element objects that gives you access to the NamespaceUri
, LocalName
, and Value
fields of the attribute. Does not give access to the Name or Prefix fields present in SAX events.
If you create an attribute with an undefined value, it will stringify as the undef
ined value. Attributes that are created without an explicit undef
ined Value
field will be given the defaul value of "", including attributes that are autovivified. This allows
get "*" until $_->{id} eq "10";
to work. This has the side effect of addingan id=""
attribute to all elements without an id
attribute. To avoid the side effect, use the exists
function to detect nonexistant attributes:
get "*" until exists $_->{id} and $_->{id} eq "10";
end_element
aka: end_elt
my $e = end_element "foo";
my $e = end_element $start_elt;
my $e = end_element $end_elt;
my $e = end_element $elt;
Stringifies as: </foo>
See start_element for details on namespace handling.
element
aka: elt
my $e = elt foo => "content", $other_elt, "more content", $pi, ...;
my $e = elt foo => { attr1 => "val1" }, "content", ...;
Stringifies as: <foo attr1="val1">content</foo>
Never stringifies as an empty element tag (<foo/>
), although downstream filters and handlers may choose to do that.
Constructs an element. An element is a sequence of events between a matching start_element and end_element, inclusive.
Attributes may be accessed using Perl hash dereferencing, as with start_element events, see "start_element" for details.
Content may be accessed using Perl array dereferencing:
my @content = @$_;
unshift @$_, "prefixed content";
push @$_, "appended content";
Note that
my $elt2 = elt $elt1; ## doesn't copy content, just name+attra
only copies the name and attributes, it does not copy the content. To copy content do either of:
my $elt2 = elt $elt1, @$elt1;
my $elt2 = $elt1->clone;
This is because the first parameter is converted to a start/end_element pair and any content is ignored. This is so that:
my $elt2 = elt $elt1, "new content";
creates an element with the indicated content.
Methods
- jclark_keys
-
Returns the names of attributes as a list of JamesClarkified keys, just like start_element's
jclark_keys()
. - name
-
Returns the name of the node according to the namespace stringification rules.
- jclark_name
-
Returns the name of the node in James Clark notation.
characters
aka: chars
my $e = chars "A stitch", " in time", " saves nine";
my $e = chars {
Data => "A stitch in time saves nine",
};
Stringifies like a string: A stitch in time saves nine.
Character events are aggregated.
TODO: make that aggregation happen.
comment
aka: (no abbreviation)
my $e = comment "A stitch in time saves nine";
my $e = comment {
Data => "A stitch in time saves nine",
};
Stringifies like a string: A stitch in time saves nine.
Implementation Details
References and blessed, tied or overloaded SAX events.
Instances of the Essex object model classes carry a reference to the original data (SAX events), rather than copying it. This means that there are fewer copies (a good thing; though there is an increased cost of getting at any data in the events) and that upstream filters may send blessed, tied, or overloaded objects to us and they will not be molested unless the Essex filter messes with them. There is also an implementation reason for this, it makes overloading hash accesses like $_-
{}> easier to implement.
Passing an Essex event to a constructor for a new Essex event does result in a deep copy of the referenced data (via XML::Essex::Event::clone()
).
Class files, or the lack thereof
The objects in the Essex object model are not available independantly as class files. You must use XML::Essex::Model
to get at them. This is because there is a set of event types used in almost all SAX filters and it is cheaper to compile one file containing these than to open multiple files.
This does not mean that all classes are loaded when the XML::Essex::Model is use()
ed or require()
ed, rare events are likely to be autoloaded.
Class names
In order to allow
my $e = XML::Essex::start_elt( ... );
to work as expected--in case the calling package prefers not to import start_elt()
, for instance--the objects in the model are all in the XML::Essex::Event::... namespace, like XML::Essex::Event::start_element
.
TODO
- Allow escaping to be configured
- Allow " vs. ' for attr quotes to be configured.
- Allow CDATA to be tested for, either by stringifying it or by allowing it to be returned as an array or something.
COPYRIGHT
Copyright 2002, R. Barrie Slaymaker, Jr., All Rights Reserved
LICENSE
You may use this module under the terms of the BSD, Artistic, oir GPL licenses, any version.
AUTHOR
Barrie Slaymaker <barries@slaysys.com>