NAME
YAX::Query - Query the YAX DOM
SYNOPSIS
use YAX::Query;
$q = YAX::Query->new( $node );
$q->select( $expr );
# method interface
$q->parent();
$q->descendants();
$q->children( $type );
$q->child( $tag_name );
$q->attributes;
$q->attribute( $name );
$q->filter( \&code );
DESCRIPTION
This module implements a tool for querying a YAX DOM tree. It supports an expression parser for simple querying of the DOM using an E4X-ish syntax, as well as a method interface.
It is useful to note that a YAX::Query object is a blessed array reference and that the resulting nodes matching the query are stored in this array reference. Therefore all query methods return the query object itself, and to access the results you simply inspect this object. For example, the following searches for all text nodes which are children of `em' elements, which in turn are children of all `div' descendants:
my $q = YAX::Query->new( $node );
$q->select(q{..div.em.#text});
for my $found ( @$q ) {
# $found is a YAX::Text node
}
The select method returns the query object itself, so the following, which selects all `li' descendants which have an `foo' attribute equal to "bar", also works:
for my $item ( @{ $q->select(q{..li.(@foo eq "bar")}) } ) {
...
}
QUERY EXPRESSIONS
A query expression is constructed of a sequence of tokens separated by a literal `.' (dot). Each successive token represents an operation on the resulting set of the application of the previous token's operation.
In the initial state, the set of nodes contains only the context node passed to the constructor: YAX::Query-
new( $node )>.
Filters are enclosed in `(' and `)', and generally contain Perl expressions with the exception that tokens of the form /\@(\w+)/ are replaced with $_->{$1} where `$_' is the current node in the loop which is applying the filter.
The following is a list of valid tokens:
- '..'
-
descendants of
- '.*'
-
all element children of
- '.element_name'
-
all elements named
element_name
- '.@*
-
all attributes of
NOTE: This adds the hash reference of the element itself, and not a list of attribute values. Moreover, adding a node selector after this in sequence is meaningless since attributes cannot have children. An exception will be raised if this occurs.
- '.@attribute_name'
-
all attributes named
attribute_name
NOTE: This adds a list of attribute values to the set. As above, node selectors following this are meaningless, and will raise and exception.
- '.parent()'
-
parent nodes of the set
- '.#text'
-
all text children
- '.#processing-instruction'
-
all processing instruction children
- '.#cdata'
-
all CDATA children
- '.#node'
-
all child nodes of
- '.#comment'
-
all comment children of
- '.( $expr )'
-
Apply the filter
$expr
by turning it into a Perl code reference. Expressions are Perl with the exception that tokens of the form /\@(\w+)/ are replaced with $_->{$1} where `$_' is the current node in the loop which is applying the filter. - '[n]'
-
the n-th element of the set
METHODS
- new( $node )
-
Constructor.
- select( $expr )
-
Evaluates
$expr
and returns the query object itself. The results are simply the elements in the query object which is a blessed array reference. This allows for chaining and piecemeal querying. The follow shows some different ways of achieving the same thing:my $q = YAX::Query->new( $node ); $q->select('..div.*'); # get all children of all `div' descendants $q->filter( \&filter ); # filter the set obtained on the live above $q->select('..div.*')->filter( \&filter ); # same as the two lines above # or the equivalent @ids = grep { filter( $_ ) } @{ $q->select('..div.*') };
- parent()
-
See `.parent()' above
- children( $type )
-
Selects child nodes of type $type (see YAX::Constants for valid types). The `#text', `#cdata', `#processing-instruction' and `#comment' selectors are implemented with
children(...)
. - child( $name )
-
Selects elements named $name.
- attribute( $name )
-
Selects attribute values named $name.
- attributes()
-
Selects the attributes hash for each element in the set.
- descendants()
-
Selects descendants for each element in the set.
- filter(\&code)
-
Applies the passed code reference to each element in the set, adding the element to the resulting set iff the code reference returns a true value.
BUGS AND LIMITATIONS
Syntax errors in the expressions are currently not handled very well. If the expression doesn't parse, an exception is raised, but because of the simplicity of the lexer, the information required to inform the user of exactly what went wrong is unavailable.
Changing this requires a more complex parser which will significantly impact performance, and so I'm reluctant to implement this since query expressions tend to be short enough for debugging by inspection.
Result sets from a query are not "live". That is, if a node is removed from or added to the DOM tree after the query is performed, these changes will not be reflected in the query result set.
SEE ALSO
t/03-query.t in the test suite for an extensive list of examples
AUTHOR
Richard Hundt
LICENSE
This program is free software and may be used and distributed under the same terms as Perl itself.