NAME
HTML::Object::XPath - HTML Object XPath Class
SYNOPSIS
use HTML::Object;
use HTML::Object::XQuery;
use HTML::Object::XPath;
my $this = HTML::Object::XPath->new || die( HTML::Object::XPath->error, "\n" );
my $p = HTML::Object->new;
my $doc = $p->parse_file( $path_to_html_file ) || die( $p->error );
# Returns a list of HTML::Object::Element objects matching the select, which is
# converted into a xpath
my @nodes = $doc->find( 'p' );
# or directly:
use HTML::Object::XPath;
my $xp = use HTML::Object::XPath->new;
my @nodes = $xp->findnodes( $xpath, $element_object );
VERSION
v0.2.0
DESCRIPTION
This module implements the XPath engine used by HTML::Object::XQuery to provide a jQuery-like interface to query the parsed DOM object.
METHODS
clear_namespaces
Clears all previously set namespace mappings.
exists
Provided with a path
and a context
and this returns true if the given path exists.
findnodes
Provided with a path
and a context
this returns a list of nodes found by path
, optionally in context context
.
In scalar context it returns an HTML::Object::XPath::NodeSet object.
findnodes_as_string
Provided with a path
and a context
and this returns the nodes found as a single string. The result is not guaranteed to be valid HTML though (it could for example be just text if the query returns attribute values).
findnodes_as_strings
Provided with a path
and a context
and this returns the nodes found as a list of strings, one per node found.
findvalue
Provided with a path
and a context
and this returns the result as a string (the concatenation of the values of the result nodes).
findvalues
Provided with a path
and a context
and this returns the values of the result nodes as a list of strings.
matches($node, $path, $context)
Provided with a node
object, path
and a context
and this returns true if the node matches the path.
find
Provided with a path
and a context
and this returns either a HTML::Object::XPath::NodeSet object containing the nodes it found (or empty if no nodes matched the path), or one of HTML::Object::XPath::Literal (a string), HTML::Object::XPath::Number, or HTML::Object::XPath::Boolean. It should always return something - and you can use ->isa() to find out what it returned. If you need to check how many nodes it found you should check $nodeset->size.
See HTML::Object::XPath::NodeSet.
get_namespace ($prefix, $node)
Provided with a prefix
and a node
object and this returns the uri associated to the prefix for the node (mostly for internal usage)
get_var
Provided with a variable name, and this returns the value of the XPath variable (mostly for internal usage)
getNodeText
Provided with a path
and this returns the text string for a particular node. It returns a string, or undef
if the node does not exist.
namespaces
Sets or gets an hash reference of namespace attributes.
new_expr
Create a new HTML::Object::XPath::Expr, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_function
Create a new HTML::Object::XPath::Function object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_literal
Create a new HTML::Object::XPath::Literal object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_location_path
Create a new HTML::Object::XPath::LocationPath object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_nodeset
Create a new HTML::Object::XPath::NodeSet object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_number
Create a new HTML::Object::XPath::Number object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_root
Create a new HTML::Object::XPath::Root object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_step
Create a new HTML::Object::XPath::Step object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
new_variable
Create a new HTML::Object::XPath::Variable object, passing it whatever argument was provided, and returns the newly instantiated object, or undef
upon error
set_namespace
Provided with a prefix
and an uri
and this sets the namespace prefix mapping to the uri.
Normally in HTML::Object::XPath the prefixes in XPath node tests take their context from the current node. This means that foo:bar will always match an element <foo:bar> regardless of the namespace that the prefix foo is mapped to (which might even change within the document, resulting in unexpected results). In order to make prefixes in XPath node tests actually map to a real URI, you need to enable that via a call to the set_namespace method of your HTML::Object::XPath object.
parse
Provided with an XPath expression and this returns a new HTML::Object::XPath::Expr object that can then be used repeatedly.
You can create an XPath expression from a CSS selector expression using HTML::selector::XPath
set_strict_namespaces
Takes a boolean value.
By default, for historical as well as convenience reasons, HTML::Object::XPath has a slightly non-standard way of dealing with the default namespace.
If you search for //tag
it will return elements tag
. As far as I understand it, if the document has a default namespace, this should not return anything. You would have to first do a set_namespace
, and then search using the namespace.
Passing a true value to set_strict_namespaces
will activate this behaviour, passing a false value will return it to its default behaviour.
set_var
Provided with a variable name and its value and this sets an XPath variable (that can be used in queries as $var
)
NODE STRUCTURE
All nodes have the same first 2 entries in the array: node_parent and node_pos. The type of the node is determined using the ref() function.
The node_parent always contains an entry for the parent of the current node - except for the root node which has undef in there. And node_pos is the position of this node in the array that it is in (think: $node == $node->[node_parent]->[node_children]->[$node->[node_pos]] )
Nodes are structured as follows:
Root Node
The root node is just an element node with no parent.
[
undef, # node_parent - check for undef to identify root node
undef, # node_pos
undef, # node_prefix
[ ... ], # node_children (see below)
]
Element Node
[
$parent, # node_parent
<position in current array>, # node_pos
'xxx', # node_prefix - namespace prefix on this element
[ ... ], # node_children
'yyy', # node_name - element tag name
[ ... ], # node_attribs - attributes on this element
[ ... ], # node_namespaces - namespaces currently in scope
]
Attribute Node
[
$parent, # node_parent - the element node
<position in current array>, # node_pos
'xxx', # node_prefix - namespace prefix on this element
'href', # node_key - attribute name
'ftp://ftp.com/', # node_value - value in the node
]
Text Nodes
[
$parent,
<pos>,
'This is some text' # node_text - the text in the node
]
Comment Nodes
[
$parent,
<pos>,
'This is a comment' # node_comment
]
AUTHOR
Jacques Deguest <jack@deguest.jp>
SEE ALSO
HTML::Object::XPath::Boolean, HTML::Object::XPath::Expr, HTML::Object::XPath::Function, HTML::Object::XPath::Literal, HTML::Object::XPath::LocationPath, HTML::Object::XPath::NodeSet, HTML::Object::XPath::Number, HTML::Object::XPath::Root, HTML::Object::XPath::Step, HTML::Object::XPath::Variable
COPYRIGHT & LICENSE
Copyright(c) 2021 DEGUEST Pte. Ltd.
You can use, copy, modify and redistribute this package and associated files under the same terms as Perl itself.