NAME

Mojo::DOM::Role::Analyzer - miscellaneous methods for analyzing a DOM

SYNOPSIS

use strict;
use warnings;
use Mojo::DOM;

my $html = '<html><head></head><body><p class="first">A paragraph.</p><p class="last">boo<a>blah<span>kdj</span></a></p><h1>hi</h1></body></html>';
my $analyzer = Mojo::DOM->with_roles('+Analyzer')->new($html);

# return the number of elements inside a dom object
my $count = $analyzer->at('body')->element_count;

# get the smallest containing dom object that contains all the paragraph tags
my $containing_dom = $analyzer->common_ancestor('p');

# compare DOM objects to see which comes first in the document
my $tag1 = $analyzer->at('p.first');
my $tag2 = $analyzer->at('p.last');
my $result = $analyzer->compare($tag1, $tag2);

# ALTERNATIVELY

$analyzer->at('p.first')->compare('p.last');    # 'p.last' is relative to root

# get the depth level of a dom object relative to root
# root node returns '1'
my $depth = $analyzer->at('p.first')->depth;

# get the deepest depth of the documented
my $deepest = $analyzer->deepest;

# SEE DESCRIPTION BELOW FOR MORE METHODS

DESCRIPTION

Operators

cmp

my $result = $dom1 cmp $dom2;

Compares the selectors of two $dom objects to determine which comes first in the dom. See compare method below for return values.

Methods

closest_down

my $closest_down_dom = $dom->at('h1')->closest_down('p');

Returns the node closest to the tag node of interest by searching downward through the DOM.

Note that "closest" is defined as the node highest in the DOM that is still below the tag node of interest (or, in the case of closeest_up lowest in the DOM but still above the tag node of interest), not by the shortest distance (number of "hops") to the other node.

For example, in the code below, the <h1> tag containing "Heading 1" is five hops away from the <p> tag, while the other <h1> tag is only two hops away. But despite being more hops away, the <h1> tag containing "Header 1" is considered to be closer.

<p>Paragraph</p>
<div><div><div><div><h1>Heading 1</h1></div></div></div></div>
<h1>Heading 2</h2>

closest_up

my $closest_up_dom = $dom->at('p')->closest_up('h1');

Returns the node closest to the tag node of interest by searching upward through the DOM.

See the closest_down method for the meaning of the "closest" node and how it is calculated.

common

$dom->at($tag1)->common($tag2)

$dom->common($dom1, $dom2)

$dom->common($selector_str1, $selector_str2)

$dom->at($tag1)->common

# Find the common ancestor for two nodes
my $common $dom->at('div.bar')->common('div.foo');    # 'div.foo' is relative to root

# OR

# Pass in two $dom objects
my $dom1 = $dom->at('div.bar');
my $dom2 = $dom->at('div.foo');
my $common = $dom->common($dom1, $dom2);

# OR

# Pass in two selectors
my $common = $dom->common($dom->at('p')->selector, $dom->at('h1')->selector);

# OR

# Find the common ancestor for all paragraph nodes with class "foo"
# This syntax is a wrapper for the Mojo::Collection::Role::Extra->common method
my $common = $dom->at('p.foo')->common;

Returns the lowest common ancestor node between two nodes or between a node and a group of nodes sharing the same selector.

See "common" in Mojo::Collection::Role::Extra for a similar method that invoked on Mojo::Collection objects.

compare

$dom->at($tag1)->compare($tag2)

compare($dom1, $dom2)

$dom1 cmp $dom2

$dom->at('p.first')->compare('p.last');    # 'p.last' is relative to root

# OR

my $dom1 = $dom->at('p.first');
my $dom2 = $dom->at('p.last');
my $result = $dom->compare($dom1, $dom2);

# OR with overloaded 'cmp' operator

my $result = $dom1 cmp $dom2;

Compares the selectors of two $dom objects to see which comes first in the DOM.

  • Returns a value of '-1' if the first argument comes before (is less than) the second.

  • Returns a value of '0' if the first and second arguments are the same.

  • Returns a value of '1' if the first argument comes after (is greater than) the second.

deepest

my $deepest_depth = $dom->deepest;

Finds the deeepest nested level within a node.

depth

my $depth = $dom->at('p.first')->depth;

Finds the nested depth level of a node. The root node returns 1.

distance

$dom->at($selector)->distance($selector)

$dom->at($selector)->distance($dom)

$dom->distance($dom1, $dom2)

Returns the distance, aka number of "hops," between two nodes.

The value is calculated by first finding the lowest common ancestor node for the two nodes and then getting the distance between the lowest common ancestor node and each of the two nodes. The two distances are then added togethr to determine the total distance between the two nodes.

element_count

$count = $dom->element_count;

Returns the number of elements in a dom object, including children of children of children, etc.

is_ancestor_to

$is_ancestor = $s->('h1')->is_ancestor_to('p.foo');

Returns true if a node is an ancestor to another node, false otherwise.

tag_analysis

@enclosing_tags = $dom->tag_analysis('p');

Searches through a DOM for tag nodes that enclose tags matching the given selector (see common_ancestor method) and returns an array of hash references with the following information for each of the enclosing nodes:

{
  "all_tags_have_same_depth" => 1,   # whether enclosed tags within the enclosing node have the same depth
  "avg_tag_depth" => 8,              # average depth of the enclosed tags
  "selector" => "body:nth-child(2)", # the selector for the enclosing tag
  "size" => 1                        # total number of tags of interest that are descendants of the enclosing tag
}

VERSION

version 0.013

SUPPORT

Perldoc

You can find documentation for this module with the perldoc command.

perldoc Mojo::DOM::Role::Analyzer

Websites

The following websites have more information about this module, and may be of help to you. As always, in addition to those websites please use your favorite search engine to discover more resources.

Source Code

The code is open to the world, and available for you to hack on. Please feel free to browse it and play with it, or whatever. If you want to contribute patches, please send me a diff or prod me to pull from your repository :)

https://github.com/sdondley/Mojo-DOM-Role-Analyzer

git clone git://github.com/sdondley/Mojo-DOM-Role-Analyzer.git

SEE ALSO

Mojo::DOM Mojo::Collection::Role::Extra

AUTHOR

Steve Dondley <s@dondley.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2020 by Steve Dondley.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.