The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTML::RelExtor - Extract "rel" and "rev" information from LINK and A tags.

SYNOPSIS

use HTML::RelExtor;

my $parser = HTML::RelExtor->new();
$parser->parse($html);

for my $link ($parser->links) {
    print $link->href, "\n" if $link->has_rel('nofollow');
}

my($canonical) = grep $_->has_rev('canonical'), $parser->links;
if ($canonical) {
    $shorten_url = $canonical->href;
}

DESCRIPTION

HTML::RelExtor is a HTML parser module to extract relationship information from A and LINK HTML tags.

METHODS

new
$parser = HTML::RelExtor->new();
$parser = HTML::RelExtor->new(base => $base_uri);

Creates new HTML::RelExtor object.

parse
$parser->parse($html);

Parses HTML content. See HTML::Parser for other method signatures.

my @links = $parser->links();
my @links = $parser->links(rel => 'alternate');
my @links = $parser->links(rev => 'canonical');

Returns list of link information with 'rel' or 'rev' attributes as a HTML::RelExtor::Link object. When given rel or rev parameter, returns only links that has the rel or rev value.

# These are equivalent
@links = $parser->links(rel => 'alternate');
@links = grep $_->has_rel('alternate'), $parser->links;

HTML::RelExtor::Link METHODS

href
my $href = $link->href;

Returns 'href' attribute of links.

tag
my $tag = $link->tag;

Returns tag name of links in lowercase, either 'a' or 'link';

attr
my $attr = $link->attr;

Returns a hash reference of attributes of the tag.

rel
my @rel = $link->rel;

Returns list of 'rel' attributes. If a link contains <a href="tag nofollow">blahblah</a>, rel() method returns a list that contains tag and nofollow.

rev
my @rev = $link->rev;

Returns list of 'rev' attributes.

has_rel
if ($link->has_rel('nofollow')) { }

A handy shortcut method to find out if a link contains specific relationship.

has_rev
if ($link->has_rev('canonical')) { }

A handy shortcut method to find out if a link contains specific reverse relationship.

text
my $text = $link->text;

Returns text inside tags, only avaiable with A tags. It returns undef value when called with LINK tags.

EXAMPLES

Collect A links tagged with rel="friend" used in XFN (XHTML Friend Network).

my $p = HTML::RelExtor->new();
$p->parse($html);

my @links = map { $_->href }
    grep { $_->tag eq 'a' && $_->has_rel('friend') } $p->links;

TODO

  • Accept callback parameter when creating a new instance.

AUTHOR

Tatsuhiko Miyagawa <miyagawa at bulknews.net>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

HTML::LinkExtor, HTML::Parser

http://www.w3.org/TR/REC-html40/struct/links.html

http://www.google.com/googleblog/2005/01/preventing-comment-spam.html

http://developers.technorati.com/wiki/RelTag

http://gmpg.org/xfn/11

http://shiflett.org/blog/2009/apr/save-the-internet-with-rev-canonical