NAME
HTML::RobotsMETA - Parse HTML For Robots Exclusion META Markup
SYNOPSIS
use HTML::RobotsMETA;
my $p = HTML::RobotsMETA->new;
my $r = $p->parse_rules($html);
if ($r->can_follow) {
# follow links here!
} else {
# can't follow...
}
DESCRIPTION
HTML::RobotsMETA is a simple HTML::Parser subclass that extracts robots exclusion information from meta tags. There's not much more to it ;)
DIRECTIVES
Currently HTML::RobotsMETA understands the following directives:
- ALL
- NONE
- INDEX
- NOINDEX
- FOLLOW
- NOFOLLOW
- ARCHIVE
- NOARCHIVE
- SERVE
- NOSERVE
- NOIMAGEINDEX
- NOIMAGECLICK
METHODS
new
Creates a new HTML::RobotsMETA parser. Takes no arguments
parse_rules
Parses an HTML string for META tags, and returns an instance of HTML::RobotsMETA::Rules object, which you can use in conditionals later
parser
Returns the HTML::Parser instance to use.
get_parser_callbacks
Returns callback specs to be used in HTML::Parser constructor.
TODO
Tags that specify the crawler name (e.g. <META NAME="Googlebot">) are not handled yet.
There also might be more obscure directives that I'm not aware of.
AUTHOR
Copyright (c) 2007 Daisuke Maki <daisuke@endeworks.jp>
SEE ALSO
HTML::RobotsMETA::Rules HTML::Parser
LICENSE
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html