NAME

CrawlerCommons::RobotRules - the result of a parsed robots.txt

SYNOPSIS

use CrawlerCommons::RobotRules;
use CrawlerCommons::RobotRulesParser;

my $rules_parser = CrawlerCommons::RobotRulesParser->new;

my $content = "User-agent: *\r\nDisallow: *images";
my $content_type = "text/plain";
my $robot_names = "any-old-robot";
my $url = "http://domain.com/";

my $robot_rules =
  $rules_parser->parse_content($url, $content, $content_type, $robot_names);

# obtain the 'mode' of the robot rules object
say "Anything Goes!!!!" if $robot_rules->is_allow_all;
say "Nothing to see here!" if $robot_rules->is_allow_none;
say "Default robot rules mode..." if $robot_rules->is_allow_some;

# are we allowed to crawl a URL (returns 1 if so, 0 if not)
say "We're allowed to crawl the index :)"
 if $robot_rules->is_allowed( "https://www.domain.com/index.html");

say "Not allowed to crawl: $_" unless $robot_rules->is_allowed( $_ )
  for ("http://www.domain.com/images/some_file.png",
       "http://www.domain.com/images/another_file.png");

DESCRIPTION

This object is the result of parsing a single robots.txt file

VERSION

Version 0.03

METHODS

`my $true_or_false = $robot_rules->is_allowed( $url )`

Returns 1 if we're allowed to crawl the URL represented by $url and 0 otherwise. Will return 1 if the method is_allow_all() returns true, otherwise, if is_allow_none is false, returns 1 if there is an allow rule or no disallow rule for this URL.

$url

The URL whose path is used to search for a matching rule within the object for evaluation.

AUTHOR

Adam Robinson <akrobinson74@gmail.com>

To install CrawlerCommons::RobotRulesParser, copy and paste the appropriate command in to your terminal.

cpanm

cpanm CrawlerCommons::RobotRulesParser

CPAN shell

perl -MCPAN -e shell
install CrawlerCommons::RobotRulesParser

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)