NAME

KeywordsSpider - web spider searching for keywords

SYNOPSIS

use KeywordsSpider;
KeywordsSpider::run(
  outfile => "output_test",
  infile => "export.sql",
  keyfile => "keywords_new",
  debug => 1,
  skip_ref_regexp => "(^http://trala|^null|twig.html\$)",
  allowed_keywords => "allowed_keywords",
  web_depth => 5
);

DESCRIPTION

KeywordsSpider is web spider, which takes urls and keywords from file and outputs urls matching the keywords to another file.

Referers can be specified in input file. Their domain is matched to website's domain.

It spiders in 10 parallel processes. It takes files as arguments and prepares attributes for KeywordsSpider::Core.

ARGUMENTS

infile

file with website and referer urls within. Like:

'domain.sk/twig.html','null'
domain.sk,domain2.sk
another-domain.sk/twig.html,null
another-domain.sk/twig.html,http://trala.sk

no space after comma, apostrophes not necessary

keyfile

file with newline separates keywords. Like:

word1
wuord2
wiaord3

allowed_keywords

file with newline separated keywords, which do not trigger ALERT to output file. Like:

wuord2

outfile

output file

debug

do you want debug to standard output ? It's turned off by default.

skip_ref_regexp

you can specify various referers for the same website. If you don't want to crawl specific domain, or any part of url, you put the regular expression here. Like:

(^http://trala|^null|twig.html\$)

METHODS

run ARGS: runs

COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install KeywordsSpider, copy and paste the appropriate command in to your terminal.

cpanm

cpanm KeywordsSpider

CPAN shell

perl -MCPAN -e shell
install KeywordsSpider

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)