The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

WWW::PkgFind - Spiders given URL(s) downloading wanted files

SYNOPSIS

my $Pkg = new WWW::PkgFind("foobar"); $Pkg->depth(3); $Pkg->active_urls("ftp://ftp.somesite.com/pub/joe/foobar/"); $Pkg->wanted_regex("patch-2\.6\..*gz", "linux-2\.6.\d+\.tar\.bz2"); $Pkg->set_create_queue("/testing/packages/QUEUE"); $Pkg->retrieve();

DESCRIPTION

TODO

FUNCTIONS

new([$pkg_name], [$agent_desc])

Creates a new WWW::PkgFind object

package_name()

Gets/sets the package name

depth()

wanted_regex()

not_wanted_regex()

rename_regex()

active_urls()

robot_urls()

files()

processed()

set_create_queue($dir)

Specifies that the retrieve() routine should also create a symlink queue in the specified directory.

set_debug($debug)

Turns on debug level. Set to 0 or undef to turn off.

want_file($file)

Checks the regular expressions in the Pkg hash. Returns 1 (true) if file matches at least one wanted regexp and none of the not_wanted regexp's. If the file matches a not-wanted regexp, it returns 0 (false). If it has no clue what the file is, it returns undef (false).

get_file($url, $dest)

Retrieves the given URL, returning true if the file was successfully obtained and placed at $dest, false if something prevented this from happening.

get_file also checks for and respects robot rules, updating the $rules object as needed, and caching url's it's checked in %robot_urls. $robot_urls{$url} will be >0 if a robots.txt was found and parsed, <0 if no robots.txt was found, and undef if the url has not yet been checked.

retrieve()

AUTHOR

Bryce Harrington <bryce@osdl.org>

COPYRIGHT

Copyright (C) 2006 Bryce Harrington. All Rights Reserved.

This script is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

perl