Why not adopt me?
NAME
LWP::UserAgent::ProxyHopper - LWP::UserAgent with proxi-hopping
SYNOPSIS
use strict;
use warnings;
use LWP::UserAgent::ProxyHopper;
my $ua = LWP::UserAgent::ProxyHopper->new( agent => 'fox', timeout => 10 );
$ua->proxify_load;
for ( 1..5 ) {
my $response = $ua->proxify_get('http://www.privax.us/ip-test/');
if ( $response->is_success ) {
my $content = $response->content;
if ( my ( $ip ) = $content
=~ m|<p>.+?IP Address:\s*</strong>\s*(.+?)\s+|s
) {
printf "\n\nSucces!!! \n%s\n", $ip;
}
else {
printf "Response is successfull but seems like we got a wrong "
. " page... here is what we got:\n%s\n", $content;
}
}
else {
print '[script] Network error: ' . $response->status_line;
}
}
DESCRIPTION
The module is a subclass of LWP::UserAgent with adds extra functionality to make proxy-hopping requests. In other words each request can be sent out from different proxy servers.
HOW GOOD IS IT?
Don't get your hopes up too high... unless you can feed the module 100% working and fast proxies. Even though the module does some basic checks on whether the request succeeded and blacklists proxies that appear to be real bad there is still quite a good chance that either (a) your request will timeout after several tries or worse: (b) your request will succeed but will return not what you would expect it to as some proxies tend to drop garbage on you. Depending on settings your mileage will vary, it's speed for quality trade off.
HOW IT WORKS
The module fetches a list of proxy servers (see proxify_load()
method) when one of proxify_*()
request methods is called it will get a proxy from the list and try to make your request with the proxy in use. If request succeeds it will check for a couple of "this is not what you wanted" proxies and retry the request with a different proxy if that the case. If this check did not raise any suspicion the result (HTTP::Response object) will be returned back to you and proxy which was used will be put into a "working" list. If the request failed the module will do a basic check on the return status code and decide whether to blacklist proxy into a "bad" list or "real_bad" list after which it will retry. The number of times it will retry depends on retry
setting to proxify_load()
method.
When the original proxy list is exhausted the module will make a new list out of proxies which it previously listed as "working", if that fails the "bad" list which might have working proxies. The "real_bad" list will never be used. If both "working" and "bad" lists do not have any proxies left the module will call proxify_load()
automatically with the same arguments you used it with the last time, therefore your program can live long with just one call to proxify_load()
during startup.
PROVIDED METHODS
The module is a subclass of LWP::UserAgent thus you can use any LWP::UserAgent's methods as you would before. All the methods are prefixed with proxify_
.
proxify_load
$your_ua->proxify_load; # plain defaults
$your_ua->proxify_load( # juicy override
freeproxylists => 1,
plan_b => 1,
proxy4free => 0,
timeout => 20,
debug => 0,
retries => 5,
extra_proxies => [],
schemes => [ 'http', 'ftp' ],
get_list_args => {
freeproxylists => [ type => 'anonymous' ],
proxy4free => [ [2,3] ],
},
);
Instructs the object to load up a list of proxies. You must call this method at least once before calling any other proxify_*
request methods. The return value is an arrayref of proxy addresses in a form "http://122.122.122.122:8080/"
. Will croak()
if after trying to fetch proxy lists and after adding extra_proxies
(see below) the proxy list is still empty. The method takes quite a bit of arguments, all of which are given in a key/value fashion. All of them are optional. Possible argumens are as follows:
freeproxylists
$your_ua->proxify_load( freeproxylists => 1 );
Optional. The module uses WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom modules to get the proxy list. If you set freeproxylists
argument to a false value the module will not attempt to load any proxies from http://freeproxylists.com/ website. Defaults to: 1
proxy4free
$your_ua->proxify_load( proxy4free => 0 );
Optional. The module uses WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom modules to get the proxy list. If you set proxy4free
argument to a false value (which is the default) the module will not attempt to load any proxies from http://www.proxy4free.com/ website. Defaults to: 0
plan_b
$your_ua->proxify_load( plan_b => 1 );
Optional. When set to a true value will enable a "Plan B" mechanism. In other words, when plan_b
and freeproxylists
both set to true values and the fetch from http://freeproxylists.com/ did not give us any proxies the module will fetch a list from http://www.proxy4free.com/ website irrelevant of whether or not proxy4free
is set to a true value. In other words, this is sort of a fallback thing in case http://freeproxylists.com is down when proxy4free
is set to a false value to speedup proxy list loading process. Defaults to: 1
(enabled)
timeout
$your_ua->proxify_load( timeout => 20 );
Optional. Takes a positive integer value which will be passed to WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom constructors as a timeout
argument. In other words, this specifies the timeout for proxy list fetching. Defaults to: 20
retries
$your_ua->proxify_load( retries => 5 );
Optional. This argument specifies how many times the module should retry the proxy_*
requests if they doesn't look as successfull ones. Generally, setting the retries
argument to a higher value will yield to more reliable requests but will also slow down the request process. See HOW IT WORKS
section about to get the idea when the module will retry the request. Defaults to: 5
.
extra_proxies
$your_ua->proxify_load( extra_proxies => [] );
Optional. Takes an arrayref of proxy addresses in a format acceptable to LWP::UserAgent's proxy()
method. These will be the extra proxies to use which you can provide. Basically you can set freeproxylists
and plan_b
arguments to false values and stuff your own proxies into extra_proxies
arrayref in which case the module will not even attempt to fetch any lists from proxy list sites (i.e. the loading will be way faster). Defaults to: []
(no extra proxies)
schemes
$your_ua->proxify_load( schemes => [ 'http', 'ftp' ] );
$your_ua->proxify_load( schemes => 'ftp' );
Optional. Specifies the first argument to pass to LWP::UserAgent's proxy()
method (i.e. the schemes to proxy for). Note: any other schemes besides 'http'
were not tested and might not even work with the proxy lists the module fetches by default. Defaults to: http
get_list_args
$your_ua->proxify_load(
get_list_args => {
freeproxylists => [ type => 'anonymous' ],
proxy4free => [ [1,2] ],
},
);
Optional. Here you have a chance to specify specific arguments to get_list()
methods of WWW::FreeProxyLists::Com and WWW::Proxy4FreeCom modules used under the hood. The get_list_args
takes a hashref with two keys as a value. The keys must be freeproxylists
and proxy4free
values of which must be arrayrefs with arguments to give to get_list()
methods of respecive modules.
debug
$your_ua->proxify_load( debug => 0 );
Optional. When set to a true value will make the module carp()
out some debugging info (including the time when proccessing of any proxify_*
request methods). Defaults to: 0
proxify_get
my $response = $your_ua->proxify_get('http://something.com/');
Must be called after a successfull call to proxify_load()
method. The method is the same as LWP::UserAgent
's get()
method except proxify_get()
will switch proxies before attempting the request.
proxify_post
my $response = $your_ua->proxify_post('http://something.com/');
Must be called after a successfull call to proxify_load()
method. The method is the same as LWP::UserAgent
's post()
method except proxify_post()
will switch proxies before attempting the request. Note: during my tests a lot (almost all) proxies from http://www.freeproxylist.com/ did not permit POST requests. You might have better luck with setting proxy4free to a true value disabling freeproxylists argument and setting higher retries
argumnet (see proxify_load()
method above),
proxify_request
my $response = $your_ua->proxify_request( $req_obj );
Must be called after a successfull call to proxify_load()
method. The method is the same as LWP::UserAgent
's request()
method except proxify_request()
will switch proxies before attempting the request.
proxify_head
my $response = $your_ua->proxify_head('http://something.com/');
Must be called after a successfull call to proxify_load()
method. The method is the same as LWP::UserAgent
's head()
method except proxify_head()
will switch proxies before attempting the request.
proxify_mirror
my $response = $your_ua->proxify_mirror(
'http://something.com/file.tar.gz',
'here.tar.gz',
);
Must be called after a successfull call to proxify_load()
method. The method is the same as LWP::UserAgent
's mirror()
method except proxify_mirror()
will switch proxies before attempting the request. Note: use this method with caution as some proxies return an HTML document insted of actual content you requested.
proxify_simple_request
my $response = $your_ua->proxify_simple_request('http://something.com/');
Must be called after a successfull call to proxify_load()
method. The method is the same as LWP::UserAgent
's simple_request()
method except proxify_simple_request()
will switch proxies before attempting the request.
proxify_list
my $proxies_list_ref = $your_ua->proxify_list;
Must be called after a successfull call to proxify_load()
method. Takes no arguments, returns an arrayref of proxies used internally for requests. This list will shrink as more requests are made (until it's depleted and reloaded see HOW IT WORKS
section). Note: you can shift
, push
, etc. on this arrayref to dinamically set what proxies will be used. The proxy to be used on the next proxify_*
request is the first element of this arrayref.
proxify_working_list
my $proxies_working_list_ref = $your_ua->proxify_working_list;
Must be called after a successfull call to proxify_load()
method. Takes no arguments, returns an arrayref of proxies listed as "working". See HOW IT WORKS
section above for details. Note: you can shift
, push
, etc. on this arrayref to dinamically change it.
proxify_bad_list
my $proxies_bad_list_ref = $your_ua->proxify_bad_list;
Must be called after a successfull call to proxify_load()
method. Takes no arguments, returns an arrayref of proxies listed as "bad". See HOW IT WORKS
section above for details. Note: you can shift
, push
, etc. on this arrayref to dinamically change it.
proxify_real_bad_list
my $proxies_real_bad_list_ref = $your_ua->proxify_real_bad_list;
Must be called after a successfull call to proxify_load()
method. Takes no arguments, returns an arrayref of proxies listed as "real bad". See HOW IT WORKS
section above for details.
proxify_schemes
my $used_schemes = $your_ua->proxify_schemes;
$your_ua->proxify_schemes( [ 'http', 'ftp' ] );
Returns a currently used value for the proxify_load()
method's schemes
argument. If called with an optional argument will use it as a new value. See proxify_load()
method above for details. Note: the value will be reset on the next proxify_load()
call, which can happen automatically if proxy lists are exhausted. See HOW IT WORKS
section for details.
proxify_retries
my $used_retries = $your_ua->proxify_retries;
$your_ua->proxify_retries( 10 );
Returns a currently used value for the proxify_load()
method's retries
argument. If called with an optional argument will use it as a new value. See proxify_load()
method above for details. Note: the value will be reset on the next proxify_load()
call, which can happen automatically if proxy lists are exhausted. See HOW IT WORKS
section for details.
proxify_debug
my $used_debug = $your_ua->proxify_debug;
$your_ua->proxify_debug( 1 );
Returns a currently used value for the proxify_load()
method's debug
argument. If called with an optional argument will use it as a new value. See proxify_load()
method above for details. Note: the value will be reset on the next proxify_load()
call, which can happen automatically if proxy lists are exhausted. See HOW IT WORKS
section for details.
proxify_current
my $current_proxy = $your_ua->proxify_current;
Takes no arguments, returns a last proxy used in proxify_*
request methods. Why is is called "current"? Because it changes several times during the calls to proxify_*
request methods depending on the retries
argument's setting ( in the proxify_load() method ).
AUTHOR
Zoffix Znet, <zoffix at cpan.org>
(http://zoffix.com, http://haslayout.net)
BUGS
Please report any bugs or feature requests to bug-lwp-useragent-proxyhopper at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=LWP-UserAgent-ProxyHopper. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc LWP::UserAgent::ProxyHopper
You can also look for information at:
RT: CPAN's request tracker
http://rt.cpan.org/NoAuth/Bugs.html?Dist=LWP-UserAgent-ProxyHopper
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT & LICENSE
Copyright 2008 Zoffix Znet, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.