NAME

WWW::Curl::UserAgent - UserAgent based on libcurl

VERSION

version 0.9.8

SYNOPSIS

use HTTP::Request;
use WWW::Curl::UserAgent;

my $ua = WWW::Curl::UserAgent->new(
    timeout         => 10000,
    connect_timeout => 1000,
);

$ua->add_request(
    request    => HTTP::Request->new( GET => 'http://search.cpan.org/' ),
    on_success => sub {
        my ( $request, $response ) = @_;
        if ($response->is_success) {
            print $response->content;
        }
        else {
            die $response->status_line;
        }
    },
    on_failure => sub {
        my ( $request, $error_msg, $error_desc ) = @_;
        die "$error_msg: $error_desc";
    },
);
$ua->perform;

DESCRIPTION

WWW::Curl::UserAgent is a web user agent based on libcurl. It can be used easily with HTTP::Request and HTTP::Response objects and handler callbacks. For an easier interface there is also a method to map a single request to a response.

WWW::Curl is used for the power of libcurl, which e.g. handles connection keep-alive, parallel requests, asynchronous callbacks and much more. This package was written, because WWW::Curl::Simple does not handle keep-alive correctly and also does not consider PUT, HEAD and other request methods like DELETE.

There is a simpler interface too, which just returns a HTTP::Response for a given HTTP::Request, named request(). The normal approach to use this library is to add as many requests with callbacks as your code allows to do and run perform afterwards. Then the callbacks will be executed sequentially when the responses arrive beginning with the first received response. The simple method request() does not support this of course, because there are no callbacks defined.

This library is in production use on https://www.xing.com.

CONSTRUCTOR METHODS

The following constructor methods are available:

$ua = WWW::Curl::UserAgent->new( %options )

This method constructs a new WWW::Curl::UserAgent object and returns it. Key/value pair arguments may be provided to set up the initial state. The default values should be based on the default values of libcurl. The following options correspond to attribute methods described below:

KEY                     DEFAULT
-----------             --------------------
user_agent_string       www.curl.useragent/$VERSION
connect_timeout         300
timeout                 0
parallel_requests       5
keep_alive              1
followlocation          0
max_redirects           -1

ATTRIBUTES

$ua->connect_timeout / $ua->connect_timeout($connect_timeout)

Get/set the timeout in milliseconds waiting for the response to be received. If the response is not received within the timeout the on_failure handler is called.

$ua->timeout / $ua->timeout($timeout)

Get/set the timeout in milliseconds waiting for the response to be received. If the response is not received within the timeout the on_failure handler is called.

$ua->parallel_requests / $ua->parallel_requests($parallel_requests)

Get/set the number of the maximum of requests performed in parallel. libcurl itself may use less requests than this number but not more.

$ua->keep_alive / $ua->keep_alive($boolean)

Get/set if TCP connections should be reused with keep-alive. Therefor the TCP connection is forced to be closed after receiving the response and the corresponding header "Connection: close" is set. If keep-alive is enabled (default) libcurl will handle the connections.

$ua->followlocation / $ua->followlocation($boolean)

Get/set if curl should follow redirects. The headers of the redirect respones are thrown away while redirecting, so that the final response will be passed into the corresponding handler.

$ua->max_redirects / $ua->max_redirects($max_redirects)

Get/set the maximum amount of redirects. -1 (default) means infinite redirects. 0 means no redirects at all. If the maximum redirect is reached the on_failure handler will be called.

$ua->user_agent_string / $ua->user_agent_string($user_agent)

Get/set the user agent submitted in each request.

$ua->request_queue_size

Get the size of the not performed requests.

$ua->request( $request, %args )

Perform immediately a single HTTP::Request. Parameters can be submitted optionally, which will override the user agents settings for this single request. Possible options are:

connect_timeout
timeout
keep_alive
followlocation
max_redirects

Some examples for a request

my $request = HTTP::Request->new( GET => 'http://search.cpan.org/');

$response = $ua->request($request);
$response = $ua->request($request,
    timeout    => 3000,
    keep_alive => 0,
);

If there is an error e.g. like a timeout the corresponding HTTP::Response object will have the statuscode 500, the short error description as message and a longer message description as content. It runs perform() internally, so queued requests will be performed, too.

$ua->add_request(%args)

Adds a request with some callback handler on receiving messages. The on_success callback will be called for every successful read response, even those containing error codes. The on_failure handler will be called when libcurl reports errors, e.g. timeouts or bad curl settings. The parameters request, on_success and on_failure are mandatory. Optional are timeout, connect_timeout, keep_alive, followlocation and max_redirects.

$ua->add_request(
    request    => HTTP::Request->new( GET => 'http://search.cpan.org/'),
    on_success => sub {
        my ( $request, $response, $easy ) = @_;
        print $request->as_string;
        print $response->as_string;
    },
    on_failure => sub {
        my ( $request, $err_msg, $err_desc, $easy ) = @_;
        # error handling
    }
);

The callbacks provide as last parameter a WWW:Curl::Easy object which was used to perform the request. This can be used to obtain some informations like statistical data about the request.

Chaining of add_request calls is a feature of this module. If you add a request within an on_success handler it will be immediately executed when the callback is executed. This can be useful to immediately react on a response:

$ua->add_request(
    request    => HTTP::Request->new( POST => 'http://search.cpan.org/', [], $form ),
    on_failure => sub { die },
    on_success => sub {
        my ( $request, $response ) = @_;

        my $target_url = get_target_from($response);
        $ua->add_request(
            request    => HTTP::Request->new( GET => $target_url ),
            on_failure => sub { die },
            on_success => sub {
                my ( $request, $response ) = @_;
                # actually do sth.
            }
        );
    },
);
$ua->perform; # executes both requests
$ua->add_handler($handler)

To have more control over the handler you can add a WWW::Curl::UserAgent::Handler by yourself. The WWW::Curl::UserAgent::Request inside of the handler needs all parameters provided to libcurl as mandatory to prevent defining duplicates of default values. Within the WWW::Curl::UserAgent::Request is the possiblity to modify the WWW::Curl::Easy object before it gets performed.

my $handler = WWW::Curl::UserAgent::Handler->new(
    on_success => sub {
        my ( $request, $response, $easy ) = @_;
        print $request->as_string;
        print $response->as_string;
    },
    on_failure => sub {
        my ( $request, $err_msg, $err_desc, $easy ) = @_;
        # error handling
    }
    request    => WWW::Curl::UserAgent::Request->new(
        http_request    => HTTP::Request->new( GET => 'http://search.cpan.org/'),
        connect_timeout => $ua->connect_timeout,
        timeout         => $ua->timeout,
        keep_alive      => $ua->keep_alive,
        followlocation  => $ua->followlocation,
        max_redirects   => $ua->max_redirects,
    ),
);

$handler->request->curl_easy->setopt( ... );

$ua->add_handler($handler);
$ua->perform

Perform all queued requests. This method will return after all responses have been received and handler have been processed.

BENCHMARK

A test with the tools/benchmark.pl script against loadbalanced webserver performing a get requests to a simple echo API on an Intel i5 M 520 with Fedora 19 gave the following results:

500 requests (sequentially, 500 iterations):
+-------------------------------+-----------+------+------+------------+------------+
|          User Agent           | Wallclock |  CPU |  CPU |  Requests  | Iterations |
|                               |  seconds  |  usr |  sys | per second | per second |
+-------------------------------+-----------+------+------+------------+------------+
| LWP::UserAgent 6.05           |    21     | 1.10 | 0.20 |    23.8    |    384.6   |
+-------------------------------+-----------+------+------+------------+------------+
| LWP::Parallel::UserAgent 2.61 |    20     | 1.13 | 0.22 |    25.0    |    370.4   |
+-------------------------------+-----------+------+------+------------+------------+
| WWW::Curl::Simple 0.100191    |    95     | 0.66 | 0.27 |     5.3    |    537.6   |
+-------------------------------+-----------+------+------+------------+------------+
| Mojo::UserAgent 4.83          |    10     | 1.19 | 0.08 |    50.0    |    393.7   |
+-------------------------------+-----------+------+------+------------+------------+
| WWW::Curl::UserAgent 0.9.6    |    10     | 0.55 | 0.06 |    50.0    |    819.7   |
+-------------------------------+-----------+------+------+------------+------------+

500 requests (5 in parallel, 100 iterations):
+-------------------------------+-----------+--------+--------+------------+------------+
|          User Agent           | Wallclock |   CPU  |   CPU  |  Requests  | Iterations |
|                               |  seconds  |   usr  |   sys  | per second | per second |
+-------------------------------+-----------+--------+--------+------------+------------+
| LWP::Parallel::UserAgent 2.61 |     10    |   1.26 |   0.26 |     50.0   |     65.8   |
+-------------------------------+-----------+--------+--------+------------+------------+
| WWW::Curl::Simple 0.100191    |    815    | 270.16 | 191.76 |      0.6   |      0.2   |
+-------------------------------+-----------+--------+--------+------------+------------+
| Mojo::UserAgent 4.83          |      3    |   1.03 |   0.04 |    166.7   |     93.5   |
+-------------------------------+-----------+--------+--------+------------+------------+
| WWW::Curl::UserAgent 0.9.6    |      3    |   0.42 |   0.06 |    166.7   |    208.3   |
+-------------------------------+-----------+--------+--------+------------+------------+

SEE ALSO

See HTTP::Request and HTTP::Response for a description of the message objects dispatched and received. See HTTP::Request::Common and HTML::Form for other ways to build request objects.

See WWW::Curl for a description of the settings and options possible on libcurl.

AUTHORS

  • Julian Knocke

  • Othello Maurer

COPYRIGHT AND LICENSE

This software is copyright (c) 2018 by XING AG.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.