NAME

DTA::CAB::Client::HTTP - generic HTTP server client for DTA::CAB

SYNOPSIS

##========================================================================
## PRELIMINARIES

use DTA::CAB::Client::HTTP;

##========================================================================
## Constructors etc.

$obj = CLASS_OR_OBJ->new(%args);

##========================================================================
## Methods: Generic Client API: Connections

$bool = $cli->connected();
$bool = $cli->connect();
$bool = $cli->disconnect();
@analyzers = $cli->analyzers();

##========================================================================
## Methods: Generic Client API: Queries

$data_str = $cli->analyzeData($analyzer, \$data_str, \%opts);
$doc = $cli->analyzeDocument($analyzer, $doc, \%opts);
$sent = $cli->analyzeSentence($analyzer, $sent, \%opts);
$tok = $cli->analyzeToken($analyzer, $tok, \%opts);

$fmt = $cli->getFormat(\%opts);
$response = $cli->analyzeDataRef($analyzer, \$data_str, \%opts);

##========================================================================
## Methods: Low-Level Utilities

$url      = $cli->lwpUrl($url);
$agent    = $cli->ua();
$rclient  = $cli->rclient();
$uriStr   = $cli->urlEncode(\%form);
$response = $cli->urequest($httpRequest);
$response = $cli->uhead($url, Header=>Value, ...);
$response = $cli->uget($url, $headers);
$response = $cli->upost( $url );
$response = $cli->uget_form($url, \%form);
$response = $cli->uxpost($url, \%form,  $content, @headers);

DESCRIPTION

Globals

Variable: @ISA

DTA::CAB::Client::HTTP inherits from DTA::CAB::Client, and optionally uses DTA::CAB::Client::XmlRpc for communication with an XML-RPC server.

Constructors etc.

new
$cli = CLASS_OR_OBJ->new(%args);

%args, %$cli:

(
 ##-- server
 serverURL => $url,             ##-- default: localhost:8000
 encoding => $enc,              ##-- default character set for client-server I/O (default='UTF-8')
 timeout => $timeout,           ##-- timeout in seconds, default: 300 (5 minutes)
 mode => $queryMode,            ##-- query mode: qw(get post xpost xmlrpc); default='xpost' (post with get-like parameters)
 post => $postmode,             ##-- post mode; one of 'urlencoded' (default), 'multipart'
 rpcns => $prefix,              ##-- prefix for XML-RPC analyzer names (default='dta.cab.')
 rpcpath => $path,              ##-- path part of URL for XML-RPC (default='/xmlrpc')
 format => $fmtName,            ##-- DTA::CAB::Format short name for transfer (default='json')
 cacheGet => $bool,             ##-- allow cached response from server? (default=1)
 cacheSet => $bool,             ##-- allow caching of server response? (default=1)
 ##
 ##-- debugging
 tracefh => $fh,                ##-- dump requests to $fh if defined (default=undef)
 testConnect => $bool,          ##-- if true connected() will send a test query (default=true)
 ##
 ##-- underlying LWP::UserAgent
 ua => $ua,                     ##-- underlying LWP::UserAgent object
 uargs => \%args,               ##-- options to LWP::UserAgent->new()
 ##
 ##-- optional underlying DTA::CAB::Client::XmlRpc
 rclient => $xmlrpc_client,     ##-- underlying DTA::CAB::Client::XmlRpc object
)

If $cli->{mode} is "xmlrpc", all methods calls will be dispatched to the underlying DTA::CAB::Client::XmlRpc object $cli->{rclient}. See DTA::CAB::Client::XmlRpc for details. The rest of this manual page documents object behavior in "raw HTTP mode", in which $cli->{mode} is one of:

get

Queries are sent to the server using HTTP GET requests. Best if you are sending many short queries.

post

Queries are sent to the server using HTTP POST requests. Form data is encoded according to $cli->{post}.

xpost

Queries are sent to the server using HTTP POST requests, in which query options are passed directly in the request URL (as for GET requests), and the data to be analyzed is formatted and passed as the literal request content. This is the default query mode.

Methods: Generic Client API: Connections

connected
$bool = $cli->connected();

Returns true if a test query (HEAD) returns a successful response.

connect
$bool = $cli->connect();

Establish connection to server. Generates the underlying connection object ($cli->{ua} or $cli->{rclient}). Really does nothing but create the LWP::UserAgent object in raw HTTP mode.

disconnect
$bool = $cli->disconnect();

Deletes underlying LWP::UserAgent object.

analyzers
@analyzers = $cli->analyzers();

Appends '/list' to $cli->{serverURL} and parses list of raw text lines returned; die()s on error

Methods: Generic Client API: Queries

getFormat
$fmt = $cli->getFormat(\%opts);

Returns a new DTA::CAB::Format object appropriate for a $cli query with %opts.

analyzeDataRef
$response = $cli->analyzeDataRef($analyzer, \$data_str, \%opts);

Low-level wrapper for the various query methods. $analyzer is the name of an analyzer known to the server, \$data_str is a reference to a formatted buffer holding the data to be analyzed, and \%opts represent the query options (see below). Returns a HTTP::Response object representing the server response.

Client-Side Options
contentType => $mimeType,      ##-- Content-Type header to apply for mode='xpost'
qraw        => $bool,          ##-- if true, query is a raw untokenized string (default=false)
headers     => $headers,       ##-- additional HTTP headers (ARRAY or HASH or HTTP::Headers object)
cacheGet    => $bool,          ##-- locally override $cli->{cacheGet} (sets header 'Cache-Control: no-cache')
cacheSet    => $bool,          ##-- locally override $cli->{cacheSet} (sets header 'Cache-Control: no-store')
Server-Side Options
##-- query data, in order of preference
data => $docData,              ##-- document data; set from $data_ref (post, xpost)
q    => $rawQuery,             ##-- query string; set from $data_ref (get)
##
##-- misc
a => $analyzer,                ##-- analyzer name; set from $analyzer
format => $format,             ##-- I/O format
pretty => $level,              ##-- pretty-printing level
raw => $bool,                  ##-- if true, data will be returned as text/plain (default=$h->{returnRaw})

See DTA::CAB::Server::HTTP::Handler::Query for a full list of parameters supported by raw HTTP servers.

analyzeData
$data_str = $cli->analyzeData($analyzer, \$data_str, \%opts);

Wrapper for analyzeDataRef(); die()s on error.

You should pass $opts->{'Content-Type'} as some sensible value for the query data. If you don't, the Content-Type header will be 'application/octet-stream'.

analyzeDocument
$doc = $cli->analyzeDocument($analyzer, $doc, \%opts);

Implements DTA::CAB::Client::analyzeDocument.

analyzeSentence
$sent = $cli->analyzeSentence($analyzer, $sent, \%opts);

Implements DTA::CAB::Client::analyzeSentence.

analyzeToken
$tok = $cli->analyzeToken($analyzer, $tok, \%opts);

Implements DTA::CAB::Client::analyzeToken.

Methods: Low-Level Utilities

lwpUrl
$lwp_url = $cli->lwpUrl();
$lwp_url = $cli->lwpUrl($url);

Returns LWP-style URL $lwp_url for $url, which defaults to $cli->{serverURL}. Supports HTTP over UNIX sockets using various URL conventions:

  • apache mod_proxy style: unix:/path/to/unix/socket|http:///uri/path

  • LWP::Protocol::http::SocketUnixAlt style: http:/path/to/unix/socket//uri/path

  • native "http+unix" scheme: http+unix:/path/to/unix/socket//uri/path.

ua
$agent = $cli->ua();

Gets underlying LWP::UserAgent object, caching if required.

rclient
$rclient = $cli->rclient();

For xmlrpc mode, gets underlying DTA::CAB::Client::XmlRpc object, caching if required.

urlEncode
$uriStr = $cli->urlEncode(\%form);
$uriStr = $cli->urlEncode(\@form);
$uriStr = $cli->urlEncode( $str);

Encodes query form parameters or a raw string for inclusing in a URL.

urequest
$response = $cli->urequest($httpRequest);

Gets response for $httpRequest (a HTTP::Request object) using $cli->ua->request(). Also traces request to $cli->{tracefh} if defined.

$response = $cli->urequest_unix($httpRequest);

Guts for urequest() over UNIX sockets using LWP::Protocol::http::SocketUnixAlt.

uhead
$response = $cli->uhead($url, Header=>Value, ...);

HEAD request.

uget
$response = $cli->uget($url, $headers);

GET request.

upost
$response = $cli->upost( $url );
$response = $cli->upost( $url,  $content, Header =E<gt> Value,... )
$response = $cli->upost( $url, \$content, Header =E<gt> Value,... )
$response = $cli->upost( $url, \%form,    Header =E<gt> Value,... )

POST request. Specify 'Content-Type'=>'form-data' to get "multipart/form-data" forms.

uget_form
$response = $cli->uget_form($url, \%form);
$response = $cli->uget_form($url, \@form, @headers);

GET request for form data.

uxpost
$response = $cli->uxpost($url, \%form,  $content, @headers);
$response = $cli->uxpost($url, \%form, \$content, @headers);

POST request which encodes \%form in the URL (as for GET) and sends $content as the request content.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2010-2019 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

dta-cab-analyze.perl(1), dta-cab-convert.perl(1), dta-cab-http-server.perl(1), dta-cab-http-client.perl(1), dta-cab-xmlrpc-server.perl(1), dta-cab-xmlrpc-client.perl(1), DTA::CAB::Client(3pm), DTA::CAB::Server::HTTP(3pm), DTA::CAB::Server::HTTP::UNIX(3pm), DTA::CAB::Format(3pm), DTA::CAB(3pm), perl(1), ...

1 POD Error

The following errors were encountered while parsing the POD:

Around line 840:

Unknown directive: =utem