SYNOPSIS

use PMLTQ::CGI;
PMLTQ::CGI::Configure({
 # options
});

my $cgi = CGI->new();
...
if ($request eq 'query') {
 resp_query($cgi);
} elsif ($request eq 'svg') {
 resp_svg($cgi);
} elsif ...

DESCRIPTION

This module is intended to be used in a FastCGI or Net::HTTPServer environment (see pmltq_http). It implements a REST web service and a web application to the PML-TQ engine driven by an SQL database (PMLTQ::SQLEvaluator).

WEB SERVICE

Individual types of request are implemented by the resp_* family of functions, which all assume a CGI-like object as their first and only argument.

The web service uses URLs of the form

http(s)://<host>:<port>/<method_prefix><method_name>?<arguments>

or

http(s)://<host>:<port>/<method_prefix><method_name>/<resource-path>?<arguments>

where method_prefix is an optional path prefix, typically empty (see method-prefix configuration option).

It is up to the HTTP server to do both user authentication and authorization to the individual web service methods.

WEB APPLICATION

Individual types of request are implemented by a wrapper app() function, whose first argument is a reference to a corresponding resp_* function (see "WEB SERVICE") and the second argument is a CGI-like object.

The web service uses URLs of the form

http(s)://<host>:<port>/<app_prefix>/<method_prefix><method_name>?<arguments>

or

http(s)://<host>:<port>/<app_prefix>/<method_prefix><method_name>/<resource-path>?<arguments>

where <app_prefix> is 'app' by default.

AUTHENTICATION AND AUTHORIZATION

The authorization to the web application depends on the HTTP server to do both autentication and authorization for all the web service requests and also the <app_prefix>/<method_prefix>login web application request. It is not required to do authorization for other <app_prefix>/<method_prefix>* requests.

The autentication and authorization data are stored in the <auth-file> configuration file, which contains user names, unencrypted passwords (optional), and server-ID based access lists for each user.

The HTTP server may use the auth() method provided by this module in order to obtain a password stored in the <auth-file> (this is what pmltq_http does). Alternatively, the passwords can be stored in the server's configuration, e,g. the .htaccess file, and the <auth-file> can be used just for authorization.

Each web application method (<app_prefix>/<method_prefix>*) first checks the user and session ID arguments (u and s) for validity and consults <auth-file> in order to determine if the user is authorized for the running instance. If the session is valid and the user authorized, the request is performed. Otherwise the client is redirected to the <app_prefix>/<method_prefix>login request.

The HTTP server should be configured so as to require HTTP password authentication for the <app_prefix>/<method_prefix>login request. If the HTTP server authorizes the client for the <app_prefix>/<method_prefix>login request, a new session is created for the user and the client is redirected to the web application start page (<app_prefix>/<method_prefix>form).

Updates to the <auth-file> apply immediately without needing to restart the service.

Each line in the <auth-file> may have one the following forms (empty and invalid lines are ignored):

# <comment> <username> : : <authorization> <username>: <password> <username>: <password> : <authorization>

where <authorization> is a comma-separated list of server IDs (see the server configuration option). If the list is preceded by the minus (-) sign, the user is authorized this service unless the server ID is present in the list. If this list is preceded by the plus (+) sign or no sign at all, the user is authorized to connect to this service, if and only if the server ID is present in the list. If the list <authorization> list is not present, the user is authorized to connect to any service.

The information about other services is also used when responding to the method "other"" in ", which returns basic information about other running instances (sharing the same <pid-dir> and <auth-file>, but typically running on different ports or using different prefixes) and whether the current user is authorized to use them or not.

INITIALIZATION

The module is initialized using a call to the Configure() function:

PMLTQ::CGI::Configure({...options...});

In a forking FastCGI or Net::HTTPServer implementation, this configuration is typically called just once prior to forking, so as only one PID file is created for this service (even if the service is handled by several forked instances).

The configuration options are:

static-dir => $dirname

Directory from which static content is to be served.

config-file => $filename

PML-TQ configuration file (in the PML format described by the pmltq_cgi_conf_schema.xml schema.

server => $conf_id

ID of the server configuration in the configuration file (see above).

pid-dir => $dirname

Directory where to store a PID file containing basic information about this running instance (to be used by other instances in order to provide a list of available services).

This directory is also used to create user session files which may be reused by other running services as well to provide a single-login access to a family of related PML-TQ services.

port => $port_number

Port number of this instance. This information is stored into a PID file and can be used by other running instances in order to determine the correct URL for the service provided by this instance.

query-log-dir => $dirname

Directory where individual user's queries are logged. The content of this directory is also used to retrieve previous user's queries.

auth-file => $filename

Path to a file containing user access configuration (note that cooperation with the HTTP server is required), see "AUTHENTICATION AND AUTHORIZATION".

tmp-dir => $dirname

A directory to use for temporary files.

google-translate => $bool

Add Google Translator service to the Toolbar of the sentence displayed with the result tree.

ms-translate => $api_key

Add Microsoft Bing Translator service to the Toolbar of the sentence displayed with the result tree. The argument must be a valid API key issued from Microsoft for the host that runs this HTTP service.

method-prefix => $path_prefix

Optional path to be used as a prefix to all method parts in the URLs. It is not recommended to use this parameter. If you must, make sure you add a trailing /. If set to foo/, the path part of the URL for the web service method 'query' (for example), will have the form of 'foo/query'. The corresponding web application path will be 'app/foo/query'.

debug => $bool

If true, the service logs some extra debugging information into the error log (STDERR).

FUNCTIONS

auth($unused,$user)

This helper function is designed for use with the RegisterAuth method of Net::HTTPServer. It retrieves password for a given user from the <auth-file> and returns ("401","") if user not found or not authorized to access this service instance (server ID), and ("200",$unencrypted_password) otherwise.

app($resp_sub, $cgi)

This function is intended as a wrapper for the requests handlers when called from the "WEB APPLICATION". It calls $resp_sub if valid authorized username and session-id were passed in the s and u parameters of the request, otherwise redirects the client to the URL of the login request.

Requests handled by this function accept the following additional parameters:

s - sessionID
u - username
resp_login($cgi)

This method implements response to the <app_prefix>/<method_prefix>login request. The request is assumed to be be protected by a HTTP authorization and should only be used in connection with the WEB APPLICATION.

It checks that a valid session file exists for the user exists in the pid_dir and creates a new one (pruning all invalid or expired session files for the user). Then it redirects the user to the "form"" in " method (providing a user name and session-id in the u and s arguments).

Note: this function does not implement authorization or authentication. It just creates a session for any user to which the HTTP server granted access to the login request; the HTTP server is responsible for granting access to authenticated users only and session validity checking mechanisms used by the app() function implementing the WEB APPLICATION are responsible for particular instance authorization based on the <auth-file> data.

resp_root($cgi)

This function is used to implement a request to the base URL (/). It redirects to <app-prefix>/form if a valid username and session-id is passed in the s and u URL parameters, otherwise redirects to <app-prefix>/login.

resp_<method>($cgi)

This family of functions implements individual types of WEB SERVICE requests described below. For the WEB APPLICATION, they should be called through the app() function documented above.

WEB APPLICATION API

The web application API is the same as that for the web service, described below, except that

s - sessionID
u - username

WEB SERVICE API

All methods of the web service accept both GET and POST requests; in the latter case, parameters can be passed both as URL parameters or as data. In both cases, the parameters must be encoded using th application/x-www-form-urlencoded format.

NOTE: we write method names as /METHOD, omitting any <method_prefix> specified in the configuration and adding a leading slash (to indicate that we are describing the REST web service API rather than Perl API). However, if a request method A returns (possibly embedded in some HTML code) an URL to a method B on this instance, the returned URL has the form of a relative ( B ) rather than absolute URL ( /B ), so if the original method was invoked e.g. as http://foo.bar:8082/app/A, the browser will reslove the returned URL to http://foo.bar:8082/app/B.

/about

Parameters:

      format - html|json|text
extended - 0|1

Returns information about this instance:

id       - ID
service  - base URL (hostname)
title    - full name
abstract - short description
moreinfo - a web URL with more information about the treebank database
featured - popularity index
/other

Parameters:

format - html|json|text

Returns information about other known PML-TQ services (sharing the same <pid-dir> and <auth-file>, but typically running on different ports or using different app or method prefixes):

id       - ID
service  - base URL (hostname)
port     - port
title    - full name
abstract - short description
moreinfo - a web URL with more information about the treebank database
access   - true if the user is authorized to use the instance
featured - popularity index
/past_queries

Parameters:

format - html|json|text
cb     - a string (typically JavaScript function name)
first  - first query to return
max    - max number of queries to return

Returns a list of users past queries. If format='json', the result is an array of arrays (pairs), each of which consists of the time they were last run (in seconds since UNIX epoch) and the query. If cb is passed, the JSON array is wrapped into a Javacript function whose name was passed in cb.

If format='text' the queries are returned as plain text, separated only by two empty lines.

The options first and max can be used to obtain only partial lists. For format='html', max defaults to 50.

/form

Parameters: none

Returns HTML with an empty PML-TQ query form, introduction and a few query examples generated for the actual treebank.

/query

Parameters:

format          - html|text
query           - string query in the PML-TQ syntax
limit           - maximum number of results to return for node queries
row_limit       - maximum number of results for filter queries
timeout         - timeout in seconds
query_submit    - name of the submit button (if contains the substring 'w/o',
                  the query is evaluated ignoring output filters, if any)

For queries returning nodes the output contains for each match a tuple of so called node handles of the matching nodes is returned. The tuple is ordered in the depth-first order of nesting of node selectors in the query. The handles can be passed to methods such as /node and /svg.

If format=text, the output consists of zero or more lines, each line consisting of TAB-separated columns. For queries with output filter, the columns are the values computed by the last filter, for queries returning nodes they are the node handles (so each line encodes the tuple of node handles as described above). In this case, the header 'Pmltq-returns-nodes' indicates whether the query returned nodes (value 1) or output filter results (value 0).

If format=html, the output is a web application page showing the query and the results. The web page depends on CSS styleheets and JavaScript code from the /static folder (i.e. it generates /static callbacks to this service). Most of the web-page functionality is implemented in the JavaScript file static/js/results.js. Tree results are encoded as node indexes in a JavaScript variable of the output web page and the browser performs callback /svg requests to this service in order to obtain a SVG rendering of the mathing tree.

[Node handles: For ordinary nodes, the handle has the form X/T or X/T@I where X is an integer (internal database index of the corresponding record), T is the name of the PML type of the node and the optional I value is the PML ID of the matched node (if available). For member objects (matching the member relation) the handle has the form X//T.]

/query_svg

Parameters:

query           - string query in the PML-TQ syntax

Returns an SVG document with the mime-type image/svg+xml rendering a graphical representation of the input PML-TQ query.

/svg

Parameters:

nodes           - a node handle (or a |-separated list of node handles)
tree_no         - tree number

Returns an SVG document with the mime-type image/svg+xml rendering a tree.

If tree_no is less or equal 0 or not specified, the rendered tree is the tree containing the node corresponding to the given node handle.

If tree_no is a positive integer N, the returned SVG is a rendering of Nth tree in the document containing the node corresponding to the given node handle.

Currently, if nodes contains a |-separated list of node handles, only the first handle in the list is used.

/n2q

Parameters:

format          - json|text
ids             - a |-separated list of PML node IDs
cb              - a string (typically JavaScript function name)
vars            - comma separated list of reserved selector names

Locates given nodes by their IDs in the database and suggests a PML-TQ query that cover this set of nodes as one of its matches (the query restricts the nodes based on most of their attributes and their mutual relationships). The returned query is formatted and indented so that there is e.g. at most one attribute test per line, tests for technical attributes (such as ID or order) are commented out, etc. The query also does not use any variable listed in the vars list.

The output for the text format is simply the query. For the json format it is either a JavaScript string literal with the 'text/x-json' mime-type, or, if the cb parameter was set, the output has the 'text/javascript' mime-type and consists of the string literal wrapped into a function call to the function whose name was passed in cb. For example, if the resulting query was 'a-node $a:= []' and 'show' was passed in cb, the the JavaScript code show('a-node $a:= []').

/data/<path>

Parameters: none

Verifies that <path> is a (relative) path of a PML document in the database (or related, e.g. a PML schema) and if so, returns the document indicating 'application/octet-stream' as mime-type.

/static/<path>

Parameters: none

Returns the content of <static-dir>/<path> guessing the mime-type based on the file extension, where <static-dir> is a pre-configured directory for static content.

/node

Parameters:

idx - a node handle
format - html|text

Resolves a given node handle (see "query"" in ") into a relative URL which points to the /data/<path> method and can be used to retrieve the document containing a given node. Usually, a fragment identifier is appended to the URL consisting either of the ID of the node or has the form N.M where N is the tree number and M is the depth-first order of the node in the tree.

/schema

Parameters:

name - name of the annotation layer (root element)

Returns a PML schema for the particular annotation layer. The schema (layer) is identified by the root name.

/type

Parameters:

type    - PML type name
format  - html|text

Returns a PML schema of the annotation layer which declares nodes of a given type.

/nodetypes

Parameters:

format  - html|text
layer   - name of the annotation layer (root element)

Returns a list of node types available the given annotation layer or on all layers if layer is not given. In 'text' format the types are returned one per line.

/relations

Parameters:

format   - html|text
type     - node type
category - implementation|pmlrf|both

Returns a list of specific (i.e. implementation-defined or PMLREF-based or both) PML-TQ relations that can start at a node of the given type (or any node if type not given).

/relation_target_types

Parameters:

format   - html|text
type     - node type
category - implementation|pmlrf|both

Returns target-node types for specific (implementation-defined or PMLREF-based or both) PML-TQ relations that can start at a node of the given type (or any node if type is not given).

The output for format='text' is one or more line, each consisting of a TAB-separated triplet ST, REL, TT where ST is the source node type (same as type if specified), REL is the name of the PML-TQ relation, and TT is a possible PML node type of a target node that can be in the relation R with nodes of type ST.

/version

Parameters:

format           - html|text
client_version   - version string

Checks compatibility of this version to the client version.

For format=text, returns the string COMPATIBLE (in cases that compatible client version string was passed) or INCOMPATIBLE (otherwise) and on the next line the version of the underlying PMLTQ::SQLEvaluator.

For format=html, returns the same information in a small HTML document.