The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

  CWB::Web::Query - A simple CQP front-end for CGI scripts

SYNOPSIS

  use CWB::Web::Query;

  # typically, a query object is used for a single query only
  $query = new CWB::Web::Query 'HANSARD-E';

  # install HTML-producing error handler
  $query->on_error(sub{grep {print "<h2>$_</h2>\n"} @_});

  # result output settings
  $query->context('1 s', '1 s');    # left & right context
  $query->attributes('word', 'pos', 's'); # show which attributes
  $query->alignments('hansard-f');  # aligned corpora to show
  $query->structures('sitting');    # return annotated values of regions
  $query->reduce(10);               # return at most 10 matches

  # run query - returns list of result structs
  @matches = $query->query("[pos='JJ'] [pos='NN' & lemma='dog']");
  $nr_matches = @matches;

  # typical result processing loop
  for ($i = 0; $i < $nr_matches; $i++) {
    $nr = $i + 1;               # match number
    $m = $matches[$i];          # result struct
    $m->{'cpos'};               # corpus position of match
    $m->{'kwic'}->{'left'};     # left context (HTML encoded)
    $m->{'kwic'}->{'match'};    # match        ( ~      ~   )
    $m->{'kwic'}->{'right'};    # right context( ~      ~   )
    $m->{'hansard-f'};          # aligned region
    $m->{'data'}->{'sitting'};  # annotation of structural region
  }

  # closes down CQP server & deallocates memory
  undef $query;

DESCRIPTION

The CWB::Web::Query module is a simplified CQP front-end intended for use in CGI scripts. Typically, a CGI script will create a CWB::Web::Query object for a single query. It is possible to reuse query objects for further queries on the same corpus, though.

ERRORS

If the CWB::Web::Query module encounters an error condition, an error message is printed on STDERR and the program is terminated. A user-defined error handler can be installed with the on_error() method. In this case, the error callback function is passed the error message generated by the module as a list of strings.

CORPUS REGISTRY

If you need to use a registry other than the default corpus registry, set the variable

  $CWB::Web::Query::Registry = "/path/to/my/registry";

This will affect all new CWB::Web::Query objects.

RESULT STRUCTURE

The query module's query() method returns a list of result structs corresponding to the matches of the query. A CGI script will usually iterate through the list with a loop similar to this:

    @result_list = $query->query(...);
    foreach $m (@result_list) {
      # code for processing match data in result struct $m 
    }

A result struct $m has the following fields:

$m->{'cpos'}

Corpus position of the first token in this match.

$m->{'kwic'}

Left context, match, and right context are returned in the subfields

   $m->{'kwic'}->{'left'}
   $m->{'kwic'}->{'match'}
   $m->{'kwic'}->{'right'}

as HTML-encoded text. Neither the match nor keyword or target fields specified in the query are highlighted.

$m->{$aligned_corpus}

For each aligned corpus $aligned_corpus passed to the alignments() method, the field $m-{$aligend_corpus}> contains the region aligned to the match as HTML-encoded text.

$m->{'data'}

The annotated values of structural attributes specified with the structures() method are returned in accordingly named subfields of the 'data' field. The returned values are not HTML-encoded.

METHODS

$query = new CWB::Web::Query $corpus;

Create CWB::Web::Query object for CQP queries on corpus $corpus.

$query->on_error(\&error_handler);

Install error callback function. error_handler() is a user-defined subroutine, which will usually generate an HTML document from the error message passed by the CWB::Web::Query module. A typical error callback might look like this:

    sub error_handler {
      my @msg = @_;  # @msg holds the lines of the error message
      print "<html><body><h1>ERROR</h1>\n";
      grep { print "$_<br>\n" } @msg;  # print @msg as individual lines
      print "</body></html>\n";
    } 
$query->context($left, $right);

Left and right context returned by the query() method. $left and $right are passed to CQP for processing and hence must be specified in CQP format. Typical values are

    $query->context("10 words", "10 words");

for fixed number of tokens and

    $query->context("1 s", "1 s");

to retrieve entire sentences.

$query->attributes($att1, $att2, ...);

Select attributes to display. Can include both positional and structural attributes.

$query->alignments($corpus, ...);

Specifiy one or more aligned corpora. Aligned regions in those corpora will be returned as HTML-encoded strings in the fields

     $m->{$corpus};
     ...

of a result struct $m.

$query->structures($att1, $att2, ...);

Specify structural attributes with annotated values. The annotated value of the $att1 region containing the match will be returned in

    $m->{'data'}->{$att1}

as plain text for further processing etc.

$query->reduce($n);

Return at most $n matches randomly selected from corpus (hence repeated execution of the same query will produce different results). Deactivate with

    $query->reduce(0);

This method uses CQP's reduce command.

$query->cut($n);

Similar to the reduce() method, this returns the first $n matches found in the corpus. The cut() method uses CQP's cut operator and is faster on slow machines. However, reduce() will usually yield more balanced results. Sometimes a combination of both can be useful, such as

    $query->cut(1000);     # stop after first 1000 matches,
    $query->reduce(50);    # but return only 50 of them 
@results = $query->query($cqp_query);

Executes CQP query and returns a list of matches. See "RESULT STRUCTURE" for the format of the @results list.

COPYRIGHT

Copyright (C) 1999-2022 Stephanie Evert [http::/purl.org/stephanie.evert]

This software is provided AS IS and the author makes no warranty as to its use and performance. You may use the software, redistribute and modify it under the same terms as Perl itself.