NAME

DiaColloDB::Relation - diachronic collocation db, relation API (abstract & utilities)

SYNOPSIS

##========================================================================
## PRELIMINARIES

use DiaColloDB::Relation;

##========================================================================
## Constructors etc.

$rel = $CLASS_OR_OBJECT->new(%args);

##========================================================================
## Relation API: creation

$rel = $CLASS_OR_OBJECT->create($coldb, $tokdat_file, %opts);
$rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);

##========================================================================
## Relation API: profiling

$mprf = $rel->profile($coldb, %opts);
$mprf = $rel->extend($coldb, %opts);
$mpdiff = $rel->compare($coldb, %opts);
$mpdiff = $rel->diff($coldb, %opts);

##========================================================================
## Relation API: default

\%slice2prf = $rel->subprofile1(\@tids, \%opts);
\%slice2prf = $rel->subprofile2(\%slice2prf, %opts);
\%slice2prf = $rel->subextend(\%slice2prf, \%opts);

\%qinfo = $rel->qinfo($coldb, %opts);
(\@q1strs,\@q2strs,\@qxstrs,\@fstrs) = $rel->qinfoData($coldb,%opts);

DESCRIPTION

DiaColloDB::Relation is a base class for low-level indices capable of returning raw frequency data suitable for constructing DiaColloDB::Profile::Multi objects. In addition to the API specification, the DiaColloDB::Relation package also provides several common utility methods used by native DiaColloDB index types.

Globals & Constants

Variable: @ISA

DiaColloDB::Relation inherits from DiaColloDB::Persistent.

Constructors etc.

new
$rel = CLASS_OR_OBJECT->new(%args);

%args, object structure: nothing here, see subclass documentation for details.

Relation API: creation

create
$rel = $CLASS_OR_OBJECT->create($coldb, $tokdat_file, %opts);

populates relation database from $tokdat_file, a tt-style text file with lines of the form:

TID DATE	##-- single token
"\n"		##-- blank line ~ EOS (hard co-occurrence boundary)

%opts: clobber %$rel

union
$rel = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
  • merge multiple co-frequency indices into new object

  • @pairs : array of pairs ([$argrel,\@ti2u],...) of relation-objects $argrel and tuple-id maps \@ti2u for $argrel

  • %opts: clobber %$rel

  • should implicitly flush the new relation index

Relation API: profiling

profile
$mprf = $rel->profile($coldb, %opts);

Get a relation-specific profile for selected items as a DiaColloDB::Profile::Multi object; called by DiaColloDB::profile().

%opts:

##-- selection parameters
query => $query,           ##-- target request ATTR:REQ...
date  => $date1,           ##-- string or array or range "MIN-MAX" (inclusive) : default=all
##
##-- aggregation parameters
slice   => $slice,         ##-- date slice (default=1, 0 for global profile)
groupby => $groupby,       ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method
##
##-- scoring and trimming parameters
eps     => $eps,           ##-- smoothing constant (default=0)
score   => $func,          ##-- scoring function (f|fm|lf|lfm|mi|ld) : default="f"
kbest   => $k,             ##-- return only $k best collocates per date (slice) : default=-1:all
cutoff  => $cutoff,        ##-- minimum score
global  => $bool,          ##-- trim profiles globally (vs. locally for each date-slice?) (default=0)
##
##-- profiling and debugging parameters
strings => $bool,          ##-- do/don't stringify (default=do)
fill    => $bool,          ##-- if true, returned multi-profile will have null profiles inserted for missing slices
onepass => $bool,          ##-- if true, use fast but incorrect 1-pass method (default=0; Cofreqs subclass only)

The default implementation

  • parses the request and extracts target tuple-ids,

  • calls $rel->subprofile1() to compute slice-wise joint frequency profiles (f12),

  • calls $rel->subprofile2() to compute independent collocate frequencies (f2), and finally

  • collects the result in a DiaColloDB::Profile::Multi object.

Default values for %opts should be set by a higher-level call, e.g. DiaColloDB::profile().

extend
$mprf = $rel->extend($coldb, %opts);

Get independent f2 frequencies for $opts{slice2keys} as a DiaColloDB::Profile::Multi object; called by DiaColloDB::extend().

%opts: as for profile(), also:

slice2keys => \%slice2keys, ##-- target f2-items by slice-label (REQUIRED)

Default implementation calls $rel->subextend().

compare
$mpdiff = $rel->compare($coldb, %opts);

Get a relation-specific comparison profile for selected items as a DiaColloDB::Profile::MultiDiff object.

%opts:

##-- selection parameters
(a|b)?query => $query,       ##-- target query as for parseRequest()
(a|b)?date  => $date1,       ##-- string or array or range "MIN-MAX" (inclusive) : default=all
##
##-- aggregation parameters
groupby      => $groupby,    ##-- string or array "ATTR1[:HAVING1] ...": default=$coldb->attrs; see groupby() method
(a|b)?slice  => $slice,      ##-- date slice (default=1, 0 for global profile)
##
##-- scoring and trimming parameters
eps     => $eps,           ##-- smoothing constant (default=0)
score   => $func,          ##-- scoring function (f|fm|lf|lfm|mi|ld) : default="f"
kbest   => $k,             ##-- return only $k best collocates per date (slice) : default=-1:all
cutoff  => $cutoff,        ##-- minimum score
global  => $bool,          ##-- trim profiles globally (vs. locally for each date-slice?) (default=0)
diff    => $diff,          ##-- low-level score-diff operation (diff|adiff|sum|min|max|avg|havg); default='adiff'
##
##-- profiling and debugging parameters
strings => $bool,          ##-- do/don't stringify (default=do)
onepass => $bool,          ##-- if true, use fast but incorrect 1-pass profiling method (default=0)
##
##-- sublcass abstraction parameters
_gbparse => $bool,         ##-- if true (default), 'groupby' clause will be parsed only once, using $coldb->groupby() method
_abkeys  => \@abkeys,      ##-- additional key-suffixes KEY s.t. (KEY=>VAL) gets passed to profile() calls if e.g. (aKEY=>VAL) is in %opts

The default implementation just wraps the profile() method; default values for %opts should be set by higher-level call, e.g. DiaColloDB::compare().

diff
$mpdiff = $rel->diff($coldb, %opts);

alias for compare()

Relation API: default

subprofile1
\%slice2prf = $rel->subprofile1(\@tids,\%opts);

Native index API low-level first-pass profiling function for joint frequency acquisition (f12); default implementation just throws an error.

subprofile2
\%slice2prf = $rel->subprofile2(\%slice2prf, %opts);

Native index API low-level second-pass profiling function for independent frequency acquisition (f2); default implementation just returns \%slice2prf, which is appropriate for relations which use a single-pass strategy to populate $prf->{f2} in their implementation of subprofile1().

subextend
\%slice2prf = $rel->subextend(\%slice2prf,\%opts);

Native index API low-level profile-extension function for slice-wise independent frequency acquisition (f2). Default implementation throws an error.

qinfo
\%qinfo = $rel->qinfo($coldb, %opts);

get query-info hash for profile administrivia (ddc kwic links). %opts: as for profile(), additionally:

qreqs => \@areqs,      ##-- as returned by $coldb->parseRequest($opts{query})
gbreq => \%groupby,    ##-- as returned by $coldb->groupby($opts{groupby})
qinfoData
(\@q1strs,\@q2strs,\@qxstrs,\@fstrs) = $rel->qinfoData($coldb,%opts);

parses @opts{qw(qreqs gbreq)} into conditions on w1, w2 and metadata filters (for ddc linkup). call this from subclass qinfo() methods.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2020 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::Persistent(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::Unigrams(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...