NAME

DiaColloDB::Relation::Cofreqs - diachronic collocation db, profiling relation: native fixed-window co-frequency index

ALIASES

DiaColloDB::Relation::Cofreqs
DiaColloDB::Cofreqs

SYNOPSIS

##========================================================================
## PRELIMINARIES

use DiaColloDB::Relation::Cofreqs;

##========================================================================
## Constructors etc.

$cof = $CLASS_OR_OBJECT->new(%args);

##========================================================================
## I/O: open/close

$cof_or_undef = $cof->open($base,$flags);
$cof_or_undef = $cof->close();
$bool = $cof->opened();

##========================================================================
## I/O: header

@keys = $cof->headerKeys();
$bool = $cof->loadHeaderData($hdr);

##========================================================================
## I/O: text

$cof  = $cof->loadTextFh($fh,%opts)
$bool = $cof->saveTextFh($fh,%opts);

##========================================================================
## Relation API: creation

$cof = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
$cof = CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);

##========================================================================
## Relation API: default

\%slice2prf = $cof->subprofile1(\@tids, \%opts);
\%slice2prf = $cof->subprofile2(\%slice2prf, \%opts);
\%slice2prf = $cof->subextend(\%slice2prf, \%opts);

\%qinfo = $rel->qinfo($coldb, %opts);

DESCRIPTION

DiaColloDB::Relation::Cofreqs is a DiaColloDB::Relation subclass for native indices over collocation frequencies within a fixed-length window of context words using a pair of DiaColloDB::PackedFile objects for low-level index data.

Only simple queries expressed as a disjunction of single-term conditions (i.e. those queries which evaluate to a set of term-tuples) are supported. Likewise, only groupby conditions over literal indexed term-attributes are supported.

Globals & Constants

Variable: @ISA

DiaColloDB::Relation::Cofreqs inherits from DiaColloDB::Relation.

Variable: $WANT_XS

Attempt to use optimized DiaColloDB::XS::CofUtils subroutines? Default: undef: use XS if available.

Constructors etc.

new
$cof = CLASS_OR_OBJECT->new(%args);

%args, object structure:

##-- user options
class    => $class,      ##-- optional, useful for debugging from header file
base     => $basename,   ##-- file basename (default=undef:none); use files "${base}.dba1", "${base}.dba2", "${base}.dba3", "${base}.hdr"
flags    => $flags,      ##-- fcntl flags or open-mode (default='r')
perms    => $perms,      ##-- creation permissions (default=(0666 &~umask))
dmax     => $dmax,       ##-- maximum distance for co-occurrences (default=5)
fmin     => $fmin,       ##-- minimum pair frequency (default=0)
pack_i   => $pack_i,     ##-- pack-template for IDs (default='N')
pack_f   => $pack_f,     ##-- pack-template for frequencies (default='N')
pack_d   => $pack_d,     ##-- pack-tempalte for dates (default='n')
keeptmp  => $bool,       ##-- keep temporary files? (default=false)
logCompat => $level,     ##-- log-level for compatibility warnings (default='warn')
logXS     => $level,     ##-- log-level for XS/PP dispatch (default='trace')
##
##-- size info (after open() or load())
size1    => $size1,      ##-- == $r1->size()
size2    => $size2,      ##-- == $r2->size()
size3    => $size2,      ##-- == $r3->size()
##
##-- low-level data
r1 => $r1,               ##-- pf: [$end2]            @ $i1				: constant (logical index)
r2 => $r2,               ##-- pf: [$end3,$d1,$f1]*   @ end2($i1-1)..(end2($i1+1)-1)	: sorted by $d1 for each $i1
r3 => $r3,               ##-- pf: [$i2,$f12]*        @ end3($d1-1)..(end3($d1+1)-1)	: sorted by $i2 for each ($i1,$d1)
N  => $N,                ##-- sum($f1)
version => $version,     ##-- file version, for compatibility checks
DESTROY

Destructor implicitly calls close().

I/O: open/close

open
$cof_or_undef = $cof->open($base,$flags);
$cof_or_undef = $cof->open($base)
$cof_or_undef = $cof->open()

Opens underlying index files.

close
$cof_or_undef = $cof->close();

Closes underlying index files. Implicitly calls flush() if index is opened for writing.

opened
$bool = $cof->opened();

Returns true iff index is opened.

I/O: header

See also DiaColloDB::Persistent.

headerKeys
@keys = $cof->headerKeys();

keys to save as header

loadHeaderData
$bool = $cof->loadHeaderData($hdr);

instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.

I/O: text

loadTextFh
$cof = $cof->loadTextFh($fh,%opts)

loads from text file as saved by saveTextFh(); lines of the form:

N                       ##-- 1 field : N
FREQ ID1 DATE           ##-- 3 fields: un-collocated portion of f(ID1,DATE)
FREQ ID1 DATE ID2       ##-- 4 fields: co-frequency pair (ID2 >= 0)
FREQ ID1 DATE ID2 DATE2 ##-- 5 fields: redundant date (used by create(); DATE2 is ignored)
  • supports semi-sorted input: input fh must be sorted numerically by ($i1,$d1), and all $i2 for each ($i1,$d1)-pair must be adjacent (i.e. no intervening $j1 != $i1)

  • supports multiple lines for collocation-triples ($i1,$d1,$i2) provided the above conditions hold

  • supports loading of $cof->{N} from single-value lines

  • uses optimized DiaColloDB::XS::CofUtils::loadTextFhXS if available, otherwise pure-perl fallback loadTextFhPP() in this package.

  • %opts: clobber %$cof

loadTextFile_create
$cof = $cof->loadTextFile_create($fh,%opts);

backwards-compatible alias for loadTextFh().

saveTextFh
$bool = $cof->saveTextFh($fh,%opts);

save to text filehandle with lines of the form:

N                       ##-- 1 field : N
FREQ ID1 DATE           ##-- 3 fields: un-collocated portion of f(ID1,DATE)
FREQ ID1 DATE ID2       ##-- 4 fields: co-frequency pair (ID2 >= 0)

%opts:

i2s  => \&CODE,   ##-- code-ref for formatting indices; called as $s=CODE($i)
i2s1 => \&CODE,   ##-- code-ref for formatting item1 indices (overrides 'i2s')
i2s2 => \&CODE,   ##-- code-ref for formatting item2 indices (overrides 'i2s')

Relation API: creation

create
$cof = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);

populates co-frequency index from $tokdat_file, a tt-style text file with lines of the form:

TID DATE	##-- single token
"\n"		##-- blank line ~ EOS (hard co-occurrence boundary)

%opts: clobber %$cof.

union
$cof = CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);

merge multiple co-frequency indices into new object. @pairs is an array of pairs ([$argcof,\@ti2u],...) of co-frequency relations $argcof and tuple-id maps \@ti2u for $argcof. implicitly flushes the new index.

%opts: clobber %$cof

Relation API: default

subprofile1
\%slice2prf = $ug->subprofile1(\@tids,\%opts);

Get slice-wise co-frequency profile(s) for tuple-IDs @tids. $cof must be opened. %opts: as for DiaColloDB::Relation::subprofile1().

subprofile2
\%slice2prf = $rel->subprofile2(\%slice2prf, %opts);

Populate independent collocate frequencies in %slice2prf values. $cof must be opened. %opts: as for DiaColloDB::Relation::subprofile2().

subextend
\%slice2prf = $rel->subextend(\%slice2prf,\%opts);

Populate independent collocate frequencies in %slice2prf values; wraps subprofile2().

qinfo
\%qinfo = $rel->qinfo($coldb, %opts);

get query-info hash for profile administrivia (ddc hit links).

%opts: as for DiaColloDB::Relation::profile(), additionally:

qreqs => \@qreqs,      ##-- as returned by $coldb->parseRequest($opts{query})
gbreq => \%groupby,    ##-- as returned by $coldb->groupby($opts{groupby})

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2020 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::Relation(3pm), DiaColloDB::Relation::Unigrams(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...