NAME
DiaColloDB::Relation::Unigrams - diachronic collocation db, profiling relation: native unigram index
ALIASES
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Relation::Unigrams;
##========================================================================
## Constructors etc.
$ug = $CLASS_OR_OBJECT->new(%args);
##========================================================================
## API: disk usage
@files = $obj->diskFiles();
##========================================================================
## I/O: open/close
$ug_or_undef = $ug->open($base,$flags);
$ug_or_undef = $ug->close();
$bool = $ug->opened();
##========================================================================
## I/O: header
@keys = $ug->headerKeys();
$bool = $ug->loadHeaderData($hdr);
##========================================================================
## I/O: text
$ug = $ug->loadTextFh($fh,%opts)
$ug = $ug->saveTextFh($fh,%opts);
##========================================================================
## Relation API: creation
$ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
$ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
##========================================================================
## Relation API: default
\%slice2prf = $rel->subprofile1(\@tids,\%opts);
\%qinfo = $rel->qinfo($coldb, %opts);
DESCRIPTION
DiaColloDB::Relation::Unigrams is a DiaColloDB::Relation subclass for native indices over attribute-tuple unigrams using the DiaColloDB::PackedFile API for low-level index data.
Globals & Constants
- Variable: @ISA
-
DiaColloDB::Relation::Unigrams inherits from DiaColloDB::Relation.
Constructors etc.
- new
-
$ug = $CLASS_OR_OBJECT->new(%args);
%args, object structure:
##-- user options base => $basename, ##-- file basename (default=undef:none); use files "${base}.dba1", "${base}.dba2", "${base}.hdr" flags => $flags, ##-- fcntl flags or open-mode (default='r') perms => $perms, ##-- creation permissions (default=(0666 &~umask)) pack_i => $pack_i, ##-- pack-template for IDs (default='N') pack_f => $pack_f, ##-- pack-template for frequencies (default='N') pack_d => $pack_d, ##-- pack-tempalte for dates (default='n') keeptmp => $bool, ##-- keep temporary files? (default=false) logCompat => $level, ##-- log-level for compatibility warnings (default='warn') ## ##-- size info (after open() or load()) size1 => $size1, ##-- == $r1->size() size2 => $size2, ##-- == $r2->size() ## ##-- low-level data r1 => $r1, ##-- pf: [$end2] @ $i1 : constant (logical index) r2 => $r2, ##-- pf: [$d1,$f1]* @ end2($i1-1)..(end2($i1+1)-1) : sorted by $d1 for each $i1 N => $N, ##-- sum($f1) version => $version, ##-- file version, for compatibility checks
- DESTROY
-
destructor implicitly calls close().
API: disk usage
- diskFiles
-
@files = $obj->diskFiles();
returns disk storage files, used by du() and timestamp()
I/O: open/close
- open
-
$ug_or_undef = $ug->open($base,$flags); $ug_or_undef = $ug->open($base); $ug_or_undef = $ug->open();
Opens underlying index files.
- close
-
$ug_or_undef = $ug->close();
Closes underlying index files. Implicitly calls flush() if index is opened for writing.
- opened
-
$bool = $ug->opened();
Returns true iff index is opened.
I/O: header
- headerKeys
-
@keys = $ug->headerKeys();
keys to save as header
- loadHeaderData
-
$bool = $ug->loadHeaderData($hdr);
instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.
I/O: text
- loadTextFh
-
$ug = $ug->loadTextFh($fh,%opts);
loads from text file as saved by saveTextFh().
input fh must be sorted numerically by
($i1,$d1)
.supports multiple lines for pairs
($i1,$d1)
provided the above condition(s) hold.supports loading of
$ug->{N}
from single-component lines.%opts: clobber %$ug
- saveTextFh
-
$bool = $ug->saveTextFh($fh,%opts);
save as text with lines of the form:
N ##-- 1 field : N FREQ ID1 DATE ##-- 3 fields: unigram frequency for (ID1,DATE)
%opts:
i2s => \&CODE, ##-- code-ref for formatting indices; called as $s=CODE($i)
Relation API: creation
- create
-
$ug = $CLASS_OR_OBJECT->create($coldb,$tokdat_file,%opts);
populates unigram database from $tokdat_file, a tt-style text file with lines of the form:
TID DATE ##-- single token "\n" ##-- blank line ~ EOS (hard co-occurrence boundary)
%opts: clobber %$ug
- union
-
$ug = $CLASS_OR_OBJECT->union($coldb, \@pairs, %opts);
merge multiple unigram indices into new object.
@pairs
is an array of pairs([$argug,\@ti2u],...)
of unigram relations$argug
and tuple-id maps\@ti2u
for$argug
. implicitly flushes the new index.%opts: clobber %$ug
Relation API: default
- subprofile1
-
\%slice2prf = $ug->subprofile1(\@tids,\%opts);
Get slice-wise unigram profile(s) for tuple-IDs
@tids
.$ug
must be opened. %opts: as for DiaColloDB::Relation::subprofile1(). - subextend
-
\%slice2prf = $rel->subextend(\%slice2prf,\%opts);
Populate independent collocate frequencies in
%slice2prf
values. Override just returns a new empty DiaColloDB::Profile::Multi object. - qinfo
-
\%qinfo = $rel->qinfo($coldb, %opts);
get query-info hash for profile administrivia (ddc hit links) %opts: as for profile(), additionally:
qreqs => \@qreqs, ##-- as returned by $coldb->parseRequest($opts{query}) gbreq => \%groupby, ##-- as returned by $coldb->groupby($opts{groupby})
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::Relation(3pm), DiaColloDB::Relation::Cofreqs(3pm), DiaColloDB::Relation::TDF(3pm), DiaColloDB::Relation::DDC(3pm), DiaColloDB(3pm), perl(1), ...