NAME
DiaColloDB::Profile::Diff - diachronic collocation db, diff profiles
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::Profile::Diff;
##========================================================================
## Constructors etc.
$prf = $CLASS_OR_OBJECT->new(%args);
$dprf2 = $dprf->clone();
##========================================================================
## Basic Access
($prf1,$prf2) = $dprf->operands();
$bool = $dprf->empty();
##========================================================================
## I/O: JSON
$obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts);
##========================================================================
## I/O: Text
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
$bool = $prf->saveTextFh($fh, %opts);
##========================================================================
## I/O: HTML
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);
##========================================================================
## Compilation
$dprf = $dprf->populate();
$dprf = $dprf->compile($func,%opts);
$dprf = $dprf->uncompile();
$opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias);
$opsub = $CLASS_OR_OBJECT->diffsub($opNameOrAlias);
$how = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias);
$key = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias);
$diff = diffop_diff($ascore,$bscore);
$diff = diffop_sum($ascore,$bscore);
$diff = diffop_min($ascore,$bscore);
$diff = diffop_max($ascore,$bscore);
$diff = diffop_avg($ascore,$bscore);
$diff = diffop_havg($ascore,$bscore);
$diff = diffop_gavg($ascore,$bscore);
$diff = diffop_lavg($ascore,$bscore);
##========================================================================
## Trimming
\@keys = $dprf->which(%opts);
$dprf = $dprf->trim(%opts);
($pa,$pb) = $CLASS_OR_OJBECT->pretrim($pa,$pb,%opts);
##========================================================================
## Stringification
$dprf = $dprf->stringify( $obj);
##========================================================================
## Binary operations
$dprf = $dprf->_add($dprf2,%opts);
DESCRIPTION
DiaColloDB::Profile::Diff is a DiaColloDB::Profile subclass class for representing low-level collocate frequency comparison data for a single date-slice as arising from the comparison of two DiaColloDB::Profile objects.
Globals & Constants
- @ISA
-
DiaColloDB::Profile::Diff inherits from DiaColloDB::Profile.
- %DIFFOPS
-
Canonical diff-operation names keyed by alias.
Constructors etc.
- new
-
$prf = $CLASS_OR_OBJECT->new(%args); $prf = $CLASS_OR_OBJECT->new($prf1,$prf2,%args)
%args, object structure:
##-- DiaColloDB::Profile::Diff prf1 => $prf1, ##-- 1st operand prf2 => $prf2, ##-- 2nd operand diff => $diff, ##-- low-level score-diff binary operation (default='adiff') ##-- DiaColloDB::Profile keys label => $label, ##-- string label (used by Multi; undef for none(default)) #N => $N, ##-- OVERRIDE:unused: total marginal relation frequency #f1 => $f1, ##-- OVERRIDE:unused: total marginal frequency of target word(s) #f2 => \%f2, ##-- OVERRIDE:unused: total marginal frequency of collocates: ($i2=>$f2, ...) #f12 => \%f12, ##-- OVERRIDE:unused: collocation frequencies, %f12 = ($i2=>$f12, ...) ## eps => $eps, ##-- smoothing constant (default=0: no smoothing) score => $func, ##-- selected scoring function ('f12', 'mi', or 'ld') mi => \%mi12, ##-- DIFFERENCE: score: mutual information * logFreq a la Wortprofil; requires compile_mi() ld => \%ld12, ##-- DIFFERENCE: score: log-dice a la Wortprofil; requires compile_ld() fm => \%fm12, ##-- DIFFERENCE: score: frequency per million; requires compile_fm()
The
diff
option selects the function to be used to to compute final scores from operand profiles. The default value is 'adiff'. Currently known values are:adiff # $score=$a-$b # aliases=qw(absolute-difference abs-difference abs-diff adiff adifference a-) ; select=kbesta diff # $score=$a-$b # aliases=qw(difference diff d minus -) sum # $score=$a+$b # aliases=qw(sum add plus +) min # $score=min($a,$b) # aliases=qw(minimum min <) max # $score=max($a,$b) # aliases=qw(maximum max >) avg # $score=avg($a,$b) # aliases=qw(average avg mean) havg # $score~=harmonic_avg($a,$b) # aliases=qw(harmonic-average harmonic-mean havg hmean ha h) gavg # $score~=geometric_avg($a,$b) # aliases=qw(geometric-average geometric-mean gavg gmean ga g) lavg # $score~=log_avg($a,$b) # aliases=qw(logarithmic-average logarithmic-mean log-average log-mean lavg lmean la l)
To avoid singularities resulting from sparse data, the
havg
andgavg
operations actually compute the arithmetic average of the harmonic (rsp. geometric) mean of and the raw arithmetic mean; e.g.score_havg($a,$b) = (($a<0 || $b<0 ? 0 : (2*$a*$b)/($a+$b) ##-- harmonic mean + ($a+$b)/2 ##-- arithmetic mean )/2 ##-- average of harmonic- and arithmetic-means
The default
diff
operation isadiff
, which selects those items with the greatest absolute differences among the (pre-trimmed) k-best items in its operand profiles. Thesum
andavg
operations return equivalent rankings, but may assign undesirably high score values for non-uniform operand values (e.g.avg(0,8)=avg(4,4)=4
, but only the latter configuration indicates similar collocation behavior in the operand profiles). Thehavg
,gavg
, andlavg
operations attempt to address this shortcoming by penalizing non-uniform score-pairs, and tend to return similar rankings in the range [$a:$b]. - clone
-
$dprf2 = $dprf->clone(); $dprf2 = $dprf->clone($keep_compiled);
clones %$dprf; if $keep_score is true, compiled data is cloned too.
Basic Access
- operands
-
($prf1,$prf2) = $dprf->operands();
get operand profiles.
- empty
-
$bool = $dprf->empty();
returns true iff both operands are empty
I/O: JSON
- loadJsonData
-
$obj = $CLASS_OR_OBJECT->loadJsonData( $data,%opts);
guts for loadJsonString(), loadJsonFile()
I/O: Text
See also DiaColloDB::Persistent.
- saveTextHeader
-
undef = $CLASS_OR_OBJECT->saveTextHeader($fh, hlabel=>$hlabel, titles=>\@titles);
print column title header for text output.
- saveTextFh
-
$bool = $prf->saveTextFh($fh, %opts);
save flat TAB-separated text, format:
Na Nb F1a F1b F2a F2b F12a F12b SCOREa SCOREb SCOREdiff LABEL ITEM2...
%opts:
label => $label, ##-- override $prf->{label} (used by Profile::Multi), no tab-separators required format => $fmt, ##-- printf score formatting (default="%.4f") header => $bool, ##-- include header-row? (default=1) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::MultiDiff)
I/O: HTML
- saveHtmlFile
-
$bool = $prf->saveHtmlFile($filename_or_handle, %opts);
Save flat HTML table data with rows of the form
SCOREa SCOREb DIFF PREFIX? ITEM2...
If
verbose
option is specified and true, saved table has the formNa Nb F1a F1b F2a F2b F12a F12b SCOREa SCOREb DIFF PREFIX? ITEM2...
Options %opts:
table => $bool, ##-- include <table>..</table> ? (default=1) body => $bool, ##-- include <html><body>..</html></body> ? (default=1) header => $bool, ##-- include header-row? (default=1) verbose => $bool, ##-- include verbose output? (default=0) hlabel => $hlabel, ##-- prefix header item-cells with $hlabel (used by Profile::Multi), no '<th>..</th>' required label => $label, ##-- prefix item-cells with $label (used by Profile::Multi), no '<td>..</td>' required format => $fmt, ##-- printf score formatting (default="%.4f")
Compilation
- populate
-
$dprf = $dprf->populate(); $dprf = $dprf->populate($prf1,$prf2);
populates diff-profile by subtracting $prf2 scores from $prf1.
- compile
-
$dprf = $dprf->compile($func,%opts);
compile for score-function $func, one of qw(f fm mi ld); default='f'.
- uncompile
-
$dprf = $dprf->uncompile();
un-compiles all scores for $dprf
- diffop
-
$opname = $dprf->diffop(); $opname = $CLASS_OR_OBJECT->diffop($opNameOrAlias);
Returns canonical diff operation-name for
$opNameOrAlias
. - diffsub
-
\&FUNC = $dprf->diffsub(); \&FUNC = $CLASS_OR_OBJECT->diffsub($opNameOrAlias);
Returns low-level binary diff operation for diff-operation
$opNameOrAlias
(default=$dprf->{diff}
). - diffpretrim
-
$how = $dprf->diffpretrim() $how = $CLASS_OR_OBJECT->diffpretrim($opNameOrAlias)
Returns whether and how a diff operation
$opNameOrAlias
should pre-trim operand profiles. Returned value is one of:'restrict' # intersect defined collocates (min,avg,havg,gavg) 'kbest' # union of k-best collocates (diff,adiff,max) 0 # don't pre-trim at all (everythiing else)
- diffkbest
-
$selector = $dprf->diffkbest(); $selector = $CLASS_OR_OBJECT->diffkbest($opNameOrAlias);
Returns 'kbest' selector appropriate for which() or trim() methods.
- diffop_diff
- diffop_sum
- diffop_min
- diffop_max
- diffop_avg
- diffop_havg
- diffop_gavg
- diffop_lavg
-
$diff = diffop_diff($ascore,$bscore)
Low-level diff-operation subs.
Trimming
- trim
-
$dprf = $dprf->trim(%opts);
trims profile and operands; %opts:
kbest => $kbest, ##-- retain only $kbest items (by score value) kbesta => $kbesta, ##-- retain only $kbest items (by score absolute value) cutoff => $cutoff, ##-- retain only items with $prf->{$prf->{score}}{$item} >= $cutoff keep => $keep, ##-- retain keys @$keep (ARRAY) or keys(%$keep) (HASH) drop => $drop, ##-- drop keys @$drop (ARRAY) or keys(%$drop) (HASH)
- pretrim
-
($pa,$pb) = $CLASS_OR_OBJECT->pretrim($pa,$pb,%opts);
Perform pre-trimming on aligned profile pair ($pa,$pb) in the manner indicated by $CLASS_OR_OBJECT->diffpretrim($opts{diff}).
Stringification
- stringify
-
$dprf = $dprf->stringify( $obj); $dprf = $dprf->stringify(\@key2str) $dprf = $dprf->stringify(\&key2str) $dprf = $dprf->stringify(\%key2str)
stringifies profile and operands (destructive) via $obj->i2s($key2), $key2str->($i2) or $key2str->{$i2}.
Binary operations
- _add
-
$dprf = $dprf->_add($dprf2,%opts);
adds $dprf2 operatnd frequency data to $dprf operands (destructive); implicitly un-compiles $dprf. %opts:
N => $bool, ##-- whether to add N values (default:true) f1 => $bool, ##-- whether to add f1 values (default:true)
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::Profile::MultiDiff(3pm), DiaColloDB::Profile(3pm), DiaColloDB(3pm), perl(1), ...