NAME
DiaColloDB::EnumFile - diachronic collocation db, symbol<->integer enum
SYNOPSIS
##========================================================================
## PRELIMINARIES
use DiaColloDB::EnumFile;
##========================================================================
## Constructors etc.
$cldb = CLASS_OR_OBJECT->new(%args);
##========================================================================
## I/O: open/close (file)
$enum_or_undef = $enum->open($base,$flags);
$enum_or_undef = $enum->close();
$bool = $enum->opened();
$bool = $enum->dirty();
$bool = $enum->loaded();
$bool = $enum->rollback();
$bool = $enum->flush();
\@i2s = $enum->toArray();
$enum = $enum->fromArray(\@i2s);
$enum = $enum->fromHash(\%s2i);
$enum = $enum->fromEnum($enum2);
$bool = $enum->load();
$enum = $enum->save();
##========================================================================
## I/O: header
@keys = $coldb->headerKeys();
$bool = $enum->loadHeaderData($hdr);
##========================================================================
## I/O: text
$enum = $CLASS_OR_OBJECT->loadTextFh($fh,%opts);
$bool = $enum->saveTextFh($fh,%opts);
##========================================================================
## Methods: population (in-memory only)
$size = $enum->size();
$newsize = $enum->setsize($newsize);
$newsize = $enum->addSymbols(@symbols);
$newsize = $enum->appendSymbols(@symbols);
$newsize = $enum->addEnum($enum2_or_undef);
##========================================================================
## Methods: lookup
$s_or_undef = $enum->i2s($i);
$i_or_undef = $enum->s2i($s);
\@is = $enum->re2i($regex);
DESCRIPTION
DiaColloDB::EnumFile provides an object-oriented interface to static symbol<->integer mappings using direct file I/O for lookup. See DiaColloDB::EnumFile::MMap for a fast implementation using mmap().
Globals & Constants
- Variable: @ISA
-
DiaColloDB::EnumFile inherits from DiaColloDB::Persistent.
Constructors etc.
- new
-
$enum = CLASS_OR_OBJECT->new(%args);
%args, object structure:
base => $base, ##-- database basename; use files "${base}.es", "${base}.esx", "${base}.eix", "${base}.hdr" perms => $perms, ##-- default: 0666 & ~umask flags => $flags, ##-- default: 'r' pack_i => $pack_i, ##-- integer pack template (default='N') pack_o => $pack_o, ##-- file offset pack template (default='N') pack_l => $pack_l, ##-- string-length pack template (default='n') pack_s => $pack_s, ##-- string pack template (default=undef) for text i/o size => $size, ##-- number of mapped symbols, like scalar(@i2s) utf8 => $bool, ##-- true iff strings are stored as utf8 (default, used by re2i()) ## ##-- in-memory construction and caching s2i => \%s2i, ##-- maps symbols to integers i2s => \@i2s, ##-- maps integers to symbols dirty => $bool, ##-- true if in-memory structures are not in-sync with file data loaded => $bool, ##-- true if file data has been loaded to memory shared => $bool, ##-- true to avoid closing filehandles on close() or DESTROY() (default=false) ## ##-- pack lengths (after open()) len_i => $len_i, ##-- packsize($pack_i) len_o => $len_o, ##-- packsize($pack_o) len_l => $len_l, ##-- packsize($pack_l) len_sx => $len_sx, ##-- $len_o + $len_i ## ##-- filehandles (after open()) sfh => $sfh, ##-- $base.es : pack("(${pack_l}/A)*", @$i2s) ixfh => $ixfh, ##-- $base.eix : [$i] => pack("${pack_o}", $offset_in_sfh_of_string_with_id_i) sxfh => $sxfh, ##-- $base.esx : [$j] => pack("${pack_o}${pack_i}", $offset_in_sfh_of_string_with_sortindex_j_and_id_i, $i)
- DESTROY
-
destructor implicitly calls close().
- promote
-
$enum = $enum->promote($class,$force)
Promotes
$enum
to class$class
. If $force is false (default), promotion viaref($enum)."::MMap"
will be disabled.
I/O: open/close (file)
See also DiaColloDB::Persistent.
- open
-
$enum_or_undef = $enum->open($base,$flags); $enum_or_undef = $enum->open($base); $enum_or_undef = $enum->open();
opens file(s), clears {loaded} flag.
- close
-
$enum_or_undef = $enum->close();
closes the enum, implicitly calling flush() if opened for writing.
- opened
-
$bool = $enum->opened();
returns true iff enum is opened.
- dirty
-
$bool = $enum->dirty();
returns true iff some in-memory structures haven't been flushed to disk.
- loaded
-
$bool = $enum->loaded();
returns true iff in-memory structures have been populated from disk
- rollback
-
$bool = $enum->rollback();
drops in-memory structures
invalidates any old references to {s2i}, {i2s} (but doesn't empty them if you need to keep a reference).
clears {dirty} flag
- flush
-
$bool = $enum->flush(); $bool = $enum->flush($force);
flush in-memory structures to disk
no-op unless $force or $enum->dirty() is true
clobbers any old disk-file contents with in-memory maps
enum must be opened in write-mode
invalidates any old references to {s2i}, {i2s} (but doesn't empty them if you need to keep a reference)
clears {dirty} flag
- toArray
-
\@i2s = $enum->toArray();
return an ARRAY-ref representing the mapping; array items are still byte-encoded.
- fromArray
-
$enum = $enum->fromArray(\@i2s);
clobbers $enum contents, steals \@i2s
- fromHash
-
$enum = $enum->fromHash(\%s2i);
clobbers $enum contents, steals \%s2i
- fromEnum
-
$enum = $enum->fromEnum($enum2);
clobbers $enum contents, does NOT steal $enum2->{i2s}
- load
-
$bool = $enum->load();
loads files to memory; enum must be opened
- save
-
$enum = $enum->save(); $enum = $enum->save($base);
saves enum to $base; really just a wrapper for open() and flush()
See also DiaColloDB::Persistent.
I/O: header
- headerKeys
-
@keys = $coldb->headerKeys();
keys to save as header
- loadHeaderData
-
$bool = $enum->loadHeaderData($hdr);
instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.
I/O: text
- loadTextFh
-
$enum = $CLASS_OR_OBJECT->loadTextFh($filename_or_fh); $enum = $CLASS_OR_OBJECT->loadTextFh($filename_or_fh, %opts);
Loads from text file with lines of the form "ID SYMBOL...". clobbering enum contents. %opts locally clobber %$enum, especially:
pack_s => $pack_s
- saveTextFh
-
$bool = $enum->saveTextFh($fh,%opts);
save from text file with lines of the form "ID SYMBOL..."
%opts locally clobber %$enum, especially:
pack_s => $pack_s
Methods: population (in-memory only)
- size
-
$size = $enum->size();
wraps {size} key
- setsize
-
$newsize = $enum->setsize($newsize);
realy just wraps {size} key
- addSymbols
-
$newsize = $enum->addSymbols(@symbols); $newsize = $enum->addSymbols(\@symbols);
adds all symbols in @symbols which don't already have an ID
enum must be loaded to memory
- appendSymbols
-
$newsize = $enum->appendSymbols(@symbols); $newsize = $enum->appendSymbols(\@symbols);
adds all symbols in @symbols in order, messily re-mapping them if they already have an ID.
- addEnum
-
$newsize = $enum->addEnum($enum2_or_undef);
ensures all symbols from $enum2_or_undef are defined (undef:'')
Methods: lookup
- i2s
-
$s_or_undef = $enum->i2s($i);
Returns symbol for ID $i, or undef if no such symbol exists. In-memory cache overrides file contents.
- s2i
-
$i_or_undef = $enum->s2i($s); $i_or_undef = $enum->s2i($s, $ilo,$ihi);
Returns ID for symbol $s. Binary search between sorted symbol positions $ilo and $ihi (default=full enum). In-memory cache overrides file content.s
- re2i
-
\@is = $enum->re2i($regex);
Gets indices for all strings matching $regex.
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2015-2020 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
DiaColloDB::EnumFile::MMap(3pm), DiaColloDB::EnumFile::FixedLen(3pm), DiaColloDB::EnumFile::FixedMap(3pm), DiaColloDB::EnumFile::Tied(3pm), DiaColloDB::Persistent(3pm), DiaColloDB(3pm), perl(1), ...