NAME

DiaColloDB::EnumFile - diachronic collocation db, symbol<->integer enum

SYNOPSIS

##========================================================================
## PRELIMINARIES

use DiaColloDB::EnumFile;

##========================================================================
## Constructors etc.

$cldb = CLASS_OR_OBJECT->new(%args);

##========================================================================
## I/O: open/close (file)

$enum_or_undef = $enum->open($base,$flags);
$enum_or_undef = $enum->close();
$bool = $enum->opened();
$bool = $enum->dirty();
$bool = $enum->loaded();
$bool = $enum->rollback();
$bool = $enum->flush();
\@i2s = $enum->toArray();
$enum = $enum->fromArray(\@i2s);
$enum = $enum->fromHash(\%s2i);
$enum = $enum->fromEnum($enum2);
$bool = $enum->load();
$enum = $enum->save();

##========================================================================
## I/O: header

@keys = $coldb->headerKeys();
$bool = $enum->loadHeaderData($hdr);

##========================================================================
## I/O: text

$enum = $CLASS_OR_OBJECT->loadTextFh($fh,%opts);
$bool = $enum->saveTextFh($fh,%opts);

##========================================================================
## Methods: population (in-memory only)

$size = $enum->size();
$newsize = $enum->setsize($newsize);
$newsize = $enum->addSymbols(@symbols);
$newsize = $enum->appendSymbols(@symbols);
$newsize = $enum->addEnum($enum2_or_undef);

##========================================================================
## Methods: lookup

$s_or_undef = $enum->i2s($i);
$i_or_undef = $enum->s2i($s);
\@is = $enum->re2i($regex);

DESCRIPTION

DiaColloDB::EnumFile provides an object-oriented interface to static symbol<->integer mappings using direct file I/O for lookup. See DiaColloDB::EnumFile::MMap for a fast implementation using mmap().

Globals & Constants

Variable: @ISA

DiaColloDB::EnumFile inherits from DiaColloDB::Persistent.

Constructors etc.

new
$enum = CLASS_OR_OBJECT->new(%args);

%args, object structure:

base => $base,       ##-- database basename; use files "${base}.es", "${base}.esx", "${base}.eix", "${base}.hdr"
perms => $perms,     ##-- default: 0666 & ~umask
flags => $flags,     ##-- default: 'r'
pack_i => $pack_i,   ##-- integer pack template (default='N')
pack_o => $pack_o,   ##-- file offset pack template (default='N')
pack_l => $pack_l,   ##-- string-length pack template (default='n')
pack_s => $pack_s,   ##-- string pack template (default=undef) for text i/o
size => $size,       ##-- number of mapped symbols, like scalar(@i2s)
utf8 => $bool,       ##-- true iff strings are stored as utf8 (default, used by re2i())
##
##-- in-memory construction and caching
s2i => \%s2i,        ##-- maps symbols to integers
i2s => \@i2s,        ##-- maps integers to symbols
dirty => $bool,      ##-- true if in-memory structures are not in-sync with file data
loaded => $bool,     ##-- true if file data has been loaded to memory
shared => $bool,     ##-- true to avoid closing filehandles on close() or DESTROY() (default=false)
##
##-- pack lengths (after open())
len_i => $len_i,     ##-- packsize($pack_i)
len_o => $len_o,     ##-- packsize($pack_o)
len_l => $len_l,     ##-- packsize($pack_l)
len_sx => $len_sx,   ##-- $len_o + $len_i
##
##-- filehandles (after open())
sfh  => $sfh,        ##-- $base.es  : pack("(${pack_l}/A)*", @$i2s)
ixfh => $ixfh,       ##-- $base.eix : [$i] => pack("${pack_o}",          $offset_in_sfh_of_string_with_id_i)
sxfh => $sxfh,       ##-- $base.esx : [$j] => pack("${pack_o}${pack_i}", $offset_in_sfh_of_string_with_sortindex_j_and_id_i, $i)
DESTROY

destructor implicitly calls close().

promote
$enum = $enum->promote($class,$force)

Promotes $enum to class $class. If $force is false (default), promotion via ref($enum)."::MMap" will be disabled.

I/O: open/close (file)

See also DiaColloDB::Persistent.

open
$enum_or_undef = $enum->open($base,$flags);
$enum_or_undef = $enum->open($base);
$enum_or_undef = $enum->open();

opens file(s), clears {loaded} flag.

close
$enum_or_undef = $enum->close();

closes the enum, implicitly calling flush() if opened for writing.

opened
$bool = $enum->opened();

returns true iff enum is opened.

dirty
$bool = $enum->dirty();

returns true iff some in-memory structures haven't been flushed to disk.

loaded
$bool = $enum->loaded();

returns true iff in-memory structures have been populated from disk

rollback
$bool = $enum->rollback();
  • drops in-memory structures

  • invalidates any old references to {s2i}, {i2s} (but doesn't empty them if you need to keep a reference).

  • clears {dirty} flag

flush
$bool = $enum->flush();
$bool = $enum->flush($force);
  • flush in-memory structures to disk

  • no-op unless $force or $enum->dirty() is true

  • clobbers any old disk-file contents with in-memory maps

  • enum must be opened in write-mode

  • invalidates any old references to {s2i}, {i2s} (but doesn't empty them if you need to keep a reference)

  • clears {dirty} flag

toArray
\@i2s = $enum->toArray();

return an ARRAY-ref representing the mapping; array items are still byte-encoded.

fromArray
$enum = $enum->fromArray(\@i2s);

clobbers $enum contents, steals \@i2s

fromHash
$enum = $enum->fromHash(\%s2i);

clobbers $enum contents, steals \%s2i

fromEnum
$enum = $enum->fromEnum($enum2);

clobbers $enum contents, does NOT steal $enum2->{i2s}

load
$bool = $enum->load();

loads files to memory; enum must be opened

save
$enum = $enum->save();
$enum = $enum->save($base);

saves enum to $base; really just a wrapper for open() and flush()

See also DiaColloDB::Persistent.

I/O: header

headerKeys
@keys = $coldb->headerKeys();

keys to save as header

loadHeaderData
$bool = $enum->loadHeaderData($hdr);

instantiates header data from $hdr; overrides DiaColloDB::Persistent implementation.

I/O: text

loadTextFh
$enum = $CLASS_OR_OBJECT->loadTextFh($filename_or_fh);
$enum = $CLASS_OR_OBJECT->loadTextFh($filename_or_fh, %opts);

Loads from text file with lines of the form "ID SYMBOL...". clobbering enum contents. %opts locally clobber %$enum, especially:

pack_s => $pack_s
saveTextFh
$bool = $enum->saveTextFh($fh,%opts);
  • save from text file with lines of the form "ID SYMBOL..."

  • %opts locally clobber %$enum, especially:

    pack_s => $pack_s

Methods: population (in-memory only)

size
$size = $enum->size();

wraps {size} key

setsize
$newsize = $enum->setsize($newsize);

realy just wraps {size} key

addSymbols
$newsize = $enum->addSymbols(@symbols);
$newsize = $enum->addSymbols(\@symbols);
  • adds all symbols in @symbols which don't already have an ID

  • enum must be loaded to memory

appendSymbols
$newsize = $enum->appendSymbols(@symbols);
$newsize = $enum->appendSymbols(\@symbols);

adds all symbols in @symbols in order, messily re-mapping them if they already have an ID.

addEnum
$newsize = $enum->addEnum($enum2_or_undef);

ensures all symbols from $enum2_or_undef are defined (undef:'')

Methods: lookup

i2s
$s_or_undef = $enum->i2s($i);

Returns symbol for ID $i, or undef if no such symbol exists. In-memory cache overrides file contents.

s2i
$i_or_undef = $enum->s2i($s);
$i_or_undef = $enum->s2i($s, $ilo,$ihi);

Returns ID for symbol $s. Binary search between sorted symbol positions $ilo and $ihi (default=full enum). In-memory cache overrides file content.s

re2i
\@is = $enum->re2i($regex);

Gets indices for all strings matching $regex.

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2015-2020 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.

SEE ALSO

DiaColloDB::EnumFile::MMap(3pm), DiaColloDB::EnumFile::FixedLen(3pm), DiaColloDB::EnumFile::FixedMap(3pm), DiaColloDB::EnumFile::Tied(3pm), DiaColloDB::Persistent(3pm), DiaColloDB(3pm), perl(1), ...