NAME
Lucy::Index::DataWriter - Write data to an index.
SYNOPSIS
# Abstract base class.
DESCRIPTION
DataWriter is an abstract base class for writing index data, generally in segment-sized chunks. Each component of an index – e.g. stored fields, lexicon, postings, deletions – is represented by a DataWriter/DataReader pair.
Components may be specified per index by subclassing Architecture.
CONSTRUCTORS
new
my $writer = MyDataWriter->new(
snapshot => $snapshot, # required
segment => $segment, # required
polyreader => $polyreader, # required
);
Abstract constructor.
snapshot - The Snapshot that will be committed at the end of the indexing session.
segment - The Segment in progress.
polyreader - A PolyReader representing all existing data in the index. (If the index is brand new, the PolyReader will have no sub-readers).
ABSTRACT METHODS
add_segment
$data_writer->add_segment(
reader => $reader # required
doc_map => $doc_map # default: undef
);
Add content from an existing segment into the one currently being written.
reader - The SegReader containing content to add.
doc_map - An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.
finish
$data_writer->finish();
Complete the segment: close all streams, store metadata, etc.
format
my $int = $data_writer->format();
Every writer must specify a file format revision number, which should increment each time the format changes. Responsibility for revision checking is left to the companion DataReader.
METHODS
delete_segment
$data_writer->delete_segment($reader);
Remove a segment’s data. The default implementation is a no-op, as all files within the segment directory will be automatically deleted. Subclasses which manage their own files outside of the segment system should override this method and use it as a trigger for cleaning up obsolete data.
reader - The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.
merge_segment
$data_writer->merge_segment(
reader => $reader # required
doc_map => $doc_map # default: undef
);
Move content from an existing segment into the one currently being written.
The default implementation calls add_segment() then delete_segment().
reader - The SegReader containing content to merge, which must represent a segment which is part of the the current snapshot.
doc_map - An array of integers mapping old document ids to new. Deleted documents are mapped to 0, indicating that they should be skipped.
metadata
my $hashref = $data_writer->metadata();
Arbitrary metadata to be serialized and stored by the Segment. The default implementation supplies a hash with a single key-value pair for “format”.
get_snapshot
my $snapshot = $data_writer->get_snapshot();
Accessor for “snapshot” member var.
get_segment
my $segment = $data_writer->get_segment();
Accessor for “segment” member var.
get_polyreader
my $poly_reader = $data_writer->get_polyreader();
Accessor for “polyreader” member var.
get_schema
my $schema = $data_writer->get_schema();
Accessor for “schema” member var.
get_folder
my $folder = $data_writer->get_folder();
Accessor for “folder” member var.
INHERITANCE
Lucy::Index::DataWriter isa Clownfish::Obj.