NAME

Alvis::Buffer - Perl extension for buffering utilities for the Alvis pipeline

SYNOPSIS

use Alvis::Buffer;
$Buffer::BUFFER = "/tmp/building.xml";
$Buffer::verbose++;
&Buffer::fix() or die "Cannot Buffer::fix";
$in = new Alvis::Pipeline::Read(host => "harvester.alvis.info",
                                port => 16716,
                                spooldir => "/home/alvis/spool");
while ($xml = $in->read(1)) {
    &clean_wrapping(\$xml);
    &Buffer::add($xml);
    if ( $Buffer::docs>1000 ) {
       $filename = &Buffer::save();
       if ( !$filename ) {
          &Buffer::close();
          die "Cannot Buffer::save";
       }
    }
}
$filename = &Buffer::save();
&Buffer::close();

DESCRIPTION

This module provides a way of buffering Alvis XML into manageable chunks as it is read in from a pipeline (Alvis::Pipeline). Chunks can be controlled by file size or document count, but this is done externally to the module, and the module simple provides a function to save the current buffer contents.

Files of collected Alvis XML documents, with appropriate XML header and footer parts, are saved in the relative directory "xml-add/" under numbers 1,2,3, ... At each time of storage, the current directory is checked to see which number to use to store the latest batch. If "xml-add/" is empty, then "xml/" is checked instead. Presumably, files in "xml-add/" are being processed into "xml/".

The implementation is independent of any pipeline, and assumes a number of fixed directories. Assumes files are in UTF-8, and that documents are present in elements named <documentRecord>.

FUNCTIONS

fix()

&Buffer::fix() or die "Cannot Buffer::fix";

Basic initialisation and checking to ensure the output buffer is OK, and have the current document count and size in memory. Returns 1 if everything is OK, else 0.

If runtime stops or aborts while the output buffer is still being built, a restart will safely recover the contents as long as no data was lost on the file.

add()

&Buffer::add($xml);

Add an XML chunk to the current buffer and updates current document count and size in memory.

save()

$filename = &Buffer::save();
if ( !$filename ) {
    die "Cannot Buffer::save";
} else {
    print STDERR "New XML file $filename saved\n";
}

Save the current buffer into "xml-add/" as an appropriote integer name, such as "xml-add/$N.xml", where $N will be determined at the point of saving as the next biggest integer. Returns the filename used if everything is OK, else returns undef. This needs to be called explicitly so the variables $Buffer::docs and $Buffer:.size should be checked to determine when this should be done.

close()

&Buffer::close();

Close the output buffer.

VARIABLES and PARAMETERS

Global variables are of two kinds. There are those intended to define characteristics of general use. These should be set by the user before the functions are used, but reasonable defaults are used.

BUFFER

name of the output buffer file that collects XML chunks. Don't include the randomising string "$$" if you want this file to be available during a restart.

text to enter at the front of a sequence of <documentRecord> elements.

text to enter at the end of a sequence of <documentRecord> elements.

verbose

set to a non-zero value if more debugging shoulöd be reported to STDERR.

Then the are variables that are read-only and define current buffer statistics.

size

Count of characters in the current buffer.

docs

Count of documents, as determined by occurrences of <documentRecord> elements.

SEE ALSO

Alvis::Pipeline

AUTHOR

Wray Buntine, <buntine@hiit.fi>

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Wray Buntine.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.4 or, at your option, any later version of Perl 5 you may have available.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 129:

Non-ASCII character seen before =encoding in 'shoulöd'. Assuming CP1252