NAME

Paranoid::IO::FileMultiplexer - File Multiplexer

VERSION

$Id: lib/Paranoid/IO/FileMultiplexer.pm, 2.10 2022/03/08 00:01:04 acorliss Exp $

SYNOPSIS

$obj = Paranoid::IO::FileMultiplexer->new(
    file        => $fn,
    readOnly    => 0,
    perms       => $perms,
    blockSize   => $bsize,
    );

$header = $obj->header;

$rv = $obj->chkConsistency;
$rv = $obj->addStream($name);

$rv = $obj->strmSeek($sname, $pos, $whence);
$rv = $obj->strmTell($sname);
$bw = $obj->strmWrite($sname, $content);
$br = $obj->strmRead($stream, \$content, $bytes);
$bw = $obj->strmAppend($sname, $content);
$bw = $obj->strmTruncate($sname, $neos);

DESCRIPTION

This class produces file multiplexer objects that multiplex I/O streams into a single file. This allows I/O patterns that would normally be applied to multiple files to be applied to one, with full support for concurrent access by multiple processes on the same system.

At its most basic, one could use these objects as an archive format for multiple files. At its most complex, this could be a database backend file, similar to sqlite or Berkeley DB.

This does require flock support for the file.

CAVEATS FOR USAGE

This class is built essentially as a block allocation tool, which does have some side effects that must be anticipated. Full support is available for both 32-bit and 64-bit file systems, and files produced can be exchange across both types of platforms with no special handling, at least until the point the file grows beyond the capabilities of a 32 bit platform. Similarly, portability should work fine across both endian platforms.

That said, the simplicity of this design did require some compromises, the first being the number of supported "streams" that can be stored inside a single file. That is a function of the block size chosen for the file. All allocated streams are tracked in the file header block, so the number of streams is constrained by the number that can be recorded in that block.

Likewise, the maximum size of a stream is also limited by the block size, since the stream head block can only track so many block allocation tables, and each block allocation table can only track so many data blocks.

Practically speaking, for many use cases this should not be an issue, but you can get an idea of the impact on both 32-bit and 64-bit systems like so:

                    32b/4KB                 64b/4KB
--------------------------------------------------------------------------
Max File Size:      4294967295 (4.00GB)     18446744073709551615 (16.00EX)
Max Streams:        135                     135
Max Stream Size:    1052872704 (1004.10MB)  1052872704 (1004.10MB)

                    32b/8KB                 64b/8KB
--------------------------------------------------------------------------
Max File Size:      4294967295 (4.00GB)     18446744073709551615 (16.00EX)
Max Streams:        272                     272
Max Stream Size:    4294967295 (4.00GB)     8506253312 (7.92GB)

As you can see, 8KB blocks will provide full utilization of your file system capabilities on a 32-bit platform, but on a 64-bit platform, you are still artificially capped on how much data can be stored in an individual stream. The number of streams will always limited identically on both platforms based on the block size.

NOTE: The actual limits of file sizes aren't dependent upon the native size of longs or quads, but the file system design, itself. Some file systems designed for 32-bit processors reserved the highest bit, which made the highest addressable space in a file 2GB instead of 4GB. Other filesystems had limits that were a function of inode size and other aspects of the formatted file system. End sum, the true limit for file size may be outside of the ability for this module to detect and accomodate gracefully.

One final caveat should be noted regarding I/O performance. The supported block sizes are intentionally limited in hopes of avoiding double-write penalties due to block alignment issues on the underlying file system. At the same time, the block size also serves as a kind of crude tuning capability for the size of I/O operations. No individual I/O, whether read or write, will exceed the size of a block. You, as the developer, can call the class API with reads of any size you wish, of course, but behind the scenes it will be broken up into block-sized reads at most.

For those reasons, when choosing your block size one should choose based on the best compromise between I/O performance and the minimum number of streams (or maximum stream size) anticipated.

As a final note, one should also remember that space is allocated to the file in block sized chunks. That means creating a new file w/1MB block size, containing one stream, but with nothing written to the stream, will create a file 4MB in size. That's due to the preallocation of the file header, a stream header, the stream's first block allocation table, and an initial data block.

SUBROUTINES/METHODS

new

$obj = Paranoid::IO::FileMultiplexer->new(
    file        => $fn,
    readOnly    => 0,
    perms       => $perms,
    blockSize   => $bsize,
    );

This class method creates new objects for accessing the contents of the pass file. It will create a new file if missing, or open an existing file and retrieve the metadata for tuning.

Only the file name is mandatory. Block size defaults to 4KB, but if specified, can support from 4KB to 1MB block sizes, as long as the block size is a multiple of 4KB.

$header = $obj->header;

This method returns a reference to the file header block object. Typically, this has no practical value to the developer, but the file header does provide a model method that returns a hash with some predicted sizing limitations. if you want to know the maximum number of supported streams or the maximum size of an individual stream, this could be useful. Calling any other method for that class, however, could cause corruption of your file.

chkConsistency

$rv = $obj->chkConsistency;

This method performs a high-level consistency check of the file structure. At this time it is limited to ensuring that every header block (file, stream, and BAT) has a viable signature, and all records inside those blocks are allocated and match signatures where appropriate.

If this method detects any inconsistencies it will mark the object as corrupted, which will prevent any further writes to the file in hopes that further corruption can be avoided.

The file format of this multiplexer is such that a good deal of data can be recovered even with the complete loss of the file header. Corruption in a stream header can even be recovered from. Only the loss of a BAT header can prevent data from being recovered, but even then that will only impact the stream it belongs to. It should not impact other streams.

Take this with a grain of salt, of course. There are always caveats to that rule, depending on whether the corruption has been detected prior to dangerous writes. Every read and write to a stream triggers a few basic consistency checks prior to progressing, but they are not as thorough as this method's process, lest it have and adverse impact on performance.

This returns a boolean value.

addStream

$rv = $obj->addStream($name);

This method adds a stream to the file, triggering the automatic allocation of three blocks (a stream header, the first stream BAT, and the first data block). It returns a boolean value, denoting success or failure.

strmSeek

$rv = $obj->strmSeek($sname, $pos, $whence);

This method acts the same as the core sysseek, taking the same arguments, but with the substitution of the stream name for the file handle. It's return value is also the same.

Note that the position returned is relative to the data stream, not the file itself.

strmTell

$rv = $obj->strmTell($sname);

This method acts the same as the core tell, taking the same arguments, but with the substitution of the stream name for the file handle. Like strmSeek, the position returned is relative to the data stream, not the file itself.

strmWrite

$bw = $obj->strmWrite($sname, $content);

This method acts similarly to a very simplifed syswrite. It does not support length and offset arguments, only the content itself. It will presume that the stream position has been adjusted as needed prior to invocation.

This returns the number of bytes written. If everything is working appropriately, that should match the byte length of the content itself.

strmRead

$br = $obj->strmRead($stream, \$content, $bytes);

This method acts similarly to a very simplified sysread. It does not support offset arguments, only a scalar reference and the number of bytes to read. It also presumes that the stream position has been adjusted as needed prior to invocation.

This returns the number of bytes read. Unless you've asked for more data than has been written to the stream, this should match the number of bytes requested.

strmAppend

$bw = $obj->strmAppend($sname, $content);

This method acts similarly to Paranoid::IO's pappend. It always seeks to the end of the written data stream before appending the requested content. Like strmWrite, it will return the number of bytes written. Like pappend, it does not move the stream position, should you perform additional writes or reads.

strmTruncate

$bw = $obj->strmTruncate($sname, $neos);

This method acts similarly to truncate. It returns a boolean value denoting failure or success.

DESTROY

Obviously, one would never need to call this directly, but it is documented here to inform the developer that once an object goes out of scope, it will call pclose on the file, explicitly closing and purging any cached file handles from Paranoid::IO's internal cache.

DEPENDENCIES

o

Carp

o

Fcntl

o

Paranoid

o

Paranoid::Debug

o

Paranoid::IO

o

Paranoid::IOFileMultiplexer::Block::FileHeader

o

Paranoid::IOFileMultiplexer::Block::StreamHeader

o

Paranoid::IOFileMultiplexer::Block::BATHeader

BUGS AND LIMITATIONS

AUTHOR

Arthur Corliss (corliss@digitalmages.com)

LICENSE AND COPYRIGHT

This software is free software. Similar to Perl, you can redistribute it and/or modify it under the terms of either:

a)     the GNU General Public License
       <https://www.gnu.org/licenses/gpl-1.0.html> as published by the 
       Free Software Foundation <http://www.fsf.org/>; either version 1
       <https://www.gnu.org/licenses/gpl-1.0.html>, or any later version
       <https://www.gnu.org/licenses/license-list.html#GNUGPL>, or
b)     the Artistic License 2.0
       <https://opensource.org/licenses/Artistic-2.0>,

subject to the following additional term: No trademark rights to "Paranoid" have been or are conveyed under any of the above licenses. However, "Paranoid" may be used fairly to describe this unmodified software, in good faith, but not as a trademark.

(c) 2005 - 2021, Arthur Corliss (corliss@digitalmages.com) (tm) 2008 - 2021, Paranoid Inc. (www.paranoid.com)