NAME
DataStore::CAS::Simple - Simple file/directory based CAS implementation
VERSION
version 0.08
DESCRIPTION
This implementation of DataStore::CAS uses a directory tree where the filenames are the hexadecimal value of the digest hashes. The files are placed into directories named with a prefix of the digest hash to prevent too many entries in the same directory (which is actually only a concern on certain filesystems).
Opening a File returns a real perl filehandle, and copying a File object from one instance to another is optimized by hard-linking the underlying file.
# This is particularly fast:
$cas1= DataStore::CAS::Simple->new( path => 'foo' );
$cas2= DataStore::CAS::Simple->new( path => 'bar' );
$cas1->put( $cas2->get( $hash ) );
This class does not perform any sort of optimization on the storage of the content, neither by combining commom sections of files nor by running common compression algorithms on the data.
TODO: write DataStore::CAS::Compressor or DataStore::CAS::Splitter for those features.
ATTRIBUTES
path
Read-only. The filesystem path where the store is rooted.
digest
Read-only. Algorithm used to calculate the hash values. This can only be set in the constructor when a new store is being created. Default is SHA-1
.
fanout
Read-only. Returns arrayref of pattern used to split digest hashes into directories. Each digit represents a number of characters from the front of the hash which then become a directory name. The final digit may be the character '=' to indicate the filename is the full hash, or '*' to indicate the filename is the remaining digits of the hash. '*' is the default behavior if the fanout
does not include one of these characters.
For example, [ 2, 2 ]
would turn a hash of "1234567890" into a path of "12/34/567890". [ 2, 2, '=' ]
would turn a hash of "1234567890" into a path of "12/34/1234567890".
fanout_list
Convenience accessor for @{ $cas->fanout }
copy_buffer_size
Number of bytes to copy at a time when saving data from a filehandle to the CAS. This is a performance hint, and the default is usually fine.
storage_format_version
Hashref of version information about the modules that created the store. Newer library versions can determine whether the storage is using an old format using this information.
METHODS
new
$class->new( \%params | %params )
Constructor. It will load (and possibly create) a CAS Store.
If create
is specified, and path
refers to an empty directory, a fresh store will be initialized. If create
is specified and the directory is already a valid CAS, create
is ignored, as well as digest
and fanout
.
path
points to the cas directory. Trailing slashes don't matter. You might want to use an absolute path in case you chdir
later.
copy_buffer_size
initializes the respective attribute.
The digest
and fanout
attributes can only be initialized if the store is being created. Otherwise, it is loaded from the store's configuration.
ignore_version
allows you to load a Store even if it was created with a newer version of the DataStore::CAS::Simple package that you are now using. (or a different package entirely)
path_parts_for_hash
my (@path)= $cas->path_parts_for_hash($digest_hash);
Given a hash string, return the directory parts and filename where that content would be found. They are returned as a list. If the hash is not valid for this digest algorithm, this will throw an exception.
path_for_hash
my $path= $cas->path_for_hash($digest_hash);
my $path= $cas->path_for_hash($digest_hash, $create_dirs);
Given a hash string, return the path to the file, including $self->path
. The second argument can be set to true to create any missing directories in this path.
create_store
$class->create_store( %configuration | \%configuration )
Create a new store at a specified path. Configuration must include path
, and may include digest
and fanout
. path
must be an empty writeable directory, and it must exist. digest
currently defaults to SHA-1
. fanout
currently defaults to [1, 2]
, resulting in paths like "a/bc/defg".
This method can be called on classes or instances.
You may also specify create => 1
in the constructor to implicitly call this method using the relevant parameters you supplied to the constructor.
get
See "get" in DataStore::CAS for details.
put
See "put" in DataStore::CAS for details.
put_scalar
See "put_scalar" in DataStore::CAS for details.
put_file
See "put_file" in DataStore::CAS for details. In particular, heed the warnings about using the 'hardlink' and 'reuse_hash' flag.
DataStore::CAS::Simple has special support for the flags 'move' and 'hardlink'. If your source is a real file on the same filesystem by the same owner and/or group, { move => 1 }
will move the file instead of copying it. (If it is a different filesystem or ownership can't be changed, it gets copied and the original gets unlinked). If the file is a real file on the same filesystem with correct owner and permissions, { hardlink => 1 }
will link the file into the CAS instead of copying it.
new_write_handle
See "new_write_handle" in DataStore::CAS for details.
commit_write_handle
See "commit_write_handle" in DataStore::CAS for details.
validate
See "validate" in DataStore::CAS for details.
open_file
See "open_file" in DataStore::CAS for details.
iterator
See "iterator" in DataStore::CAS for details.
delete
See "delete" in DataStore::CAS for details.
FILE OBJECTS
File objects returned by DataStore::CAS::Simple have two additional attributes:
local_file
The filename of the disk file within DataStore::CAS::Simple's path which holds the requested data.
block_size
The block_size parameter from stat()
, which might be useful for accessing the file efficiently.
AUTHOR
Michael Conrad <mconrad@intellitree.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2023 by Michael Conrad, and IntelliTree Solutions llc.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.