NAME
FsDB - Use the filesystem as a DB
SYNOPSIS
use FsDB;
my %hash;
tie %hash, 'FsDB', "mydb";
$hash{ $key } = $value;
# If you are creating multiple thousands of entries:
tie %hash, 'FsDB', { dir=>"mydb", depth=>1 };
DESCRIPTION
FsDB uses the filesystem as a DBM or more correctly, a persistent key-value store. FsDB will create a file for each value stored in the DBM. The name of the file is a hash of the key.
FsDB uses a directory per database instead of a file. Each value is stored in a unique file. The unique filename is created by hashing the key with "murmur32" in Digest::MurmurHash3. The value is stringified and stored in the file. The opposite operations are done to retrieve a value: the unique filename is created with the hash, the file is read and the value is returned.
FsDB is not intended to be portable nor distributed.
FsDB's defaults (depth=0) are intended for situations where you only have a few keys, less then a few thousand. An example application would be persistent session information in a web application. Each session would be its own database (aka directory) and rarely will you need to write thousands of distinct keys.
The limit is what your filesystem can handle easily. For example, ext4 is fine up to a few thousand entries. Other filesystems will have different performance limits.
FsDB is surprisingly fast:
Benchmark: timing 5000 iterations of BerkeleyDB, DB_File, FsDB,
FsDB;depth=1, FsDB;depth=1;primed, GDBM_File, QDBM_File, SQLite_File...
BerkeleyDB: 62 wallclock secs ( 0.93 usr + 1.23 sys = 2.16 CPU) @ 2314.81/s (n=5000)
DB_File: 61 wallclock secs ( 0.77 usr + 1.17 sys = 1.94 CPU) @ 2577.32/s (n=5000)
FsDB: 3 wallclock secs ( 1.47 usr + 1.11 sys = 2.58 CPU) @ 1937.98/s (n=5000)
FsDB;depth=1: 8 wallclock secs ( 1.80 usr + 1.07 sys = 2.87 CPU) @ 1742.16/s (n=5000)
FsDB;depth=1;primed: 3 wallclock secs ( 1.74 usr + 1.06 sys = 2.80 CPU) @ 1785.71/s (n=5000)
GDBM_File: 44 wallclock secs ( 0.42 usr + 1.76 sys = 2.18 CPU) @ 2293.58/s (n=5000)
QDBM_File: 1 wallclock secs ( 0.11 usr + 0.16 sys = 0.27 CPU) @ 18518.52/s (n=5000)
(warning: too few iterations for a reliable count)
SQLite_File: 125 wallclock secs ( 5.76 usr + 2.76 sys = 8.52 CPU) @ 586.85/s (n=5000)
The above Benchmarks were run on a VM with a ext4 filesystem, qcow2 disk image. The host uses ext4 and NVMe. None of which really matters as the operations are small enough to stay in memory buffers/cache.
Small rant
The use of DB or DBM in this module and others like it (DB_File Berkeley_DB, DBM_File) is misleading. They are in fact persistent key-value stores.
METHODS
TIEHASH
my %hash
tie %hash, 'FsDB', \%params;
tie %hash, 'FsDB', $dir, [IGNORED]; # compatible with DB_File et al
The first form is prefered. The 2nd form makes FsDB a drop-in replacement for DB_File.
- dir
-
Directory where the database will be stored. This directory is created if it doesn't exist.
- depth
-
What depth of subdirectories should be crated.
depth=0
means that all the files are created in the top directory.depth=1
means that the top directory will contain one level of subdirectories that will themselves contain the files.The names of subdirectories are created by using 2 characters from the end of the hashed key. For example:
Hash is 02f45789. depth=0, $dir/02f456789 depth=1, $dir/89/02f456789 depth=2, $dir/89/67/02f456789
OVERLOADING
The following 3 methods are useful if you want to create a subclass and modify the behaviour of FsDB.
__hash
sub __hash
{
my( $self, $key ) = @_;
# ...
}
Allows you to change the hashing algorythm. Please return a string that is at least twice as long as "depth".
set_depth
sub set_depth
{
my( $self, $depth ) = @_;
# ...
}
If you want change the hashing algorythm to one that returns more then 32 bits, you might want more then a depth of 4.
__freeze
sub __freeze
{
my( $self, $data ) = @_;
# ...
}
Allows you to change serialization method. Note that $data
will be an arrayref : first element is the key, second element is the value.
__thaw
sub __thaw
{
my( $self, $data ) = @_;
# ...
}
Allows you to change serialization method. You should return an arrayref, the first element is the key, the second is the value.
SEE ALSO
perltie, BerkeleyDB, DB_File, GDBM_File, QDBM_File
AUTHOR
Philip Gwyn, <gwyn -AT- cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2023 by Philip Gwyn
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.26.3 or, at your option, any later version of Perl 5 you may have available.