NAME
Data::Unique - Module to check for duplicate item with time expiration and disk persistence.
VERSION
Version 0.02
SYNOPSIS
Create a data structure that avoid duplicate entries (key) whith any data and add expiration time to clean old entries. This module use Storable::AMF0 for the persistence. After some benchmark of various serialisation it is best compromise in read and write for huge quantity of data.
e.g.
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use feature qw( say );
use Time::HiRes qw(gettimeofday usleep );
use Data::Unique;
my $filename = '/tmp/dedup.test';
my @dup;
my $dedup = Data::Unique->new( { expiration => 10, file => $filename, gc => 5 } );
for my $idx ( 1 .. 6 ) {
my ( $seconds, $microseconds ) = gettimeofday;
my $time = ( $seconds * 1000000 ) + $microseconds;
say "$idx -> $time";
$dedup->item( $time, { T => $idx } ) or say "no insertion ($$time already present)";
push @dup, $time if ( ( $idx % 2 ) == 0 );
usleep 10;
}
say Data::Dumper::Dumper $dedup;
say "Number of item=".$dedup->scalar;
say "Expiration time ".$dedup->expiration;
say "Number of item=".$dedup->scalar;
say $dedup->expiration(6);
sleep 15;
#say "deleted item number=".$dedup->gc();
say "Number of item=".$dedup->scalar;
foreach my $ins (@dup) {
say $dedup->item($ins, { T => time }) ? "inserting $ins" : "no insertion ($ins already present)";
}
say "Expiration time ".$dedup->expiration;
say "Number of item=".$dedup->scalar. ' => '.scalar( keys( %{ $dedup->{data} }));
SUBROUTINES/METHODS
new
Create a new Data::Unique object. It is possible to set the default values as parameters
my $dedup = Data::Unique->new(
{
expiration => 60, # the retention time. When reached the expiration time, the item is removed
file => $filename, # the file used for the retention
gc => 5 # the number of operation between garbage colletor (checking the expiration time)
}
);
item
Add item and return 1 if succeed or return 0 if the item is already present; The key to test for unicity is the first parameter The second parameter is the data.
$dedup->item( $time, $data );
If no data is provided, only test is the item is present.
$dedup->item( $time );
expiration
Check or modify the expiration time (if a parameter is provided) If the expiration is modified, the garbage colletor run.
$dedup->expiration(6); # set the new expiration to 6 seconds
$exp = $dedup->expiration; # return the current expiration time
scalar
Return the number of item
$nbr = $dedup->scalar;
A convenient way to do:
scalar keys scalar keys %{ $self->{data} };
gc
Run the garbage collector to remove the expired item or modify the gc value if a paramter is provided. When the garbage collector is run, a sync to disk is executed. The garbage collector run each time the number item() action is reaching the value of the parameter gc If the value is 0, no automatic garbage collector is run. If the value < 0, this value is used as a expiration time when manually running the garbage collector.
$dedup->gc(); # force the garbage collector to run;
$dedup->gc(10); # change the gc value;
sync
Write the data on disk. The sync is always done when the gc() run. It is possible to run it (if the gc occurence is too high)
$dedup->sync();
AUTHOR
DULAUNOY Fabrice, <fabrice at dulaunoy.com>
BUGS
Please report any bugs or feature requests to bug-data-unique at rt.cpan.org
, or through the web interface at https://rt.cpan.org/NoAuth/ReportBug.html?Queue=Data-Unique. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
TODO
add more test add a delete method maybe TIE support
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Data::Unique
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
LICENSE AND COPYRIGHT
This software is Copyright (c) 2019 by DULAUNOY Fabrice.
This is free software, licensed under:
The Artistic License 2.0 (GPL Compatible)