NAME
Archive::BagIt - The main module to handle bags.
VERSION
version 0.072
NAME
Achive::BagIt - The main module to handle Bags
SOURCE
The original development version was on github at http://github.com/rjeschmi/Archive-BagIt and may be cloned from there.
The actual development version is available at https://art1pirat.spdns.org/art1/Archive-BagIt
Conformance to RFC8493
The module should fulfill the RFC requirements, with following limitations:
- only encoding UTF-8 is supported
- version 0.97 or 1.0 allowed
- version 0.97 requires tag-/manifest-files with md5-fixity
- version 1.0 requires tag-/manifest-files with sha512-fixity
- BOM is not supported
- Carriage Return in bagit-files are not allowed
- fetch.txt is unsupported
At the moment only filepaths in linux-style are supported.
To get an more detailled overview, see the testsuite under t/verify_bag.t and corresponding test bags from the BagIt conformance testsuite of Library of Congress under bagit_conformance_suite/.
See https://datatracker.ietf.org/doc/rfc8493/?include_text=1 for details.
TODO
FAQ
How to access the manifest-entries directly?
Try this:
foreach my $algorithm ( keys %{ $self->manifests }) {
my $entries_ref = $self->manifests->{$algorithm}->manifest_entries();
# $entries_ref returns a hashref of form:
# $entries_ref->{$algorithm}->{$file} = $digest;
}
Similar for tagmanifests
How fast is Archive::BagIt::Fast
?
It depends. On my system with SSD and a 38MB bag with 48 payload files the results for verify_bag()
are:
Rate Base Fast
Base 102% -- -10%
Fast 125% 11% --
On network filesystem (CIFS, 1Gb) with same Bag:
Rate Fast Base
Fast 2.20/s -- -11%
Base 2.48/s 13% --
But you should measure which variant is best for you. In general the default Archive::BagIt
is fast enough.
How to update an old bag of version v0.97 to v1.0?
You could try this:
use Archive::BagIt;
my $bag=Archive::BagIt->new( $my_old_bag_filepath );
$bag->load();
$bag->store();
How to create UTF-8 based paths under MS Windows?
For versions < Windows10: I have no idea and suggestions for a portable solution are very welcome! For Windows 10: Thanks to https://superuser.com/questions/1033088/is-it-possible-to-set-locale-of-a-windows-application-to-utf-8/1451686#1451686 you have to enable UTF-8 support via 'System Administration' -> 'Region' -> 'Administrative' -> 'Region Settings' -> Flag 'Use Unicode UTF-8 for worldwide language support'
Hint: The better way is to use only portable filenames. See perlport for details.
SYNOPSIS
This modules will hopefully help with the basic commands needed to create and verify a bag. This part supports BagIt 1.0 according to RFC 8493 ([https://tools.ietf.org/html/rfc8493](https://tools.ietf.org/html/rfc8493)).
You only need to know the following methods first:
read a BagIt
use Archive::BagIt;
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new($bag_dir);
construct a BagIt around a payload
use Archive::BagIt;
my $bag2 = Archive::BagIt->make_bag($bag_dir);
verify a BagIt-dir
use Archive::BagIt;
# Validate a BagIt archive against its manifest
my $bag3 = Archive::BagIt->new($bag_dir);
my $is_valid1 = $bag3->verify_bag();
# Validate a BagIt archive against its manifest, report all errors
my $bag4 = Archive::BagIt->new($bag_dir);
my $is_valid2 = $bag4->verify_bag( {report_all_errors => 1} );
read a BagIt-dir, change something, store
Because all methods operate lazy, you should ensure to parse parts of the bag *BEFORE* you modify it. Otherwise it will be overwritten!
use Archive::BagIt;
my $bag5 = Archive::BagIt->new($bag_dir); # lazy, nothing happened
$bag5->load(); # this updates the object representation by parsing the given $bag_dir
$bag5->store(); # this writes the bag new
METHODS
Constructor
The constructor sub, will create a bag with a single argument,
use Archive::BagIt;
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new($bag_dir);
or use hashreferences
use Archive::BagIt;
#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt->new(
bag_path => $bag_dir,
);
The arguments are:
bag_path
- path to bag-directoryforce_utf8
- if set the warnings about non portable filenames are disabled (default: enabled)
The bag object will use $bag_dir, BUT an existing $bag_dir is not read. If you use store()
an existing bag will be overwritten!
See load()
if you want to parse/modify an existing bag.
has_force_utf8()
to check if force_utf8() was set.
If set it ignores warnings about potential filepath problems.
bag_path([$new_value])
Getter/setter for bag path
metadata_path()
Getter for metadata path
payload_path()
Getter for payload path
checksum_algos()
Getter for registered Checksums
bag_version()
Getter for bag version
bag_encoding()
Getter for bag encoding.
HINT: the current version of Archive::BagIt only supports UTF-8, but the method could return other values depending on given Bags.
bag_info([$new_value])
Getter/Setter for bag info. Expects/returns an array of HashRefs implementing simple key-value pairs.
HINT: RFC8493 does not allow *reordering* of entries!
has_bag_info()
returns true if bag info exists.
errors()
Getter to return collected errors after a verify_bag()
call with Option report_all_errors
warnings()
Getter to return collected warnings after a verify_bag()
call
digest_callback()
This method could be reimplemented by derived classes to handle fixity checks in own way. The getter returns an anonymous function with following interface:
my $digest = $self->digest_callback;
&$digest( $digestobject, $filename);
This anonymous function MUST use the get_hash_string()
function of the Archive::BagIt::Role::Algorithm
role, which is implemented by each Archive::BagIt::Plugin::Algorithm::XXXX
module.
See Archive::BagIt::Fast
for details.
get_baginfo_values_by_key($searchkey)
Returns all values which match $searchkey, undef otherwise
is_baginfo_key_reserved_as_uniq($searchkey)
returns true if key is reserved and should be uniq
is_baginfo_key_reserved( $searchkey )
returns true if key is reserved
verify_baginfo()
checks baginfo-keys, returns true if all fine, otherwise returns undef and the message is pushed to errors()
. Warnings pushed to warnings()
delete_baginfo_by_key( $searchkey )
deletes an entry of given $searchkey if exists
exists_baginfo_key( $searchkey )
returns true if a given $searchkey exists
append_baginfo_by_key($searchkey, $newvalue)
Appends a key value pair to bag_info.
HINT: check return code if append was successful, because some keys needs to be uniq.
add_or_replace_baginfo_by_key($searchkey, $newvalue)
It replaces the first entry with $newvalue if $searchkey exists, otherwise it appends.
forced_fixity_algorithm()
Getter to return the forced fixity algorithm depending on BagIt version
manifest_files()
Getter to find all manifest-files
tagmanifest_files()
Getter to find all tagmanifest-files
payload_files()
Getter to find all payload-files
non_payload_files()
Getter to find all non payload-files
plugins()
Getter/setter to algorithm plugins
manifests()
Getter/Setter to all manifests (objects)
algos()
Getter/Setter to all registered Algorithms
load_plugins
As default SHA512 and MD5 will be loaded and therefore used. If you want to create a bag only with one or a specific checksum-algorithm, you could use this method to (re-)register it. It expects list of strings with namespace of type: Archive::BagIt::Plugin::Algorithm::XXX where XXX is your chosen fixity algorithm.
load()
Triggers loading of an existing bag
verify_bag($opts)
A method to verify a bag deeply. If $opts
is set with {return_all_errors}
all fixity errors are reported. The default ist to croak with error message if any error is detected.
HINT: You might also want to check Archive::BagIt::Fast to see a more direct way of accessing files (and thus faster).
calc_payload_oxum()
returns an array with octets and streamcount of payload-dir
calc_bagsize()
returns a string with human readable size of paylod
create_bagit()
creates a bagit.txt file
create_baginfo()
creates a bag-info.txt file
Hint: the entries 'Bagging-Date', 'Bag-Software-Agent', 'Payload-Oxum' and 'Bag-Size' will be automagically set, existing values in internal bag-info representation will be overwritten!
store()
store a bagit-obj if bagit directory-structure was already constructed.
init_metadata()
A constructor that will just create the metadata directory
This won't make a bag, but it will create the conditions to do that eventually
make_bag( $bag_path )
A constructor that will make and return a bag from a directory,
It expects a preliminary bagit-dir exists. If there a data directory exists, assume it is already a bag (no checking for invalid files in root)
AVAILABILITY
The latest version of this module is available from the Comprehensive Perl Archive Network (CPAN). Visit http://www.perl.com/CPAN/ to find a CPAN site near you, or see https://metacpan.org/module/Archive::BagIt/.
BUGS AND LIMITATIONS
You can make new bug reports, and view existing ones, through the web interface at http://rt.cpan.org.
AUTHOR
Rob Schmidt <rjeschmi@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2021 by Rob Schmidt and William Wueppelmann and Andreas Romeyke.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.