NAME

Archive::BagIt::Base - The common base for Archive::BagIt. This is the module for experts. ;)

VERSION

version 0.061

SYNOPSIS

This modules will hopefully help with the basic commands needed to create and verify a bag. This part supports BagIt 1.0 according to RFC 8493 ([https://tools.ietf.org/html/rfc8493](https://tools.ietf.org/html/rfc8493)).

You only need to know the following methods first:

read a BagIt

use Archive::BagIt::Base;

#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt::Base->new($bag_dir);

construct a BagIt around a payload

use Archive::BagIt::Base;
my $bag2 = Archive::BagIt::Base->make_bag($bag_dir);

verify a BagIt-dir

use Archive::BagIt::Base;

# Validate a BagIt archive against its manifest
my $bag3 = Archive::BagIt::Base->new($bag_dir);
my $is_valid1 = $bag3->verify_bag();

# Validate a BagIt archive against its manifest, report all errors
my $bag4 = Archive::BagIt::Base->new($bag_dir);
my $is_valid2 = $bag4->verify_bag( {report_all_errors => 1} );

read a BagIt-dir, change something, store

Because all methods operate lazy, you should ensure to parse parts of the bag *BEFORE* you modify it. Otherwise it will be overwritten!

use Archive::BagIt::Base;
my $bag5 = Archive::BagIt::Base->new($bag_dir); # lazy, nothing happened
$bag5->load(); # this updates the object representation by parsing the given $bag_dir
$bag5->store(); # this writes the bag new

NAME

Archive::BagIt::Base - The common base for Archive::BagIt. This is the module for experts. ;)

VERSION

version 0.061

NAME

Achive::BagIt::Base - The common base for both Bagit and dotBagIt

AUTHORS

Robert Schmidt, <rjeschmi at gmail.com>
William Wueppelmann, <william at c7a.ca>
Andreas Romeyke, <pause at andreas minus romeyke.de>

CONTRIBUTORS

Serhiy Bolkun
Russell McOrmond

SOURCE

The original development version was on github at http://github.com/rjeschmi/Archive-BagIt and may be cloned from there.

The actual development version is available at https://art1pirat.spdns.org/art1/Archive-BagIt

Conformance to RFC8493

The module should fulfill the RFC requirements, with following limitations:

only encoding UTF-8 is supported
version 0.97 or 1.0 allowed
version 0.97 requires tag-/manifest-files with md5-fixity
version 1.0 requires tag-/manifest-files with sha512-fixity
BOM is not supported
Carriage Return in bagit-files are not allowed
fetch.txt is unsupported

At the moment only filepaths in linux-style are supported.

To get an more detailled overview, see the testsuite under t/verify_bag.t and corresponding test bags from the BagIt conformance testsuite of Library of Congress under bagit_conformance_suite/.

See https://datatracker.ietf.org/doc/rfc8493/?include_text=1 for details.

TODO

enhanced testsuite
reduce complexity
use modern perl code
add code to easily update outdated Bags to v1.0

FAQ

How to access the manifest-entries directly?

Try this:

foreach my $algorithm ( keys %{ $self->manifests }) {
    my $entries_ref = $self->manifests->{$algorithm}->manifest_entries();
    # $entries_ref returns a hashref of form:
    # $entries_ref->{$algorithm}->{$file} = $digest;
}

Similar for tagmanifests

How fast is Archive::BagIt::Fast?

It depends. On my system with SSD and a 38MB bag with 48 payload files the results for verify_bag() are:

               Rate BaseParallel FastParallel         Base         Fast
BaseParallel 2.46/s           --          -2%         -52%         -57%
FastParallel 2.52/s           2%           --         -51%         -56%
Base         5.10/s         107%         102%           --         -10%
Fast         5.69/s         131%         125%          11%           --

On network filesystem (CIFS, 1GB) with same Bag:

               Rate FastParallel         Fast BaseParallel         Base
FastParallel 1.97/s           --         -10%         -15%         -20%
Fast         2.20/s          12%           --          -6%         -11%
BaseParallel 2.33/s          18%           6%           --          -6%
Base         2.48/s          26%          13%           6%           --

But you should measure which variant is best for you. In general the default Archive::BagIt::Base is fast enough.

How to update an old bag of version v0.97 to v1.0?

You could try this:

use Archive::BagIt::Base;
my $bag=Archive::BagIt::Base->new( $my_old_bag_filepath );
$bag->load();
$bag->store();

METHODS

Constructor

The constructor sub, will create a bag with a single argument,

use Archive::BagIt::Base;

#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt::Base->new($bag_dir);

or use hashreferences

use Archive::BagIt::Base;

#read in an existing bag:
my $bag_dir = "/path/to/bag";
my $bag = Archive::BagIt::Base->new(
    bag_path => $bag_dir,
    parallel => 1
);

The arguments are:

bag_path - path to bag-directory
parallel - if set and Parallel::Iterator available, it verifies files in parallel. Hint: use it only for very large bagits, because overhead for parallelization

The bag object will use $bag_dir, BUT an existing $bag_dir is not read. If you use store() an existing bag will be overwritten!

See load() if you want to parse/modify an existing bag.

has_parallel()

to check if parallelization is possible.

bag_path([$new_value])

Getter/setter for bag path

metadata_path()

Getter for metadata path

payload_path()

Getter for payload path

checksum_algos()

Getter for registered Checksums

bag_version()

Getter for bag version

bag_encoding()

Getter for bag encoding.

HINT: the current version of Archive::BagIt::Base only supports UTF-8, but the method could return other values depending on given Bags.

bag_info([$new_value])

Getter/Setter for bag info. Expects/returns an array of HashRefs implementing simple key-value pairs.

HINT: RFC8493 does not allow *reordering* of entries!

errors()

Getter to return collected errors after a verify_bag() call with Option report_all_errors

digest_callback()

This method could be reimplemented by derived classes to handle fixity checks in own way. The getter returns an anonymous function with following interface:

my $digest = $self->digest_callback;
&$digest( $digestobject, $filename);

This anonymous function MUST use the get_hash_string() function of the Archive::BagIt::Role::Algorithm role, which is implemented by each Archive::BagIt::Plugin::Algorithm::XXXX module.

See Archive::BagIt::Fast for details.

get_baginfo_values_by_key($searchkey)

Returns all values which match $searchkey, undef otherwise

is_baginfo_key_reserved_as_uniq($searchkey)

returns true if key is reserved and should be uniq

is_baginfo_key_reserved( $searchkey )

returns true if key is reserved

verify_baginfo()

checks baginfo-keys, returns true if all fine, otherwise returns undef and the message is pushed to errors().

delete_baginfo_by_key( $searchkey )

deletes an entry of given $searchkey if exists

exists_baginfo_key( $searchkey )

returns true if a given $searchkey exists

append_baginfo_by_key($searchkey, $newvalue)

Appends a key value pair to bag_info.

HINT: check return code if append was successful, because some keys needs to be uniq.

add_or_replace_baginfo_by_key($searchkey, $newvalue)

It replaces the first entry with $newvalue if $searchkey exists, otherwise it appends.

forced_fixity_algorithm()

Getter to return the forced fixity algorithm depending on BagIt version

manifest_files()

Getter to find all manifest-files

tagmanifest_files()

Getter to find all tagmanifest-files

payload_files()

Getter to find all payload-files

non_payload_files()

Getter to find all non payload-files

plugins()

Getter/setter to algorithm plugins

manifests()

Getter/Setter to all manifests (objects)

algos()

Getter/Setter to all registered Algorithms

load_plugins

As default SHA512 and MD5 will be loaded and therefore used. If you want to create a bag only with one or a specific checksum-algorithm, you could use this method to (re-)register it. It expects list of strings with namespace of type: Archive::BagIt::Plugin::Algorithm::XXX where XXX is your chosen fixity algorithm.

load()

Triggers loading of an existing bag

verify_bag($opts)

A method to verify a bag deeply. If $opts is set with {return_all_errors} all fixity errors are reported. The default ist to croak with error message if any error is detected.

HINT: You might also want to check Archive::BagIt::Fast to see a more direct way of accessing files (and thus faster).

calc_payload_oxum()

returns an array with octets and streamcount of payload-dir

calc_bagsize()

returns a string with human readable size of paylod

create_bagit()

creates a bagit.txt file

create_baginfo()

creates a bag-info.txt file

store()

store a bagit-obj if bagit directory-structure was already constructed.

init_metadata()

A constructor that will just create the metadata directory

This won't make a bag, but it will create the conditions to do that eventually

make_bag( $bag_path )

A constructor that will make and return a bag from a directory,

It expects a preliminary bagit-dir exists. If there a data directory exists, assume it is already a bag (no checking for invalid files in root)

AVAILABILITY

The latest version of this module is available from the Comprehensive Perl Archive Network (CPAN). Visit http://www.perl.com/CPAN/ to find a CPAN site near you, or see https://metacpan.org/module/Archive::BagIt/.

BUGS AND LIMITATIONS

You can make new bug reports, and view existing ones, through the web interface at http://rt.cpan.org.

AUTHOR

Rob Schmidt <rjeschmi@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2020 by Rob Schmidt and William Wueppelmann and Andreas Romeyke.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

AUTHOR

Rob Schmidt <rjeschmi@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2020 by Rob Schmidt and William Wueppelmann and Andreas Romeyke.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.