NAME
sasbactrl.pl - command line interface to SeeAlso::Source::BeaconAggregator and auxiliary classes
SYNOPSIS
DESCRIPTION
This Module allows a collection of BEACON files (cf. http://de.wikipedia.org/wiki/Wikipedia:BEACON) to be used as SeeAlso::Source (probably in the context of an SeeAlso::Server application). Therefore it implements the four methods documented in SeeAlso::Source
The BEACON files (lists of non-local identifiers of a certain type documenting the coverage of a given online database plus means for access) are imported by the methods provided by SeeAlso::Source::BeaconAggregator::Maintenance.pm, usually by employing the script sasbactrl.pl as command line client.
Serving other formats than SeeAlso or providing a BEACON file with respect to this SeeAlso service is achieved by using SeeAlso::Source::BeaconAggregator::Publisher.
USAGE
Use the new()
method inherited from SeeAlso::Source::BeaconAggregator
to access an existing database or create a new one.
Database Methods
init( [ %options] )
Sets up and initializes the database structure for the object. This has to be done once after creating a new database and after upgrading this module.
Valid options include:
- verbose
- prepareRedirs
- identifierClass
The repos table contains as columns all valid beacon fields plus the following administrative fields which have to be prefixed with "_" in the interface:
- seqno
-
Sequence number: Is incremented on any successfull load
- alias
-
Unique key: On update older seqences with the same alias are automatically discarded. Most methods take an alias as argument thus obliterating the need to determine the sequence number.
- sort
-
optional sort key
- uri
-
Overrides the #FEED header for updates
- ruri
-
Real uri from which the last instance was loaded
- ftime
-
Fetch time: Timestamp as to when this instance was loaded
Clear this or mtime to force automatic reload.
- fstat
-
Short statistics line of last successful reload on update.
- mtime
-
Modification time: Timestamp of the file / HTTP object from which this instance was loaded. Identical to ftime if no timestamp is provided
Clear this or ftime to force automatic reload on update.
- utime
-
Timestamp of last update attempt
- ustat
-
Short status line of last update attempt.
- counti
-
Identifier count
- countu
-
Unique identifier count
- admin
-
Just to store some remarks.
The beacons table stores the individual beacon entries from the input files. Its columns are:
- hash
-
Identifier. If a (subclass of) C<SeeAlso::Source::Identifier> instance is provided, this will be transformed by the C<hash()> method.
- seqno
-
Sequence number of the beacon file in the database
- altid
-
optional identifier from an alternative identifier system for use with ALTTARGET templates.
- hits
-
optional number of hits for this identifier in the given resource
- info
-
optional information text
- link
-
optional explicit URL
The osd table contains key
, val
pairs for various metadata concerning the collection as such, notably the values needed for the Open Search Description and the Header fields needed in case of publishing a beacon file for this collection.
The admin table stores (unique) key
, val
pairs for general persistent data. Currently the following keys are defined:
- DATA_VERSION
-
Integer version number to migrate database layout.
- IDENTIFIER_CLASS
-
Name of the Identifier class to be used.
- REDIRECTION_INDEX
-
Control creation of an additional index for the altid column (facialiates reverse lookups as needed for clustering).
deflate()
Maintenance action: performs VACCUUM, REINDEX and ANALYZE on the database
Handling of beacon files
loadFile ( $file, $fields, %options )
Reads a physical beacon file and stores it with a new Sequence number in the database.
Returns a triple:
my ($seqno, $rec_ok, $message) = loadFile ( $file, $fields, %options )
$seqno is undef on error
$seqno and $rec_ok are zero with $message containing an explanation in case of no action taken
$seqno is an positive integer if something was loaded: The "Sequence Number" (internal unique identifier) for the representation of the beacon file in the database.
- $file
-
File to read: Must be a beacon file
- $fields
-
Hashref with additional meta and admin fields to store
- Supported options:
-
verbose => (0|1)
If the file does not contain a minimal correct header (eg. is an empty file or an HTML error page accidentaly caught) no action is performed.
Otherwise, a fresh SeqNo (sequence number) is generated and meta and BEACON-Lines are stored in the appropriate tables in the database.
If the _alias field is provided, existing database entries for this Alias are updated, identifiers not accounted for any more are eventually discarded.
processbeaconheader($self, $fieldref, [ %options] )
Internal subroutine used by loadFile.
- $fieldref
-
Hash with raw fields.
- Supported options:
-
verbose => (0|1)
Show seqnos of old instances which are met by the alias
update ($sq_or_alias, $params, %options)
Loads a beacon file into the database, possibly replacing a previous instance.
Some magic is employed to autoconvert ISO-8859-1 or doubly UTF-8 encoded files back to UTF-8.
Returns undef, if something goes wrong, or the file was not modified since, otherwise returns a pair (new seqence number, number of lines imported).
- $sq_or_alias
-
Sequence number or alias: Used to determine an existing instance.
- $params
-
Hashref, containing
agent => LWP::UserAgent to use _uri => Feed URL to load from
- %options
-
verbose => (0|1) force => (0|1)
Incorporates a new beacon source from a URI in the database or updates an existing one. For HTTP URIs care is taken not to reload an unmodified BEACON feed (unless the 'force' option is provided).
If the feed appears to be newer than the previously loaded version it is fetched, some UTF-8 adjustments are performed if necessary, then it is stored to a temporary file and from there finally processed by the loadFile method above.
The URI to load is determined by the following order of precedence:
_uri Option
admin field uri stored in the database
meta field #FEED taken from the database
Typical use is with an alias, not with a sequence number:
$db->update('whatever');
Can be used to initially load beacon files from URIs:
$db->update("new_alias", {_uri => $file_uri} );
unload ( [ $seqno_or_alias, %options ] )
Deletes the sequence(s).
- $seqno_or_alias
-
numeric sequence number, Alias or SQL pattern.
- Supported options:
-
force => (0|1)
Needed to purge the complete database ($seqno_or_alias empty) or to purge more than one sequence ($seqno_or_alias yields more than one seqno).
purge ( $seqno_or_alias[, %options ] )
Deletes all identifiers from the database to the given pattern, but leaves the stored header information intact, such that it can be updated automatically.
Methods for headers
($rows, @oldvalues) = headerfield ( $sq_or_alias, $key [, $value] )
Gets or sets an meta or admin Entry for the constituent file indicated by $sq_or_alias
($resultref, $metaref) = headers ( [ $seqno_or_alias ] )
Iterates over all
For each iteration returns two hash references:
listCollections ( [ $seqno_or_alias ] )
Iterates over all Sequences and returns on each call an array of
Seqno, Alias, Uri, Modification time, Identifier Count and Unique identifier count
Returns undef if done.
Statistics
idStat ( [ $seqno_or_alias, %options ] )
Count identifiers for the given pattern.
idCounts ( [ $pattern, %options ] )
Iterates through the entries according to the optional id filter expression.
For each iteration the call returns a triple consisting of (identifier, number of rows, and sum of all individual counts).
idList ( [ $pattern ] )
Iterates through the entries according to the optional selection.
For each iteration the call returns a tuple consisting of identifier and an list of array references (Seqno, Hits, Info, explicit Link, AltId) or the emtpy list if finished.
Hits, Info, Link and AltId are normalized to the empty string if undefined (or < 2 for hits).
It is important to finish all iterations before calling this method for "new" arguments:
1 while $db->idList(); # flush pending results
Manipulation of global metadata: Open Search Description
setOSD ( $field, $value }
Sets the field $field of the OpenSearchDescription to $value.
clearOSD ( $field }
Clears the field $field of the OpenSearchDescription.
addOSD ( $field, $value }
Appends $value the (repeatable) field $field of the OpenSearchDescription.
Manipulation of global metadata: Beacon Metadata
These headers are used when you will be publishing a beacon file for the collection.
setBeaconMeta ( $field, $value )
Sets the field $field of the Beacon meta table (used to generate a BEACON file for this service) to $value.
clearBeaconMeta ( $field }
Deletes the field $field of the Beacon meta table.
addBeaconMeta ( $field, $value )
Appends $value to the field $field of the BEACON meta table
admin ( [$field, [$value]] )
Manipulates the admin table.
Yields a hashref to the admin table if called without arguments.
If called with $field, returns the current value, and sets the table entry to $value if defined.
AUTHOR
Thomas Berger
CPAN ID: THB
gymel.com
THB@cpan.org
COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the LICENSE file included with this module.