NAME
Metadata::DB::Indexer
SYNOPSIS
use Metadata::DB::Indexer;
use File::Find::Rule;
no strict 'refs';
*{Metadata::DB::Indexer::record_identifier_to_metadata} = \&get_mp3_meta;
my $finder = File::Find::Rule->file()->name( qr/\.mp3$/i );
my @music_files = $finder->in('/home/myself');
my $absdb = '/home/myself/music.db';
my $dbh = DBI->connect("dbi:SQLite:dbname=$absdb","","");
my $indexer = Metadata::DB::Indexer({ DBH => $dbh });
$indexer->records_to_index(\@music_files);
$indexer->run;
sub get_mp3_meta {
my $record_identifier = shift;
my $abs_path = $record_identifier;
my $meta = my_sub_that_turns_mp3s_to_hashref_meta($abs_path);
$meta or return; # this registers a fail, and continues
return $meta;
}
my $total = $indexed->records_to_index_count;
my $indexed = $indexer->records_indexed_count;
print STDERR "Done, indexed $indexed of $total records.\n";
DESCRIPTION
This facilitates indexing records for use with Metadata::DB and sub packages. This is meant to completely recreate a metadata table for records. Useful for indexing files, or any sort of record that may be timely.
NOTE
The purpose of Metadata::DB::Analizer and Metadata::DB::WUI are to provide a means to autoregenerate search interfaces without a developer's intervention. The interface regenerates by itself, depending on the data available. This package provides an iterator for indexing groups of data.
If you are dealing with metadata that a computer can deduce from a record set, then this package is useful.
If you are working with metadata that is randomly inserted, this module is not useful.
METHODS
new()
The contructor takes a hash ref as argument, the key DBH must be provided, with a database handle.
my $x = Metadata::DB::Indexer->new({ DBH => $dbh });
records_to_index()
Argument is array ref of 'record identifier's. A record identifier is the id- of sorts, that is sent to your method that generates the metadata.
records_to_index_count()
Returns number. You can call this after you set records_to_index().
record_identifier_to_metadata()
This method is meant to be overridden. This method receives as argument a record identifier, and should return a hash ref where each key is a metadata attribute label. The value can be an array ref or a scalar.
run()
Takes no arguments. Sets the gears in motion, iterates through your record identifiers provided to records_to_index().
records_indexed_count()
Returns number. This is the count of records sucessfully indexed.
AN EXAMPLE INDEXING RUN
A run will drop your entire database table for records. Note, this package does not aid in incremental indexing.
Let's have an example where we are indexing files on disk.
INSTANCE
my $dbh; # you have to provide an open database handle
my $o = Metadata::DB::Indexer->new({ DBH => $dbh });
GENERATE A LIST OF WHAT TO INDEX
You must set the records to index as 'abs paths', because we are going to index files on disk. If you wanted to index something else, like collect data from the web, you would maybe set urls as a list, or something else.
it's up to you how you get that list in this example we are using files, so I would suggest File::Find::Rule
my @abs_paths; # list of abs paths to records on disk
Let's set the list
$o->records_to_index(\@abs_paths);
DEFINE THE METHOD TO GENERATE METADATA FOR EACH RECORD
You should overrite or redefine record_identifier_to_metadata() to generate your own meta.
Example:
*Metadata::DB::Indexer::record_identifier_to_metadata = \&abs_path_to_metadata;
sub abs_path_to_metadata {
my($abs_path) = @_;
# boring example that just records stat info
my @stat = stat($abs_path) or warn("$abs_path not on disk?") and return;
my $meta = {};
( $meta->{dev}, $meta->{ino}, $meta->{mode}, $meta->{nlink},
$meta->{uid}, $meta->{gid}, $meta->{rdev}, $meta->{atime},
$meta->{mtime}, $meta->{ctime}, $meta->{blksize}, $meta->{blocks} ) =
@stat;
return $meta;
}
The method record_identifier_to_metadata() receives one argument only, the list element from your originally provided list. this basically acts as an iteration.
Your method must return a hash ref with keys and values to set as metadata for this 'record' if it returns undef, the iteration is skipped and the run continues
You could do something more interesting like collect id3 tags from mp3s, and then you could search by author, album, genre, a combination of any. You will not have to design the the search form, if you want to use a web interface. Metadata::DB::WUI will take care of generating it for you, so as you reindex, the possible choices to search upon will automatically adapt to what you have indexed. It's really like magic.
RUN
the run will set the db handle to AutoCommit 0, and then commit and set it back to what it was before.
$o->run;
How many records were indexed succesfully?
$o->records_indexed_count;
ANALIZE YOUR DATA
Let's see what we have, in the above example we used an sqlite db open a terminal..
mdri -A -a /home/myself/md_records.db
CREATE A WEB APP TO SEARCH
The whole point of using Metadata::DB is to automatically generate search interfaces to data. The search interface recreates itself depending on 'what' is in there. If you store info on people, you search by people meta, or music, or whatever. This is a very flexible system!
See CGI::Application::Plugin::MetadataDB for an example, see Metadata::DB::WUI
CHANGING THE TABLE
You may be keeping differenta metadata collections in different tables in on db
if so.. you can choose the table by..
$o->table_metadata_name('mp3s'); # you should run check to make sure it is there
HOW TO SET UP A NEW COLLECTION OF META
my $name = 'metadata_mp3s';
$o->table_metadata_name($name);
If you want to just reset the table (drop if exists and create)
$o->table_metadata_reset;
Note that calling a run() will automatically reset the metadata table by the name you have provided via table_metadata_name(), the default is 'metadata'.
DEBUG
You can turn on the debug flag via:
$Metadata::DB::Indexer::DEBUG = 1;
BUGS
Please contact the AUTHOR for any bugs.
SEE ALSO
Metadata::DB Metadata::DB::Search Metadata::DB::Analizer
AUTHOR
Leo Charre leocharre at cpan dot org