NAME
Audio::DB - Tools for generating relational databases of MP3s
SYNOPSIS
use Audio::DB;
my $mp3 = Audio::DB->new(-user =>'user',
-pass =>'password',
-host =>'db_host',
-dsn =>'music_db',
-adaptor => 'mysql');
$mp3->initialize(1);
$mp3->load_database(-dirs =>['/path/to/MP3s/'],
-tmp =>'/tmp');
DESCRIPTION
Audio::DB is a module for creating relational databases of MP3 files directly from data stored in ID3 tags or from flatfiles of information of track information. Once created, Audio::DB provides various methods for creating reports and web pages of your collection. Although it's nutritious and delicious on its own, Audio::DB was created for use with Apache::Audio::DB, a subclass of Apache::MP3. This module makes it easy to make your collection web-accessible, complete with browsing, searching, streaming, multiple users, playlists, ratings, and more!
REQUIRES
MP3::Info for reading ID3 tags, LWP::MediaTypes for distinguising types of readable files;
EXPORTS
No methods are exported.
CAVEATS
Metrics for assigning songs to albums: Since Audio::DB processes file-by-file, it uses a number of parameters to assign tracks to albums. The quality of the results of Audio::DB will depend directly on the quality and integrity of the ID3 tags of your files.
Single tracks (those not belonging to a specific album) are distinguished by either undef or the label "single" in the album tag. In this way, all the single tracks for a given artist can be easily grouped together and fetched as a sort of pseudo-album. Of course, since you've ripped all of your MP3z from albums that you own, this shouldn't be a problem ;).
If two or more albums have the same name ("Greatest Hits"), Audio::DB checks to see if the year they were released and the total number of tracks is the same. If so, it thinks they are the same album, and all tracks are grouped together. This works most of the time, but obviously will fail sometimes. If you haven't assigned either of these tags, you'll have one less metric for distinguishing tracks. If you have a better metric for distinguishing tracks, please let me know!
METHODS
initialize
Title : initialize
Usage : $mp3->initialize(-erase=>$erase);
Function: initialize a new database
Returns : true if initialization successful
Args : a set of named parameters
Status : Public
This method can be used to initialize an empty database. It takes the following named arguments:
-erase A boolean value. If true the database will be wiped clean if it
already contains data.
A single true argument ($mp3->initialize(1) is the same as initialize(-erase=>1). Future versions may support additional options for initialization and database construction (ie custom schemas).
load_database
Title : load_database
Usage :
Creating a database by reading the tags from MP3 files:
$stats = $mp3->load_database(-dirs => ['/path/to/MP3s/'],
-tmp => '/tmp',
-verbose => 100);
Creating a database from a flat file of file information
: $stats = $mp3->load_database(-files => ['/path/to/files/'],
-columns => '[columns in file]',
-tmp => '/tmp',
-verbose => 100);
Creating a database from the iTunes Music Library.xml file
$stats = $mp3->load_database(-library => '/path/to/iTunes\ Music\ Library.xml',
-verbose => 100);
Function: Parses mp3s and loads database
Returns : Hash reference containing number of artists, albums, songs,
and genres processed.
Args : array of top-level paths to mp3s; path to tmp directory,
verbose flag
Status : Public
load_database is a broad wrapper method that provides simplified access to many Audio::DB less-public methods. load_database expects an array of top level paths to directories containing MP3s to load. The second required parameter is the path to a suitable /tmp directory. Audio::DB::Build will write temporary files to this directory prior to doing bulk loads into the database.
The optional -verbose flag will a variety of messages to be displayed to STDERR during processing. The value of -verbose controls how frequently to display a message during song processing.
Instead of reading the tags directly, a flat file or files containing the ID3 tag information can be read. This is particularly useful, in part for offline files that have been cataloged with utilities like MP3Rage. Furthermore, I've found that the MP3::Info modules that Audio::DB::Build relies on isn't as robust at reading tags as other applications. The path to individual files or directories contain batches of these files should be passed in as an anonymous array. A second parameter, columns, should also be passed showing the order of the fields in the file. Minimally, the file should contain album, artist, and title. The following column names should be adhered to:
title => song title
artist => performing artist
album => containing album
track => song track number
total_tracks => total tracks on album
duration => [optional] formatted string of song duration
seconds => [optional] song duration in seconds
bitrate => [optional] integer. The bitrate of the song
samplerate => [optional] sample rate of encoding
comment => [optional] song comment
filename => [optional] duh.
filesize => [optional] file size in kb
filepath => [optional] absolute file path
tagtypes => [optional] ID3 tag types present
fileformat => [optional] file format
channels => [optional] number of channels
year => [optional] year of the album
rating => [optional] user rating
playcount => [optional] song play count
playdate => [optional] date song last played
dateadded => [optional] date song added to collection
datemodified => [optional] date song information last modified
update_database
Title : update_database
Usage : $mp3->update_database(-dirs =>['/path/to/MP3s/'],
-tmp =>'/tmp',
-verbose => '/100/');
$mp3->update_database(-files =>['/path/to/files'],
-columns =>'[columns in file]',
-tmp =>'/tmp',
-verbose => 100);
Function: Parses new mp3s and adds them to a pre-existing database,
Returns : true if succesful
Args : array of top-level paths to new mp3s; path to tmp directory
Status : Public
<B>update_database<B> accepts the same parameters and is a similar in function to load_database except that it takes a path to new mp3s and adds them to a preexisting database. The artist and album of these new files will be checked against those already existing in the database to prevent addition of duplicates. Duplicate songs, however, will be added. This is a feature, since you may want multiple copies of some tracks. It's up to you in advance to remove duplicates if you don't want them listed in your database. See the section below "Appending To A Preexisting Database" for more information on using this method.
The optional -verbose flag will a variety of messages to be displayed to STDERR during processing. The value of -verbose controls how frequently to display a message during song processing.
Like load_database, update_database can read information directly from flat files instead of the MP3s themselves. See load_database for more information.
Additional Public Methods
Audio::DB;:Build contains several additional public methods that you are welcome to use if you'd like greater control over file parsing and database loading. In the normal course of things, you probably will not need to use these methods directly but are described for completeness.
cache_song
Title : cache_song
Usage : $mp3->cache_song(-full_path=>$full_path,-file=>$file);
$mp3->cache_song(-song=>$song);
Function: Parses new mp3s and adds them to a pre-existing database
Returns : true if successful
Args : a pre-processed data hash arising from one of the Parse modules
Status : Public
cache_song accepts the filename and full path to a file to be processed. It makes seperate calls to MP3::Info to extract ID3 tag info. Once extracted, song information is checked against the database to determine if the artist or album have been seen before, adding the song to that artist or album or inserting new artists / albums into the internal temporary data structure as required. Finally, the song is added to this structure.
Alternatively, cache_song can be passed a single tab-delimited line of data that holds the relevant information. See load_database for more information and using this interface.
get_couldnt_read
Title : get_couldnt_read
Usage : $mp3->get_couldnt_read()
Function: Fetch a list of files that could not be read
Returns : Array reference of files whose tags could not be read
Args : none
Status : Public
get_stats
Title : get_stats
Usage : $mp3->get_stats;
Function: Get some info on files loaded
Returns : Hash reference containing the number of artists,
albums, genres, and songs loaded into the database.
Args : none
Status : Public
Private Methods
There are a number of private methods, described here for my own sanity. These methods are not part of the public interface.
_establish_counters
Title : _establish_counters
Usage : $mp3->_establish_counters
Function: Used to determine the highest values for keys before adding
new data to the database.
Returns : Hash reference containing the number of artists,
albums, genres, and songs loaded into the database.
Args : none
Status : Private
get_tags
Title : get_tags
Usage : $mp3->get_tags(@args);
Function: Fetch and processes raw ID3 tags from files
Returns : Hash reference of parsed tag data
Status : Private
_check_*_mem, _check_*_db
_check_artist_mem _check_album_mem
_check_genre_mem _check_artist_db
_check_album_db _check_genre_db
Title : _check_*_mem or _check_*_db
Usage : $mp3->_check_album_mem($artist);
Function: Checks for the existence of the current tag
Returns : ID of the appropriate album, artist, genre, if it already exists
Args : artist, album, or genre, as appropriate
Status : Private
The _check_* methods check for the pre-existence of the current artist, album, or genre for the file currently being examined. The two variations, *_mem and *_db, control whether this look up is done against the internal data structure in memory or against a pre-existing database.
_check_album_* is necessarily more complex. It attempts to assign songs to albums based on both the year and total number of tracks. See "Caveats" above for more information.
_dump_data_structures
Title : _dump_data_structures
Usage : _dump_data_structures
Function: Wrapper around all the _dump_* subroutines
Returns : true if succesful
Args : none
Status : Private
_dump_*
_dump_artists _dump_albums
_dump_songs _dump_genres
Title : _dump_*
Usage : _dump_artists()
Function: Create temp files for loading into the database
Returns : true if succesful
Args : none
Status : Private
These methods dump out the appropriate data from the internal data structure into the temporary directory path. Some dump multiple tables:
_dump_artists : artists and artist_genres tables
_dump_albums : album and album_artists tables
_dump_songs : songs table
_dump_genres : genres table
_load_db
Title : _load_db
Usage : _load_db()
Function: Loads data from temporary tables into the database
Returns : true if succesful
Args : none
Status : Private
_stuff_album
Title : _stuff_album
Usage : _stuff_album()
Function: Stuffs the current album into the internal data structure
Returns : true if succesful
Args : none
Status : Private
Internal Data Structure
Audio::DB::Build builds a large internal data structure as it reads each file. The data strucutre is:
Lookups - For quick lookups to see if an artist, album or genre has been encountered
$self->{lookups}->{artists}->{$artist} = $artist_id;
$self->{lookups}->{albums}->{$album} = $album_id;
$self->{lookups}->{songs}->{$song} = $song_id;
$self->{lookups}->{songs}->{$genre} = $genre_id;
Counters - for tracking the number of artists, albums, songs, and genres
$self->{counters}->{artists}= $total;
$self->{counters}->{albums} = $total;
$self->{counters}->{songs} = $total;
$self->{counters}->{genres} = $total;
$self->{couldnt_read} = [ files that could not be read ];
The main data structure of artists, albums, songs, and genres I know, I know, its partially denormalized.
$self->{artists}->{$artist_id} = { artist => artist name,
genres => { $genre_ids => total },
albums => { $album => $album_id }
};
$self->{albums}->{$album_id} = { album => $album,
# For tracking multiple genres per album
genres => { $genre_ids => ++ },
# For tracking multiple artists per album (compilation CDs)
contributing_artists => { $artist_id => ++ },
# Internal measure for distinguishing same-named albums
total_tracks => total number of tracks,
year => year released
};
$self->{songs}->{$song_id} = { title => song title,
artist_id => artist_id,
album_id => album_id,
genre_id => genre_id,
track => track number,
total_tracks => total tracks on album,
duration => formatted duration,
seconds => raw seconds,
bitrate => song bitrate,
samplerate => sample rate,
comment => id3 comment,
filename => filename,
filesize => filesize,
filepath => filepath,
tagtypes => types of ID3 tags found,
format => MPEG layer,
channels => stereo / mono / joint,
song_year => year (also with album),
rating => user rating,
playcount => play count }
$self->{genres}->{$genre_id} = { genre => $genre }
BUGS
This module implements a fairly complex internal data structure, which in itself rests upon lots of things going right, like reading ID3 tags, tag naming conventions, etc. On top of that, I wrote this in a Starbucks full of screaming children.
TODO
Need a resonable way of dealing with tags that can't be read
Lots of error checking needs to be added. Support for custom data schemas, including new data types like more extensive artist info, paths to images, etc.
Keep track of stats for updates. Fix update - needs to use mysql (these are the _check_artist_db routines that all need to be implemented)
Robusticize new for different adaptor types
Add in full MP4 support make the data dumps rely on the schema in the module put the schema into its own module
AUTHOR
Copyright 2002-2004, Todd W. Harris <harris@cshl.org>.
This module is distributed under the same terms as Perl itself. Feel free to use, modify and redistribute it as long as you retain the correct attribution.
ACKNOWLEDGEMENTS
Chris Nandor <pudge@pudge.net> wrote MP3::Info, the module responsible for reading MP3 tags. Without, this module would be a best-selling pulp romance novel behind the gum at the grocery store checkout. Chris has been really helpful with issues that arose with various MP3 tags from different taggers. Kudos, dude!
Lincoln (Dr. Leichtenstein) Stein <lstein@cshl.org> wrote much of the original adaptor code as part of the l<Bio::DB::GFF> module. Much of that code is incorporated here, albeit in a pared-down form. The code for reading ID3 tags from files only with appropriate MIME-types is borrowed from his <Apache::MP3> module. This was a much more elegant than my lame solution of checking for .mp3! Lincoln tolerates having me in his lab, too, even though I use a Mac.
SEE ALSO
Audio::DB::Adaptor::dbi::mysql,Audio::DB::Util::Reports, Apache::MP3, Apache::Audio::DB,MP3::Info