NAME

Audio::DB - Tools for generating relational databases of MP3s

SYNOPSIS

      use Audio::DB;
      my $mp3 = Audio::DB->new(-user    =>'user',
			       -pass    =>'password',
			       -host    =>'db_host',
			       -dsn     =>'music_db',
                               -adaptor => 'mysql');

      $mp3->initialize(1);

      $mp3->load_database(-dirs =>['/path/to/MP3s/'],
		          -tmp  =>'/tmp');

DESCRIPTION

Audio::DB is a module for creating relational databases of MP3 files directly from data stored in ID3 tags or from flatfiles of information of track information. Once created, Audio::DB provides various methods for creating reports and web pages of your collection. Although it's nutritious and delicious on its own, Audio::DB was created for use with Apache::Audio::DB, a subclass of Apache::MP3. This module makes it easy to make your collection web-accessible, complete with browsing, searching, streaming, multiple users, playlists, ratings, and more!

REQUIRES

MP3::Info for reading ID3 tags, LWP::MediaTypes for distinguising types of readable files;

EXPORTS

No methods are exported.

CAVEATS

Metrics for assigning songs to albums: Since Audio::DB processes file-by-file, it uses a number of parameters to assign tracks to albums. The quality of the results of Audio::DB will depend directly on the quality and integrity of the ID3 tags of your files.

Single tracks (those not belonging to a specific album) are distinguished by either undef or the label "single" in the album tag. In this way, all the single tracks for a given artist can be easily grouped together and fetched as a sort of pseudo-album. Of course, since you've ripped all of your MP3z from albums that you own, this shouldn't be a problem ;).

If two or more albums have the same name ("Greatest Hits"), Audio::DB checks to see if the year they were released and the total number of tracks is the same. If so, it thinks they are the same album, and all tracks are grouped together. This works most of the time, but obviously will fail sometimes. If you haven't assigned either of these tags, you'll have one less metric for distinguishing tracks. If you have a better metric for distinguishing tracks, please let me know!

METHODS

initialize

Title   : initialize
Usage   : $mp3->initialize(-erase=>$erase);
Function: initialize a new database
Returns : true if initialization successful
Args    : a set of named parameters
Status  : Public

This method can be used to initialize an empty database. It takes the following named arguments:

-erase     A boolean value.  If true the database will be wiped clean if it
           already contains data.

A single true argument ($mp3->initialize(1) is the same as initialize(-erase=>1). Future versions may support additional options for initialization and database construction (ie custom schemas).

load_database

 Title   : load_database
 Usage   :

       Creating a database by reading the tags from MP3 files:
       $stats = $mp3->load_database(-dirs    => ['/path/to/MP3s/'],
                	            -tmp     => '/tmp',
                                    -verbose => 100);

       Creating a database from a flat file of file information
:       $stats = $mp3->load_database(-files   => ['/path/to/files/'],
                                    -columns  => '[columns in file]',
	                            -tmp      => '/tmp',
                                    -verbose  => 100);

       Creating a database from the iTunes Music Library.xml file
       $stats = $mp3->load_database(-library  => '/path/to/iTunes\ Music\ Library.xml',
                                    -verbose  =>  100);

 Function: Parses mp3s and loads database
 Returns : Hash reference containing number of artists, albums, songs,
           and genres processed.
 Args    : array of top-level paths to mp3s; path to tmp directory, 
           verbose flag
 Status  : Public

load_database is a broad wrapper method that provides simplified access to many Audio::DB less-public methods. load_database expects an array of top level paths to directories containing MP3s to load. The second required parameter is the path to a suitable /tmp directory. Audio::DB::Build will write temporary files to this directory prior to doing bulk loads into the database.

The optional -verbose flag will a variety of messages to be displayed to STDERR during processing. The value of -verbose controls how frequently to display a message during song processing.

Instead of reading the tags directly, a flat file or files containing the ID3 tag information can be read. This is particularly useful, in part for offline files that have been cataloged with utilities like MP3Rage. Furthermore, I've found that the MP3::Info modules that Audio::DB::Build relies on isn't as robust at reading tags as other applications. The path to individual files or directories contain batches of these files should be passed in as an anonymous array. A second parameter, columns, should also be passed showing the order of the fields in the file. Minimally, the file should contain album, artist, and title. The following column names should be adhered to:

title        => song title
artist       => performing artist
album        => containing album
track        => song track number
total_tracks => total tracks on album
duration     => [optional] formatted string of song duration
seconds      => [optional] song duration in seconds
bitrate      => [optional] integer. The bitrate of the song
samplerate   => [optional] sample rate of encoding
comment      => [optional] song comment
filename     => [optional] duh.
filesize     => [optional] file size in kb
filepath     => [optional] absolute file path
tagtypes     => [optional] ID3 tag types present
fileformat   => [optional] file format
channels     => [optional] number of channels
year         => [optional] year of the album
rating       => [optional] user rating
playcount    => [optional] song play count
playdate     => [optional] date song last played
dateadded    => [optional] date song added to collection
datemodified => [optional] date song information last modified

update_database

 Title   : update_database
 Usage   : $mp3->update_database(-dirs    =>['/path/to/MP3s/'],
 		                 -tmp     =>'/tmp',
                                 -verbose => '/100/');

           $mp3->update_database(-files    =>['/path/to/files'],
		                 -columns  =>'[columns in file]',
		                 -tmp     =>'/tmp',
                                 -verbose  => 100);

 Function: Parses new mp3s and adds them to a pre-existing database,
 Returns : true if succesful
 Args    : array of top-level paths to new mp3s; path to tmp directory
 Status  : Public
  

<B>update_database<B> accepts the same parameters and is a similar in function to load_database except that it takes a path to new mp3s and adds them to a preexisting database. The artist and album of these new files will be checked against those already existing in the database to prevent addition of duplicates. Duplicate songs, however, will be added. This is a feature, since you may want multiple copies of some tracks. It's up to you in advance to remove duplicates if you don't want them listed in your database. See the section below "Appending To A Preexisting Database" for more information on using this method.

The optional -verbose flag will a variety of messages to be displayed to STDERR during processing. The value of -verbose controls how frequently to display a message during song processing.

Like load_database, update_database can read information directly from flat files instead of the MP3s themselves. See load_database for more information.

Additional Public Methods

Audio::DB;:Build contains several additional public methods that you are welcome to use if you'd like greater control over file parsing and database loading. In the normal course of things, you probably will not need to use these methods directly but are described for completeness.

cache_song

Title   : cache_song
Usage   : $mp3->cache_song(-full_path=>$full_path,-file=>$file);
          $mp3->cache_song(-song=>$song);
Function: Parses new mp3s and adds them to a pre-existing database
Returns : true if successful
Args    : a pre-processed data hash arising from one of the Parse modules
Status  : Public

cache_song accepts the filename and full path to a file to be processed. It makes seperate calls to MP3::Info to extract ID3 tag info. Once extracted, song information is checked against the database to determine if the artist or album have been seen before, adding the song to that artist or album or inserting new artists / albums into the internal temporary data structure as required. Finally, the song is added to this structure.

Alternatively, cache_song can be passed a single tab-delimited line of data that holds the relevant information. See load_database for more information and using this interface.

get_couldnt_read

Title   : get_couldnt_read
Usage   : $mp3->get_couldnt_read()
Function: Fetch a list of files that could not be read
Returns : Array reference of files whose tags could not be read
Args    : none
Status  : Public

get_stats

Title   : get_stats
Usage   : $mp3->get_stats;
Function: Get some info on files loaded
Returns : Hash reference containing the number of artists,
          albums, genres, and songs loaded into the database.
Args    : none
Status  : Public

Private Methods

There are a number of private methods, described here for my own sanity. These methods are not part of the public interface.

_establish_counters

Title   : _establish_counters
Usage   : $mp3->_establish_counters
Function: Used to determine the highest values for keys before adding
          new data to the database.
Returns : Hash reference containing the number of artists,
          albums, genres, and songs loaded into the database.
Args    : none
Status  : Private

get_tags

Title   : get_tags
Usage   : $mp3->get_tags(@args);
Function: Fetch and processes raw ID3 tags from files
Returns : Hash reference of parsed tag data
Status  : Private

_check_*_mem, _check_*_db

 _check_artist_mem _check_album_mem 
 _check_genre_mem _check_artist_db
 _check_album_db _check_genre_db

Title   : _check_*_mem or _check_*_db
Usage   : $mp3->_check_album_mem($artist);
Function: Checks for the existence of the current tag
Returns : ID of the appropriate album, artist, genre, if it already exists
Args    : artist, album, or genre, as appropriate
Status  : Private

The _check_* methods check for the pre-existence of the current artist, album, or genre for the file currently being examined. The two variations, *_mem and *_db, control whether this look up is done against the internal data structure in memory or against a pre-existing database.

_check_album_* is necessarily more complex. It attempts to assign songs to albums based on both the year and total number of tracks. See "Caveats" above for more information.

_dump_data_structures

Title   : _dump_data_structures
Usage   : _dump_data_structures
Function: Wrapper around all the _dump_* subroutines
Returns : true if succesful
Args    : none
Status  : Private

_dump_*

 _dump_artists  _dump_albums
 _dump_songs    _dump_genres

Title   : _dump_*
Usage   : _dump_artists()
Function: Create temp files for loading into the database
Returns : true if succesful
Args    : none
Status  : Private

These methods dump out the appropriate data from the internal data structure into the temporary directory path. Some dump multiple tables:

_dump_artists : artists and artist_genres tables 
_dump_albums  : album and album_artists tables
_dump_songs   : songs table
_dump_genres  : genres table

_load_db

Title   : _load_db
Usage   : _load_db()
Function: Loads data from temporary tables into the database
Returns : true if succesful
Args    : none
Status  : Private

_stuff_album

Title   : _stuff_album
Usage   : _stuff_album()
Function: Stuffs the current album into the internal data structure
Returns : true if succesful
Args    : none
Status  : Private

Internal Data Structure

Audio::DB::Build builds a large internal data structure as it reads each file. The data strucutre is:

Lookups - For quick lookups to see if an artist, album or genre has been encountered
$self->{lookups}->{artists}->{$artist} = $artist_id;
$self->{lookups}->{albums}->{$album}   = $album_id;
$self->{lookups}->{songs}->{$song}     = $song_id;
$self->{lookups}->{songs}->{$genre}    = $genre_id;

Counters - for tracking the number of artists, albums, songs, and genres
$self->{counters}->{artists}= $total;
$self->{counters}->{albums} = $total;
$self->{counters}->{songs}  = $total;
$self->{counters}->{genres} = $total;

$self->{couldnt_read} = [ files that could not be read ];

The main data structure of artists, albums, songs, and genres I know, I know, its partially denormalized.

 $self->{artists}->{$artist_id} = { artist => artist name,
		  		    genres => { $genre_ids => total },
				    albums => { $album => $album_id }
				};

 $self->{albums}->{$album_id} = { album     => $album,
          # For tracking multiple genres per album
				  genres    => { $genre_ids => ++ },
	  # For tracking multiple artists per album (compilation CDs)
          			  contributing_artists  => { $artist_id => ++ },
          # Internal measure for distinguishing same-named albums
                                  total_tracks => total number of tracks,
				  year         => year released
			      };

 $self->{songs}->{$song_id} = { title        => song title,
	                        artist_id    => artist_id,
			        album_id     => album_id,
			        genre_id     => genre_id,
			        track        => track number,
			        total_tracks => total tracks on album,
			        duration     => formatted duration,
			        seconds      => raw seconds,
			        bitrate      => song bitrate,
			        samplerate   => sample rate,
			        comment      => id3 comment,
			        filename     => filename,
			        filesize     => filesize,
			        filepath     => filepath,
			        tagtypes     => types of ID3 tags found,
			        format       => MPEG layer,
			        channels     => stereo / mono / joint,
			        song_year    => year (also with album),
			        rating       => user rating,
			        playcount    => play count }

 $self->{genres}->{$genre_id} = { genre => $genre }

BUGS

This module implements a fairly complex internal data structure, which in itself rests upon lots of things going right, like reading ID3 tags, tag naming conventions, etc. On top of that, I wrote this in a Starbucks full of screaming children.

TODO

Need a resonable way of dealing with tags that can't be read

Lots of error checking needs to be added. Support for custom data schemas, including new data types like more extensive artist info, paths to images, etc.

Keep track of stats for updates. Fix update - needs to use mysql (these are the _check_artist_db routines that all need to be implemented)

Robusticize new for different adaptor types

Add in full MP4 support make the data dumps rely on the schema in the module put the schema into its own module

AUTHOR

Copyright 2002-2004, Todd W. Harris <harris@cshl.org>.

This module is distributed under the same terms as Perl itself. Feel free to use, modify and redistribute it as long as you retain the correct attribution.

ACKNOWLEDGEMENTS

Chris Nandor <pudge@pudge.net> wrote MP3::Info, the module responsible for reading MP3 tags. Without, this module would be a best-selling pulp romance novel behind the gum at the grocery store checkout. Chris has been really helpful with issues that arose with various MP3 tags from different taggers. Kudos, dude!

Lincoln (Dr. Leichtenstein) Stein <lstein@cshl.org> wrote much of the original adaptor code as part of the l<Bio::DB::GFF> module. Much of that code is incorporated here, albeit in a pared-down form. The code for reading ID3 tags from files only with appropriate MIME-types is borrowed from his <Apache::MP3> module. This was a much more elegant than my lame solution of checking for .mp3! Lincoln tolerates having me in his lab, too, even though I use a Mac.

SEE ALSO

Audio::DB::Adaptor::dbi::mysql,Audio::DB::Util::Reports, Apache::MP3, Apache::Audio::DB,MP3::Info