NAME
AC::MrGamoo::FileList - get list of files
SYNOPSIS
emacs /myperldir/Local/MrGamoo/FileList.pm
copy. paste. edit.
use lib '/myperldir';
my $m = AC::MrGamoo::D->new(
class_filelist => 'Local::MrGamoo::FileList',
);
IMPORTANT
You can fire up the system, and get the servers talking to each other, and perform some limited tests without this file.
But you must provide this file in order to actually run map/reduce jobs.
DESCRIPTION
MrGamoo only runs map/reduce jobs. It is up to you to get the files on to the servers and keep track of where they are. And to tell MrGamoo.
Some people keep the file meta-information in a sql database. Some people keep the file meta-information in a yenta map. Some people keep the file meta-information in the filesystem.
When a new job starts, your get_file_list
function will be called with the job config, and should return an arrayref of matching files along with meta-info.
Each element of the returned arrayref should be a hashref containing at least the following fields:
filename
the name of the file, relative to the basedir
in your config file.
filename => 'www/2010/01/17/23/5943_prod_5x2N5qyerdeddsNi'
location
an arrayref of servers where this file is located. the locations should be the persistent-ids of the servers (see MySelf).
if the same file is replicated on multiple servers, mrgamoo will be able to both intelligently determine which servers will process which files, as well as recover from failures.
location => [ 'mrm@athena.example.com', 'mrm@zeus.example.com' ]
size
this should be the size of the file, in bytes. mrgamoo will consider the sizes of files in determining which servers will process which files.
size => 10843
BUGS
none. you write this yourself.
SEE ALSO
AC::MrGamoo
AUTHOR
You!