NAME

Data::Downloader::File

DESCRIPTION

Represents a file managed by Data::Downloader. Files are represented in the database as rows in the file table. Each row corresponds to a single file on disk. There may be multiple symbolic links to this file, but the uniqueness of this row reflects the different ways in in which files and their contents may be considered unique. In addition the unique numeric integer id for this file, there are three types of uniqueness : content, filename, and resource.

content

If a file appears in a feed which has the same MD5 sum as an existing file, it will not be downloaded multiple times. However, multiple symlink links may be created for it (based on the metadata in the feed).

filename

Filenames are considered unique; if an existing filename appears again, it will be treated as an update, rather than an insert, to the metadata database. (However, if the MD5 differs, it will be re-downlaoded).

resource

If a urn_xpath is given in the configuration, this will be treated a unique identifier for the content. If the same value appears again, an update, rather than an insert, will occur. If the filename is different, this will be changed. if the content is different, new content will be downloaded and the old content will be removed.

METHODS

storage_path

Returns the storage path for this file. This is calculated using the md5, the disk, and the storage root of the repository associated with this file.

download

Download a file. This may be called as either a class method or an instance method. In the former case, it acts as a constructor, saving the object to the database.

Compute the URL if necessary. The URL may come from either an RSS feed (i.e. this file is already in the database) or may be computed using the url template.

Examples :

# make a new file, download it, store it, update symlinks
my $file = Data::Downloader::File->download(
    md5        => "a46cee6a6d8df570b0ca977b9e8c3097",
    filename   => "OMI-Aura_L2-OMTO3_2007m0220t0052-o13831_v002-2007m0220t221310.he5",
    repository => "local_repo",
);

# equivalent
my $file = Data::Downloader::File->new(
    md5        => "a46cee6a6d8df570b0ca977b9e8c3097",
    filename   => "OMI-Aura_L2-OMTO3_2007m0220t0052-o13831_v002-2007m0220t221310.he5",
    repository => Data::Downloader::Repository->new( name => "local_repo" )->load->id,
);
$file->download or die $file->error;

# download all files for a certain feed
$_->download for $feed->files;

Parameters : repository - a repository name fake - fake the download? skip_links - Skip making symlinks? <name> - value : value for the variable "<name>" in the url_template.

Returns :

true (1)   - the file was downloaded or cached
false (0)  - there was an error (look in $obj->error for a message)
decorate_tree

Put the links for a file within a single linktree. A tree may contain multiple symlinks for a file if there are metadata_transformations defined for this repository which transform a set of metadata into mutltiple sets of template parameters.

Parameters :

tree -- A DD::Linktree object

Make all the symlinks for a file by iterating through the linktrees and checking which satisfy the condition for the tree.

load_file
loads the representation of a file in the database.

Arguments :

filename -- filename to be pruned

Returns : reference to self on success

List all the symlinks for a file

remove

Remove this file from the disk, set "on_disk" to false and remove any symlinks too.

purge

Remove this file and any information stored about it.

check

Check a file and its symlinks and ensure that the database information represents what is stored on disk.

Arguments :

checksum -- if true, also compute the checksum
fix      -- if true, also attempt to fix anything broken

Returns :

nothing, just produces warnings and errors

Remove all the symlinks for a file matching a particular regular expression.

Arguments :

regex -- a regex to match against.

Returns :

false if a link could not be removed
true if all links matching regex could be removed.
load_from_urn

Load this object using the urn stored for it.

list

List the names of files matching the given criteria.

The list is printed to STDOUT.

Arguments:

filename -- show the file name in the list? (default: True)
md5      -- show the file MD5 in the list? (default: False)
id       -- show the file ID in the list? (default: False)
url      -- show the file URL in the list? (default: False)
urn      -- show the file URN in the list? (default: False)
size     -- show the file size in the list? (default: False)
on_disk  -- show the file status in the list? (default: False)
disk     -- show the file location in the list? (default: False)
atime    -- show the file ingest time in the list? (default: False)

Returns :

nothing

SEE ALSO

Rose::DB::Object

"SCHEMA" in Data::Downloader