NAME
Data::Downloader::File
DESCRIPTION
Represents a file managed by Data::Downloader. Files are represented in the database as rows in the file table. Each row corresponds to a single file on disk. There may be multiple symbolic links to this file, but the uniqueness of this row reflects the different ways in in which files and their contents may be considered unique. In addition the unique numeric integer id for this file, there are three types of uniqueness : content, filename, and resource.
- content
-
If a file appears in a feed which has the same MD5 sum as an existing file, it will not be downloaded multiple times. However, multiple symlink links may be created for it (based on the metadata in the feed).
- filename
-
Filenames are considered unique; if an existing filename appears again, it will be treated as an update, rather than an insert, to the metadata database. (However, if the MD5 differs, it will be re-downlaoded).
- resource
-
If a urn_xpath is given in the configuration, this will be treated a unique identifier for the content. If the same value appears again, an update, rather than an insert, will occur. If the filename is different, this will be changed. if the content is different, new content will be downloaded and the old content will be removed.
METHODS
- storage_path
-
Returns the storage path for this file. This is calculated using the md5, the disk, and the storage root of the repository associated with this file.
- download
-
Download a file. This may be called as either a class method or an instance method. In the former case, it acts as a constructor, saving the object to the database.
Compute the URL if necessary. The URL may come from either an RSS feed (i.e. this file is already in the database) or may be computed using the url template.
Examples :
# make a new file, download it, store it, update symlinks my $file = Data::Downloader::File->download( md5 => "a46cee6a6d8df570b0ca977b9e8c3097", filename => "OMI-Aura_L2-OMTO3_2007m0220t0052-o13831_v002-2007m0220t221310.he5", repository => "local_repo", ); # equivalent my $file = Data::Downloader::File->new( md5 => "a46cee6a6d8df570b0ca977b9e8c3097", filename => "OMI-Aura_L2-OMTO3_2007m0220t0052-o13831_v002-2007m0220t221310.he5", repository => Data::Downloader::Repository->new( name => "local_repo" )->load->id, ); $file->download or die $file->error; # download all files for a certain feed $_->download for $feed->files;
Parameters : repository - a repository name fake - fake the download? skip_links - Skip making symlinks? <name> - value : value for the variable "<name>" in the url_template.
Returns :
true (1) - the file was downloaded or cached false (0) - there was an error (look in $obj->error for a message)
- decorate_tree
-
Put the links for a file within a single linktree. A tree may contain multiple symlinks for a file if there are metadata_transformations defined for this repository which transform a set of metadata into mutltiple sets of template parameters.
Parameters :
tree -- A DD::Linktree object
- makelinks
-
Make all the symlinks for a file by iterating through the linktrees and checking which satisfy the condition for the tree.
- load_file
-
loads the representation of a file in the database.
Arguments :
filename -- filename to be pruned
Returns : reference to self on success
- listlinks
-
List all the symlinks for a file
- remove
-
Remove this file from the disk, set "on_disk" to false and remove any symlinks too.
- purge
-
Remove this file and any information stored about it.
- check
-
Check a file and its symlinks and ensure that the database information represents what is stored on disk.
Arguments :
checksum -- if true, also compute the checksum fix -- if true, also attempt to fix anything broken
Returns :
nothing, just produces warnings and errors
- prune_links
-
Remove all the symlinks for a file matching a particular regular expression.
Arguments :
regex -- a regex to match against.
Returns :
false if a link could not be removed true if all links matching regex could be removed.
- load_from_urn
-
Load this object using the urn stored for it.
- list
-
List the names of files matching the given criteria.
The list is printed to STDOUT.
Arguments:
filename -- show the file name in the list? (default: True) md5 -- show the file MD5 in the list? (default: False) id -- show the file ID in the list? (default: False) url -- show the file URL in the list? (default: False) urn -- show the file URN in the list? (default: False) size -- show the file size in the list? (default: False) on_disk -- show the file status in the list? (default: False) disk -- show the file location in the list? (default: False) atime -- show the file ingest time in the list? (default: False)
Returns :
nothing