NAME
dado - Data::Downloader command-line interface.
SYNOPSIS
dado [DEBUG OPTIONS] <config|disk|file|repository|linktree|feed> ACTION [ACTION OPTIONS]
dado [DEBUG OPTIONS] <disks|files|repositories|linktrees|feeds> [CRITERIA] ACTION [ACTION OPTIONS]
# Examples :
dado config init --file flickr.yml
dado feeds refresh --tags apples
dado files download
dado disk usage --summary --human
dado feeds refresh
dado files dump
dado files download
dado linktrees rebuild
dado repositories dump_stats
dado feeds refresh --esdt OMTO3 --startdate 2008-01-01 --archiveset 10003 --count 3
dado linktrees --root '/linktree/root' rebuild
dado files --md5 a46cee6a6d8df570b0ca977b9e8c3097 download --fake 1
dado disks --root '/tmp/foo' usage --human
dado feeds --name georss refresh --esdt OMTO3 --download 1
dado --debug root feeds refresh
dado --trace root feeds refresh
dado --progress_bars linktrees rebuild
QUICK START
To use dado to download some scientific data files, here are a few sample repositories and sequences of commands to try out :
Example 1 :
cat > gsics.conf
name: gsics
storage_root: /tmp/datastore
cache_max_size: 2048000
cache_strategy: LRU
feeds:
name: all
feed_template: http://gsics.nesdis.noaa.gov/datacast/all.xml
file_source:
filename_xpath: title
url_xpath: enclosure/@url
metadata_sources:
- name: starttime
xpath: datacasting:customElement[@name="validity_start_time"]/@value
- name: endtime
xpath: datacasting:customElement[@name="validity_end_time"]/@value
linktrees:
- condition: ~
path_template: '<starttime:%Y/%m/%d>'
root: /tmp/datalinks
^D
$ dado config init --filename ./gsics.conf
$ dado feeds refresh
$ dado files download
$ find /tmp/data* -not -type d
Example 2 :
cat > podaac.conf
---
name: podaac
storage_root: /tmp/datastore
cache_max_size: 2048000
cache_strategy: LRU
feeds:
name: asop2
feed_template: http://podaac.jpl.nasa.gov/datacast/PODAAC-ASOP2-25X01-rss.xml
file_source:
filename_xpath: title
url_xpath: enclosure/@url
metadata_sources:
- name: starttime
xpath: datacasting:acquisitionStartDate
- name: endtime
xpath: datacasting:acquisitionEndDate
linktrees:
- condition: ~
path_template: '<starttime:%Y/%m/%d>'
root: /tmp/datalinks
^D
$ dado config init --filename ./podaac.conf
$ dado feeds refresh
$ dado files download
$ find /tmp/data* -not -type d
DESCRIPTION
This is the command line interface for Data::Downloader. It is used to manage a repository of files, file metadata, and symbolic links.
An invocation of dado is of one of the forms described in the SYNOPSIS above. The valid values for DEBUG OPTIONS, CRITERIA, ACTION, and ACTION OPTIONS are described below.
The options available when using dado commands correspond to the methods available within the corresponding Data::Downloader classes. Those classes are linked to in the headings below.
DEBUG OPTIONS
These are options to control the verbosity of the output. See Log::Log4perl::CommandLine for a complete description. Examples :
dado --debug root ... # show debug messages
dado --trace root ... # show trace messages
Note that a logging category (e.g. "root"), must be given, to disambiugate the logging category from the ITEM or ITEMS.
Additionally, to turn on progress bars, use the --progress_bars option (which takes no arguments). e.g.
dado --progress_bars linktrees rebuild
dado --progress_bars feeds refresh
ACTION
The valid ACTIONs depend on what item is selected.
- config
-
Actions : init, update.
Action options : --file
The configuration is stored in an sqlite database located in $HOME/.data_downloader.db by default (see Data::Downloader::DB).
Examples:
dado config init --file my_config_file.txt dado config update --file my_new_config_file.txt
The former creates a new configuration, the latter updates an existing one. Updating a configuration will _not_ make any changes to files that have already been downloaded (or their symlinks).
The configuration file format is described in detail in Data::Downloader::Config.
- disk
-
Actions : --usage
Action options : --summary, --human
- file
-
Actions : download
Action options : --filename, --repository
If the file was in an RSS feed, then the url for this file is stored in the database and no other action options are necessary. However, if this is a new file, then additional options may be necessary in order to construct the url. In this case, for instance, if the config file contains :
name: example_repository file_url_template: 'http://example.com/<url_parameter>/<filename>'
Then this command:
dado file download \ --repository example_repository \ --url_parameter 12345 \ --filename something.txt
will retrieve http://example.com/12345/something.txt and store it under the "storage_root" (also given in the config file).
Symbolic links will be created to this file, as specified, e.g.
linktrees: - root: /tmp/put_all_files_here condition: ~ path_template: '<filename>'
would result in a symlink from /tmp/put_all_files_here/something.txt to the location of the actual file.
- files
-
Valid actions: dump, download, makelinks, remove, check
See Data::Downloader::File for the behavior of each of these actions.
CRITERIA may refer to file columns (see the schema) or metadata e.g.
dado files --filename abcdefg.hij dump dado files --on_disk 0 download dado files --esdt XYZPDQ remove dado files check
When giving CRITERIA, the option value may be more complex, e.g.
dado files --on_disk 0 --md5 'like: a%' dump
would select all files whose md5sum begins with "a". The value is converted from YAML into a perl data structure which is then passed to Rose::DB::Object::QueryBuilder to create a query, so things like this are possible :
# esdt OML1BCAL, orbits 1000-2000 inclusive dado files --esdt OML1BCAL --orbit '[ge: 1000, le: 2000]' dump # esdt not equal to OML1BCAL dado files --'!esdt' OML1BCAL dump
Also the skip_re option may be passed to dump in order to skip a regex of keys, e.g.
dado files --esdt OML1BCAL dump --skip_re 'log_entries|atime'
- feeds
-
Actions : refresh
Action options : The options will correspond to the variables in the template for any RSS feeds. e.g. this configuration :
feed_template: 'http://example.com/<param1>/<param2>/feed.xml'
would correspond to this command line :
dado feeds refresh --param1 12345 --param2 6789
Two more options are:
--download 1 # download the files --fake 1 # make dummy files (for testing)
Values may be omitted from the command line if they have default values set like so :
feed_parameters: - name: param1 default_value: 12345
If one of the parameters is named "count", this will limit the number of files to download.
- linktrees
-
Actions : rebuild
This will rebuild all the trees of symlinks (i.e. remove them all and re-create them).
When passing CRITERIA, the options may refer to any of the fields for linktrees, e.g.
dado linktrees --root '/linktree/root' rebuild dado linktrees --root 'like: %abc%' rebuild
- repositories
-
Repositories are directory trees which contains all of the files that been downloaded. The only valid action is "dump_stats".