NAME
spipe - simple pipeline running interface
VERSION
version 0.9.1
SYNOPSIS
spipe [--version | [-?|-h|--help] | [-g|--debug] | [--graphviz | [-c|--config file | [[-d|--directory value] | [-i|--input string| [-it|--itype string | [[--start value] | [[--stop value]
spipe -config t/data/string_manipulation.yml -d /tmp/test
DESCRIPTION
Spipe is a control script for running simple pipelines read from configuration files written in YAML language.
For internal details of the pipeline, check the documentation for the perl module App::Pipeline::Simple.
NAME
spipe - simple pipeline running interface
OPTIONS
- -v | --version
-
Print out a line with the program name and version number.
- -? | -h | --help
-
Show this help.
- -g | --debug
-
Print out the UNIX command line equivalent of the pipeline and exit.
Reports parsing and logical errors.
- --graphviz
-
Print out a graphviz dot file.
Example one liner to display a graph of the pipeline:
spipe -config t/data/string_manipulation.yml -graph > \ /tmp/p.dot; dot -Tpng /tmp/p.dot| display
- -c | --config string
-
Path to the config file. Required unless there is a file called config.yml in the current directory.
- -d | --directory string
-
Directory to keep all files.
If the directory does not exist, it will be created and a copy of the config file will be copied into it under name
config.yml
.For subsequent runs of the that pipeline, you adjust the parameters in the configuration file and rerun spipe without -config and -directory options.
- -i | --input string
-
Optional input to pipeline.
- -it | --itype string
-
Type of the optional input. Values?
- --start string
-
ID of the step to start or restart the pipeline.
Fails if the prerequisites of the step are not met, i.e. the input file(s) does not exist.
- --stop string
-
ID of the step to stop the pipeline. Defaults to the last step.
- --verbose int
-
Verbosity level. Defaults to zero. This will get translated to Log::Log4perl levels:
verbose = -1 0 1 2 log level = DEBUG INFO WARN ERROR
RUNNING
Example run:
spipe -config t/data/string_manipulation.xml -dir /tmp/test
reads instructions from the config file and writes all information to the project directory.
The debug option will parse the config file, print out the command line equivalents of all commands and print out warnings of problems encountered in the file:
spipe -config t/data/string_manipulation.xml -dir /tmp/test
An other tool integrated in the system is visualization of the execution graph. It is done with the help of GraphViz perl interface module that will need to be installed from CPAN.
The following command line creates a Graphviz dot file, converts it into an image file and opens it with the Imagemagic display program:
spipe -config t/data/string_manipulation.xml -graph > \
/tmp/p.dot; dot -Tpng /tmp/p.dot | display
CONFIGURATION
The default configuration is written in YAML, a simple and human readable language that can be parsed in many languages cleanly into data structures.
The YAML file contains four top level keys for the hash that the file will be read into: 1) name
to give the pipeline a short name, 2) version
to indicate the version number, 3) description
to give a more verbose explanation what the pipeline does, and 4) steps
listing pipeline steps.
---
description: "Example of a pipeline"
name: String Manipulation
version: '0.4'
steps:
Each step
is identified by an unique short ID and has a name
that identifies an executable somewhere in the system path. Alternatively, you can give the full path leading to the executable file with key path
. The name will be added to the path and padded with a suitable separator character when needed.
Arguments to the executable are given individually as key/value pairs within the args
tag. A single hyphen is added in front of the argument key when they are executed. If two hyphens are needed, just add one the key. Arguments can exist without values, too.
s3:
name: cat
args:
in:
type: redir
value: s1.txt
n:
out:
type: redir
value: s3_mod.txt
next:
- s4
There are two special keys in
and out
that need to have a key type
defined. The IO type
can get several kinds of values:
unnamed
-
that indicates that the argument is an unnamed argument to the executable
redir
-
will be interpreted as UNIX redirection character '<' or '>' depending on the context
file
-
means that IO happens from/to a file and is rendered as named argument
dir
-
is rendered like file, but is a mnemonic that all files under this directory name are processed
Finally, the step
tag can contain the next
key that gives an array of IDs for the next steps in the execution. Typically, these steps depend on the previous step for input.
Practices that are completely bonkers, like spaces in file names, are not supported.
Finally, it is worth noting that YAML can need escaping and quoting to get special characters inside strings. Double quotes around a string works most of the time well. A single quote inside a single quoted string needs to be doubled.
The following example of a perl one-liner (Thanks to Nic Walker for alerting me) could be equally well written using double quotes like this: "'print $F[1]'"
s6:
name: perl
args:
lane: '''print $F[1]'''
in:
type: redir
value: myfile
out:
type: redir
value: sec_column
Advanced features
The pipeline does not have to be linear; it can contain branches. For example, the pipeline can have several start points with different kinds of input: file and string.
Sometimes it is useful to run the same pipeline with different parameter. The starting point of execution can take a value from the command line. Leave the value for the given argument blank in the configuration file and give it from the command line. Matching of values is done by matching the type string.
spipe -conf input_demo.yml --input=ABC --itype=str
---
description: "Demonstrate input from command line"
name: input.yml
version: '0.1'
steps:
s1:
name: echo
args:
in:
type: unnamed
value:
out:
type: redir
value: s1_string.txt
The empty value
will be filled in from the command line into the config.yml
stored in the project directory. Also, the config file looks slightly different since the steps are written out as App::Pipeline::Simple objects. Functionally there is no difference.
TO DO
This pipeline engine has been tested using mostly linear pipelines. Extensive branching and complex dependencies might not work as expected.
There are no explicit tests for the existence of step input files. Scripts are expected to run these steps themselves and die gracefully when appropriate.
There has been no attempt to execute steps in parallel fashion.
If all this is included, this pipeline engine might not be "simple" any more.
SEE ALSO
AUTHOR
Heikki Lehvaslaiho, KAUST (King Abdullah University of Science and Technology).
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Heikki Lehvaslaiho.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.