NAME
BioX::Workflow - A very opinionated template based workflow writer.
SYNOPSIS
Most of the functionality can be accessed through the biox-workflow.pl script.
biox-workflow.pl --workflow /path/to/workflow.yml
This module was written with Bioinformatics workflows in mind, but should be extensible to any sort of workflow or pipeline.
Usage
Please check out the full Usage Docs at BioX::Workflow::Usage
In Code Documenation
You shouldn't really need to look here unless you have some reason to do some serious hacking.
Attributes
Moose attributes. Technically any of these can be changed, but may break everything.
comment_char
coerce_paths
select_rules
Select a subsection of rules
resample
Boolean value get new samples based on indir/file_rule or no
Samples are found at the beginning of the workflow, based on the global indir variable and the file_find.
Chances are you don't want to set resample to try, because these files probably won't exist outside of the indirectory until the pipeline is run.
One example of doing so, shown in the gemini.yml in the examples directory, is looking for uncompressed files, .vcf extension, compressing them, and then resampling based on the .vcf.gz extension.
find_by_dir
Use this option when you sample names are by directory The default is to find samples by filename
/SAMPLE1
SAMPLE1_r1.fastq.gz
SAMPLE1_r2.fastq.gz
/SAMPLE2
SAMPLE2_r1.fastq.gz
SAMPLE2_r2.fastq.gz
by_sample_outdir
outdir/
/outdir/SAMPLE1
/rule1
/rule2
/rule3
/outdir/SAMPLE2
/rule1
/rule2
/rule3
Instead of
/outdir
/rule1
/rule2
This feature is not particularly well supported, and may break when mixed with other methods, particularly --resample
min
Print the workflow as 2 files.
#run-workflow.sh
export SAMPLE=sampleN && ./run_things
number_rules
Instead of
outdir/
rule1
rule2
outdir/
001-rule1
002-rule2
auto_name
Auto_name - Create outdirectory based on rulename
global: - outdir: /home/user/workflow/processed rule: normalize: process: dostuff {$self->indir}/{$sample}.in >> {$self->outdir}/$sample.out
Would create your directory structure /home/user/workflow/processed/normalize (if it doesn't exist)
auto_input
This is similar to the auto_name function in the BioX::Workflow. Instead this says each input should be the previous output.
verbose
Output some more things
wait
Print "wait" at the end of each rule
override_process
local: - override_process: 1
indir outdir
create_outdir
INPUT OUTPUT
Special variables that can have input/output
These variables are also used in BioX::Workflow::Plugin::Drake
file_rule
Rule to find files
No GetOpt Here
attr
attributes read in from runtime
global_attr
Attributes defined in the global section of the yaml file
local_attr
Attributes defined in the rules->rulename->local section of the yaml file
local_rule
infiles
Infiles to be processed
samples
process
Do stuff
key
Do stuff
workflow
Path to workflow workflow. This must be a YAML file.
rule_based
This is the default. The outer loop are the rules, not the samples
sample_based
Default Value. The outer loop is samples, not rules. Must be set in your global values or on the command line --sample_based 1
If you ever have resample: 1 in your config you should NOT set this value to true!
stash
This isn't ever used in the code. Its just there incase you want to do some things with override_process
It uses Moose::Meta::Attribute::Native::Trait::Hash and supports all the methods.
set_stash => 'set',
get_stash => 'get',
has_no_stash => 'is_empty',
num_stashs => 'count',
delete_stash => 'delete',
stash_pairs => 'kv',
_classes
Saves a snapshot of the entire namespace for the initial environment, and each rule.
Subroutines
Subroutines can also be overriden and/or extended in the usual Moose fashion.
run
Starting point.
save_env
At each rule save the env for debugging purposes.
make_outdir
Set initial indir and outdir
get_samples
Get basename of the files. Can add optional rules.
sample.vcf.gz and sample.vcf would be sample if the file_rule is (.vcf)$|(.vcf.gz)$
Also gets the full path to infiles
Instead of doing
foreach my $sample (@$self->samples){
dostuff
}
Could have
foreach my $infile (@$self->infiles){
dostuff
}
match_samples
Match samples based on regex written in file_rule
plugin_load
Load plugins defined in yaml with MooseX::Object::Pluggable
class_load
Load classes defined in yaml with Class::Load
make_template
Make the template for interpolating strings
create_attr
make attributes
check_keys
There should be one key and one key only!
clear_process_vars
Clear the process vars
init_process_vars
Initialize the process vars
add_attr
Add the local attr onto the global attr
write_rule_meta
write_process
Fill in the template with the process
process_by_sample_outdir
Make sure indir/outdirs are named appropriated for samples when using by
OUTPUT_to_INPUT
If we are using auto_input chain INPUT/OUTPUT
DESCRIPTION
BioX::Workflow - A very opinionated template based workflow writer.
AUTHOR
Jillian Rowe <jillian.e.rowe@gmail.com>
Acknowledgements
Before version 0.03
This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.
As of version 0.03:
This modules continuing development is supported by NYU Abu Dhabi in the Center for Genomics and Systems Biology. With approval from NYUAD, this information was generalized and put on bitbucket, for which the authors would like to express their gratitude.
COPYRIGHT
Copyright 2015- Weill Cornell Medical College in Qatar
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.