NAME

BioX::Workflow - A very opinionated template based workflow writer.

SYNOPSIS

Most of the functionality can be accessed through the biox-workflow.pl script.

biox-workflow.pl --workflow /path/to/workflow.yml

This module was written with Bioinformatics workflows in mind, but should be extensible to any sort of workflow or pipeline.

Usage

Please check out the full Usage Docs at BioX::Workflow::Usage

In Code Documenation

You shouldn't really need to look here unless you have some reason to do some serious hacking.

Attributes

Moose attributes. Technically any of these can be changed, but may break everything.

comment_char

coerce_paths

select_rules

Select a subsection of rules

resample

Boolean value get new samples based on indir/file_rule or no

Samples are found at the beginning of the workflow, based on the global indir variable and the file_find.

Chances are you don't want to set resample to try, because these files probably won't exist outside of the indirectory until the pipeline is run.

One example of doing so, shown in the gemini.yml in the examples directory, is looking for uncompressed files, .vcf extension, compressing them, and then resampling based on the .vcf.gz extension.

find_by_dir

Use this option when you sample names are by directory The default is to find samples by filename

/SAMPLE1
    SAMPLE1_r1.fastq.gz
    SAMPLE1_r2.fastq.gz
/SAMPLE2
    SAMPLE2_r1.fastq.gz
    SAMPLE2_r2.fastq.gz

by_sample_outdir

outdir/
/outdir/SAMPLE1
    /rule1
    /rule2
    /rule3
/outdir/SAMPLE2
    /rule1
    /rule2
    /rule3

Instead of

/outdir
    /rule1
    /rule2

This feature is not particularly well supported, and may break when mixed with other methods, particularly --resample

min

Print the workflow as 2 files.

#run-workflow.sh
export SAMPLE=sampleN && ./run_things

number_rules

Instead of
outdir/
    rule1
    rule2

outdir/
    001-rule1
    002-rule2

auto_name

Auto_name - Create outdirectory based on rulename

global: - outdir: /home/user/workflow/processed rule: normalize: process: dostuff {$self->indir}/{$sample}.in >> {$self->outdir}/$sample.out

Would create your directory structure /home/user/workflow/processed/normalize (if it doesn't exist)

auto_input

This is similar to the auto_name function in the BioX::Workflow. Instead this says each input should be the previous output.

verbose

Output some more things

wait

Print "wait" at the end of each rule

override_process

local: - override_process: 1

indir outdir

create_outdir

INPUT OUTPUT

Special variables that can have input/output

These variables are also used in BioX::Workflow::Plugin::Drake

file_rule

Rule to find files

No GetOpt Here

attr

attributes read in from runtime

global_attr

Attributes defined in the global section of the yaml file

local_attr

Attributes defined in the rules->rulename->local section of the yaml file

local_rule

infiles

Infiles to be processed

samples

process

Do stuff

key

Do stuff

workflow

Path to workflow workflow. This must be a YAML file.

rule_based

This is the default. The outer loop are the rules, not the samples

sample_based

Default Value. The outer loop is samples, not rules. Must be set in your global values or on the command line --sample_based 1

If you ever have resample: 1 in your config you should NOT set this value to true!

save_object_env

Save object env. This will save all the variables. Useful for debugging, but gets unweildly for larger workflows.

stash

This isn't ever used in the code. Its just there incase you want to do some things with override_process

It uses Moose::Meta::Attribute::Native::Trait::Hash and supports all the methods.

set_stash     => 'set',
get_stash     => 'get',
has_no_stash => 'is_empty',
num_stashs    => 'count',
delete_stash  => 'delete',
stash_pairs   => 'kv',

_classes

Saves a snapshot of the entire namespace for the initial environment, and each rule.

Subroutines

Subroutines can also be overriden and/or extended in the usual Moose fashion.

run

Starting point.

save_env

At each rule save the env for debugging purposes.

make_outdir

Set initial indir and outdir

get_samples

Get basename of the files. Can add optional rules.

sample.vcf.gz and sample.vcf would be sample if the file_rule is (.vcf)$|(.vcf.gz)$

Also gets the full path to infiles

Instead of doing

foreach my $sample (@$self->samples){
    dostuff
}

Could have

foreach my $infile (@$self->infiles){
    dostuff
}

match_samples

Match samples based on regex written in file_rule

plugin_load

Load plugins defined in yaml with MooseX::Object::Pluggable

class_load

Load classes defined in yaml with Class::Load

make_template

Make the template for interpolating strings

create_attr

make attributes

check_keys

There should be one key and one key only!

clear_process_vars

Clear the process vars

init_process_vars

Initialize the process vars

add_attr

Add the local attr onto the global attr

write_rule_meta

write_process

Fill in the template with the process

process_by_sample_outdir

Make sure indir/outdirs are named appropriated for samples when using by

OUTPUT_to_INPUT

If we are using auto_input chain INPUT/OUTPUT

DESCRIPTION

BioX::Workflow - A very opinionated template based workflow writer.

AUTHOR

Jillian Rowe <jillian.e.rowe@gmail.com>

Acknowledgements

Before version 0.03

This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.

As of version 0.03:

This modules continuing development is supported by NYU Abu Dhabi in the Center for Genomics and Systems Biology. With approval from NYUAD, this information was generalized and put on bitbucket, for which the authors would like to express their gratitude.

COPYRIGHT

Copyright 2015- Weill Cornell Medical College in Qatar

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO