Example002
Here is a very simple example that searches a directory for *.csv files and creates an outdir /home/user/workflow/output if one doesn't exist.
Create the /home/user/workflow/workflow.yml
yaml
---
global:
- indir: /home/user/workflow/workflow
- outdir: /home/user/workflow/workflow/output
- file_rule: (.*).csv$
rules:
- backup:
process: cp {$self->indir}/{$sample}.csv {$self->outdir}/{$sample}.csv
- grep_VARA:
process: |
echo "Working on {$self->{indir}}/{$sample.csv}"
grep -i "VARA" {$self->indir}/{$sample}.csv >> {$self->outdir}/{$sample}.grep_VARA.csv
- grep_VARB:
process: |
grep -i "VARB" {$self->indir}/{$sample}.grep_VARA.csv >> {$self->outdir}/{$sample}.grep_VARA.grep_VARB.csv
Make some test data
```yaml cd /home/user/workflow
#Create test1.csv with some lines
echo "This is VARA" >> test1.csv
echo "This is VARB" >> test1.csv
echo "This is VARC" >> test1.csv
#Create test2.csv with some lines
echo "This is VARA" >> test2.csv
echo "This is VARB" >> test2.csv
echo "This is VARC" >> test2.csv
echo "This is some data I don't want" >> test2.csv
```
Run the script to create out directory structure and workflow bash script
bash
biox-workflow.pl --workflow workflow.yml > workflow.sh
Look at the directory structure
/home/user/workflow/
test1.csv
test2.csv
/output
/backup
/grep_vara
/grep_varb
Run the workflow
Assuming you saved your output to workflow.sh if you run ./workflow.sh you will get the following.
yaml
/home/user/workflow/
test1.csv
test2.csv
/output
/backup
test1.csv
test2.csv
/grep_vara
test1.grep_VARA.csv
test2.grep_VARA.csv
/grep_varb
test1.grep_VARA.grep_VARB.csv
test2.grep_VARA.grep_VARB.csv
A closer look at workflow.sh
This top part here is the metadata. It tells you the options used to run the script.
bash
#
# This file was generated with the following options
# --workflow workflow.yml
#
If --verbose is enabled, and it is by default, you'll see some variables printed out for your benefit
bash
#
# Variables
# Indir: /home/user/workflow
# Outdir: /home/user/workflow/output/backup
# Samples: test1 test2
#
Here is out first rule, named backup. As you can see our $self->outdir is automatically named 'backup', relative to the globally defined outdir.
```bash # # Starting backup #
cp /home/user/workflow/test1.csv /home/user/workflow/output/backup/test1.csv
cp /home/user/workflow/test2.csv /home/user/workflow/output/backup/test2.csv
wait
#
# Ending backup
#
```
Notice the 'wait' command. If running your outputted workflow through any of the HPC::Runner scripts, the wait signals to wait until all previous processes have ended before beginning the next one.
Basically, wait builds a linear dependency tree.
For instance, if running this as
slurmrunner.pl --infile workflow.sh
#OR
mcerunner.pl --infile workflow.sh
The "cp blahblahblah" commands would run in parallel, and the next rule would not begin until those processes have finished.