BioX::Workflow::Samples
All the options for samples are here.
Variables
resample
Boolean value get new samples based on indir/file_rule or no
Samples are found at the beginning of the workflow, based on the global indir variable and the file_find.
Chances are you don't want to set resample to true. These files probably won't exist outside of the indirectory until the pipeline is run.
One example of doing so, shown in the gemini.yml in the examples directory, is looking for uncompressed files, .vcf extension, compressing them, and then resampling based on the .vcf.gz extension.
infiles
Infiles to be processed
find_by_dir
Use this option when you sample names are by directory The default is to find samples by filename
/SAMPLE1
SAMPLE1_r1.fastq.gz
SAMPLE1_r2.fastq.gz
/SAMPLE2
SAMPLE2_r1.fastq.gz
SAMPLE2_r2.fastq.gz
by_sample_outdir
outdir/
/outdir/SAMPLE1
/rule1
/rule2
/rule3
/outdir/SAMPLE2
/rule1
/rule2
/rule3
Instead of
/outdir
/rule1
/rule2
samples
Our samples to process. They are either found through file_rule, or passed as command line opts
sample
Each time we get the sample we set it.
file_rule
Rule to find files/samples
Subroutines
get_samples
Get basename of the files. Can add optional rules.
sample.vcf.gz and sample.vcf would be sample if the file_rule is (.vcf)$|(.vcf.gz)$
Also gets the full path to infiles
Instead of doing
foreach my $sample (@$self->samples){
dostuff
}
Could have
foreach my $infile (@$self->infiles){
dostuff
}
write_sample_meta
Write the meta for samples
match_samples
Match samples based on regex written in file_rule