NAME
App::Sandy::Command::Genome - simulate command class. Simulate genome sequencing
VERSION
version 0.19
SYNOPSIS
sandy genome [options] <fasta-file>
Arguments:
a fasta-file
Options:
-h, --help brief help message
-u, --man full documentation
-v, --verbose print log messages
-p, --prefix prefix output [default:"out"]
-o, --output-dir output directory [default:"."]
-O, --output-format bam, sam, fastq.gz, fastq [default:"fastq.gz"]
-1, --join-paired-ends merge R1 and R2 outputs in one file
-i, --append-id append to the defined template id [Format]
-I, --id overlap the default template id [Format]
-j, --jobs number of jobs [default:"1"; Integer]
-s, --seed set the seed of the base generator
[default:"time()"; Integer]
-c, --coverage fastq-file coverage [default:"8", Number]
-t, --sequencing-type single-end or paired-end reads
[default:"paired-end"]
-q, --quality-profile sequencing system profiles from quality
database [default:"poisson"]
-e, --sequencing-error sequencing error rate for poisson
[default:"0.001"; Number]
-m, --read-mean read mean size for poisson
[default:"100"; Integer]
-d, --read-stdd read standard deviation size for poisson
[default:"0"; Integer]
-M, --fragment-mean the fragment mean size for paired-end reads
[default:"300"; Integer]
-D, --fragment-stdd the fragment standard deviation size for
paired-end reads [default:"50"; Integer]
-a, --structural-variation a list of structural variation entries from
variation database. This option may be passed
multiple times [default:"none"]
-A, --structural-variation-regex a list of perl-like regex to match structural
variation entries in variation database.
This option may be passed multiple times
[default:"none"]
DESCRIPTION
Simulate genome sequencing.
OPTIONS
- --help
-
Print a brief help message and exits.
- --man
-
Prints the manual page and exits.
- --verbose
-
Prints log information to standard error
- --prefix
-
Concatenates the prefix to the output-file name.
- --output-dir
-
Creates output-file inside output-dir. If output-dir does not exist, it is created recursively
- --output-format
-
Choose the output format. Available options are: bam, sam, fastq.gz, fastq. For bam option, --append-id is ignored, considering that the sequence identifier is splitted by blank character, so just the first field is included into the query name column (first column).
- --join-paired-ends
-
By default, paired-end reads are put into two different files, prefix_R[12]_001.fastq(\.gz)?. If the user wants both outputs together, she can pass this option. If the --id does not have the escape character %R, it is automatically included right after the first field (blank separated values) as in id/%R - which resolves to id/1 or id/2. It is necessary to distinguish which read is R1/R2
- --append-id
-
Append string template to the defined template id. See Format
- --id
-
Overlap the default defined template id: single-end %i.%U_%c_%s_%t_%n and paired-end %i.%U_%c_%s_%S_%E e.g. SR123.1_chr1_P_1001_1101 See Format
- Format
-
A string Format is a combination of literal and escape characters similar to the way printf works. That way, the user has the freedom to customize the fastq sequence identifier to fit her needs. Valid escape characteres are:
Common escape characters
---------------------------------------------------------------------------- Escape Meaning ---------------------------------------------------------------------------- %i instrument id composed by SR + PID %I job slot number %q quality profile %e sequencing error %x sequencing error position %R read 1, or 2 if it is the paired-end mate %U read number %r read size %m read mean %d read standard deviation %c sequence id as chromossome, gene/transcript id %C sequence id type (reference or alternate non reference allele) *** %s read strand %t read start position %n read end position %a read start position regarding reference genome *** %b read end position regarding reference genome *** %v structural variation position *** ---------------------------------------------------------------------------- *** specific for structural variation (genome simulation only)
Paired-end specific escape characters
---------------------------------------------------------------------------- Escape Meaning ---------------------------------------------------------------------------- %T mate read start position %N mate read end position %A mate read start position regarding reference genome *** %B mate read end position regarding reference genome *** %D distance between the paired-reads %M fragment mean %D fragment standard deviation %f fragment size %F fragment strand %S fragment start position %E fragment end position %X fragment start position regarding reference genome *** %Z fragment end position regarding reference genome *** ---------------------------------------------------------------------------- *** specific for structural variation (genome simulation only)
- --jobs
-
Sets the number of child jobs to be created
- --seed
-
Sets the seed of the base generator. The ability to set the seed is useful for those who want reproducible simulations. Pay attention to the number of jobs (--jobs) set, because each job receives a different seed calculated from the main seed. So, for reproducibility, the same seed set before needs the same number of jobs set before as well.
- --read-mean
-
Sets the read mean if quality-profile is equal to 'poisson'. The quality-profile from database overrides the read-size
- --read-stdd
-
Sets the read standard deviation if quality-profile is equal to 'poisson'. The quality-profile from database overrides the read-stdd
- --coverage
-
Calculates the number of reads based on the sequence coverage: number_of_reads = (sequence_size * coverage) / read_size. This is the default option for genome sequencing simulation
- --sequencing-type
-
Sets the sequencing type to single-end or paired-end
- --fragment-mean
-
If the sequencing-type is set to paired-end, it sets the fragment mean
- --fragment-stdd
-
If the sequencing-type is set to paired-end, it sets the fragment standard deviation
- --sequencing-error
-
Sets the sequencing error rate if quality-profile is equal to 'poisson'. Valid values are between zero and one
- --quality-profile
-
Sets the sequencing system profile for quality. The default value is a poisson distribution, but the user can choose among several profiles stored into the database or import his own data. See quality command for more details
- --structural-variation
-
Sets the structural variation to be applied on the genome feeded. By default no variation is included to the simulation, but the user has the power to point some entries from variation database or index his own data. This option accepts a list with comma separated values and can be passed multiple times, which is useful in order to join various types of structural variation into the same simulation. It is possible to combine this option with --structural-variation-regex See variation command for the available list of structural variation entries
- --structural-variation-regex
-
Applies perl-regex in the variation database and selects all entryes that match the pattern. This option accepts a list with comma separated values and can be passed multiple times. It is possible to combine this option with --structural-variation See variation command for the available list of structural variation entries
AUTHORS
Thiago L. A. Miller <tmiller@mochsl.org.br>
J. Leonel Buzzo <lbuzzo@mochsl.org.br>
Gabriela Guardia <gguardia@mochsl.org.br>
Fernanda Orpinelli <forpinelli@mochsl.org.br>
Pedro A. F. Galante <pgalante@mochsl.org.br>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2018 by Teaching and Research Institute from Sírio-Libanês Hospital.
This is free software, licensed under:
The GNU General Public License, Version 3, June 2007