NAME
fasta-shuffle-notryptic.pl - shuffle each sequence, without any original tryptic peptide
DESCRIPTION
Reads input fasta file and produce a shuffle databank & avoid known cleaved peptides: shuffle sequence but avoid producing known tryptic peptides
SYNOPSIS
#shuffle each sequence fasta-shuffle-notryptic.pl --in=/tmp/uniprot_sprot.fasta
#to limit memory usage, one can use CRC code (--crcsize will ./fasta-shuffle-notryptic.pl --ac-prefix=DECOY_ --in=/home/alex/tmp/a.fasta --out=/tmp/a.fasta --crcsize=33 -v --norandom
ARGUMENTS
--in=infile.fasta
An input fasta file (will be uncompressed if ending with gz)
-out=outfile.fasta
A .fasta file [default is stdout]
OPTIONS
--ac-prefix=string
Set a key to be prepended before the AC in the randomized bank. By default, it will be dependent on the choosen method.
--peptminlength [default 6]
Set the size of the peptide to be reshuffled if they already exist
--crcsize=int
Building a hash of known cleaved peptide can be quite demanding for memory (uniprot_trembl => ~4GB). Therefore solution is to make an array containing statements if or not a peptide with corresponding crc code was found.
The argument passed here is the number of bits use for the CRC coding: 33 means 2^33 bit of memory => 2^30 bytes => 1GB
--norandomseed
Random generator seed is set to 0, so 2 run on same data will produce the same result
misc
--noprogressbar
do not display terminal progress bar (if possible)
--help
--man
--verbose
Setting an environment variable DO_NOT_DELETE_TEMP=1 will keep the temporay file after the script exit
EXAMPLE
COPYRIGHT
Copyright (C) 2004-2006 Geneva Bioinformatics www.genebio.com
This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
AUTHORS
Alexandre Masselot, www.genebio.com