NAME
BS_PCRTagger.pl
VERSION
Version 3.00
DESCRIPTION
This utility creates unique tags for open reading frames to aid the analysis
of synthetic content in a nascent synthetic genome. Each tag in a gene has
a wildtype and a synthetic version that correspond to the same offset in the
gene; each tag can be paired with another to form gene specific amplicons
which are also specific to either wildtype or synthetic sequence, depending
on which tags are used.
To pick tags for a chromosome, each open reading frame over I<MINORFLEN> base
pairs long will be slightly recoded to contain a set of PCR tags. The
locations and sequences of these tags are carefully chosen to maximize the
selectivity of the tags for either wild type or synthetic sequence. Each wild
type or synthetic tag and its reverse complement are unique in the entire
wild type genome; this is accomplished by creating a BLAST database for the
entire wild type genome and BLASTing each potential tag against it (this
requires that a complete wild type genome is available in the BioStudio
repository). Pairs of tags are selected in such a way that they will not
amplify any other genomic sequence under 1000 bases long. Each synthetic
counterpart to a wild type tag is recoded with GeneDesign's "most different"
algorithm to guarantee maximum nucleotide sequence difference while
maintaining identical protein sequence and, hopefully, minimizing any effect
on gene expression. The synthetic tags are all at least I<MINPERDIFF> percent
recoded from the wild type tags. Each tag is positioned in such a way that
the first and last nucleotides correspond to the wobble of a codon that can
be edited to change its wobble without changing its amino acid. This usually
automatically excludes methionine or tryptophan, but it can exclude others
when a I<MINRSCUVAL> filter is in place. The wobble restriction ensures that
the synthetic and wild type counterparts have different 5' and 3'
nucleotides, minimizing the chances that they (and their complements) will
cross-prime. This means that tags will be between I<MINTAGLEN> and
I<MAXTAGLEN> base pairs long, where I<TAGLEN> is a multiple of 3 plus 1. All
tags have melting temperature between I<MINTAGMELT> and I<MAXTAGMELT> so they
can be used in a single set of PCR conditions.
Tag pairs are chosen to form amplicons specific for each ORF, with at least
one amplicon chosen per kilobase of ORF. Each amplicon is between
I<MINAMPLEN> and I<MAXAMPLEN> base pairs long, ensuring that they will all
fall within an easily identifiable range on an agarose gel. No amplicon will
be chosen within the first I<FIVEPRIMESTART> base pairs of an ORF to avoid
disrupting unknown regulatory features. Amplicons are forbidden from
overlapping each other by more than I<MAXAMPOLAP> percent.
ARGUMENTS
Required arguments:
-C, --CHROMOSOME : The chromosome to be modified
-E, --EDITOR : The person responsible for the edits
-ME, --MEMO : Justification for the edits
Optional arguments:
--ITERATE : [genome, chromosome (def)] Which version number to increment?
-STA, --STARTPOS : The first base for analysis;
-STO, --STOPPOS : The last base for analysis;
--MINTAGMELT : (default 58) Minimum melting temperature for tags
--MAXTAGMELT : (default 60) Maximum melting temperature for tags
--MINPERDIFF : (default 33) Minimum base pair difference between synthetic and
wildtype versions of a tag
--MINTAGLEN : (default 19) Minimum length for tags. Must be a multiple of 3,
plus 1
--MAXTAGLEN : (default 28) Maximum length for tags. Must be a multiple of 3,
plus 1
--MINAMPLEN : (default 200) Minimum span for a pair of tags
--MAXAMPLEN : (default 500) Maximum span for a pair of tags
--MAXAMPOLAP : (default 25) Maximum percentage of overlap allowed between
different tag pairs
--MINORFLEN : (default 501) Minimum size of gene for tagging eligibility
--FIVEPRIMESTART : (default 101) The first base in a gene eligible for a tag
--MINRSCUVAL : (default 0.06) The minimum RSCU value for any replacement codon
in a tag
--OUTPUT : [html, txt (def)] Format of reporting and output.
-h, --help : Display this message