NAME
Bio::GeneDesign
VERSION
Version 5.56
DESCRIPTION
AUTHOR
Sarah Richardson <smrichardson@lbl.gov>
CONSTRUCTORS
new
Returns an initialized Bio::GeneDesign object.
This function reads the ConfigData written at installation, imports the relevant sublibraries, and sets the relevant paths.
my $GD = Bio::GeneDesign->new();
ACCESSORS
codon_path
returns the directory containing codon tables
EMBOSS
returns a value if EMBOSS_support was vetted and approved during installation.
BLAST
returns a value if BLAST_support was vetted and approved during installation.
graph
returns a value if graphing_support was vetted and approved during installation.
vmatch
returns a value if vmatch_support was vetted and approved during installation.
enzyme_set
Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects, if the enzyme set has been defined.
To set this value, use set_restriction_enzymes.
enzyme_set_name
Returns the name of the enzyme set in use, if there is one.
To set this value, use set_restriction_enzymes.
all_enzymes
Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects
To set this value, use set_restriction_enzymes.
organism
Returns the name of the organism in use, if there is one.
To set this value, use set_organism.
codontable
Returns the codon table in use, if there is one.
The codon table is a hash reference where the keys are upper case nucleotides and the values are upper case single letter amino acids.
my $codon_t = $GD->codontable();
$codon_t->{"ATG"} eq "M" || die;
To set this value, use set_codontable.
reversecodontable
Returns the reverse codon table in use, if there is one.
The reverse codon table is a hash reference where the keys are upper case single letter amino acids and the values are upper case nucleotides.
my $revcodon_t = $GD->reversecodontable();
$revcodon_t->{"M"} eq "ATG" || die;
This value is set automatically when set_codontable is run.
rscutable
Returns the RSCU table in use, if there is one.
The RSCU codon table is a hash reference where the keys are upper case nucleotides and the values are floats.
my $rscu_t = $GD->rscutable();
$rscu_t->{"ATG"} eq 1.00 || die;
To set this value, use set_rscu_table.
FUNCTIONS
melt
my $Tm = $GD->melt(-sequence => $myseq);
The -sequence argument is required.
Returns the melting temperature of a DNA sequence.
You can set the salt and DNA concentrations with the -salt and -concentration arguments; they are 50mm (.05) and 100 pm (.0000001) respectively.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be analyzed with the -sequence flag.
There are four different formulae to choose from. If you wish to use the nearest neighbor method, use the -nearest_neighbor flag. Otherwise the appropriate formula will be determined by the length of your -sequence argument.
For sequences under 14 base pairs: Tm = (4 * #GC) + (2 * #AT).
For sequences between 14 and 50 base pairs: Tm = 100.5 + (41 * #GC / length) - (820 / length) + 16.6 * log10(salt)
For sequences over 50 base pairs: Tm = 81.5 + (41 * #GC / length) - (500 / length) + 16.6 * log10(salt) - .62;
complement
$my_seq = "AATTCG";
my $complemented_seq = $GD->complement($my_seq);
$complemented_seq eq "TTAAGC" || die;
my $reverse_complemented_seq = $GD->complement($my_seq, 1);
$reverse_complemented_seq eq "CGAATT" || die;
#clean
my $complemented_seq = $GD->complement(-sequence => $my_seq);
$complemented_seq eq "TTAAGC" || die;
my $reverse_complemented_seq = $GD->complement(-sequence => $my_seq,
-reverse => 1);
$reverse_complemented_seq eq "CGAATT" || die;
The -sequence argument is required.
Complements or reverse complements a DNA sequence.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
If you also pass along a true statement, the sequence will be reversed and complemented.
rcomplement
Sugar time!
$my_seq = "AATTCG";
my $reverse_complemented_seq = $GD->rcomplement($my_seq);
$reverse_complemented_seq eq "CGAATT" || die;
#clean
my $reverse_complemented_seq = $GD->complement(-sequence => $my_seq,
-reverse => 1);
$reverse_complemented_seq eq "CGAATT" || die;
The -sequence argument is required.
Reverse complements a DNA sequence.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
transcribe
$my_seq = "AATTCG";
my $RNA_seq = $GD->transcribe($my_seq);
$complemented_seq eq "AAUUCG" || die;
The -sequence argument is required.
Transcribes an RNA sequence from a DNA sequence.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
count
$my_seq = "AATTCG";
my $count = $GD->count($my_seq);
$count->{C} == 1 || die;
$count->{G} == 1 || die;
$count->{A} == 2 || die;
$count->{GCp} == 33.3 || die;
$count->{ATp} == 66.7 || die;
#clean
my $count = $GD->count(-sequence => $my_seq);
You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object.
the count function counts the bases in a DNA sequence and returns a hash reference where each base (including the ambiguous bases) are keys and the values are the number of times they appear in the sequence. There are also the special values GCp and ATp for GC and AT percentage.
GC_windows
takes a nucleotide sequence, a window size, and minimum and maximum values. returns lists of real coordinates of subsequences that violate mimimum or maximum GC percentages.
Values are returned inside an array reference such that the first value is an array ref of minimum violators (as array refs of left/right coordinates), and the second value is an array ref of maximum violators.
$return_value = [ [[left, right], [left, right]], #minimum violators [[left, right], [left, right]] #maximum violators ];
regex_nt
my $my_seq = "ABC";
my $regex = $GD->regex_nt(-sequence => $my_seq);
# $regex is qr/A[CGT]C/;
my $regarr = $GD->regex_nt(-sequence => $my_seq --reverse_complement => 1);
# $regarr is [qr/A[CGT]C/, qr/G[ACG]T/]
You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.
regex_nt creates a compiled regular expression or a set of them that can be used to query large nucleotide sequences for possibly ambiguous subsequences.
If you want to get regular expressions for both the forward and reverse senses of the DNA, use the -reverse_complement flag and expect a reference to an array of compiled regexes.
regex_aa
my $my_pep = "AEQ*";
my $regex = $GD->regex_aa(-sequence => $my_pep);
$regex == qr/AEQ[\*]/ || die;
Creates a compiled regular expression or a set of them that can be used to query large amino acid sequences for smaller subsequences.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.
sequence_is_ambiguous
my $my_seq = "ABC";
my $flag = $GD->sequence_is_ambiguous($my_seq);
$flag == 1 || die;
$my_seq = "ATC";
$flag = $GD->sequence_is_ambiguous($my_seq);
$flag == 0 || die;
Checks to see if a DNA sequence contains ambiguous bases (RYMKWSBDHVN) and returns true if it does.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
ambiguous_translation
my $my_seq = "ABC";
my @peps = $GD->ambiguous_translation(-sequence => $my_seq, -frame => 1);
# @peps is qw(I T C)
You must pass a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
Translates a nucleotide sequence that may have ambiguous bases and returns an array of possible peptides.
The frame argument may be 1, 2, 3, -1, -2, or -3. It may also be t (three, 1, 2, 3), or s (six, 1, 2, 3, -1, -2, -3). It defaults to 1.
ambiguous_transcription
my $my_seq = "ABC";
my $seqs = $GD->ambiguous_transcription($my_seq);
# $seqs is [qw(ACC AGC ATC)]
Deambiguates a nucleotide sequence that may have ambiguous bases and returns a reference to a sorted array of possible unambiguous sequences.
You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.
positions
my $seq = "TGCTGACTGCAGTCAGTACACTACGTACGTGCATGAC";
my $seek = "CWC";
my $positions = $GD->positions(-sequence => $seq,
-query => $seek);
# $positions is {18 => "CAC"}
$positions = $GD->positions(-sequence => $seq,
-query => $seek,
-reverse_complement => 1);
# $positions is {18 => "CAC", 28 => "GTG"}
Finds and returns all the positions and sequences of a potentially ambiguous subsequence in a larger sequence. The reverse_complement flag is off by default.
You can pass either string variables, Bio::Seq objects, or Bio::SeqFeatureI objects as the sequence and query arguments; additionally you may pass a RestrictionEnzyme object as the query argument.
parse_organisms
Returns two hash references. The first contains the names of all rscu tables. The second contains the name of all codon tables.
set_codontable
# load a codon table from the GeneDesign configuration directory
$GD->set_codontable(-organism_name => "yeast");
# load a codon table from an arbitrary path and catch it in a variable
my $codon_t = $GD->set_codontable(-organism_name => "custom",
-table_path => "/path/to/table.ct");
The -organism_name argument is required.
This function loads, sets, and returns a codon definition table. After it is run the accessor codontable will return the hash reference that represents the codon table.
If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. Any codon table that is using a non standard definition for a codon will cause a warning to be issued.
The table format for codon tables is
# Standard genetic code
{TTT} = F
{TTC} = F
{TTA} = L
...
See NCBI's table
set_rscutable
# load a RSCU table from the GeneDesign configuration directory
$GD->set_rscutable(-organism_name => "yeast");
# load an RSCU table from an arbitrary path and catch it in a variable
my $rscu_t = $GD->set_rscutable(-organism_name => "custom",
-table_path => "/path/to/table.rscu");
The -organism_name argument is required.
This function loads, sets, and returns an RSCU table. After it is run the accessor rscutable will return the hash reference that represents the RSCU table.
If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. If there is no table in that directory, a warning will appear and the flat RSCU table will be used.
Any RSCU table that is missing a definition for a codon will cause a warning to be issued. The table format for RSCU tables is
# Saccharomyces cerevisiae (Highly expressed genes)
# Nucleic Acids Res 16, 8207-8211 (1988)
{TTT} = 0.19
{TTC} = 1.81
{TTA} = 0.49
...
See Sharp et al. 1986.
set_organism
# load both codon tables and RSCU tables simultaneously
$GD->set_organism(-organism_name => "yeast");
# with arguments
$GD->set_organism(-organism_name => "custom",
-table_path => "/path/to/table.ct",
-rscu_path => "/path/to/table.rscu");
The -organism_name argument is required.
This function is just a shortcut; it runs "set_codontable" in set_codontable and "set_rscutable" in set_rscutable. See those functions for details.
codon_count
# count the codons in a list of sequences
my $tally = $GD->codon_count(-input => \@sequences);
# add a gene to an existing codon count
$tally = $GD->codon_count(-input => $sequence,
-count => $tally);
# add a list of Bio::Seq objects to an existing codon count
$tally = $GD->codon_count(-input => \@seqobjects,
-count => $tally);
The -input argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.
The codon_count function takes a set of sequences and counts how often each codon appears in them. It returns a hash reference where the keys are upper case nucleotide codons and the values are integers. If you pass a hash reference containing codon counts with the -count argument, new counts will be added to the old values.
This function will warn you if non nucleotide codons are found.
TODO: what about ambiguous codons?
generate_RSCU_table
my $rscu_t = $GD->generate_RSCU_table(-sequences => \@list_of_sequences);
The -sequences argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.
The generate_RSCU_table function takes a set of sequences, counts how often each codon appears, and returns an RSCU table as a hash reference where the keys are upper case nucleotide codons and the values are floats.
See Sharp et al. 1986.
generate_codon_report
my $report = $GD->generate_codon_report(-sequences => \@list_of_sequences);
The report will have the format
TTT (F) 12800 0.74
TTC (F) 21837 1.26
TTA (L) 4859 0.31
TTG (L) 18806 1.22
where the first column in each group is the codon, the second column is the one letter amino acid abbreviation in parentheses, the third column is the number of times that codon has been seen, and the fourth column is the RSCU value for that codon.
This report comes in a 4x4 layout, as would a standard genetic code table in a textbook.
NO TEST
generate_RSCU_file
my $contents = $GD->generate_RSCU_file(
-sequences => \@seqs,
-comments => ["Got these codons from mice"]
);
open (my $OUT, '>', '/path/to/cods') || die "can't write to /path/to/cods";
print $OUT $contents;
close $OUT;
This function generates a string that can be written to file to serve as a GeneDesign RSCU table. Provide a set of sequences and an optional array reference of comments to prepend to the file.
The file will have the format # Comment 1 # ... # Comment n {TTT} = 0.19 {TTC} = 1.81 ...
NO TEST
list_enzyme_sets
my @available_enzlists = $GD->list_enzyme_sets();
# @available_enzlists == ('standard_and_IIB', 'blunts', 'IIB', 'nonpal', ...)
Returns an array containing the names of every restriction enzyme recognition list GeneDesign knows about.
set_restriction_enzymes
$GD->set_restriction_enzymes(-enzyme_set => 'blunts');
or
$GD->set_restriction_enzymes(-list_path => '/path/to/enzyme_file');
or even
$GD->set_restriction_enzymes(
-list_path => '/path/to/enzyme_file',
-enzyme_set => 'custom_enzymes'
);
All will return a hash structure full of restriction enzymes.
Tell GeneDesign which set of restriction enzymes to use. If you provide only a set name with the -enzyme_set flag, GeneDesign will check its config path for a matching file. Otherwise you must provide a path to a file (and optionally a name for the set).
remove_from_enzyme_set
Removes a subset of enzymes from an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of enzyme names.
$GD->set_restriction_enzymes(-enzyme_set => 'blunts');
$GD->remove_from_enzyme_set(-enzymes => ['SmaI', 'MlyI']);
NO TEST
add_to_enzyme_set
Adds a subset of enzymes to an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of RestrictionEnzyme objects.
#Grab all known enzymes
my $allenz = $GD->set_restriction_enzymes(-enzyme_set => 'standard_and_IIB');
#Pull out a few
my @keepers = ($allenz->{'BmrI'}, $allenz->{'HphI'});
#Give GeneDesign the enzyme set you want
$GD->set_restriction_enzymes(-enzyme_set => 'blunts');
#Add the few enzymes it didn't have before
$GD->add_to_enzyme_set(-enzymes => \@keepers);
NO TEST
restriction_status
build_prefix_tree
Take an array reference of nucleotide sequences (they can be strings, Bio::Seq objects, or Bio::GeneDesign::RestrictionEnzyme objects) and create a suffix tree. If you add the peptide flag, the sequences will be ambiguously translated before they are added to the suffix tree. Otherwise they will be ambiguously transcribed. It will add the reverse complement of any non peptide sequence as long as the reverse complement is different.
my $tree = $GD->build_prefix_tree(-input => ['GGATCC']);
my $ptree = $GD->build_prefix_tree(
-input => ['GGCCNNNNNGGCC'],
-peptide => 1
);
search_prefix_tree
Takes a suffix tree and a sequence and searches for results, which are returned as in the Bio::GeneDesign::PrefixTree documentation.
my $hits = $GD->search_prefix_tree(-tree => $ptree, -sequence => $mygeneseq);
# @$hits = (['BamHI', 4, 'GGATCC', 'i hope this didn't pop up'],
# ['OhnoI', 21, 'GGCCC', 'I hope these pop up'],
# ['WoopsII', 21, 'GGCCC', 'I hope these pop up']
#);
pattern_aligner
pattern_adder
codon_change_type
translate
reverse_translate_algorithms
reverse_translate
codon_juggle_algorithms
codon_juggle
subtract_sequence
repeat_smash
make_amplification_primers
NO TEST
contains_homopolymer
Returns 1 if the sequence contains a homopolymer of the provided length (default is 5bp) and 0 else
filter_homopolymers
find_runs
make_graph
make_dotplot
import_seqs
NO TEST
import_seq_from_string
NO TEST
export_formats
Export formats that have been tried and tested to work well.
export_seqs
NO TEST
random_dna
replace_ambiguous_bases
PLEASANTRIES
pad
my $name = 5;
my $nice = $GD->pad($name, 3);
$nice == "005" || die;
$name = "oligo";
$nice = $GD->pad($name, 7, "_");
$nice == "__oligo" || die;
Pads an integer with leading zeroes (by default) or any provided set of characters. This is useful both to make reports pretty and to standardize the length of designations.
attitude
my $adverb = $GD->attitude();
Ask GeneDesign how it handled your request.
endslash
_stripdown
_checkref
COPYRIGHT AND LICENSE
Copyright (c) 2015, Sarah Richardson All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
* The names of Johns Hopkins, the Joint Genome Institute, the Lawrence Berkeley National Laboratory, the Department of Energy, and the GeneDesign developers may not be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.