NAME

Bio::GeneDesign

VERSION

Version 5.56

DESCRIPTION

AUTHOR

Sarah Richardson <smrichardson@lbl.gov>

CONSTRUCTORS

new

Returns an initialized Bio::GeneDesign object.

This function reads the ConfigData written at installation, imports the relevant sublibraries, and sets the relevant paths.

my $GD = Bio::GeneDesign->new();

ACCESSORS

codon_path

returns the directory containing codon tables

EMBOSS

returns a value if EMBOSS_support was vetted and approved during installation.

BLAST

returns a value if BLAST_support was vetted and approved during installation.

graph

returns a value if graphing_support was vetted and approved during installation.

vmatch

returns a value if vmatch_support was vetted and approved during installation.

enzyme_set

Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects, if the enzyme set has been defined.

To set this value, use set_restriction_enzymes.

enzyme_set_name

Returns the name of the enzyme set in use, if there is one.

To set this value, use set_restriction_enzymes.

all_enzymes

Returns a hash reference where the keys are enzyme names and the values are RestrictionEnzyme objects

To set this value, use set_restriction_enzymes.

organism

Returns the name of the organism in use, if there is one.

To set this value, use set_organism.

codontable

Returns the codon table in use, if there is one.

The codon table is a hash reference where the keys are upper case nucleotides and the values are upper case single letter amino acids.

my $codon_t = $GD->codontable();
$codon_t->{"ATG"} eq "M" || die;

To set this value, use set_codontable.

reversecodontable

Returns the reverse codon table in use, if there is one.

The reverse codon table is a hash reference where the keys are upper case single letter amino acids and the values are upper case nucleotides.

my $revcodon_t = $GD->reversecodontable();
$revcodon_t->{"M"} eq "ATG" || die;

This value is set automatically when set_codontable is run.

rscutable

Returns the RSCU table in use, if there is one.

The RSCU codon table is a hash reference where the keys are upper case nucleotides and the values are floats.

my $rscu_t = $GD->rscutable();
$rscu_t->{"ATG"} eq 1.00 || die;

To set this value, use set_rscu_table.

FUNCTIONS

melt

my $Tm = $GD->melt(-sequence => $myseq);

The -sequence argument is required.

Returns the melting temperature of a DNA sequence.

You can set the salt and DNA concentrations with the -salt and -concentration arguments; they are 50mm (.05) and 100 pm (.0000001) respectively.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be analyzed with the -sequence flag.

There are four different formulae to choose from. If you wish to use the nearest neighbor method, use the -nearest_neighbor flag. Otherwise the appropriate formula will be determined by the length of your -sequence argument.

For sequences under 14 base pairs: Tm = (4 * #GC) + (2 * #AT).

For sequences between 14 and 50 base pairs: Tm = 100.5 + (41 * #GC / length) - (820 / length) + 16.6 * log10(salt)

For sequences over 50 base pairs: Tm = 81.5 + (41 * #GC / length) - (500 / length) + 16.6 * log10(salt) - .62;

complement

$my_seq = "AATTCG";

my $complemented_seq = $GD->complement($my_seq);
$complemented_seq eq "TTAAGC" || die;

my $reverse_complemented_seq = $GD->complement($my_seq, 1);
$reverse_complemented_seq eq "CGAATT" || die;

#clean
my $complemented_seq = $GD->complement(-sequence => $my_seq);
$complemented_seq eq "TTAAGC" || die;

my $reverse_complemented_seq = $GD->complement(-sequence => $my_seq,
                                               -reverse => 1);
$reverse_complemented_seq eq "CGAATT" || die;

The -sequence argument is required.

Complements or reverse complements a DNA sequence.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

If you also pass along a true statement, the sequence will be reversed and complemented.

rcomplement

Sugar time!

$my_seq = "AATTCG";

my $reverse_complemented_seq = $GD->rcomplement($my_seq);
$reverse_complemented_seq eq "CGAATT" || die;

#clean

my $reverse_complemented_seq = $GD->complement(-sequence => $my_seq,
                                               -reverse => 1);
$reverse_complemented_seq eq "CGAATT" || die;

The -sequence argument is required.

Reverse complements a DNA sequence.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

transcribe

$my_seq = "AATTCG";

my $RNA_seq = $GD->transcribe($my_seq);
$complemented_seq eq "AAUUCG" || die;

The -sequence argument is required.

Transcribes an RNA sequence from a DNA sequence.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

count

$my_seq = "AATTCG";
my $count = $GD->count($my_seq);
$count->{C} == 1 || die;
$count->{G} == 1 || die;
$count->{A} == 2 || die;
$count->{GCp} == 33.3 || die;
$count->{ATp} == 66.7 || die;

#clean
my $count = $GD->count(-sequence => $my_seq);

You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object.

the count function counts the bases in a DNA sequence and returns a hash reference where each base (including the ambiguous bases) are keys and the values are the number of times they appear in the sequence. There are also the special values GCp and ATp for GC and AT percentage.

GC_windows

takes a nucleotide sequence, a window size, and minimum and maximum values. returns lists of real coordinates of subsequences that violate mimimum or maximum GC percentages.

Values are returned inside an array reference such that the first value is an array ref of minimum violators (as array refs of left/right coordinates), and the second value is an array ref of maximum violators.

$return_value = [ [[left, right], [left, right]], #minimum violators [[left, right], [left, right]] #maximum violators ];

regex_nt

my $my_seq = "ABC";
my $regex = $GD->regex_nt(-sequence => $my_seq);
# $regex is qr/A[CGT]C/;

my $regarr = $GD->regex_nt(-sequence => $my_seq --reverse_complement => 1);
# $regarr is [qr/A[CGT]C/, qr/G[ACG]T/]

You must pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.

regex_nt creates a compiled regular expression or a set of them that can be used to query large nucleotide sequences for possibly ambiguous subsequences.

If you want to get regular expressions for both the forward and reverse senses of the DNA, use the -reverse_complement flag and expect a reference to an array of compiled regexes.

regex_aa

my $my_pep = "AEQ*";
my $regex = $GD->regex_aa(-sequence => $my_pep);
$regex == qr/AEQ[\*]/ || die;

Creates a compiled regular expression or a set of them that can be used to query large amino acid sequences for smaller subsequences.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed with the -sequence flag.

sequence_is_ambiguous

my $my_seq = "ABC";
my $flag = $GD->sequence_is_ambiguous($my_seq);
$flag == 1 || die;

$my_seq = "ATC";
$flag = $GD->sequence_is_ambiguous($my_seq);
$flag == 0 || die;

Checks to see if a DNA sequence contains ambiguous bases (RYMKWSBDHVN) and returns true if it does.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

ambiguous_translation

my $my_seq = "ABC";
my @peps = $GD->ambiguous_translation(-sequence => $my_seq, -frame => 1);
# @peps is qw(I T C)

You must pass a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

Translates a nucleotide sequence that may have ambiguous bases and returns an array of possible peptides.

The frame argument may be 1, 2, 3, -1, -2, or -3. It may also be t (three, 1, 2, 3), or s (six, 1, 2, 3, -1, -2, -3). It defaults to 1.

ambiguous_transcription

my $my_seq = "ABC";
my $seqs = $GD->ambiguous_transcription($my_seq);
# $seqs is [qw(ACC AGC ATC)]

Deambiguates a nucleotide sequence that may have ambiguous bases and returns a reference to a sorted array of possible unambiguous sequences.

You can pass either a string variable, a Bio::Seq object, or a Bio::SeqFeatureI object to be processed.

positions

my $seq = "TGCTGACTGCAGTCAGTACACTACGTACGTGCATGAC";
my $seek = "CWC";

my $positions = $GD->positions(-sequence => $seq,
                               -query => $seek);
# $positions is {18 => "CAC"}

$positions = $GD->positions(-sequence => $seq,
                            -query => $seek,
                            -reverse_complement => 1);
# $positions is {18 => "CAC", 28 => "GTG"}

Finds and returns all the positions and sequences of a potentially ambiguous subsequence in a larger sequence. The reverse_complement flag is off by default.

You can pass either string variables, Bio::Seq objects, or Bio::SeqFeatureI objects as the sequence and query arguments; additionally you may pass a RestrictionEnzyme object as the query argument.

parse_organisms

Returns two hash references. The first contains the names of all rscu tables. The second contains the name of all codon tables.

set_codontable

# load a codon table from the GeneDesign configuration directory
$GD->set_codontable(-organism_name => "yeast");

# load a codon table from an arbitrary path and catch it in a variable
my $codon_t = $GD->set_codontable(-organism_name => "custom",
                                  -table_path => "/path/to/table.ct");

The -organism_name argument is required.

This function loads, sets, and returns a codon definition table. After it is run the accessor codontable will return the hash reference that represents the codon table.

If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. Any codon table that is using a non standard definition for a codon will cause a warning to be issued.

The table format for codon tables is

# Standard genetic code
{TTT} = F
{TTC} = F
{TTA} = L
...

See NCBI's table

set_rscutable

# load a RSCU table from the GeneDesign configuration directory
$GD->set_rscutable(-organism_name => "yeast");

# load an RSCU table from an arbitrary path and catch it in a variable
my $rscu_t = $GD->set_rscutable(-organism_name => "custom",
                                -table_path => "/path/to/table.rscu");

The -organism_name argument is required.

This function loads, sets, and returns an RSCU table. After it is run the accessor rscutable will return the hash reference that represents the RSCU table.

If no path is provided, the configuration directory /codon_tables is checked for tables that match the provided organism name. If there is no table in that directory, a warning will appear and the flat RSCU table will be used.

Any RSCU table that is missing a definition for a codon will cause a warning to be issued. The table format for RSCU tables is

# Saccharomyces cerevisiae (Highly expressed genes)
# Nucleic Acids Res 16, 8207-8211 (1988)
{TTT} = 0.19
{TTC} = 1.81
{TTA} = 0.49
...

See Sharp et al. 1986.

set_organism

# load both codon tables and RSCU tables simultaneously
$GD->set_organism(-organism_name => "yeast");

# with arguments
$GD->set_organism(-organism_name => "custom",
                  -table_path => "/path/to/table.ct",
                  -rscu_path => "/path/to/table.rscu");

The -organism_name argument is required.

This function is just a shortcut; it runs "set_codontable" in set_codontable and "set_rscutable" in set_rscutable. See those functions for details.

codon_count

# count the codons in a list of sequences
my $tally = $GD->codon_count(-input => \@sequences);

# add a gene to an existing codon count
$tally = $GD->codon_count(-input => $sequence,
                          -count => $tally);

# add a list of Bio::Seq objects to an existing codon count
$tally = $GD->codon_count(-input => \@seqobjects,
                          -count => $tally);

The -input argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.

The codon_count function takes a set of sequences and counts how often each codon appears in them. It returns a hash reference where the keys are upper case nucleotide codons and the values are integers. If you pass a hash reference containing codon counts with the -count argument, new counts will be added to the old values.

This function will warn you if non nucleotide codons are found.

TODO: what about ambiguous codons?

generate_RSCU_table

my $rscu_t = $GD->generate_RSCU_table(-sequences => \@list_of_sequences);

The -sequences argument is required and will take a string variable, a Bio::Seq object, a Bio::SeqFeatureI object, or a reference to an array full of any combination of those things.

The generate_RSCU_table function takes a set of sequences, counts how often each codon appears, and returns an RSCU table as a hash reference where the keys are upper case nucleotide codons and the values are floats.

See Sharp et al. 1986.

generate_codon_report

my $report = $GD->generate_codon_report(-sequences => \@list_of_sequences);

The report will have the format

TTT (F) 12800 0.74
TTC (F) 21837 1.26
TTA (L)  4859 0.31
TTG (L) 18806 1.22

where the first column in each group is the codon, the second column is the one letter amino acid abbreviation in parentheses, the third column is the number of times that codon has been seen, and the fourth column is the RSCU value for that codon.

This report comes in a 4x4 layout, as would a standard genetic code table in a textbook.

NO TEST

generate_RSCU_file

my $contents = $GD->generate_RSCU_file(
  -sequences => \@seqs,
  -comments => ["Got these codons from mice"]
);
open (my $OUT, '>', '/path/to/cods') || die "can't write to /path/to/cods";
print $OUT $contents;
close $OUT;

This function generates a string that can be written to file to serve as a GeneDesign RSCU table. Provide a set of sequences and an optional array reference of comments to prepend to the file.

The file will have the format # Comment 1 # ... # Comment n {TTT} = 0.19 {TTC} = 1.81 ...

NO TEST

list_enzyme_sets

my @available_enzlists = $GD->list_enzyme_sets();
# @available_enzlists == ('standard_and_IIB', 'blunts', 'IIB', 'nonpal', ...)

Returns an array containing the names of every restriction enzyme recognition list GeneDesign knows about.

set_restriction_enzymes

$GD->set_restriction_enzymes(-enzyme_set => 'blunts');

or

$GD->set_restriction_enzymes(-list_path => '/path/to/enzyme_file');

or even

$GD->set_restriction_enzymes(
  -list_path => '/path/to/enzyme_file',
  -enzyme_set => 'custom_enzymes'
);

All will return a hash structure full of restriction enzymes.

Tell GeneDesign which set of restriction enzymes to use. If you provide only a set name with the -enzyme_set flag, GeneDesign will check its config path for a matching file. Otherwise you must provide a path to a file (and optionally a name for the set).

remove_from_enzyme_set

Removes a subset of enzymes from an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of enzyme names.

$GD->set_restriction_enzymes(-enzyme_set => 'blunts');
$GD->remove_from_enzyme_set(-enzymes => ['SmaI', 'MlyI']);

NO TEST

add_to_enzyme_set

Adds a subset of enzymes to an enzyme list. This only happens in memory, no files will be altered. The argument is an array reference of RestrictionEnzyme objects.

#Grab all known enzymes
my $allenz = $GD->set_restriction_enzymes(-enzyme_set => 'standard_and_IIB');

#Pull out a few
my @keepers = ($allenz->{'BmrI'}, $allenz->{'HphI'});

#Give GeneDesign the enzyme set you want
$GD->set_restriction_enzymes(-enzyme_set => 'blunts');

#Add the few enzymes it didn't have before
$GD->add_to_enzyme_set(-enzymes => \@keepers);

NO TEST

restriction_status

build_prefix_tree

Take an array reference of nucleotide sequences (they can be strings, Bio::Seq objects, or Bio::GeneDesign::RestrictionEnzyme objects) and create a suffix tree. If you add the peptide flag, the sequences will be ambiguously translated before they are added to the suffix tree. Otherwise they will be ambiguously transcribed. It will add the reverse complement of any non peptide sequence as long as the reverse complement is different.

my $tree = $GD->build_prefix_tree(-input => ['GGATCC']);

my $ptree = $GD->build_prefix_tree(
  -input => ['GGCCNNNNNGGCC'],
  -peptide => 1
);

search_prefix_tree

Takes a suffix tree and a sequence and searches for results, which are returned as in the Bio::GeneDesign::PrefixTree documentation.

my $hits = $GD->search_prefix_tree(-tree => $ptree, -sequence => $mygeneseq);

# @$hits = (['BamHI', 4, 'GGATCC', 'i hope this didn't pop up'],
#          ['OhnoI', 21, 'GGCCC', 'I hope these pop up'],
#          ['WoopsII', 21, 'GGCCC', 'I hope these pop up']
#);

pattern_aligner

pattern_adder

codon_change_type

translate

reverse_translate_algorithms

reverse_translate

codon_juggle_algorithms

codon_juggle

subtract_sequence

repeat_smash

make_amplification_primers

NO TEST

contains_homopolymer

Returns 1 if the sequence contains a homopolymer of the provided length (default is 5bp) and 0 else

filter_homopolymers

find_runs

make_graph

make_dotplot

import_seqs

NO TEST

import_seq_from_string

NO TEST

export_formats

Export formats that have been tried and tested to work well.

export_seqs

NO TEST

random_dna

replace_ambiguous_bases

PLEASANTRIES

pad

my $name = 5;
my $nice = $GD->pad($name, 3);
$nice == "005" || die;

$name = "oligo";
$nice = $GD->pad($name, 7, "_");
$nice == "__oligo" || die;

Pads an integer with leading zeroes (by default) or any provided set of characters. This is useful both to make reports pretty and to standardize the length of designations.

attitude

my $adverb = $GD->attitude();

Ask GeneDesign how it handled your request.

endslash

_stripdown

_checkref

COPYRIGHT AND LICENSE

Copyright (c) 2015, Sarah Richardson All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

* The names of Johns Hopkins, the Joint Genome Institute, the Lawrence Berkeley National Laboratory, the Department of Energy, and the GeneDesign developers may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.