NAME
BioStudio::Basic - basic functions for the BioStudio synthetic biology framework
VERSION
Version 1.04
DESCRIPTION
Basic BioStudio functions
AUTHOR
Sarah Richardson <notadoctor@jhu.edu>.
Customization functions
configure_BioStudio()
This function loads the configuration file into a hash ref. You must pass it
the path to the directory containing the configuration file; it will use
Config::Auto to ``magically'' parse the file.
fetch_custom_features()
Pass the config hashref, receive a hashref of the custom features defined in
the BioStudio configuration directory. Each feature has four attributes: NAME,
KIND, SOURCE, and SEQ
fetch_custom_markers()
Pass the config hashref, receive a hashref of the custom markers defined in
the BioStudio configuration directory. Each key in the hashref is a marker
name, each value in the hashref is a L<Bio::BioStudio::Marker> object.
fetch_enzyme_lists()
Pass the config hashref, receive an array that contains the names of the
enzyme lists in the BioStudio configuration directory. Each list is a
GeneDesign compatible list of restriction enzyme recognition sites.
Masking functions
make_mask()
Given a length, a reference to a list full of Bio::SeqFeatures, and optionally
an offset, returns a string of integers where each positon corresponds to a
base of sequence, and the integer represents the number of features that
overlap that base. Obviously limited to ten overlapping features before a
serious bug sets in :(
mask_combine()
Takes two string masks (see make_mask()) and adds them. Returns the merged
mask.
mask_filter()
Takes a string mask (see make_mask()) and returns a listref of break
coordinates; that is, where does feature sequence end and interfeature
sequence begin, and where does interfeature sequence end and feature sequence
begin? For example, if the mask is "0001100033221100", the resulting list
would be [0 3 5 8 14 16], meaning that features exist from 4 to 5 and 9 to 14.
Intergenic sequence coordinates can thus be pulled out by hashing the array,
%inter = @{mask_filter($mask)} where each key + 1 is the left coordinate,
and the value is the right coordinate.
Genome Repository functions
get_src_path()
Given a chromosome name and the config hashref, returns the absolute path to
that chromosome in the BioStudio genome repository.
get_genome_list()
Given the config hashref, returns a list of all chromosomes in the BioStudio
genome repository.
gather_versions()
Given a species, a target, and the config hashref, returns a hashref of all
chromosomes in the species in the BioStudio genome repository that match the
target. The target is an integer that represents a version.
If target is set to 0, we will return every wildtype version.
If target is set to -1, we will return every latest version.
For any other target (1, 3, 5) we will return that particular version.
rollback()
Given a chromosome name and the BioStudio config hashref, removes that
chromosome from the BioStudio genome repository.
Editing and Markup functions
ORF_compile()
given a reference to an array full of Bio::SeqFeature gene objects, returns a
reference to a hash with gene ids as keys and concatenated 5' to 3' coding
sequences as values
get_feature_sequence()
For when you can't use the Bio::SeqFeature seq function. Given a
Bio::SeqFeature compliant feature and a sequence, returns the sequence that
the coordinates of the feature indicate.
flatten_subfeats()
Given a seqfeature, iterate through its subfeatures and add all their subs to
one big array. Mainly need this when CDSes are hidden behind mRNAs in genes.
gene_names()
Given a list of Bio::SeqFeature gene objects and the BioStudio config hashref,
returns a hash where each gene id is the key to a display friendly string.
allowable_codon_changes()
Given two codons (a from, and a to) and a GeneDesign codon table hashref, this
function generates every possible peptide pair that could contain the from
codon and checks to see if the peptide sequence can be maintained when the
from codon is replaced by the to codon. This function is of particular use
when codons are being changed in genes that overlap one another.
check_new_sequence()
Best when used as a confirmation that your edits went as expected. Given a
Bio::SeqFeature compliant feature that has a``newseq'' attribute, checks if
the newseq and the actual sequence occupied by the feature are the same
print_as_fasta()
takes a sequence as a string and a sequence id and returns an 80 column FASTA
formatted sequence block as an array reference
COPYRIGHT AND LICENSE
Copyright (c) 2011, BioStudio developers All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the Johns Hopkins nor the names of the developers may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.