NAME
BioStudio::Basic - basic functions for the BioStudio synthetic biology framework
VERSION
Version 1.03
DESCRIPTION
Basic BioStudio functions
FUNCTIONS
configure_BioStudio() This function loads the configuration file into a hash ref. You must pass it the path to the directory containing the configuration file; it will use Config::Auto to ``magically'' parse the file.
fetch_custom_features() Pass the config hashref, receive a hashref of the custom features defined in the BioStudio configuration directory. Each feature has four attributes: NAME, KIND, SOURCE, and SEQ
fetch_custom_markers() Pass the config hashref, receive a hashref of the custom markers defined in the BioStudio configuration directory. Each marker is a GFF file that gets read into the attributes NAME, SEQ, DB (a Bio::DB::SeqFeature::Store), and COLOR (if a color is defined in the GFF file).
fetch_enzyme_lists() Pass the config hashref, receive an array that contains the names of the enzyme lists in the BioStudio configuration directory. Each list is a GeneDesign compatible list of restriction enzyme recognition sites.
make_mask() Given a length, a reference to a list full of Bio::SeqFeatures, and optionally an offset, returns a string of integers where each positon corresponds to a base of sequence, and the integer represents the number of features that overlap that base. Obviously limited to ten overlapping features before a serious bug sets in :(
mask_combine() Takes two string masks (see make_mask()) and adds them. Returns the merged mask.
mask_filter() Takes a string mask (see make_mask()) and returns a listref of break coordinates; that is, where does feature sequence end and interfeature sequence begin, and where does interfeature sequence end and feature sequence begin? For example, if the mask is "0001100033221100", the resulting list would be [0 3 5 8 14 16], meaning that features exist from 4 to 5 and 9 to 14. Intergenic sequence coordinates can thus be pulled out by hashing the array, %inter = @{mask_filter($mask)} where each key + 1 is the left coordinate, and the value is the right coordinate.
get_src_path() Given a chromosome name and the config hashref, returns the absolute path to that chromosome in the BioStudio genome repository.
get_genome_list() Given the config hashref, returns a list of all chromosomes in the BioStudio genome repository.
gather_versions() Given a species, a target, and the config hashref, returns a hashref of all chromosomes in the species in the BioStudio genome repository that match the target. The target is an integer that represents a version. If target is set to 0, we will return every wildtype version. If target is set to -1, we will return every latest version. For any other target (1, 3, 5) we will return that particular version.
rollback() Given a chromosome name and the BioStudio config hashref, removes that chromosome from the BioStudio genome repository.
ORF_compile() given a reference to an array full of Bio::SeqFeature gene objects, returns a reference to a hash with gene ids as keys and concatenated 5' to 3' coding sequences as values
get_feature_sequence() For when you can't use the Bio::SeqFeature seq function. Given a Bio::SeqFeature compliant feature and a sequence, returns the sequence that the coordinates of the feature indicate.
check_new_sequence() Best when used as a confirmation that your edits went as expected. Given a Bio::SeqFeature compliant feature that has a``newseq'' attribute, checks if the newseq and the actual sequence occupied by the feature are the same
flatten_subfeats() Given a seqfeature, iterate through its subfeatures and add all their subs to one big array. Mainly need this when CDSes are hidden behind mRNAs in genes.
gene_names() Given a list of Bio::SeqFeature gene objects and the BioStudio config hashref, returns a hash where each gene id is the key to a display friendly string.
allowable_codon_changes() Given two codons (a from, and a to) and a GeneDesign codon table hashref, this function generates every possible peptide pair that could contain the from codon and checks to see if the peptide sequence can be maintained when the from codon is replaced by the to codon. This function is of particular use when codons are being changed in genes that overlap one another.
print_as_fasta() takes a sequence as a string and a sequence id and returns an 80 column FASTA formatted sequence block as an array reference
AUTHOR
Sarah Richardson <notadoctor@jhu.edu>.
COPYRIGHT AND LICENSE
Copyright (c) 2011, BioStudio developers All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the Johns Hopkins nor the names of the developers may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.