NAME

BS_ChromosomeSegmenter.pl

VERSION

Version 3.00

DESCRIPTION

1) Scanning:
 This utility creates an exhaustive database of restriction enzyme recognition
  sites along a chromsome, both existing and potential.
 Every intergenic sequence is parsed for existing recognition sites. Those that
  are found are marked (i)mmutable. A prefix tree is created from all possible
  6-frame translations of restriction enzyme recognition sites, such that each
  node in the tree is an amino acid string that may be reverse translated to be
  a recognition site. Every exonic sequence in the chromosome is then searched
  with the prefix tree both for (e)xisting recognition sites and for sites
  where a (p)otential recognition site could be introduced without changing the
  protein sequence of the gene. As long as they occur within protein coding
  genes, existing and potential recognition sites may be manipulated to yield
  any of several different overhangs, all of which are computed by the
  algorithm.
 A score is assigned to every restriction enzyme site. The score is a function
  of the log of the enzyme's price per unit, plus two tenths for each orf that
  must be modified to make the enzyme unique within the range of the chunk
  size.

2) Filtering:
 Every extant site is indexed so that later in the design process, when
   potential sites are considered, the number of sites that must be modified is
   a known contribution to the cost. However, all potential sites that could
   not be made unique under any useful circumstances are culled from the
   database.

3) Segmenting:
 A set of restriction enzyme recognition site changes to a chromosome. The goal
   is to make the chromosome assemblable from multikilobase pieces called
   chunks, which in groups of roughly CHUNKNUM form larger pieces called
   megachunks. Megachunks end in special regions called InterSiteSequences
   (ISS), which consist of synthetic sequence, followed by a marker, followed
   by wild-type sequence, followed by a type IIB restriction enzyme recognition
   site.  The wild-type sequence targets the megachunk to its target chromosome
   for homologous recombination (thus wild type here can be any homologous
   chromosome, as long as gene order is the same). Megachunks alternate markers
   to allow a simple selection for successful integration. Markers should be
   defined in config/markers.

4) Committing:
 If a successful plan was found in step three, the utility attempts to make all
   of the proposed changes.

5) Proof reading:
 The new chromosome is checked for the existance of new restriction sites,
   their appropriate uniqueness, and the sanctity of coding sequences.

6) Reporting:
 The megachunks and chunks of the new chromosome are exported to the genome
   repository in an assembly directory. Each file is a genbank annotation with
   restriction enzymes, ISS, and marker sequences specially highlighted.

ARGUMENTS

Required arguments:

-CHR, --CHROMOSOME : The chromosome to be modified
--WTCHR  : The chromosome that will receive chunks (usually wildtype)
--MARKERS : Comma separated list which will be alternately inserted
    into megachunk ISS sequences (must be defined in config/markers)

Optional arguments:

--ENZYME_SET : Which list of restriction enzymes to use (default nonpal)
--MIN_CHUNK_SIZE : The minimum size of chunks to be designed (default 5000)
--MAX_CHUNK_SIZE : The maximum size of chunks to be designed (default 10000)
--STARTPOS : The first base for analysis
--STOPPOS  : The last base for analysis
--CHUNKNUM : The target number of chunks per megachunk (default 4)
--CHUNKNUMMIN : The minimum number of chunks per megachunk (default 3)
--CHUNKNUMMAX : The maximum number of chunks per megachunk (default 5)
--CHUNKOLAP : The number of bases each chunk must overlap (default 40)
--ISSMIN : Minimum size of the homologous intersite sequence (default 900)
--ISSMAX : Maximum size of the homologous intersite sequence (default 1500)
--FPUTRPADDING : No edit zone upstream of the five prime end of
     essential/fast growth genes when no UTR is annotated (default 500)
--TPUTRPADDING : No edit zone downstream of the three prime end of
          essential/fast growth genes when no UTR is annotated (default 100)
--LASTMARKER : Which marker should be the last marker inserted (must be
          defined in config/markers)
-h, --help : Display this message

COPYRIGHT AND LICENSE

Copyright (c) 2014, BioStudio developers All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

* The names of Johns Hopkins, the Joint Genome Institute, the Lawrence Berkeley National Laboratory, the Department of Energy, and the BioStudio developers may not be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.