NAME

Bio::Gonzales::Seq - Gonzales Sequence Object

SYNOPSIS

my $seq = Bio::Gonzales::Seq->new(id => $id, seq => $seq, desc? => '', delim? => ' ');

print $seq->def;
print $seq->desc;

DESCRIPTION

METHODS

$seq->id
$seq->desc

The description of a sequence object. In case of FASTA-files, this corresponds to the text after the first space.

$seq->seq
$seq->delim
$seq->info

An hash of additional stuff you can store about the sequence

$seq->gaps
$seq->length
$seq->def

The definition also known as the FASTA header line w/o ">"

$seq->clone

Clone the sequence

$seq->clone_empty

Clone the sequence properties, do not clone the sequence string.

$seq->display_id

Same as $seq-id>

$seq->ungapped_length
$seq->all
"$seq"

The complete sequence in fasta format, ready to be written.

$seq->all_formatted
$seq->all_pretty

The complete sequence in pretty fasta format, ready to be written.

$seq->as_primaryseq

Return a Bio::PrimarySeqI compatible object, so you can use it in BioPerl.

$seq_string = $seq->gapless_seq
$seq->rm_gaps!
$seq->revcom

Create the reverse complement of the sequence. THIS FUNCTION ALTERS THE SEQUENCE OBJECT.

$seq->subseq( [ $begin, $end, $strand , @rest ], \%c )

Gets a subseq from $seq. Config options can be:

%c = (
    keep_id => 1, # keeps the original id of the sequence
    attach_details => 1, # keeps the original range and strand in $seq->info->{subseq}
    wrap => 1, # see further down
    relaxed_range => 1, # substitute 0 or undef for $begin with '^' and for $end with '$'
    relaxed_revcom => 1, # substitute N for all characters that are non-AGCTN before doing a reverse complement
);

There are several possibilities for $begin and $end:

GGCAAAGGA ATGATGGTGT GCAGGCTTGG CATGGGAGAC
^..........^                                (1,11) OR ('^', 11)
   ^.....................................^  (4,'$')
                      ^..............^      (21,35) { with wrap on: OR (-19,35) OR (-19, -5) }
                      ^..................^  (21,35) { with wrap on: OR (-19,'$') }
wrap

The default is to limit all negative values to the sequence boundaries, so a negative begin would be equal to 1 or '^' and a negative end would be equal to '$'.

See also "fasubseq" in Bio::Gonzales::Seq::IO.

my $reverse_complement_string = Bio::Gonzales::Seq::_revcom_from_string($seq_string, $alphabet)

Stolen from Bio::Perl. Alphabet can be 'rna' or 'dna';

AUTHOR

jw bargsten, <joachim.bargsten at wur.nl>