NAME

SuffixTree - Efficient string manipulation data structure interface for Perl.

SYNOPSIS

use SuffixTree;

my $str = "mississippi";
my $tree=create_tree($str);
print_tree($tree);
my $position = find_substring($tree, "ssis");

printf("\nPosition of ssis in mississippi is %ld.\n\n", $position);

delete_tree($tree); # NOTICE: this method will soon become deprecated 

DEPRECATED SYNOPSIS

use SuffixTree;

my $str = "mississippi";
my $tree=ST_CreateTree($str, length($str));
ST_PrintTree($tree);
my $position = ST_FindSubstring($tree, "ssis", 4);

printf("\nPosition of ssis in mississippi is %ld.\n\n", $position);

ST_DeleteTree($tree);

DESCRIPTION

The intention of this project is to provide an open-source implementation for an efficient data structure for strings manipulation - the Suffix Tree.

The code was written with as much consistency with the theoretical algorithm as possible (see references). It provides a set of interface functions for creating and searching the tree.

The suffix tree is implemented in ANSI-C. The code is written based on the original algorithm by E.Ukkonen. This is the Perl interface for the underlying ANSI-C implementation.

FUNCTIONS

All functions are exported by default. Please note that all these interface functions were automatically extracted from the ANSI-C header file, so they might not behave as Perlish as you'd expect them to. This is something we will definitly address in the future.

$tree = create_tree($string)

Allocates memory for the tree and starts Ukkonen's construction algorithm. Parameters: A string. Returns a reference to the tree.

$position = find_substring($tree, $substring)

Searches for a string in the tree. It traverses the tree down starting its root like in a regular trie. Parameters: the tree to search in, a substring to look for. Returns the position it was found in the source string or ST_ERROR if string is not in the tree. NOTE: We did not make sure how ST_ERROR 'looks like' in Perl. This should be further explained in future releases.

Prints the tree. Parameters: the tree to print.

delete_tree($tree)

Deletes a suffix tree. Parameters: the tree to delete. Returns : void.

DEPRECATED FUNCTIONS

All functions are exported by default. Please note that all these interface functions were automatically extracted from the ANSI-C header file, so they might not behave as Perlish as you'd expect them to. This is something we will definitly address in the future.

$tree = ST_CreateTree($string, length($string))

Allocates memory for the tree and starts Ukkonen's construction algorithm. Parameters: A string, length of the string. Returns a reference to the tree.

$position = ST_FindSubstring($tree, $substring, length($substring))

Searches for a string in the tree. It traverses the tree down starting its root like in a regular trie. Parameters: the tree to search in, a substring to look for, length of substring. Returns the position it was found in the source string or ST_ERROR if string is not in the tree. NOTE: We did not make sure how ST_ERROR 'looks like' in Perl. This should be further explained in future releases.

ST_PrintTree($tree)

Prints the tree. Parameters: the tree to print.

ST_DeleteTree($tree)

Deletes a suffix tree. Parameters: the tree to delete. Returns : void.

BUGS

This Perl interface was mostly built automatically (using SWIG). Little to no attention was given to testing. In future relases of this Perl Module (along with its underlying ANSI-C implementation) we hope to fix all problems that might currenly interfere with successful usage of this module. Please send bug reports to the author(s) of this module.

FUTURE WORK

[1] A better Perl-ish interface

[2] Building tests for this module (for the `make test` part of the installation)

[3] Object Oriented like usage

PORTABILITY

Please read the README file for information.

SEE ALSO

http://cs.haifa.ac.il/~shlomo/suffix_tree/

AUTHOR

Shlomo Yona <shlomo@cs.haifa.ac.il>

THANKS TO

Offer Kaye (http://okaye.esmartweb.com/) for useful ideas and support.

COPYRIGHT

Copyright (c) 2002, 2003 Shlomo Yona. All rights reserved.

This library is free software. You can redistribute it and/or modify it under the same terms as Perl itself.

2 POD Errors

The following errors were encountered while parsing the POD:

Around line 113:

You forgot a '=back' before '=head1'

Around line 147:

You forgot a '=back' before '=head1'