NAME
Lingua::EN::Segmenter::TextTiling - Segment text using the TextTiling method
SYNOPSIS
use Lingua::EN::Segmenter::TextTiling qw(segments);
use lib '.';
my $text = <<EOT;
Lingua::EN::Segmenter is a useful module that allows text to be split up
into words, paragraphs, segments, and tiles.
Paragraphs are by default indicated by blank lines. Known segment breaks are
indicated by a line with only the word "segment_break" in it.
The module detects paragraphs that are unrelated to each other by comparing
the number of words per-paragraph that are related. The algorithm is designed
to work only on long segments.
SOUTH OF BAGHDAD, Iraq (CNN) -- Seven U.S. troops freed Sunday after being
held by Iraqi forces arrived by helicopter at a base south of Baghdad and were
transferred to a C-130 transport plane headed for Kuwait, CNN's Bob Franken
reported from the scene.
EOT
my $num_segment_breaks = 1;
my @segments = segments($num_segment_breaks,$text);
print $segments[0]; # Prints the first three paragraphs of the above text
print "\n----------SEGMENT_BREAK----------\n";
print $segments[1]; # Prints the last paragraph of the above text
# This module can also be used in an object-oriented fashion
my $splitter = new Lingua::EN::Splitter;
@words = $splitter->words($text);
DESCRIPTION
See synopsis.
EXTENDING
This module is designed to be easily extendable. Feel free to extend from this module when designing alternate methods for text segmentation.
AUTHORS
David James <david@jamesgang.com>
SEE ALSO
Lingua::EN::Segmenter::Baseline, Lingua::EN::Segmenter::Evaluator, http://www.cs.toronto.edu/~james
LICENSE
Copyright (c) 2002 David James
All rights reserved.
This program is free software; you can redistribute it and/or
modify it under the same terms as Perl itself.