NAME
Lingua::EN::Sentence::Offsets - Finds sentence boundaries, and returns their offsets.
VERSION
version 0.01_05
SYNOPSIS
use Lingua::EN::Sentence::Offsets qw/get_offsets get_sentences/;
my $offsets = get_offsets($text); ## Get the offsets.
foreach my $o (@$offsets) {
my $start = $o->[0];
my $length = $o->[1]-$o->[0];
my $sentence = substr($text,$start,$length) ## Get a sentence.
# ...
}
### or
my $sentences = get_sentences($text);
foreach my $sentence (@$sentences) {
## do something with $sentence
}
METHODS
get_offsets
Takes text input and returns reference to array containin pairs of character offsets, corresponding to the sentences start and end positions.
get_sentences
Takes text input and splits it into sentences.
add_acronyms
user can add a list of acronyms/abbreviations.
get_acronyms
get defined list of acronyms.
set_acronyms
run over the predefined acronyms list with your own list.
remove_false_eos
split_unsplit_stuff
Finds additional split points in the middle of previously defined sentences.
adjust_offsets
Minor adjusts to offsets (leading/trailing whitespace, etc)
initial_offsets
First naive delimitation of sentences
offsets2sentences
Given a list of sentence boundaries offsets and a text, returns an array with the text split into sentences.
ACKNOWLEDGEMENTS
Based on the original module Lingua::EN::Sentence, from Shlomo Yona (SHLOMOY)
SEE ALSO
Lingua::EN::Sentence, Text::Sentence
AUTHOR
Andre Santos <andrefs@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Andre Santos.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.