NAME
Lingua::EN::Sentence::Offsets - Finds sentence boundaris, and returns their offsets.
VERSION
version 0.01_02
get_offsets
Takes text input and returns reference to array containin pairs of character offsets, corresponding to the sentences start and end positions.
get_sentences
Takes text input and splits it into sentences.
add_acronyms
user can add a list of acronyms/abbreviations.
get_acronyms
get defined list of acronyms.
set_acronyms
run over the predefined acronyms list with your own list.
remove_false_eos
split_unsplit_stuff
Finds additional split points in the middle of previously defined sentences.
adjust_offsets
Minor adjusts to offsets (leading/trailing whitespace, etc)
initial_offsets
First naive delimitation of sentences
offsets2sentences
Given a list of sentence boundaries offsets and a text, returns an array with the text split into sentences.
AUTHOR
Andre Santos <andrefs@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2012 by Andre Santos.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.