NAME
Lingua::HE::Sentence - Module for splitting Hebrew text into sentences.
SYNOPSIS
use Lingua::HE::Sentence qw( get_sentences );
my $sentences=get_sentences($text); ## Get the sentences.
foreach my $sentence (@$sentences) {
## do something with $sentence
}
DESCRIPTION
The Lingua::HE::Sentence
module contains the function get_sentences, which splits Hebrew text into its constituent sentences, based on regular expressions.
The module assumes text encoded in Logical Hebrew, according to CP1255. Supporting other input formats is possible, but I need people to ask for it.
HEBREW DETAILS
Language: Hebrew Language ID: he MS Locale ID: 1037 ISO 639-1: he ISO 639-2 (MARC): heb ISO 8859 (charset): 8859-8 ANSI codepage: 1255 Unicode: 0590-05FF
FUNCTIONS
All functions used should be requested in the 'use' clause. None is exported by default.
- get_sentences( $text )
-
The get sentences function takes a scalar containing ascii text as an argument and returns a reference to an array of sentences that the text has been split into. Returned sentences will be trimmed (beginning and end of sentence) of white-spaces. Strings with no alpha-numeric characters in them, won't be returned as sentences.
- get_EOS( )
-
This function returns the value of the string used to mark the end of sentence. You might want to see what it is, and to make sure your text doesn't contain it. You can use set_EOS() to alter the end-of-sentence string to whatever you desire.
- set_EOS( $new_EOS_string )
-
This function alters the end-of-sentence string used to mark the end of sentences.
FUTURE WORK
- [1] Object Oriented like usage.
- [2] Supporting more encodings, or at least UNICODE (e.g. utf-8).
- [3] Code cleanup and optimization.
SEE ALSO
Lingua::EN::Sentence
AUTHOR
Shlomo Yona shlomo@cs.haifa.ac.il
COPYRIGHT
Copyright (c) 2001, 2002 Shlomo Yona. All rights reserved.
This library is free software. You can redistribute it and/or modify it under the same terms as Perl itself.
4 POD Errors
The following errors were encountered while parsing the POD:
- Around line 44:
'=item' outside of any '=over'
- Around line 58:
You forgot a '=back' before '=head1'
- Around line 60:
'=item' outside of any '=over'
- Around line 66:
You forgot a '=back' before '=head1'