NAME
Lingua::FR::Hyphen - Hyphenate French words
SYNOPSIS
#!/usr/bin/perl
use strict;
use warnings;
use Lingua::FR::Hyphen;
use utf8;
binmode( STDOUT, ':utf8' );
my $hyphenator = new Lingua::FR::Hyphen;
foreach (qw/
représentation Montpellier avocat porte-monnaie
0102030405 rouge-gorge transaction consultant
rubicon développement UNESCO
/) {
print "$_ -> " . $hyphenator->hyphenate($_) . "\n";
}
# représentation -> repré-sen-ta-tion
# Montpellier -> Montpellier
# avocat -> avo-cat
# porte-monnaie -> porte-monnaie
# 0102030405 -> 0102030405
# rouge-gorge -> rouge-gorge
# transaction -> tran-sac-tion
# consultant -> consul-tant
# rubicon -> rubicon
# développement -> déve-lop-pe-ment
# UNESCO -> UNESCO
DESCRIPTION
Lingua::FR::Hyphen hyphenates French words using Knuth Liang algorithm.
CONSTRUCTOR/METHODS
new
This constructor allows you to create a new Lingua::FR::Hyphen object.
$hyphenator = Lingua::FR::Hyphen->new([OPTIONS])
- cut_proper_nouns => integer 0 or 1
-
hyphenates or not proper nouns: (default : 0 (recommended)).
cut_proper_nouns => 0,
- cut_compounds => integer 0 or 1
-
hyphenates compounds: (default : 0 (recommended)).
cut_compounds => 0,
hyphenate
hyphenates French words using Knuth Liang algorithm and following the rules of the French language.
$hyphenator->hyphenate($word, $delimiter ? )
Two arguments : the word and the delimiter (optionnal) (default : "-").
$hyphenator->hyphenate($word1);
$hyphenator->hyphenate($word2, '/');
AUTHORS
Djibril Ousmanou, <djibel at cpan.org>
Laurent Rosenfeld, <laurent.rosenfeld at googlemail.com>
ACKNOWLEDGEMENTS
This module is based on the Knuth-Liang Algorithm. Frank Liang wrote his Stanford Ph.D. thesis (under the supervision of Donald Knuth) on a hyphenation algorithm that was aimed at TeX (the typesetting utility written by Knuth) and is now standard in Tex, and has been adapted to many open source projects such as OpenOffice, LibreOffice, Firefox, Thunderbird, etc. His 1983 PhD thesis can be found at http://www.tug.org/docs/liang/. He invented both the "packed or compressed trie" structure for storing efficiently the patterns and the way to represent possible hyphens in those patterns.
This module is also partly derived from Alex Kapranoff's Text::Hyphen module to hyphenate English language words.
The list of hyphenation (« césure » or « coupure de mots » in French) patterns for the French language was derived from the Dicollecte site (http://www.dicollecte.org/home.php?prj=fr), which produces several French open source spell check dictionaries, used notably for Latex, OpenOffice and LibreOffice. The list of patterns itself can be found there: http://www.dicollecte.org/download/fr/hyph-fr-v3.0.zip.
The list of proper nouns used for preventing their hyphenation (it is usually considered bad to hyphenate proper nouns in French) was compiled from several sources, but the main source was the Hunspell dictionary for French words, which can also be found on the Dicollect site (see http://www.dicollecte.org/download.php?prj=fr) from which we extracted proper nouns as well as acronyms (which also should no be hyphenated), although this module will not hyphenate all-capital words anyway.
BUGS
Please report any bugs or feature requests to bug-lingua-fr-hyphen at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-FR-Hyphen. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SEE ALSO
See Text::Hyphen.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Lingua::FR::Hyphen
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
LICENSE AND COPYRIGHT
Copyright 2015 Djibril Ousmanou.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.