NAME

Lingua::FR::Hyphen - Hyphenate French words

SYNOPSIS

#!/usr/bin/perl
use strict;
use warnings;
use Lingua::FR::Hyphen;
use utf8;
  
binmode( STDOUT, ':utf8' );
my $hyphenator = new Lingua::FR::Hyphen;
  
foreach (qw/  
    représentation  Montpellier avocat porte-monnaie 
    0102030405 rouge-gorge transaction consultant 
    rubicon développement UNESCO 
    /) {
    print "$_ -> " . $hyphenator->hyphenate($_) . "\n";
}

# représentation -> repré-sen-ta-tion
# Montpellier -> Montpellier
# avocat -> avo-cat
# porte-monnaie -> porte-monnaie
# 0102030405 -> 0102030405
# rouge-gorge -> rouge-gorge
# transaction -> tran-sac-tion
# consultant -> consul-tant
# rubicon -> rubicon
# développement -> déve-lop-pe-ment
# UNESCO -> UNESCO

DESCRIPTION

Lingua::FR::Hyphen hyphenates French words using Knuth Liang algorithm.

CONSTRUCTOR/METHODS

new

This constructor allows you to create a new Lingua::FR::Hyphen object.

$hyphenator = Lingua::FR::Hyphen->new([OPTIONS])

min_word => integer

Minimum length of word to be hyphenated. (default : 6).

min_word => 4,
min_prefix => integer

Minimal prefix to leave without any hyphens. (default : 3).

min_prefix => 3,
min_suffix => integer

Minimal suffix to leave without any hyphens. (default : 3).

min_suffix => 3,
cut_proper_nouns => integer 0 or 1

hyphenates or not proper nouns: (default : 0 (recommended)).

cut_proper_nouns => 0,
cut_compounds => integer 0 or 1

hyphenates compounds: (default : 0 (recommended)).

cut_compounds => 0,

hyphenate

hyphenates French words using Knuth Liang algorithm and following the rules of the French language.

$hyphenator->hyphenate($word, $delimiter ? )

Two arguments : the word and the delimiter (optionnal) (default : "-").

$hyphenator->hyphenate($word1); 
$hyphenator->hyphenate($word2, '/');

AUTHORS

Djibril Ousmanou, <djibel at cpan.org>

Laurent Rosenfeld, <laurent.rosenfeld at googlemail.com>

ACKNOWLEDGEMENTS

This module is based on the Knuth-Liang Algorithm. Frank Liang wrote his Stanford Ph.D. thesis (under the supervision of Donald Knuth) on a hyphenation algorithm that was aimed at TeX (the typesetting utility written by Knuth) and is now standard in Tex, and has been adapted to many open source projects such as OpenOffice, LibreOffice, Firefox, Thunderbird, etc. His 1983 PhD thesis can be found at http://www.tug.org/docs/liang/. He invented both the "packed or compressed trie" structure for storing efficiently the patterns and the way to represent possible hyphens in those patterns.

This module is also partly derived from Alex Kapranoff's Text::Hyphen module to hyphenate English language words.

The list of hyphenation (« césure » or « coupure de mots » in French) patterns for the French language was derived from the Dicollecte site (http://www.dicollecte.org/home.php?prj=fr), which produces several French open source spell check dictionaries, used notably for Latex, OpenOffice and LibreOffice. The list of patterns itself can be found there: http://www.dicollecte.org/download/fr/hyph-fr-v3.0.zip.

The list of proper nouns used for preventing their hyphenation (it is usually considered bad to hyphenate proper nouns in French) was compiled from several sources, but the main source was the Hunspell dictionary for French words, which can also be found on the Dicollect site (see http://www.dicollecte.org/download.php?prj=fr) from which we extracted proper nouns as well as acronyms (which also should no be hyphenated), although this module will not hyphenate all-capital words anyway.

BUGS

Please report any bugs or feature requests to bug-lingua-fr-hyphen at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-FR-Hyphen. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SEE ALSO

See Text::Hyphen.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Lingua::FR::Hyphen

You can also look for information at:

LICENSE AND COPYRIGHT

Copyright 2015 Djibril Ousmanou.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.