NAME
Text::GaleChurch - Perl extension for aligning translated sentences
SYNOPSIS
use Text::GaleChurch;
my @eParagraph = ();
push @eParagraph, "According to our survey, 1988 sales of mineral water and soft drinks were much higher than in 1987, reflecting the growing popularity of these products.";
push @eParagraph, "Cola drink manufacturers in particular achieved above-average growth rates.";
push @eParagraph, "The higher turnover was largely due to an increase in the sales volume.";
push @eParagraph, "Employment and investment levels also climbed.";
push @eParagraph, "Following a two-year transitional period, the new Foodstuffs Ordinance for Mineral Water came into effect on April 1, 1988.";
push @eParagraph, "Specifically, it contains more stringent requirements regarding quality consistency and purity guarantees.";
my @fParagraph = ();
push @fParagraph, "Quant aux eaux minérales et aux limonades, elles rencontrent toujours plus d'adeptes.";
push @fParagraph, "En effet, notre sondage fait ressortir des ventes nettement supérieures à celles de 1987, pour les boissons à base de cola notamment.";
push @fParagraph, "La progression des chiffres d'affaires résulte en grande partie de l'accroissement du volume des ventes.";
push @fParagraph, "L'emploi et les investissements ont également augmenté.";
push @fParagraph, "La nouvelle ordonnance fédérale sur les denrées alimentaires concernant entre autres les eaux minérales, entrée en vigueur le 1er avril 1988 après une période transitoire de deux ans, exige surtout une plus grande constance dans la qualité et une garantie de la pureté.";
my $eAlignedRef,$fAlignedRef;
($eAlignedRef,$fAlignedRef) = Text::GaleChurch::align(\@eParagraph,\@fParagraph);
for(my $i=0;$i<scalar(@{$eAlignedRef});$i++) {
print "E:",$eAlignedRef->[$i],"\t is aligned to\tF:",$fAlignedRef->[$i],"\n";
}
DESCRIPTION
This module aligns the sentences of paragraphs in two languages in a way that the aligned sentences are likely translations of each other. This is useful for applications in machine translation and other applications where sentence-aligned parallel corpora are needed. The algorithm used for this is described in the paper "A Program for Aligning Sentences in Bilingual Corpora" by William A. Gale and Kenneth W. Church (Computational Linguistics, 1994). The input to the align function are two arrays with sentences from the source language and target language text. The arrays need to contain one sentence per array element. To split paragraphs into sentences the module Lingua::Sentence can be used.
EXPORT
- split($sourceRef,$targetRef)
-
Align the bilingual sentences in the arrays referenced by the two arguments. The function returns two array references.
SUPPORT
Bugs should always be submitted via the CPAN bug tracker
http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-GaleChurch
For other issues, contact the maintainer.
SEE ALSO
Google code project: http://code.google.com/p/corpus-tools/
AUTHOR
Achim Ruopp, <achimru@gmail.com>
COPYRIGHT AND LICENSE
Copyright (C) 2010 by Digital Silk Road
Portions Copyright (C) 2005 by Philip Koehn and Josh Schroeder (used with permission)
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.