NAME
WWW::Translate::Apertium - Open source machine translation
VERSION
Version 0.02 May 3, 2007
SYNOPSIS
use WWW::Translate::Apertium;
my $engine = WWW::Translate::Apertium->new();
my $translated_string = $engine->translate($string);
# default language pair is Catalan -> Spanish
# change to Spanish -> Galician:
$engine->from_into('es-gl');
# check current language pair:
my $current_langpair = $engine->from_into();
# get available language pairs:
my %pairs = $engine->get_pairs();
# default output format is 'plain_text'
# change to 'marked_text':
$engine->output_format('marked_text');
# check current output format:
my $current_format = $engine->output_format();
# configure a new Apertium object to store unknown words:
my $engine = WWW::Translate::Apertium->new(
output => 'marked_text',
store_unknown => 1,
);
# get unknown words for source language = Aranese
my $es_unknown_href = $engine->get_unknown('oc');
DESCRIPTION
Apertium is an open source shallow-transfer machine translation engine designed to translate between related languages, which provides approximate translations between romance languages. It is being developed by the Department of Software and Computing Systems at the University of Alicante. The linguistic data is being developed by research teams from the University of Alicante, the University of Vigo and the Pompeu Fabra University. For more details, see http://apertium.sourceforge.net/.
WWW::Translate::Apertium provides an object oriented interface to the Apertium online machine translation engine.
The language pairs currently supported by Apertium are:
Catalan < > Spanish
Galician < > Spanish
Spanish < > Portuguese
Spanish > Brazilian Portuguese
Aranese < > Catalan
Catalan < > French
The Apertium 2.0 architecture includes improvements that support translation between less related languages:
Catalan < > English (experimental)
CONSTRUCTOR
new()
Creates and returns a new WWW::Translate::Apertium object.
my $engine = WWW::Translate::Apertium->new();
WWW::Translate::Apertium recognizes the following parameters:
lang_pair
The valid values of this parameter are:
ca-es
Catalan into Spanish (default value).
es-ca
Spanish into Catalan.
es-gl
Spanish into Galician.
gl-es
Galician into Spanish.
es-pt
Spanish into Portuguese.
pt-es
Portuguese into Spanish.
es-br
Spanish into Brazilian Portuguese.
oc-ca
Aranese into Catalan.
ca-oc
Catalan into Aranese.
fr-ca
French into Catalan.
ca-fr
Catalan into French.
en-ca
English into Catalan.
ca-en
Catalan into English.
output
The valid values of this parameter are:
plain_text
Returns the translation as plain text (default value).
marked_text
Returns the translation with the unknown words marked with an asterisk.
store_unknown
Off by default. If set to a true value, it configures the engine object to store in a hash the unknown words and their frequencies during the session. You will be able to access this hash later through the get_unknown method. If you change the engine language pair in the same session, it will also create a separate word list for the new source language.
IMPORTANT: If you activate this setting, then you must also set the output parameter to marked_text. Otherwise, the get_unknown method will return an empty hash.
The default parameter values can be overridden when creating a new Apertium engine object:
my %options = (
lang_pair => 'es-ca',
output => 'marked_text',
store_unknown => 1,
);
my $engine = WWW::Translate::Apertium->new(%options);
METHODS
$engine->translate($string)
Returns the translation of $string generated by Apertium. $string must be a string of ANSI text. If the source text isn't encoded as Latin-1, you must convert it to that encoding before sending it to the machine translation engine. For this task you can use the Encode module or the PerlIO layer, if you are reading the text from a file.
In case the server is down, it will show a warning and return undef
.
$engine->from_into($lang_pair)
Changes the engine language pair to $lang_pair. When called with no argument, it returns the value of the current engine language pair.
$engine->get_pairs()
Returns a hash containing the available language pairs. The hash keys are the language codes, and the values are the corresponding language names.
$engine->output_format($format)
Changes the engine output format to $format. When called with no argument, it returns the value of the current engine output format.
$engine->get_unknown($lang_code)
If the engine was configured to store unknown words, it returns a reference to a hash containing the unknown words (keys) detected during the current machine translation session for the specified source language, along with their frequencies (values).
The valid values of $lang_code are (in alphabetical order):
ca
Source language is Catalan.
en
Source language is English.
es
Source language is Spanish.
fr
Source language is French.
gl
Source language is Galician.
oc
Source language is Aranese.
pt
Source language is Portuguese.
DEPENDENCIES
WWW::Mechanize 1.20 or higher.
SEE ALSO
WWW::Translate::interNOSTRUM
REFERENCES
Apertium project website:
http://apertium.sourceforge.net/
ACKNOWLEDGEMENTS
Many thanks to Mikel Forcada Zubizarreta, coordinator of the Transducens research team of the Department of Software and Computing Systems at the University of Alicante, who kindly answered my questions during the development of this module.
AUTHOR
Enrique Nell, <perl_nell@telefonica.net>
COPYRIGHT AND LICENSE
Copyright (C) 2007 by Enrique Nell.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.