NAME
MetaTrans::Base - Abstract base class for creating meta-translator plug-ins
SYNOPSIS
# This is not a working example. It serves for illustration only.
# For a working one see MetaTrans::UltralinguaNet source code.
package MetaTrans::MyPlugin;
use MetaTrans::Base;
use vars qw(@ISA);
@ISA = qw(MetaTrans::Base);
use HTTP::Request;
use URI::Escape;
sub new
{
my $class = shift;
my %options = @_;
$options{host_server} = "www.some-online-translator.com"
unless (defined $options{host_server});
my $self = new MetaTrans::Base(%options);
$self = bless $self, $class;
# supported translation directions:
# English <-> German
# English <-> French
# English <-> Spanish
$self->set_languages('eng', 'ger', 'fre', 'spa');
$self->set_dir_1_to_all('eng');
$self->set_dir_all_to_1('eng');
return $self;
}
sub create_request
{
my $self = shift;
my $expression = shift;
my $src_lang_code = shift;
my $dest_lang_code = shift;
# our-language-codes-to-server-language-codes conversion table
my %table = (eng => 'eng', ger => 'deu', fre => 'fra', spa => 'esp');
return new HTTP::Request('GET',
'http://www.some-online-translator.com/translate.cgi?' .
'expr=' . uri_escape($expression) . '&' .
'src=' . $table{$src_lang_code} . '&' .
'dst=' . $table{$dest_lang_code}
);
}
sub process_response
{
my $self = shift;
my $contents = shift;
# we don't care about these here, but
# in some cases we might need to care
my $src_lang_code = shift;
my $dest_lang_code = shift;
my @result;
while ($contents =~ m|
<td class="expr">([^<]*)</td>
<td class="trns">([^<]*)</td>
|gsix)
{
my $expression = $1;
my $translation = $2;
# add some $expression and $translation normalization code here
push @result, ($expression, $translation);
}
return @result;
}
1;
DESCRIPTION
This class serves as a base for creating MetaTrans
plug-ins, especially those ones, which extract data from online translators. Please see MetaTrans first. MetaTrans::Base
already contains many features a MetaTrans
plug-in must have and makes creating new plug-ins really easy.
To perform a translation using an online translator (e.g. http://www.ultralingua.net/) one needs to do two things:
To create a MetaTrans
plug-in using MetaTrans::Base
one only needs to do a bit more. The first step is to derrive from MetaTrans::Base
and "override" following two abstract methods:
- $plugin->create_request($expression, $src_lang_code, $dest_lang_code)
-
Should return a
HTTP::Request
object to be used byLWP::UserAgent
for retrieving HTML output, which contains translation of $expression from the language with $src_lang_code to the language with $dest_lang_code. This basicaly emulates sending a form. - $plugin->process_response($contents, $src_lang_code, $dest_lang_code)
-
This method should extract translations from the HTML code ($contents) returned by webserver in response to the request. The translations must be returned in an array of following form:
(expression_1, translation_1, expression_2, translation_2, ...)
Character encoding must be UTF-8! In addition all expressions and their translations should be normalized in a way so that all the grammar and meaning information were in parenthesis or behind a semi-colon. For example, if you request a English to French translation of "dog" from the http://www.ultralingua.net/ translator, the first line of the result is
dog n. : 1. chien n.m.,f. chienne 2. pitou n.m. (Familier) (Québécisme)
The
MetaTrans::UltralinguaNet
module returns it as('dog (n.)', 'chien (n.m.,f.)', 'dog (n.)', 'pitou (n.m.)')
The next step is specifying list of languages supported by the plug-in. We have to say, which languages we are able to translate from and which to. This can be done easily by calling appropriate methods inherrited from MetaTrans::Base
. Please see "SPECIFYING SUPPORTED LANGUAGES".
The last step is setting the host_server
attribute to the name of the online translator used by the plug-in. See ATTRIBUTES.
The MetaTrans::UltralinguaNet
source code should serve as a good example on how to create a MetaTrans
plug-in derrived from MetaTrans::Base
.
CONSTRUCTOR METHODS
- MetaTrans::Base->new(%options)
-
This method constructs a new MetaTrans::Base object and returns it. Key/value pair arguments may be provided to set up the initial state. The following options correspond to attribute methods described below:
KEY DEFAULT --------------- ---------------- host_server 'unknown.server' script_name undef timeout 5 matching M_START match_at_bounds 1
Please note that as long as the
MetaTrans::Base
is an abstract class, calling the constructor method only makes sense in the derrived classes.
ATTRIBUTES
- $plugin->host_server
- $plugin->host_server($name)
-
Get/set the name of the online translator used by the plug-in. Is is only used to inform the user where the translation comes from and hence can be set to any meaningful value. It is a convention to set this to the online translator base URL with the
'http://'
stripped. For example, theMetaTrans::UltralinguaNet
setshost_server
to'www.ultralingua.net'
. - $plugin->script_name
- $plugin->script_name($name)
-
Get/set the name of the script, which runs this plug-in as a command line application. The script uses this to identify itself when printing usage. If unset, the script name is extracted from
$0
variable. See therun
method. - $plugin->timeout
- $plugin->timeout($secs)
-
Get/set the time in seconds we want to wait for a reply from the online translator before timing out.
- $plugin->matching
- $plugin->matching($type)
-
Get/set the way of matching the found translations to the searched expression. Some online translators in addition to the translation of the searched expression also return translations of related expressions. For example, we want to translate "dog" from English to French and we also get translations of "dog days" or "every dog has his day". If this is not what we want we can help ourselves by setting
matching
to appropriate value:- MetaTrans::Base::M_EXACT
-
Match only those expressions which are the same as the searched one. Matching is incasesensitive and ignores grammar information, i.e. everything in parenthesis or after semi-colon. The same applies bellow.
Examples:
'Dog' matches 'dog' (incasesensitive) 'Hund' matches 'Hund; r' (grammar information ignored) 'dog' does not match 'dog bite' (not an exact match)
- MetaTrans::Base::M_START
-
Match those expressions which are prefixed with the searched expression.
Examples:
'Dog' matches 'dog bite' (incasesensitive) 'Hund' matches 'Hund is los' 'Hund' does not match 'bissiger Hund' ('Hund' is not a prefix)
- MetaTrans::Base::M_EXPR
-
Match those expressions which contain the searched expression, no matter where.
Examples:
'Big Dog' matches 'very big dog' 'big dog' does not match 'big angry dog' ('big dog' is not a substring)
- MetaTrans::Base::M_WORDS
-
Match those expressions which contain all the words of the searched expression.
Examples:
'big dog' matches 'big angry dog' 'big dog' does not match 'angry dog' (not all words are contained)
- MetaTrans::Base::M_ALL
-
Return all without any filtering.
You can
use MetaTrans::Base qw(:match_consts);
to import matching constant names (
M_EXACT
,M_START
, ...) into your program's namespace. - $plugin->match_at_bounds
- $plugin->match_at_bounds($bool)
-
Get/set the match-at-boundaries flag. Setting it to true value makes matching behave in a slightly different way. Subexpressions and words are matched at word boundaries only. In practice this means that with
matching
set toM_WORDS
the expression "big dog" won't be matched to "big angry doggie" while it would be with match-at-boundaries set to false value. The same applies toM_START
andM_EXPR
. The option has no effect whenmatching
is set toM_EXACT
orM_ALL
. - $plugin->default_dir
- $plugin->default_dir($src_lang_code, $dest_lang_code)
-
Get/set the default translation direction. May only be set to supported one, see "SPECIFYING SUPPORTED LANGUAGES". Returns old value as an array of two language codes.
SPECIFYING SUPPORTED LANGUAGES
Every MetaTrans
plug-in has to specify supported languages and translation directions. MetaTrans::Base
provides several methods for doing so. The first step is specifying list of all languages, which appear on the left or right side of any of supported translation directions. Consider your plug-in supports following ones:
English -> French
English -> German
French -> Spanish
Then the list of supported languages is simply English, French, German and Spanish.
The arguments passed to particular methods need to be language codes, not language names. Please see MetaTrans::Languagues for a complete list.
- $plugin->set_languages(@language_codes)
-
Set supported languages to the ones specified by
@language_codes
. In the above exapmle one would call:$plugin->set_languages('eng', 'fre', 'ger', 'spa');
- $plugin->set_dir_1_to_1($src_lang_code, $dest_lang_code)
-
Add support for translating from language with
$src_lang_code
to language with$dest_lang_code
. Both languages need to be previously declared as supported. The method returns true value on success, false value on error. To specify we support directions from the above example we would simply call:$plugin->set_dir_1_to_1('eng', 'fre'); $plugin->set_dir_1_to_1('eng', 'ger'); $plugin->set_dir_1_to_1('fre', 'spa');
- $plugin->unset_dir_1_to_1($src_lang_code, $dest_lang_code)
-
Remove support for translating from language with
$src_lang_code
to language with$dest_lang_code
. Both languages need to be previously declared as supported. The method returns true value on success, false value on error. - $plugin->set_dir_1_to_spec($src_lang_code, @dest_lang_codes)
-
Add support for translating from language with
$src_lang_code
to all languages whichs codes are in@dest_lang_codes
. The direction from$src_lang_code
language to itself won't be set as supported even if$src_lang_code
is specified in@dest_lang_codes
. However, calling$plugin->set_dir_1_to_1($src_lang_code, $src_lang_code);
will do the job if this is what you want. It only results in warning messages if some of the
@dest_lang_codes
are unsupported. Only the supported ones will be used, others are ignored. The method returns number of directions set as supported on (partial) success, 0 on error.Example:
my @all_languages = ('eng', 'fre', 'ger', 'spa'); $plugin->set_languages(@all_languages); $plugin->set_dir_1_to_spec('eng', @all_languages);
... will result in following supported translation directions:
English -> French English -> German English -> Spanish
- $plugin->set_dir_1_to_all($src_lang_code)
-
This is just a shorter way for writting:
$plugin->set_dir_1_to_spec($src_lang_code, @all_codes);
where
@all_codes
is an array of codes of all supported languages. - $plugin->set_dir_spec_to_1($dest_lang_code, @src_lang_codes)
-
This works exactly as
set_dir_1_to_spec
with reversed sides. - $plugin->set_dir_all_to_1($dest_lang_code)
-
This is just a shorter way for writting:
$plugin->set_dir_spec_to_1($dest_lang_code, @all_codes);
where
@all_codes
is an array of codes of all supported languages. Example:my @src_lang_codes = ('ger', 'fre', 'spa'); $plugin->set_languages('eng', 'por', @src_lang_codes); $plugin->set_dir_spec_to_1('eng', @src_lang_codes);
... will result in following supported translation directions:
German -> English French -> English Spanish -> English
But if we replaced the last line with
$plugin->set_dir_all_to_1('eng');
the result would have been:
Portuguese -> English German -> English French -> English Spanish -> English
PLUG-IN REQUIRED METHODS
These are the methods MetaTrans
expects every plug-in to provide. You only need to worry about this if you are writting a plug-in from a scratch. If you are derriving from MetaTrans::Base
all these methods are inherited. They make use of the abstract methods create_request
and process_response
, attribute values and supported translation directions specified using set_dir_*
methods. If you only want to use MetaTrans::Base
as a base class for your plug-in you can stop reading here. Everything you need to know was written above.
If you are writting a plug-in from a scratch you have to make sure it provides all the methods with appropriate functionality specified in this section. In addition, every MetaTrans
plug-in has to provide attribute methods as specified in ATTRIBUTES section.
- $plugin->is_supported_dir($src_lang_code, $dest_lang_code)
-
Returns true value if the translation direction is supported from language with
$src_lang_code
to language with$dest_lang_code
, false value otherwise. - $plugin->get_all_src_lang_codes
-
Returns a list of all language codes, which the plug-in is able to translate from. For example,
('eng', 'fre')
will be returned if supported translation directions are:English -> French English -> Spanish French -> Spanish
- $plugin->get_dest_lang_codes_for_src_lang_code($src_lang_code)
-
Returns a list of all language codes, which the plug-in is able to translate to from the language with $src_lang_code. If called with
'eng'
as an parameter in the above example, returned value would be('fre', 'spa')
. - $plugin->translate($expression [, $src_lang_code, $dest_lang_code])
-
Returns translation of
$expression
as an array of expression-translation pairs in one string separated by" = "
in UTF-8 character encoding. An example output is:("dog = chien", "dog = pitou", "dog days = canicule")
undef
value is returned and an error printed if$src_lang_code -> $dest_lang_code
is an unsupported translation direction.'timeout'
string is returned if timeout occurs when querying online translator,'error'
string is returned on any other error.Default translation direction (see
default_dir
attribute) is used if the method is called with first argument only. - $plugin->get_trans_command($expression, $src_lang_code, $dest_lang_code, $append)
-
This method is a very ugly hack, for which writting
MetaTrans
plug-ins from a scratch is discouraged. See MetaTrans for more information on why this it is required.The
get_trans_command
method is expected to return an array containing command, which if run usingProc::SyncExec::sync_popen_noshell
function will print translations of$expression
from$src_lang_code
language to$dest_lang_code
language (the first element of the array is the program name, list of arguments follows). The command also needs to contain options correspondent to current plug-in attribute values and ensure appropriate behaviour. Each line of the output must correspond to one translation and have following form:expression = translation
In addition, the
$append string
, if specified, should be appendet to each line of the output.
STATIC FUNCTIONS
- is_exact_match($in_expr, $found_expr)
-
Returns true value if the
$found_expr
expression matches input expression$in_expr
when usingM_EXACT
matching options (seematching
attribute). - is_match_at_start($in_expr, $found_expr, $at_bounds)
-
Returns true value if the
$found_expr
expression matches input expression$in_expr
when usingM_START
matching options (seematching
attribute). The$at_bounds
argument corresponds to thematch_at_bounds
attribute. - is_match_expr($in_expr, $found_expr, $at_bounds)
-
Returns true value if the
$found_expr
expression matches input expression$in_expr
when usingM_EXPR
matching options (seematching
attribute). The$at_bounds
argument corresponds to thematch_at_bounds
attribute. - is_match_words($in_expr, $found_expr, $at_bounds)
-
Returns true value if the
$found_expr
expression matches input expression$in_expr
when usingM_WORDS
matching options (seematching
attribute). The$at_bounds
argument corresponds to thematch_at_bounds
attribute. - strip_grammar_info($expression)
-
Returns the
$expression
with all the grammar and meaning information deleted (everything in parantheses or behind a semicolon) in perl's internal UTF-8 format (see Encode). - convert_to_utf8($input_encoding, $string)
-
Converts
$string
from$input_encoding
to UTF-8 encoding. In addition all HTML entities contained in the$string
are converted to corresponding UTF-8 characters. This may sometimes be very useful when writting theprocess_response
method.
OTHER METHODS
- $plugin->run
-
Run the plug-in as a command line application. Very useful for testing and debugging. Try executing following script to see what this does:
#!perl # load a plug-in class derrived from MetaTrans::Base use MetaTrans::UltralinguaNet; # instantiate an object my $plugin = new MetaTrans::UltralinguaNet; # run it $plugin->run;
BUGS
Please report any bugs or feature requests to bug-metatrans@rt.cpan.org
, or through the web interface at http://rt.cpan.org. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
AUTHOR
Jan Pomikalek, <xpomikal@fi.muni.cz>
COPYRIGHT & LICENSE
Copyright 2004 Jan Pomikalek, All Rights Reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
MetaTrans, MetaTrans::Languages, MetaTrans::UltralinguaNet, HTTP::Request, URI::Escape
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 138:
Non-ASCII character seen before =encoding in '(Québécisme)'. Assuming CP1252