NAME

WWW::Wikipedia::LangTitles - get interwiki links from Wikipedia.

SYNOPSIS

use utf8;
use WWW::Wikipedia::LangTitles 'get_wiki_titles';
my $title = 'Three-phase electric power';
my $links = get_wiki_titles ($title);
print "$title is '$links->{de}' in German.\n";
my $film = '東京物語';
my $flinks = get_wiki_titles ($film, lang => 'ja');
print "映画「$film」はイタリア語で「$flinks->{it}」と名付けた。\n";

produces output

Three-phase electric power is 'Dreiphasenwechselstrom' in German.
映画「東京物語」はイタリア語で「Viaggio a Tokyo」と名付けた。

(This example is included as synopsis.pl in the distribution.)

VERSION

This documents version 0.04 of WWW::Wikipedia::LangTitles corresponding to git commit cd5d0156c401472bc424421159fca7d3c0f769fe released on Thu Jul 20 13:15:53 2017 +0900.

DESCRIPTION

This module retrieves the Wikipedia interwiki link titles from the web site wikidata.org. It can be used, for example, to translate a term in English into other languages, or to get near equivalents.

FUNCTIONS

get_wiki_titles

my $ref = get_wiki_titles ('Helium');

Given a word or phrase as an argument, which is the title of a Wikipedia article, the return value is a hash reference containing keys which are language codes, and values which are the names of the equivalent Wikipedia article in other languages. For example, in the above case of Helium, $ref->{th} will be equal to ฮีเลียม, the Thai title of the Wikipedia article on helium.

The language of the original page can be specified like this:

use utf8;
my $from_th = get_wiki_titles ('ฮีเลียม', lang => 'th');

The URL is encoded using "uri_escape_utf8" in URI::Escape, so use character, not byte, strings (use "use utf8;" etc.)

As of version 0.04, get_wiki_titles deletes the non-encyclopedia sites like Wikiquote and Wikiversity from the list of returned values.

make_wiki_url

my $url = make_wiki_url ('helium');

Make a URL for the Wikidata page. You will then need to retrieve the page and parse the JSON yourself. Use a second argument to specify the language of the page:

use utf8;
use WWW::Wikipedia::LangTitles 'make_wiki_url';
print make_wiki_url ('ฮีเลียม', 'th'), "\n";

produces output

https://www.wikidata.org/w/api.php?action=wbgetentities&sites=thwiki&titles=%E0%B8%AE%E0%B8%B5%E0%B9%80%E0%B8%A5%E0%B8%B5%E0%B8%A2%E0%B8%A1&props=sitelinks/urls|datatype&format=json

(This example is included as thai-url.pl in the distribution.)

If no language is specified, the default is en for English.

This method was added in version 0.02 of the module.

SEE ALSO

Locale::Codes

This module enables one to convert the language key names given by this module into the English-language names of the languages.

use utf8;
use FindBin '$Bin';
use WWW::Wikipedia::LangTitles 'get_wiki_titles';
use Locale::Codes::Language;
my $article = 'King Kong';
my $titles = get_wiki_titles ($article);
for my $lang (keys %$titles) {
    my $l2c = code2language ($lang);
    if (! $l2c) {
        $l2c = $lang;
    }
    my $name = $titles->{$lang};
    if ($name ne $article) {
        print "$name in $l2c.\n";
    }
}

produces output

king.kong in jbo.
קינג קונג in Hebrew.
Кинг Конг in Bulgarian.
キングコング in Japanese.
كينغ كونغ in Arabic.
Кінг-Конг in Ukrainian.
King Kong (hahmo) in Finnish.
金剛 (怪獸) in Chinese.
Քինգ Քոնգ in Armenian.
คิงคอง in Thai.
کینگ کونگ in Persian.
Кинг-Конг in Russian.
킹콩 in Korean.
კინგ კონგი in Georgian.

(This example is included as locale-codes.pl in the distribution.)

DEPENDENCIES

Carp

Carp is used to report errors

LWP::UserAgent

LWP::UserAgent is used to retrieve the data from Wikidata.

JSON::Parse

JSON::Parse is used to parse the JSON data from Wikidata.

URI::Escape

URI::Escape is used to make the URLs for Wikidata from the input titles.

EXPORTS

Nothing is exported by default. The export tag ':all' exports all the functions of the module.

use WWW::Wikipedia::LangTitles ':all';

TESTING

The default tests of the module do not attempt to connect to the internet. To test using an internet connection, run xt/scrape.t like this:

prove -I lib xt/scrape.t

from the top directory of the distribution.

HISTORY

This module was a collection of small scripts I had been using to scrape multilingual article names related to physics from Wikipedia. I made the scripts into a CPAN module because I thought it could be useful to other people. Specifically, I used my scripts to add some Japanese element names to Chemistry::Elements, and I thought this method might be useful for someone else.

Version 0.02 added the "make_wiki_url" for people who want to retrieve and parse the output themselves.

AUTHOR

Ben Bullock, <bkb@cpan.org>

COPYRIGHT & LICENCE

This package and associated files are copyright (C) 2016-2017 Ben Bullock.

You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.