NAME

Lingua::Thesaurus - Thesaurus management

SYNOPSIS

Creating a thesaurus

my $thesaurus = Lingua::Thesaurus->new(SQLite => $dbname);
$thesaurus->load($io_class => @files);
$thesaurus->load($io_class => {$origin1 => $file1, ...});
$thesaurus->load($io_class => {files => \@files,
                               params  => {termClass => ..,
                                           relTypeClass => ..}});

Using a thesaurus

my $thesaurus = Lingua::Thesaurus->new(SQLite => $dbname);

my @terms = $thesaurus->search_terms('*foo*');
my $term  = $thesaurus->fetch_term('foobar');

my $scope_note = $term->SN; # returns a string
my @synonyms   = $term->UF; # returns a list of other terms

foreach my $pair ($term->related(qw/NT RT/)) {
  my ($rel_type, $item) = @$pair;
  printf "  %s(%s) = %s\n", $rel_type->description, $rel_type->rel_id, $item;
}

# transitive search
foreach my $quadruple ($term->transitively_related(qw/NT/)) {
  my ($rel_type, $related_term, $through_term, $level) = @$quadruple;
  printf "  %s($level): %s (through %s)\n", 
     $rel_type->rel_id,
     $level,
     $related_term->string,
     $through_term->string;
}

DESCRIPTION

This distribution manages thesauri. A thesaurus is a list of terms, with some relations (like for example "broader term" / "narrower term"). Relations are either "internal" (between two terms), or "external" (between a term and some external data, like for example a "Scope Note"). Relations may have a reciprocal; see Lingua::Thesaurus::RelType.

Thesauri are loaded from one or several IO formats; usually this will be the ISO 2788 format, or some derivative from it. See classes under the Lingua::Thesaurus::IO namespace for various implementations.

Once loaded, thesauri are stored via a storage class; this is meant to be an efficient internal structure for supporting searches. Currently, only Lingua::Thesaurus::Storage::SQLite is implemented; but the architecture allows for other storage classes to be defined, as long as they comply with the Lingua::Thesaurus::Storage role.

Terms are retrieved through the "search_terms" and "fetch_term" methods. The results are instances of Lingua::Thesaurus::Term; these objects have navigation methods for retrieving related terms.

This distribution was originally targeted for dealing with the Swiss thesaurus for justice "Jurivoc" (see Lingua::Thesaurus::IO::Jurivoc). However, the framework should be easily extensible to other needs. Other Perl modules for thesauri are briefly discussed below in the "SEE ALSO" section.

Side note: another motivation for writing this distribution was also to experiment with Moose meta-programming possibilities. Subclasses of Lingua::Thesaurus::Term are created dynamically for implementing relation methods NT, BT, etc. --- see Lingua::Thesaurus::Storage source code.

Caveat: at the moment, IO classes only implement loading and searching; methods for editing and dumping a thesaurus will be added in a future version.

METHODS

new

my $thesaurus = Lingua::Thesaurus->new($storage_class => @storage_args);

Instanciates a thesaurus on a given storage. The $storage_class will be automatically prefixed by Lingua::Thesaurus::Storage::, unless the classname contains an initial '+'. The remaining arguments are transmitted to the storage class. Since Lingua::Thesaurus::Storage::SQLite is the default storage class supplied with this distribution, thesauri are usually opened as

my $dbname = '/path/to/some/file.sqlite';
my $thesaurus = Lingua::Thesaurus->new(SQLite => $dbname);

load

$thesaurus->load($io_class => @files);
$thesaurus->load($io_class => {$origin1 => $file1, ...});
$thesaurus->load($io_class => {files => \@files,
                               params  => {termClass    => ..,
                                           relTypeClass => ..}});

Populates a thesaurus database with data from thesauri dumpfiles. The job of parsing these files is delegated to some IO subclass, given as first argument. The $io_class will be automatically prefixed by Lingua::Thesaurus::IO::, unless the classname contains an initial '+'. The remaining arguments are transmitted to the IO class; the simplest form is just a list of dumpfiles, or a hashref of pairs {$origin1 => $dumpfile1, ...}. Each $origin is a string for tagging terms coming from that dumpfile; while interrogating the thesaurus, origins can be retrieved from $term->origin. See IO subclasses in the Lingua::Thesaurus::IO namespace for more details.

search_terms

my @terms = $thesaurus->search_terms($pattern, $origin);

Searches the term database according to $pattern, where the pattern may contain '*' to mean word completion.

The interpretation of patterns depends on the storage engine; by default, this is implemented using SQLite's "LIKE" function (see http://www.sqlite.org/lang_expr.html#like). Characters '*' in the pattern are translated into '%' for the LIKE function to work as expected.

It is also possible to configure the storage to use fulltext searches, so that a pattern such as 'sci*' would also match 'computer science'; see "use_fulltext" in Lingua::Thesaurus::Storage::SQLite.

If $pattern is empty, the method returns the list of all terms in the thesaurus.

The second argument $origin is optional; it may be used to restrict the search on terms loaded from one specific origin.

Results are instances of Lingua::Thesaurus::Term.

fetch_term

my $term = $thesaurus->fetch_term($term_string, $origin);

Retrieves a specific term and returns an instance of Lingua::Thesaurus::Term (or undef if the term is unknown). The second argument $origin is optional.

rel_types

Returns the list of ids of relation types stored in this thesaurus (i.e. 'NT', 'RT', etc.).

fetch_rel_type

my $rel_type = $thesaurus->fetch_rel_type($rel_type_id);

Returns the Lingua::Thesaurus::RelType object corresponding to $rel_type_id.

storage

Returns the internal object playing role Lingua::Thesaurus::Storage.

FURTHER DOCUMENTATION

More details can be found in the various implementation classes :

Lingua::Thesaurus::IO : Role for input/output operations on a thesaurus
Lingua::Thesaurus::IO::ISO2788 : IO class for ISO thesauri (not implemented yet)
Lingua::Thesaurus::IO::Jurivoc : IO class for "Jurivoc", the Swiss thesaurus for justice
Lingua::Thesaurus::IO::LivelinkCollectionServer : IO class for Livelink Collection Server thesaurus files
Lingua::Thesaurus::RelType : Relation type in a thesaurus
Lingua::Thesaurus::Storage: Role for thesaurus storage
Lingua::Thesaurus::Storage::SQLite: Thesaurus storage in an SQLite database
Lingua::Thesaurus::Term: parent class for thesaurus terms; in particular, this class implements methods for navigating through relations.

AUTHOR

Laurent Dami, <dami at cpan.org>

BUGS

Please report any bugs or feature requests to bug-lingua-thesaurus at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-Thesaurus. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Lingua::Thesaurus

You can also look for information at:

RT: CPAN's request tracker (report bugs here)

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Lingua-Thesaurus
AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Lingua-Thesaurus
CPAN Ratings

http://cpanratings.perl.org/d/Lingua-Thesaurus
Search MetaCPAN

https://metacpan.org/module/Lingua::Thesaurus

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

The test suite contains a short excerpt from the Swiss Jurivoc thesaurus, copyright 1999-2012 Tribunal fédéral Suisse (see http://www.bger.ch/fr/index/juridiction/jurisdiction-inherit-template/jurisdiction-jurivoc-home.htm).

TODO

Thesaurus

- support for multiple thesauri files (a term belongs to one-to-many
  thesaurus files; a relation belongs to exactly one thesaurus file)

SQLite

- use_unaccent without fulltext ==> use collation sequence or redefine LIKE
- store thesaurus name for each term
   => adapt search_terms($pattern, $thes_name);

To install Lingua::Thesaurus, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lingua::Thesaurus

CPAN shell

perl -MCPAN -e shell
install Lingua::Thesaurus

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)