NAME

NNexus::Classification - Dismabiguation logic for NNexus concept harvests

SYNOPSIS

use NNexus::Classification qw(disambiguate msc_similarity);
my $concepts_refined = disambiguate($concept_harvest,%options);
my $similarity_score = msc_similarity($category1,$category2);

DESCRIPTION

NNexus::Classification contains disambiguation and clustering algorithms for determining a subset of "relevant" concept candidates from a given concept harvest. Relevance is determined heuristically. The current algorithm considers two facets of "relevance":

1. Relevant candidates come from empirically similar domains of knowledge.

To this extent, a similarity metric has been extracted from 3+ million mathematical reviews
in Zentrallblatt Math, each annotated with categories from the Math Subject Classification.

2. Technical terms are more likely to be relevant. Consequently:
 - The more words in a candidate, the more likely that it is a term
 - The more characters in a candidate, the more likely that it is a term

METHODS

my $concepts_refined = disambiguate($concept_harvest,%options);
Disambiguates a concept harvest, as returned by NNexus::Discover, following the
algorithm in the description.

Currently the only accepted option is a boolean value for "verbosity".
my $similarity_score = msc_similarity($category1,$category2);

AUTHOR

Deyan Ginev <d.ginev@jacobs-university.de>

COPYRIGHT

Research software, produced as part of work done by 
the KWARC group at Jacobs University Bremen.
Released under the MIT License (MIT)