NAME

Word2vec::Lesk - Word2vec-Interface Utility Module.

SYNOPSIS

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

my $string_a = "This is a test string";
my $string_b = "This is another test string";

my $lesk_score   = $lesk->CalculateLeskScore( $string_a, $string_b );
my $cosine_score = $lesk->CalculateCosineScore( $string_a, $string_b );
my $f_score      = $lesk->CalcualteFScore( $string_a, $string_b );

print( "Lesk Score: $lesk_score\n"     );
print( "Cosine Score: $cosine_score\n" );
print( "F Score: $f_score\n"           );

undef( $lesk );

or

my $lesk = Word2vec::Lesk->new();

my $string_a = "This is a test string";
my $string_b = "This is another test string";

my %results  = %{ $lesk->CalculateAllScores( $string_a, $string_b ) };

for my $key ( sort keys %results )
{
   print "$key: $results{ $key }\n";
}

undef( %results );
undef( $lesk    );

DESCRIPTION

Word2vec::Lesk is a module of Lesk functions for the Word2vec::Interface package. Lesk, Raw Lesk, Cosine, F, Recall and Precision scores are all calculated and returned to the used based on phrase/feature overlap between two strings.

Main Functions

new

Description:

Returns a new "Word2vec::Lesk" module object.

Note: Specifying no parameters implies default options.

Default Parameters:
   debugLog = 0
   writeLog = 0

Input:

$debugLog -> Instructs module to print debug statements to the console. (1 = True / 0 = False)
$writeLog -> Instructs module to print debug statements to a log file.  (1 = True / 0 = False)

Output:

Word2vec::Lesk object.

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

undef( $lesk );

DESTROY

Description:

Removes Word2vec::Lesk object from memory.

Input:

None

Output:

None

Example:

See above example for "new" function.

Note: Destroy function is also automatically called during global destruction when exiting the program.

GetMatchingFeatures

Description:

Given two strings, this returns a hash of all overlapping (matching) features between both strings and their frequency counts.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$hash_ref -> Returns a hash table reference with keys being the unique matching feature between two input string parameters and the value as the frequency count of each unique feature.

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

my %matching_features = %{ $lesk->GetMatchingFeatures( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

for my $feature ( sort keys %matching_features )
{
   print "$feature : $matching_features{ $feature }\n";
}

undef( %matching_features );
undef( $lesk );

GetPhraseOverlap

Description:

Given two strings, this returns a hash of all overlapping (matching) phrases between both strings and their frequency counts. This prioritizes longer phrases as higher priority when matching.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$hash_ref -> Returns a hash table reference with keys being the unique matching phrase between two input string parameters and the value as the frequency count of each unique phrase.

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

my %phrase_overlaps = %{ $lesk->GetPhraseOverlap( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

for my $phrase ( sort keys %phrase_overlaps )
{
   print "$phrase : $phrase_overlaps{ $phrase }\n";
}

undef( %phrase_overlaps );
undef( $lesk );

CalculateLeskScore

Description:

Given two strings, this returns a lesk score based on overlapping (matching) features between both strings.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$score    -> Lesk Score (Float)

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

my $lesk_score = $lesk->CalculateLeskScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

print "Lesk Score: $lesk_score\n";

undef( $lesk );

CalculateCosineScore

Description:

Given two strings, this returns a cosine score based on overlapping (matching) features between both strings.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$score    -> Cosine Score (Float)

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

my $cosine_score = $lesk->CalculateCosineScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

print "Cosine Score: $cosine_score\n";

undef( $lesk );

CalculateFScore

Description:

Given two strings, this returns a F score based on overlapping (matching) features between both strings.

Input:

$string_a -> First comparison string
$string_b -> Second comparison string

Output:

$score    -> F Score (Float)

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

my $f_score = $lesk->CalculateFScore( "I like to eat cookies", "Sometimes I like to eat cookies" );

print "F Score: $f_score\n";

undef( $lesk );

CalculateAllScores

Description:

Given two strings, this returns a list of scores (F, Cosine, Lesk, Raw Lesk, Precision, Recall), frequency counts (features, phrases, string lengths).

Input:

$string_a    -> First comparison string
$string_b    -> Second comparison string

Output:

$result_hash -> Hash reference containing: Lesk, Raw Lesk, F, Precision, Recall, Cosine, Matching Feature Frequency, Matching Phrase Frequency, String A Length and String B Length.

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();

my %scores = %{ $lesk->CalculateAllScores( "I like to eat cookies", "Sometimes I like to eat cookies" ) };

for my $score_name ( sort keys %scores )
{
   print "$score_name : $scores{ $score_name }\n";
}

undef( $lesk );

Accessor Functions

GetDebugLog

Description:

Returns the _debugLog member variable set during Word2vec::Lesk object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new()
my $debugLog = $lesk->GetDebugLog();

print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;

undef( $lesk );

GetWriteLog

Description:

Returns the _writeLog member variable set during Word2vec::Lesk object initialization of new function.

Input:

None

Output:

$value -> '0' = False, '1' = True

Example:

use Word2vec::Lesk;

my $lesk = Word2vec::Lesk->new();
my $writeLog = $lesk->GetWriteLog();

print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;

undef( $lesk );

Debug Functions

WriteLog

Description:

Prints passed string parameter to the console, log file or both depending on user options.

Note: printNewLine parameter prints a new line character following the string if the parameter
is undefined and does not if parameter is 0.

Input:

$string -> String to print to the console/log file.
$value  -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.

Output:

None

Example:

use Word2vec::Lesk:

my $lesk = Word2vec::Lesk->new();
$lesk->WriteLog( "Hello World" );

undef( $lesk );

Author

Clint Cuffy, Virginia Commonwealth University

COPYRIGHT

Copyright (c) 2016

Bridget T McInnes, Virginia Commonwealth University
btmcinnes at vcu dot edu

Clint Cuffy, Virginia Commonwealth University
cuffyca at vcu dot edu

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to:

The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA  02111-1307, USA.