NAME
Word2vec::Word2phrase - word2vec's word2phrase wrapper module.
SYNOPSIS
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetMinCount( 12 );
$w2p->SetMaxCount( 20 );
$w2p->SetTrainFilePath( "textCorpus.txt" );
$w2p->SetOutputFilePath( "phraseTextCorpus.txt" );
$w2p->ExecuteTraining();
undef( $w2p );
# or
my $w2p = Word2vec::Word2phrase->new();
$w2p->ExecuteTraining( $trainFilePath, $outputFilePath, $minCount, $threshold, $debug, $overwrite );
undef( $w2p );
DESCRIPTION
Word2vec::Word2phrase is a word2vec package tool that "compoundifies" bi-grams in a text corpus based on a minimum and maximum frequency.
Main Functions
new
Description:
Returns a new 'Word2vec::Word2phrase' module object.
Note: Specifying no parameters implies default options.
Default Parameters:
debugLog = 0
writeLog = 0
trainFilePath = ""
outputFilePath = ""
minCount = 5
threshold = 100
setW2PDebug = 2
workingDir = Current Directory
word2PhraseExeDir = Word2Phrase Executable Directory
overwriteOldFile = 0
Input:
$debugLog -> Instructs module to print debug statements to the console. (1 = True / 0 = False)
$writeLog -> Instructs module to print debug statements to a log file. (1 = True / 0 = False)
$trainFilePath -> Specifies the training text corpus for word2phrase training. (String)
$outputFilePath -> Specifies the output path for post word2phrase training. (String)
$minCount -> Specifies the minimum range value for bi-gram 'compoundification'. (Positive Integer)
$threshold -> Specifies the maximum range value for bi-gram 'compoundification'. (Positive Integer)
$setW2PDebug -> Specifies the word2phrase debug information parameter value to show during training. (Integer)
$workingDir -> Specifies the current working directory. (String)
$word2PhraseExeDir -> Specifies word2phrase executable directory. (String)
$overwriteOldFile -> Instructs the module to either overwrite any existing data with the same output file name and path. ( '1' or '0' )
Note: It is not recommended to specify all new() parameters, as it has not been thoroughly tested.
Output:
Word2vec::Word2phrase object.
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
undef( $w2p );
DESTROY
Description:
Removes member variables and file handle from memory.
Input:
None
Output:
None
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->DESTROY();
undef( $w2p );
ExecuteTraining
Description:
Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables.
Any parameter specified will override its respective member variable.
Note: If no parameters are specified, this module executes word2phrase training based on preset member
variables. Returns string regarding training status.
Input:
$trainFilePath -> Training text corpus file path
$outputFilePath -> Vector binary file path
$minCount -> Minimum bi-gram frequency (Positive Integer)
$threshold -> Maximum bi-gram frequency (Positive Integer)
$debug -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information)
$overwrite -> Overwrites old training file when executing training. (0 = False / 1 = True)
Output:
$value -> '0' = Successful / '-1' = Un-successful
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetMinCount( 12 );
$w2p->SetMaxCount( 20 );
$w2p->SetTrainFilePath( "textCorpus.txt" );
$w2p->SetOutputFilePath( "phraseTextCorpus.txt" );
$w2p->ExecuteTraining();
undef( $w2p );
# Or
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->ExecuteTraining( "textCorpus.txt", "phraseTextCorpus.txt", 12, 20, 2, 1 );
undef( $w2p );
ExecuteStringTraining
Description:
Executes word2phrase training based on parameters. Parameter variables have higher precedence than member variables.
Any parameter specified will override its respective member variable.
Note: If no parameters are specified, this module executes word2phrase training based on preset member
variables. Returns string regarding training status.
Input:
$trainingString -> String to train
$outputFilePath -> Vector binary file path
$minCount -> Minimum bi-gram frequency (Positive Integer)
$threshold -> Maximum bi-gram frequency (Positive Integer)
$debug -> Displays word2phrase debug information during training. (0 = None, 1 = Show Debug Information, 2 = Show Even More Debug Information)
$overwrite -> Overwrites old training file when executing training. (0 = False / 1 = True)
Output:
$value -> '0' = Successful / '-1' = Un-successful
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetMinCount( 12 );
$w2p->SetMaxCount( 20 );
$w2p->SetTrainFilePath( "large string to train here" );
$w2p->SetOutputFilePath( "phraseTextCorpus.txt" );
$w2p->ExecuteTraining();
undef( $w2p );
# Or
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->ExecuteTraining( "large string to train here", "phraseTextCorpus.txt", 12, 20, 2, 1 );
undef( $w2p );
GetOSType
Description:
Returns the operating system type string.
Input:
None
Output:
$string -> Operating system string.
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $operatingSystem = $w2p->GetOSType();
print( "Operating System: $operatingSystem\n" ) if defined( $operatingSystem );
undef( $w2p );
Accessor Functions
GetDebugLog
Description:
Returns the _debugLog member variable set during Word2vec::Word2phrase object initialization of new function.
Input:
None
Output:
$value -> 0 = False, 1 = True
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $debugLog = $w2p->GetDebugLog();
print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;
undef( $w2p );
GetWriteLog
Description:
Returns the _writeLog member variable set during Word2vec::Word2phrase object initialization of new function.
Input:
None
Output:
$value -> 0 = False, 1 = True
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $writeLog = $w2p->GetWriteLog();
print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;
undef( $w2p );
GetFileHandle
Description:
Returns file handle used by WriteLog() method.
Input:
None
Output:
$fileHandle -> Returns file handle blob used by 'WriteLog()' function or undefined.
Example:
<This should not be called.>
GetTrainFilePath
Description:
Returns (string) training file path.
Input:
None
Output:
$string -> word2phrase training file path
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $filePath = $w2p->GetTrainFilePath();
print( "Output File Path: $filePath\n" ) if defined( $filePath );
undef( $w2p );
GetOutputFilePath
Description:
Returns (string) output file path.
Input:
None
Output:
$string -> word2phrase output file path
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $filePath = $w2p->GetOutputFilePath();
print( "Output File Path: $filePath\n" ) if defined( $filePath );
undef( $w2p );
GetMinCount
Description:
Returns (integer) minimum bi-gram range.
Input:
None
Output:
$value -> Minimum bi-gram frequency (Positive Integer)
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $mincount = $w2p->GetMinCount();
print( "MinCount: $mincount\n" ) if defined( $mincount );
undef( $w2p );
GetThreshold
Description:
Returns (integer) maximum bi-gram range.
Input:
None
Output:
$value -> Maximum bi-gram frequency (Positive Integer)
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $mincount = $w2p->GetThreshold();
print( "MinCount: $mincount\n" ) if defined( $mincount );
undef( $w2p );
GetW2PDebug
Description:
Returns word2phrase debug parameter value.
Input:
None
Output:
$value -> 0 = No debugging, 1 = Show debugging, 2 = Show even more debugging
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $w2pdebug = $w2p->GetW2PDebug();
print( "Word2Phrase Debug Level: $w2pdebug\n" ) if defined( $w2pdebug );
undef( $w2p );
GetWorkingDir
Description:
Returns (string) working directory path.
Input:
None
Output:
$string -> Current working directory path
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $workingDir = $w2p->GetWorkingDir();
print( "Working Directory: $workingDir\n" ) if defined( $workingDir );
undef( $w2p );
GetWord2PhraseExeDir
Description:
Returns (string) word2phrase executable directory path.
Input:
None
Output:
$string -> Word2Phrase executable directory path
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $workingDir = $w2p->GetWord2PhraseExeDir();
print( "Word2Phrase Executable Directory: $workingDir\n" ) if defined( $workingDir );
undef( $w2p );
GetOverwriteOldFile
Description:
Returns the current value of the overwrite training file variable.
Input:
None
Output:
$value -> 1 = True/Overwrite or 0 = False/Append to current file
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
my $overwrite = $w2p->GetOverwriteOldFile();
if defined( $overwrite )
{
print( "Overwrite Old File: " );
print( "Yes\n" ) if $overwrite == 1;
print( "No\n" ) if $overwrite == 0;
}
undef( $w2p );
Mutator Functions
SetTrainFilePath
Description:
Sets training file path.
Input:
$string -> Training file path
Output:
None
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetTrainFilePath( "filePath" );
undef( $w2p );
SetOutputFilePath
Description:
Sets word2phrase output file path.
Input:
$string -> word2phrase output file path
Output:
None
Example:
use Word2vec::Word2phrase;
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetOutputFilePath( "filePath" );
undef( $w2p );
SetMinCount
Description:
Sets minimum range value.
Input:
$value -> Minimum frequency value (Positive integer)
Output:
None
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetMinCount( 1 );
undef( $w2p );
SetThreshold
Description:
Sets maximum range value.
Input:
$value -> Maximum frequency value (Positive integer)
Output:
None
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetThreshold( 100 );
undef( $w2p );
SetW2PDebug
Description:
Sets word2phrase debug parameter.
Input:
$value -> word2phrase debug parameter (0 = No debug info, 1 = Show debug info, 2 = Show more debug info.)
Output:
None
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetW2PDebug( 2 );
undef( $w2p );
SetWorkingDir
Description:
Sets working directory path.
Input:
$string -> Current working directory path.
Output:
None
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetWorkingDir( "filePath" );
undef( $w2p );
SetWord2PhraseExeDir
Description:
Sets word2phrase executable file directory path.
Input:
$string -> Word2Phrase executable directory path.
Output:
None
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetWord2PhraseExeDir( "filePath" );
undef( $w2p );
SetOverwriteOldFile
Description:
Enables overwriting word2phrase output file if one already exists with the same output file name.
Input:
$value -> Integer: 1 = Overwrite old file, 0 = No not overwrite old file.
Output:
None
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
$w2p->SetOverwriteOldFile( 1 );
undef( $w2p );
Debug Functions
GetTime
Description:
Returns current time string in "Hour:Minute:Second" format.
Input:
None
Output:
$string -> XX:XX:XX ("Hour:Minute:Second")
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
my $time = $w2p->GetTime();
print( "Current Time: $time\n" ) if defined( $time );
undef( $w2p );
GetDate
Description:
Returns current month, day and year string in "Month/Day/Year" format.
Input:
None
Output:
$string -> XX/XX/XXXX ("Month/Day/Year")
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
my $date = $w2p->GetDate();
print( "Current Date: $date\n" ) if defined( $date );
undef( $w2p );
WriteLog
Description:
Prints passed string parameter to the console, log file or both depending on user options.
Note: printNewLine parameter prints a new line character following the string if the parameter
is undefined and does not if parameter is 0.
Input:
$string -> String to print to the console/log file.
$value -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.
Output:
None
Example:
use Word2vec::Word2phrase:
my $w2p = Word2vec::Word2phrase->new();
$w2p->WriteLog( "Hello World" );
undef( $w2p );
Author
Clint Cuffy, Virginia Commonwealth University
COPYRIGHT
Copyright (c) 2016
Bridget T McInnes, Virginia Commonwealth University
btmcinnes at vcu dot edu
Clint Cuffy, Virginia Commonwealth University
cuffyca at vcu dot edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.