NAME
Word2vec::Util - Word2vec-Interface Utility Module.
SYNOPSIS
use Word2vec::Util;
my $util = Word2vec::Util->new();
my $result = $util->IsFileOrDirectory( "../samples/stoplist" );
print( "Path Type Is A File\n" ) if $result eq "file";
print( "Path Type Is A Directory\n" ) if $result eq "dir";
print( "Path Type Is Unknown\n" ) if $result eq "unknown";
undef( $util );
DESCRIPTION
Word2vec::Util is a module of utility functions for the Word2vec::Interface package.
Main Functions
new
Description:
Returns a new "Word2vec::Util" module object.
Note: Specifying no parameters implies default options.
Default Parameters:
debugLog = 0
writeLog = 0
Input:
$debugLog -> Instructs module to print debug statements to the console. (1 = True / 0 = False)
$writeLog -> Instructs module to print debug statements to a log file. (1 = True / 0 = False)
Output:
Word2vec::Util object.
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
undef( $util );
DESTROY
Description:
Removes Word2vec::Util object from memory.
Input:
None
Output:
None
Example:
See above example for "new" function.
Note: Destroy function is also automatically called during global destruction when exiting the program.
CleanText
Description:
Normalizes text based on the following.
- Text converted to lowercase
- More than one white space is replaced with a single white space
- Apostrophe "s" ('s) characters are removed
- Hyphen character is replaced with a single white space
- All special characters removed outside of lowercase 'a-z' and compoundified terms retained, joined by '_' (underscore).
- Line-feed/carriage return (LF-CR) endings are cleaned and converted to OS specific LF-CR endings
Input:
$string -> String of text to normalize
Output:
$string -> Cleaned/Normalized text.
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
my $text = "123485clean-text!!@&^#*@";
print( "Original Text: \"$text\"\n" );
$text = $util->CleanText( $text );
print( "Cleaned Text: \"$text\"\n" );
undef( $util );
RemoveNewLineEndingsFromString
Description:
Removes new line endings from string. Supports MSWin32, linux and MacOS line endings.
Input:
$string -> String with line-feed/carriage return ending to remove.
Output:
$string -> String without line-feed/carriage return ending.
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
my $text = "this is sample text\n";
print( "Original Text: \"$text\"\n" );
$text = $util->RemoveNewLineEndingsFromString( $text );
print( "Cleaned Text: \"$text\"\n" );
undef( $util );
IsFileOrDirectory
Description:
Given a path, returns a string specifying whether this path represents a file or directory.
Input:
$path -> String representing path to check
Output:
$string -> Returns "file", "dir" or "unknown".
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
my $result = $util->IsFileOrDirectory( "../samples/stoplist" );
print( "Path Type Is A File\n" ) if $result eq "file";
print( "Path Type Is A Directory\n" ) if $result eq "dir";
print( "Path Type Is Unknown\n" ) if $result eq "unknown";
undef( $util );
IsWordOrCUITerm
Description:
Checks whether the passed string argument is word or CUI term.
Input:
$string -> Word or CUI string term
Output:
$string -> Returns "cui", "word" or undef
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
my $result = $util->IsWordOrCUITerm( "Cookie" );
print( "Passed String Argument Term Type: \"$result\"\n" ) if defined( $result );
print( "Cannot Determine String Argument Term Type\n" ) if !defined( $result );
my $result = $util->IsWordOrCUITerm( "C08132016" );
print( "Passed String Argument Term Type: \"$result\"\n" ) if defined( $result );
print( "Cannot Determine String Argument Term Type\n" ) if !defined( $result );
undef( $util );
GetFilesInDirectory
Description:
Given a path and file tag string, returns a string of files consisting of the file tag string in the specified path.
Input:
$path -> String representing path
$fileTag -> String consisting of file tag to fetch.
Output:
$string -> Returns string of file names consisting of $fileTag.
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
# Looks in specified path for files including ".sval" in their file name.
my $result = $util->GetFilesInDirectory( "../samples/", ".sval" );
print( "Found File Name(s): $result\n" ) if defined( $result );
undef( $util );
GetOSType
Description:
Returns (string) operating system type.
Input:
None
Output:
$string -> Operating System String
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
my $result = $util->GetOSType();
print( "Current OS Type: $result\n" ) if defined( $result );
undef( $util );
Accessor Functions
GetDebugLog
Description:
Returns the _debugLog member variable set during Word2vec::Util object initialization of new function.
Input:
None
Output:
$value -> '0' = False, '1' = True
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new()
my $debugLog = $util->GetDebugLog();
print( "Debug Logging Enabled\n" ) if $debugLog == 1;
print( "Debug Logging Disabled\n" ) if $debugLog == 0;
undef( $util );
GetWriteLog
Description:
Returns the _writeLog member variable set during Word2vec::Util object initialization of new function.
Input:
None
Output:
$value -> '0' = False, '1' = True
Example:
use Word2vec::Util;
my $util = Word2vec::Util->new();
my $writeLog = $util->GetWriteLog();
print( "Write Logging Enabled\n" ) if $writeLog == 1;
print( "Write Logging Disabled\n" ) if $writeLog == 0;
undef( $util );
Debug Functions
WriteLog
Description:
Prints passed string parameter to the console, log file or both depending on user options.
Note: printNewLine parameter prints a new line character following the string if the parameter
is undefined and does not if parameter is 0.
Input:
$string -> String to print to the console/log file.
$value -> 0 = Do not print newline character after string, all else prints new line character including 'undef'.
Output:
None
Example:
use Word2vec::Util:
my $util = Word2vec::Util->new();
$util->WriteLog( "Hello World" );
undef( $util );
Author
Clint Cuffy, Virginia Commonwealth University
COPYRIGHT
Copyright (c) 2016
Bridget T McInnes, Virginia Commonwealth University
btmcinnes at vcu dot edu
Clint Cuffy, Virginia Commonwealth University
cuffyca at vcu dot edu
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc.,
59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA.