Name
Text::SenseClusters::LabelEvaluation::ReadingFilesData - Module for reading the data from a file as single string object.
SYNOPSIS
The following code snippet will show how to use this module.
package Text::SenseClusters::LabelEvaluation::Test_ReadingFilesData;
# Including the LabelEvaluation Module.
use Text::SenseClusters::LabelEvaluation::ReadingFilesData;
# Including the FileHandle module.
use FileHandle;
# The following block-of-code, create a file and write the data into it.
# At the end of this test program, we will delete that file.
# File that will contain the topic information.
my $topicFileName = "temp_TopicData.txt";
# Defining the file handle for the topic file.
our $topicFileHandle = FileHandle->new(">$topicFileName");
# Writing into the Topic file.
# Bill Clinton , Tony Blair
print $topicFileHandle "Bill Clinton is an American politician who served as the 42nd President of".
"the United States from 1993 to 2001. Inaugurated at age 46, he was the third-youngest president.".
"He took office at the end of the Cold War, and was the first president of the baby boomer generation.".
"Clinton has been described as a New Democrat. Many of his policies have been attributed to a centrist".
"Third Way philosophy of governance. He is married to Hillary Rodham Clinton, who has served as the".
"United States Secretary of State since 2009 and was a Senator from New York from 2001 to 2009.".
"As Governor of Arkansas, Clinton overhauled the state's education system, and served as Chair ".
"of the National Governors Association.Clinton was elected president in 1992, defeating incumbent".
"president George H. W. Bush. The Congressional Budget Office reported a budget surplus between ".
"the years 1998 and 2000, the last three years of Clinton's presidency. Since leaving office,".
"Clinton has been rated highly in public opinion polls of U.S. presidents. \n";
# Closing file handle.
close($topicFileHandle);
# END OF file creation block.
# The following code will call the readLinesFromTopicFile() function from the
# ReadingFilesData modules. It will return the content of the file in a string.
my $fileData = Text::SenseClusters::LabelEvaluation::ReadingFilesData::readLinesFromTopicFile(
$topicFileName);
# Printing the content of the file.
print "\n Data of the input file is $fileData \n";
# Deleting the temporary label and topic files.
unlink $topicFileName or warn "Could not unlink $topicFileName: $!";
DESCRIPTION
This module provides the two functions. The first function reads the labelled
data generated by the SenseClusters and create hash from it. The data of the
input file must match the format of label-file generated by SenseClusters.
The second function reads a file into a string variable by removing all the
newline characters from it.
Function: readLinesFromClusterFile ------------------------------------------------
This function will read lines from the file containing the Labels of the Clusters and make the hash file.
@argument1 : Name of the cluster file name.
@argument2 : Reference of Hash ($labelSenseClustersHash) which will hold the information in the following format:
For e.g.: Cluster0{
Descriptive => George Bush, Al Gore, White
House, New York
Discriminating => George Bush, York Times
}
Cluster1{
Descriptive => George Bush, BRITAIN London,
Prime Minister
Discriminating => BRITAIN London, Prime Minister
}
@return : It will return the reference of the Hash mentioned above: $labelSenseClustersHashRef.
@description :
1. Read the file line by line.
2. Ignore the lines which do not follow one of the following format:
Cluster 0 (Descriptive): George Bush, Al Gore, White House, New York
Cluster 0 (Discriminating): George Bush, BRITAIN London
3. Create Key from the "Cluster # (Descriptive)" or "Cluster # (Discrim - inating)" as "OuterKey: Cluster#" "InnerKey: Descriptive".
4. Store the value of hash as the keywords similar to above example: for e.g:
$labelSenseClustersGlobalRef{Cluster0}{Discriminating}
= "BRITAIN London, Prime Minister";
Function: readLinesFromTopicFile
------------------------------------------------
This function will read lines from the topic file and list of all the topics.
@argument1 : Name of the topicFile.
@return : String containing the list of all the topics(labels) for the clusters.
@description :
1. Read the file line by line.
2. Remove the new line characters and making string variable which
contains the list of all the topics.
SEE ALSO
http://senseclusters.cvs.sourceforge.net/viewvc/senseclusters/LabelEvaluation/
@Last modified by : Anand Jha @Last_Modified_Date : 24th Dec. 2012 @Modified Version : 1.6
AUTHORS
Ted Pedersen, University of Minnesota, Duluth
tpederse at d.umn.edu
Anand Jha, University of Minnesota, Duluth
jhaxx030 at d.umn.edu
COPYRIGHT AND LICENSE
Copyright (C) 2012 Ted Pedersen, Anand Jha
See http://dev.perl.org/licenses/ for more information.
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to:
The Free Software Foundation, Inc., 59 Temple Place, Suite 330,
Boston, MA 02111-1307 USA