NAME
WebService::GoogleHack::Text - This module implements some basic text processing such as parsing data etc.
SYNOPSIS
use WebService::GoogleHack::Text;
#create an object of type Text
my $text = GoogleHack::Text->new();
# returns an hash words
%results=$text->getWords("file location");
# returns an hash of 3 word sentences
%results=$text->getSentences("file location", 3);
# this function reads the configuration file
%results=$text->readConfig("location of configuration file");
#removes HTML tags
%results=$text->removeHTML("string");
DESCRIPTION
This is a simple Text processing package which aids GoogleHack and Rate modules. Given a file of words, it retreives the words in the file and stores it in a simple hash format. In addition, given a file of text, it can also form n word sentences.
PACKAGE METHODS
__METHOD__->new()
Purpose: This function creates an object of type Text and returns a blessed reference.
__METHOD__->init(Params Given Below)
Purpose: This this function can used to inititalize the member variables.
Valid arguments are :
key
string. key to the google-api
wsdl_location
string. This the wsdl file name
basedir
string. The base directory of Google Hack.
taggerdir
string. The location of the Brill Tagger
__METHOD__->getSentences(file_name,sentence_length,trace_file)
Purpose: Given a file of text or a variable containing text, this function tries to retrieve sentences from it.
Valid arguments are :
file_name
string. Name of file to retrieve sentences from.
sentence_length
Number. Number of words in a sentence.
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
Returns: Returns an array of strings.
__METHOD__->getSentences(file_name,trace_file)
Purpose:Given a file of text this function tries to retrieve words from it.
Valid arguments are :
file_name
string. Name of file to retrieve sentences from.
trace_file.
string. The location of the trace file. If a file_name is given, the results are stored in this file
Returns: Returns a hash of words.
__METHOD__->getSentences(text)
Purpose: Remove XML tags. Package XML::TokeParser must be installed
Valid arguments are :
text
string. The text to be de-tagged.
Returns: Returns a XML less text.
__METHOD__->getSentences(text)
Purpose: Remove HTML tags. Package HTML::TokeParser must be installed
Valid arguments are :
text
string. The text to be de-tagged.
Returns: Returns a HTML less text.
__METHOD__->getSurroundingWords(filename,stemmer)
Purpose: this function is used to read a configuration file containing informaiton such as the Google-API key, the words list etc.
Valid arguments are :
filename
string. Location of the configuration file.
stemmer.
bool. Porter Stemmer on or off.
returns : Returns an object which contains the parsed information.
__METHOD__->readConfig(filename)
Purpose: this function is used to read a configuration file containing informaiton such as the Google-API key, the words list etc.
Valid arguments are :
filename
string. Location of the configuration file.
returns : Returns an object which contains the parsed information.
AUTHOR
Pratheepan Raveendranathan, <rave0029@d.umn.edu>
Ted Pedersen, <tpederse@d.umn.edu>
BUGS
SEE ALSO
GoogleHack home page - http://google-hack.sourceforge.net
Pratheepan Raveendranathan - http://www.d.umn.edu/~rave0029/research
Ted Pedersen - www.d.umn.edu./~tpederse
Google-Hack Maling List <google-hack-users@lists.sourceforge.net>
AUTHOR
Pratheepan Raveendranathan, <rave0029@d.umn.edu>
Ted Pedersen, <tpederse@d.umn.edu>
COPYRIGHT AND LICENSE
Copyright (c) 2005 by Pratheepan Raveendranathan, Ted Pedersen
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to
The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.