Name

GoogleHack::Text

SYNOPSIS

use GoogleHack::Text; my $search = GoogleHack::Text->new(); #create an object of type Text %results=$search->getWords("file location"); # returns an hash words

%results=$search->getSentences("file location", 3); # returns an hash of 3 word sentences

%results=$search->readConfig("file name") # this function reads a #configuration file

%results=$search->removeHTML("string") #removes HTML tags %results=$search->removeHTML("string") #removes XML tags

DESCRIPTION

This is a simple Text processing package which aids GoogleHack and Rate modules. Given a file of words, it retreives the words in the file and stores it in a simple hash format. In addition, given a file of text, it can also form n word sentences.

AUTHOR

Pratheepan Raveendranathan, <rave0029@d.umn.edu>

Ted Pedersen, <tpederse@d.umn.edu>

BUGS

SEE ALSO

GoogleHack home page Pratheepan Raveendranathan Ted Pedersen

Google-Hack Maling List <google-hack-users@lists.sourceforge.net>

AUTHOR

Pratheepan Raveendranathan, <rave0029@d.umn.edu>

Ted Pedersen, <tpederse@d.umn.edu>

COPYRIGHT AND LICENSE

Copyright (c) 2003 by Pratheepan Raveendranathan, Ted Pedersen

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

The Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

PACKAGE METHODS

__PACKAGE__->new(\%args)

Purpose: This function creates an object of type Text and returns a blessed reference.

__PACKAGE__->init(\%args)

Purpose: This this function can used to inititalize the member variables.

Valid arguments are :

  • key

    string. key to the google-api

  • File_location

    string. This the wsdl file name

  • adverbs_list

    string. The location of the adverbs list file

  • verbs_list

    string. The location of the verbs list file

  • adjectives_list

    string. The location of the adjectives list file

  • nouns_list

    string. The location of the nouns list file

  • stop_list

    string. The location of the stop_words list file

__PACKAGE__->getSentences(\%args)

Purpose: Given a file of text or a variable containing text, this function tries to retrieve sentences from it.

Valid arguments are :

  • file_name

    string. Name of file to retrieve sentences from.

  • sentence_length

    Number. Number of words in a sentence.

  • trace_file.

    string. The location of the trace file. If a file_name is given, the results are stored in this file

Returns: Returns an array of strings.

__PACKAGE__->getSentences(\%args)

Purpose:Given a file of text this function tries to retrieve words from it.

Valid arguments are :

  • file_name

    string. Name of file to retrieve sentences from.

  • trace_file.

    string. The location of the trace file. If a file_name is given, the results are stored in this file

Returns: Returns a hash of words.

__PACKAGE__->getSentences(\%args)

Purpose: Remove HTML tags. Package HTML::TokeParser must be installed

Valid arguments are :

  • text

    string. The text to be de-tagged.

Returns: Returns a HTML less text.

__PACKAGE__->getSentences(\%args)

Purpose: Remove XML tags. Package XML::TokeParser must be installed

Valid arguments are :

  • text

    string. The text to be de-tagged.

Returns: Returns a XML less text.

__PACKAGE__->readConfig(\%args)

Purpose: this function is used to read a configuration file containing informaiton such as the Google-API key, the words list etc.

Valid arguments are :

  • filename

    string. Location of the configuration file.

returns : Returns an object which contains the parsed information.

__PACKAGE__->getSurroundingWords(\%args)

Purpose: this function is used to read a configuration file containing informaiton such as the Google-API key, the words list etc.

Valid arguments are :

  • filename

    string. Location of the configuration file.

returns : Returns an object which contains the parsed information.