NAME
Text::TFIDF - Perl extension for computing the TF-IDF measure
SYNOPSIS
use Text::TFIDF;
my $Obj = new Text::TFIDF(file=>[file1,file2...]);
print $Obj->TFIDF($file,$word);
DESCRIPTION
The TF-IDF weight (ie, Frequency-Inverse Document Frequency) weight is used in information retrieval and text mining. It is a statistical measure used to see how important a word is in a document or collection of documents. This module is designed to only work on text documents at this time.
Currently, the module reads everything into memory. This should be altered in the future.
EXPORT
None by default.
new(file=>\@files)
Creates a new module. If the file argument is passed in, populates the module using those files.
TFIDF(file,word)
Computes the TF-IDF weight for the given document and word. If the file is not in the corpus used to populate the module, returns undef
TF(file,word)
Returns the frequency of the given word in the document.
IDF(word)
Returns the inverse document frequency of a word. That is, the ratio of the number of documents in the corpus divided by the number of documents containing the term and taking the logarithm of the result. Since the number of documents containing the term can be zero, we add one to the result to ensure a rational result.
process_files(@files)
Populates the document with the given list of files. This does not replace data currently in the document, rather, it adds to the list.
SEE ALSO
See http://en.wikipedia.org/wiki/Tf-idf for more information.
AUTHOR
Leigh Metcalf, <leigh@fprime.net<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2011 by Leigh Metcalf
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.3 or, at your option, any later version of Perl 5 you may have available.