NAME

Lingua::JA::TFIDF - TFIDF Calculator based on MeCab.

SYNOPSIS

use Lingua::JA::TFIDF;
use Data::Dumper;

my $calc   = Lingua::JA::TFIDF->new(%config);

# calculate TFIDF and return a result object.
my $result = $$calc->tfidf;
print Dumper $result->list;

# or calculate just TF 
print Dumper $calc->tf->list;

# dump the result object.
print Dumper $result->dump

DESCRIPTION

* This software is still in alpha release *

Lingua::JA::TFIDF is TFIDF Calculator based on MeCab. It has DF(Document Frequency) data set that was fetched from Yahoo Search API, beforehand.

METHODS

new(%config)

Instantiates a new Lingua::JA::TFIDF object. Takes the following parameters (optional).

my $calc = Lingua::JA::TFIDF->new(
  df_file         => 'my_df_file',           # default is undef
  ng_word         => \@original_ngword,      # default is undef
  fetch_df        => 1,                      # default is undef
  fetch_df_save   => 'my_df_file',           # default is undef
  LWP_UserAgent   => \%lwp_useragent_config, # default is undef
  XML_TreePP      => \%xml_treepp_config,    # default is undef
  yahoo_api_appid => $myid,                  # default is undef
);

tfidf($text);

Calculates TFIDF score. If the text includes unknown words, Document Frequency score of unknown words are replaced the average score of known words. If you set TRUE value to fetch_df parameter on constructor, the calculator fetches the unknown word from Yahoo Search API.

tf($text);

Calculates TF score.

ng_word

Accessor method. You can replace ngword.

df_data

Inncer accessor method.

fetcher

Inncer accessor method.

AUTHOR

Takeshi Miki <miki@cpan.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO