NAME

Freq - A purpose-built inverted text index for making term frequency calculations.

SYNOPSIS

Index documents:

# cat textcorpus.txt | tokenize | indexstream corpus_dir

Create ngram list:

# cat textcorpus.txt | tokenize | ngrams [N-size] [threshold]

Get statistics on word frequencies:

# cat termlist.txt | stats --everything corpus_dir

Get help:

# tokenize --help
# stats --help
# indexstream --help
# ngrams --help

PROGRAMMING API

  use Freq;

  $index = Freq->open_write( "indexname" );
  $index->index_document( "docname", $string );
  $index->close();

  $index = Freq->open_read( "indexname" );
  my ( $words_in_corpus, $docs_in_corpus ) = $index->index_info();

  # Find all docs containing a phrase
  $hashref = $index->doc_hash( "this phrase and no other phrase" );

  # Total number of matches for this phrase/word.
  my $matches = $hashref->{MATCHES};

  # The consecutive ID of each document.
  my @docids = @{ $hashref->{DOCIDS} };

  # The number of matches found in each document.
  my @docmatches = @{ $hashref->{DOCMATCHES} };

  # The number of words between each consecutive match.
  my @intervals = @{ $hashref->{INTERVALS} };

  # Get matches, doc count, standard deviation of terms/document, standard deviation of intervals/match.
  my ($matches, $doc_count, $docsigma, $intsigma ) = 
		$index->stats("some phrase or other");

  $index->close();

DESCRIPTION

Blah blah blah.

EXPORT

None. Use programming API as shown.

AUTHOR

Ira Joseph Woodhead, ira@ejemoni.com

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)