The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Algorithm::LDA

SYNOPSIS

 use Algorithm::LDA;
 
 my $lda = new Algorithm::LDA("Data", 5, 100, 100, 0, 10, 0.1, 10, "stoplist.txt");
 

DESCRIPTION

Algorithm::LDA is an implementation of Latent Dirichlet Allocation in Algorithm

add

description:

 Used to add to array of documents ($self->documents)

input:

 %args <- hash containing data

output:

 1

example:

 while (my $line = <$fh2>) {
    my $obj = decode_json($line);
    add(%$obj);
 }

init

description:

 Initializes alpha, initializes beta, loads documents, starts main loop

input:

 None

output:

 1

example:

 init();

printResults

description:

 Prints words in each topic, topics in each document, phi values, 
 and theta values to text files in the 'Results/$data' directory

input:

 None

output:

 None

example:

 printResults();

load

description:

 Loads documents from text files (in "data/$data") or JSON file (in "Documents")

input:

 None

output:

 None

example:

 load();

wordsPerTopic

  description:
    
 Creates an array of words in each topic

input:

 %args -> hash containing topic

output:

 @words -> Array containing words and probabilities (phi value) for $args{topic}

example:

 my $words_on_topic = wordsPerTopic(topic => $topic);

topicsPerDocument

description:

 Creates an array of topics in each document

input:

 %args -> hash containing document

output:

 @topics -> Array containing topics and probabilities (theta value) for $args{document}

example:

 my $topics_on_document= topicsPerDocument(document => $doc);

sample_topic

description:

 Uses Gibbs Sampling to determine a topic given a document and word

input:

 $document -> ID of document word is in
 $word -> word that is to be evaluated

output:

 $topic -> topic ID
 $k -> last topic if topic can't be found

example:

 my $topics_on_document= topicsPerDocument(document => $doc);

computePhi

description:

 Computes the expected phi value for a word given a topic ID

input:

 $topic -> ID of topic (iteration 0..$k)
 $word -> word that is to be evaluated

output:

 Phi value

example:

 $dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

computeTheta

description:

 Computes the expected theta value for a topic given a document ID

input:

 $document -> ID of document
 $topic -> ID of topic (iteration 0..$k)

output:

 Theta value

example:

 $dist += ($self->computePhi($topic, $word) * $self->computeTheta($document, $topic));

increaseMap

description:

 Increases the values of all of the hashmaps

input:

 $document -> ID of document
 $topic -> ID of topic
 $word -> word in document $document

output:

 None

example:

 $self->increaseMap($data->{document}, $data->{topic}, $data->{word});

decreaseMap

description:

 Decreases the values of all of the hashmaps

input:

 $document -> ID of document
 $topic -> ID of topic
 $word -> word in document $document

output:

 None

example:

 $self->decreaseMap($data->{document}, $data->{topic}, $data->{word});

valid

description:

 Returns whether or not $data is a valid array (able to be added to the dataset)

input:

 $data -> data to be evaluated

output:

 Boolean/Integer -> true/1 - $data is an array | false/0 - $data is not an array;

example:

 return unless (valid($args{data}));

removeSpecialChars

description:

 Removes special characters from a word (non-ascii/non-letter characters)

input:

 $word -> word to be cleaned

output:

 $newWord -> $word without non-ascii/non-letter characters

example:

 @ws = map { removeSpecialChars($_) } @ws;

beta

description:

 Randomly initializes beta values

input:

 None

output:

 None

example:

 beta();

stop

description:

 Stopword subroutine.  Generates a regex to remove words in a stopword list

input:

 None

output:

 $stop_regex -> regex containing stopwords

example:

 my $stop = stop();
 my $regex = qr/($stop)/;

REFERENCING

    If you have a reference paper for this module put it here in bibtex form

CONTACT US

  If you have any trouble installing and using <module name> 
  please contact us via :

      Bridget T. McInnes: btmcinnes at vcu.edu

SEE ALSO

Additional modules associated with the package

AUTHORS

  Bridget McInnes <btmcinnes at vcu.edu>

COPYRIGHT AND LICENSE

Copyright 2016 by Bridget McInnes, Nicholas Jordan

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to

 The Free Software Foundation, Inc.,
 59 Temple Place - Suite 330,
 Boston, MA  02111-1307, USA.