NAME

Algorithm::PageRank::XS - A Fast PageRank implementation

DESCRIPTION

This module implements a simple PageRank algorithm in C. The goal is to quickly get a vector that is closed to the eigenvector of the stochastic matrix of a graph.

Algorithm::PageRank does some pagerank calculations, but it's slow and memory intensive. This module was developed to compute pagerank on graphs with millions of arcs. This module will not, however, scale up to quadrillions of arcs (see TODO).

SYNOPSYS

use Algorithm::PageRank::XS;

my $pr = Algorithm::PageRank::XS->new();

$pr->graph([
          'John'  => 'Joey',
          'John'  => 'James',
          'Joey'  => 'John',
          'James' => 'Joey',
          ]
          );

$pr->results();
# {
#      'James' => '0.569840431213379',
#      'Joey'  => '1',
#      'John'  => '0.754877686500549'
# }



#
#
# The following simple program takes up arcs and prints the ranks.
use Algorithm::PageRank::XS;

my $pr = Algorithm::PageRank::XS->new();

while (<>) {
    chomp;
    my ($from, to) = split(/\t/, $_);
    $pr->add_arc($from, $to);
}

while (my ($name, $rank) = each(%{$pr->results()})) {
    print("$name,$rank\n");
}

METHODS

new %PARAMS

Create a new PageRank object. Possible parameters:

alpha: This is (1 - how much people can move from one node to another unconnected one randomly). Decreasing this number makes convergence more likely, but brings us further from the true eigenvector.
max_tries: The maximum number of tries until we give up trying to achieve convergence.
convergence: The maximum number the difference between two subsequent vectors must be before we say we are "convergent enough". The convergence rate is the rate at which alpha^t goes to 0. Thus, if you set alpha to 0.85, and convergence to 0.000001, then you will need 85 tries.

add_arc

Add an arc to the pagerank object before running the computation. The actual values don't matter. So you can run:

$pr->add_arc("Apple", "Orange");

and you mean that "Apple" links to "Orange".

graph

Add a graph, which is just an array of from, to combinations. This is equivalent to calling add_arc a bunch of times, but may be more convenient.

results

Compute the pagerank vector, and return it as a hash.

Whatever you called the nodes when specifying the arcs will be the keys of this hash, where the values will be the vector.

The result vector is normalized such that the maximum value is 1. This is to prevent extremely small values for large data sets. You can normalize it any other way you like if you don't like this.

BUGS

None known.

TODO

We may want to support double values rather than single floats
We may or may not want to adjust the weighting of individual arcs, as you cannot do now.
At present the indexes are unsigned int, rather than size_t. Thus this will not scale with 64-bit architectures.
It'd be nice to be able to use mmap(2) to efficiently use the hard drive to scale to places where memory can't take us.

PERFORMANCE

This module is pretty fast. I ran this on a 1 million node set with 4.5 million arcs in 57 seconds on my 32-bit 1.8GHz laptop. Let me know if you have any performance tips. It's orders of magnitude faster than Algorithm::PageRank, but performance tests will be here shortly.

AUTHOR

Michael Axiak <mike@axiak.net>

COPYRIGHT

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself

To install Algorithm::PageRank::XS, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Algorithm::PageRank::XS

CPAN shell

perl -MCPAN -e shell
install Algorithm::PageRank::XS

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)