NAME
NNexus::Index::Dispatcher
- High-level dispatcher to the correct domain indexer classes.
SYNOPSIS
use NNexus::Index::Dispatcher;
$dispatcher = NNexus::Index::Dispatcher->new(db=>$db,domain=>$domain,verbosity=>0|1);
$invalidated_URLs = $dispatcher->index_step(%options);
while (my $payload = $dispatcher->index_step ) {
push @$invalidated_URLs, @{$payload};
}
DESCRIPTION
The NNexus::Dispatcher
class provides a comprehensive high-level API for indexing web domains.
It requires that each $domain
has its own NNexus::Index::$domain
indexer plug-in, that follows a ucfirst(lc($domain))
naming convention.
Additionally, NNexus::Index::Dispatcher
computes the concept diffs when re-indexing, an already visited page and updates the database as needed. Lastly, the return value of an indexing step is a list of suggested URLs to be relinked, a process called "invalidation" in previous NNexus releases.
METHODS
$dispatcher = NNexus::Index::Dispatcher->new(domain=>$domain,db=>$db,$verbosity=>0|1, start=>$url, dom=>$dom);
-
The object constructor prepares a domain crawler object (
NNexus::Index::ucfirst(lc($domain))
) and requires a NNexus::DB object,$db
, for database interactions.The returned dispatcher object can be used to iteratively index the domain, via the index_step method.
The method accepts the following options:
start: the initial URL, required for first invocation
dom: optional, provides a Mojo::DOM object for the current URL instead of performing an HTTP GET to retrieve it.
verbosity: 0 for quiet, 1 for detailed progress messages
$invalidated_URLs = $dispatcher->index_step(%options);
-
Performs an indexing step as follows:
Dispatches a crawl request to the domain indexer
Computes a diff over the previously and currently indexed concepts for the given object/URL
Updates the Database tables
Computes and returns an impact graph of previously linked objects (aka "invalidation")
Accepts no options, all customization is to be achieved through the new constructor.
AUTHOR
Deyan Ginev <d.ginev@jacobs-university.de>
COPYRIGHT
Research software, produced as part of work done by
the KWARC group at Jacobs University Bremen.
Released under the The MIT License (MIT)