NAME
NNexus::Index::Dispatcher
- High-level dispatcher to the correct domain indexer classes.
SYNOPSIS
use NNexus::Index::Dispatcher; my $dispatcher = NNexus::Index::Dispatcher->new(db=>$db,domain=>$domain,verbosity=>0|1); my $invalidated_URLs = $dispatcher->index_step(%options); while (my $payload = $dispatcher->index_step ) { push @$invalidated_URLs, @{$payload}; }
DESCRIPTION
The NNexus::Dispatcher class provides a comprehensive high-level API for indexing web domains.
It requires that each $domain has its own NNexus::Index::$domain
indexer plug-in, that follows a ucfirst(lc($domain)) naming convention.
Additionally, NNexus::Index::Dispatcher
computes the concept diffs when re-indexing, an already visited page and updates the database as needed. Lastly, the return value of an indexing step is a list of suggested URLs to be relinked, a process called "invalidation" in previous NNexus releases.
METHODS
my $dispatcher = NNexus::Index::Dispatcher->new(domain=>$domain,db=>$db,$verbosity=>0|1, start=>$url, dom=>$dom);
-
The object constructor prepares a domain crawler object ( NNexus::Index::ucfirst(lc($domain)) ) and requires a NNexus::DB object, $db, for database interactions.
The returned dispatcher object can be used to iteratively index the domain, via the index_step method.
The method accepts the following options: - start - the initial URL, required for first invocation - dom - optional, provides a Mojo::DOM object for the current URL instead of performing an HTTP GET to retrieve it. - verbosity - 0 for quiet, 1 for detailed progress messages
my $invalidated_URLs = $dispatcher->index_step(%options);
-
Performs an indexing step by: - dispatches a crawl request to the domain indexer - computes a diff over the previously and currently indexed concepts for the given object/URL - updates the Database tables - Computes and returns an impact graph of previously linked objects (aka "invalidation")
Accepts no options, all customization is to be achieved through the "new" constructor.
AUTHOR
Deyan Ginev <d.ginev@jacobs-university.de>
COPYRIGHT
Research software, produced as part of work done by the KWARC group at Jacobs University Bremen. Released under the The MIT License (MIT)