NAME
OurNet::Query - Scriptable queries with template extraction
SYNOPSIS
use OurNet::Query;
# Set query parameters
my ($query, $hits) = ('autrijus', 10);
my @sites = ('google', 'google'); # XXX: write more templates!
my %found;
# Generate a new Query object
my $bot = OurNet::Query->new($query, $hits, @sites);
# Perform a query
my $found = $bot->begin(\&callback, 30); # Timeout after 30 seconds
print '*** ' . ($found ? $found : 'No') . ' match(es) found.';
sub callback {
my %entry = @_;
my $entry = \%entry;
unless ($found{$entry{url}}) {
print "*** [$entry->{title}]" .
" ($entry->{score})" .
" - [$entry->{id}]\n" .
" URL: [$entry->{url}]\n";
}
$found{$entry{url}}++;
}
DESCRIPTION
This module provides an easy interface to perform multiple queries to internet services, and wraps them into your own format at once. The results are processed on-the-fly and are returned via callback functions.
Its interfaces resembles that of WWW::Search's, but implements it in a different fashion. While WWW::Search relies on additional subclasses to parse returned results, OurNet::Query uses site descriptors for search search engine, which makes it much easier to add new backends.
Site descriptors may be written in XML, Template toolkit format, or the .fmt format from the commercial Inforia Quest product.
CAVEATS
The only confirmed, working site descriptor currently is google.tt2. The majority of *.xml descriptors are outdated, and need volunteers to either correct them, or convert them to .tt2
format.
This package is supposedly to magically turn your web pages built with Template Toolkit into web services overnight, using diff-based induction heuristics; but this is not happening yet. Stay tuned.
There should be instructions of how to write templates in various formats.
COMPONENTS
Most Query Toolkit components are independently useful; they rely on several front-end interfaces to glue themselves together.
Full-Text Search Engine (FuzzyIndex)
The indexing module MUST implement an indexing mechanism suitable to handle variable-byte encoding charsets, e.g. big-5 or utf8. Its index file SHOULD NOT require original data be presented, nor exceed the original data size on verage.
Interactive Queries (ChatBot)
The interactive query module MUST accept context-free queries against any indexed database generated by the Search Engine, and provide feedbacks based on the entries contained within. It MUST develop a heuristic to accumulate user input, and build connections between entries based on relevancy.
Template Extraction (Template::Extract)
This component MUST support the Template(3)
Toolkit format, and MAY support additional template formats. It MUST be capable of taking a document and the original template used to generated it, and produce the original parameter list.
All simple assignment and loop directives MUST be supported; it SHOULD also accept nested loops and structure elements.
Site Descriptors (Site)
This includes a collection of oft-used web sites, akin to the WWW::Search
or Inforia Quest collection. It SHOULD also support basic validation and variable interpolation within the descriptors.
Template Generation (Template::Generate)
This module MUST be able to generate the original template, based on two or more distinct outputs. It SHOULD operate without any prompt of original structures, but MAY draw on such information to increase its accuracy.
Front-End Interface (bin/*)
All above components MUST come with at least one command-line utility, capable of exporting most of their functions to the normal user. The utilities SHOULD assume a common look-and-feel.
Documentation (pod/*)
The Query Toolkit Manual MUST contain a tutorial, an overview of functions, and guides on how to embedd Query components into existing programs.
MILESTONES
Milestone 0 - v1.56 - 2001/09/01
This milestone represents the raw, unconnected state of all tools. It provides all basic functionalities except for template generation, yet offers only fzindex / fzquery as useful user-accessible interfaces.
FuzzyIndex big-5 & latin-1 support
ChatBot automatic building of default database
T::Extract template toolkit support; nested fetch
Site google (as proof-of-concept)
bin/* all above interfaces
pod/* overview of functions
Milestone 1 - v1.6 - 2001/10/15
This milestone aims to export a consistent interface to other developers, by populating the missing descriptor and documents.
FuzzyIndex gb-1312 support
Site all major search engines and news sources
T::Generate simple diff-based heuristic framework
bin/* a parallel, configurable sitequery coupled with fzindex
pod/* embbed-howto, including win32 COM+ port
Milestone 2 - v1.7 - 2002/01/01
This milestone will be the first feature-complete release of Query Toolkit, capable of being used in a more diversed environment.
SEE ALSO
AUTHORS
Autrijus Tang <autrijus@autrijus.org>
COPYRIGHT
Copyright 2001 by Autrijus Tang <autrijus@autrijus.org>.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.