NAME

bin/wig.pl - gather indexes (centroids)

SYNOPSIS

bin/wig.pl [-d] spec_file

DESCRIPTION

The wig.pl program is used to gather WHOIS++ index and Common Indexing Protocol (CIP) centroids from remote servers. Its is intended to be run either from the command line or, more likely, from cron periodically. It implements the protocol described in RFC 1913, and the client side of the Common Indexing Protocol. Please note that at the time of writing, CIP was still under development by the IETF's FIND working group. Please let us know if you find any interoperability problems.

The upshot is that wig.pl lets you configure your ROADS WHOIS++ server to grab the database indexes from other people's WHOIS++ and CIP aware servers, e.g. CNIDR's Iknow and Bunyip's Digger. When a search performed on your server matches information in one or more of these indexes, the client will be returned a "referral" to the relevant server or servers. The ROADS WWW based WHOIS++ client, search.pl, will automatically follow these referrals and search the indexed WHOIS++ servers in addition to your own.

OPTIONS

-d

Enter debug mode (only of interest to developers and during debugging)

FILES

config/wig/* - index gatherer specification files

guts/wig/* - per-server centroids

Note that the config file name in config/wig should both be the same as the indexed server's WHOIS++ server handle. This is the "Serverhandle" parameter in lib/ROADS.pm. Each server you index must have a unique server handle.

FILE FORMATS

SPECIFICATION FILES

wig.pl is configured at run time by specifying the name of an indexing specification file. This filename is mandatory and it is assumed to be a file within the config/wig directory. Each line in the specification file contains either a comment (indicated by a hash character at the start of the line) or a configuration directive, followed by a colon and whitespace and then the value for that directive. valid directives are:

Host-Name

The hostname of the machine that is to be polled for a centroid. A specification file must contain the hostname of the remote server

Host-Port

The port number of the remote server that is to be polled. By default this is assumed to be the same as the port number of the local ROADS WHOIS++ server.

Type-of-poll

The type of poll to perform. This can either be CENTROID or QUERY. By default it is CENTROID.

Poll-scope

For a QUERY type-of-poll the directive specifies the WHOIS++ style search string to send to the remote server. For CENTROID type-of-poll, it can take on two values: FULL or RELATIVE. A FULL poll-scope means that the FULL centroid should be return (taking into accound the Start-Time and End-Time still) whereas RELATIVE means that the centroid returned should contain any changes since the last poll by this index server. The default for a CENTROID type-of-poll is FULL.

Start-Time

The time before which we're not interested in changed centroid details. The default is empty (ie no constraint on the start time).

End-Time

The time after which we're not interested in changed centroid details. This directive and Start-Time allow a selective subset of the remote servers centroid to be returned based on when the underlying data changed. The default is empty (ie no constraint on the end time).

Template

The name of the template from which the centroids should be generated, or the special value ALL. ALL means consider all templates on the remote server. The default is ALL.

Field

The list of names of fields that are of interest in the centroid, or the special value ALL. ALL means consider all fields within the specified template(s) when generating the centroid. The default value is ALL.

Hierarchy

Specifies this machine's relation to the remote server. This directive can take one of three values: Topology, Geographical or Administrative (note that these are case sensitive). Topology means that this index server is indexing the remote server because of its place in the network topology, Geographical means that it is indexing the remote server because of their respective geographical locations and Administrative means that the indexing is taking place because of an administrative decision. The default value is Administrative.

Description

A free text description of this index gatherer (or its related WHOIS++ server that makes use of the centroids it gathers) which the remote server can use when asked to describe the servers that index it. There is no default value for this directive.

Authentication-Type

This directive specifies the type of authentication to supply to the remote server. Common values are NONE (for no authentication) and Password (for a simple plaintext password exchange). RFC 1913 does not specify any others but any value that is understood by the remote server can be entered in this directive. There is no default value for this directive.

Authentication-Data

This directive's value is used inconjunction with the Authentication-Type directive to pass the actual password, key or other data required for this index server to be authenticated to the remote server. There is no default value for this directive.

CIP-v3

The presence of this directive (its value doesn't actually matter) indicates that the remote server should be polled using the Common Indexing Protocol, rather than the standard WHOIS++ centroids mechanism.

Index-Type

Sets the CIP index type - by default we use the tagged index object, "application/index.obj.tagged".

DSI

For CIP polls, this corresponds to the Data Set Identifier of the server being polled. For ROADS we construct these by appending the (remote!) server's IP address and port number to the Loughborough University Department of Computer Studies enterprise identifier. In the SOSIG example below, e.g.

1.3.6.1.4.1.1828.10.198.168.254.252.8237
CENTROIDS

The output of the wig.pl program is held in the guts/wig directory. In this directory a subdirectory named after the remote server's handle will be generated. In the subdirectory, an index file generated from the returned centroid(s) will be created, along with a DBM database file used to rapidly locate items within the file. The format of each line of the index file is:

template:oldfield:term

The DBM file is keyed on the terms and the associated values are a list of offsets into the main index file that match that term. The DBM file must be regenerated every time the main index file is changed.

EXAMPLE

To cross search the WHOIS++ server running on sosig.ac.uk, the Social Science Information Gateway at the University of Bristol, you would create the file config/wig/sosigacuk01. As a bare minimum, this file would need to contain the host name of the server to contact, but in practice you will probably want to include the following:

Host-Name: sosig.ac.uk
Host-Port: 8237
Description: Muppet Gateway; lets put on makeup and light up lights.

It's typically necessary for you to contact the remote server's administrator at this stage, because most WHOIS++ implementations will only let you index a server if you've been given permission to by its administrator. The ROADS WHOIS++ server uses an access control list based on the file config/hostsallow, and comes with some default settings which let the ROADS developers index your server by default. To add a new machine, we recommend that you put both its domain name and IP address into config/hostsallow, e.g.

bork.swedish-chef.org: poll
198.168.254.252: poll

Once this has been done, the ROADS WHOIS++ server will automatically allow the machine doing the indexing to "poll" it for centroids. Now all you need to do at the local end is run wig.pl, e.g.

bin/wig.pl sosigacuk01

If the index is successful, subsequent searches of your server will result in the centroid from SOSIG also being searched, and referrals being returned for any matches in this.

SEE ALSO

wppd.pl

BUGS

If you want to set up an index server which has no local data of its own, you'll still need to build the main ROADS index, e.g. with bin/mkinv.pl. It's debatable whether this is a bug or a feature!

COPYRIGHT

Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.

AUTHORS

Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>