NAME
bin/z3950_centroid.pl - extract centroid from NWI/EWI objects
SYNOPSIS
bin/z3950_centroid.pl [-d] [-h hashtemp1] [-H hashtemp2]
[-s serverhandle] < filename
DESCRIPTION
This Perl program creates a WHOIS++ compatible centroid from the attributes and values in a collection of NWI/EWI index objects, as created by the Combine harvester. Note that you should give a server handle when invoking this program, or the default value of 'undefined' will be used.
The Combine harvester creates its database in a two level directory hierarchy, with a separate file for each indexed object. You can combine them together for feeding into this program using a simple find invocation :-
find HDB/hdb -type f -exec cat {} \; | z3950_centroid.pl -s test01
Or perhaps something more complicated!
OPTIONS
- -d
-
Turn on debugging output - very verbose!
- -h hashtemp1
-
Filename to use for temporary DB hash database used in the construction of the centroid. This defaults to hashtemp1, and is used to hold a list of the document titles being indexed.
- -H hashtemp2
-
Filename to use for temporary DB hash database used in the construction of the centroid. This defaults to hashtemp2, and is used to hold a list of the terms in the document text being indexed.
- -s serverhandle
BUGS
We could traverse the filesystem and look at the timestamps on the index objects - this would let us do a relative centroid.
We don't do anything special about character sets/encodings.
Not up to date with current CIP specifications - this is really intended for use with a WHOIS++ server which speaks the old RFC 1913 indexing protocol.
SEE ALSO
"harvest_centroid.pl" in bin, RFC 1913
COPYRIGHT
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
AUTHOR
Martin Hamilton <martinh@gnu.org>