NAME
ZOOM::IRSpy::WebService - Accessing the IRSpy database as a Web Service
INTRODUCTION
Because IRSpy keeps its information about targets as ZeeRex records in a Zebra database, that information is available via the SRU and SRW web services. These two services are very closely related: the former REST-like, based on HTTP GET URLs, and the latter SOAP-based. Both use the same query language (CQL) and the same XML-based result formats.
(In addition, Zebra provides ANSI/NISO Z39.50 services, but these are not further discussed here.)
EXAMPLE
Here is a example SRU URL that accesses the IRSpy database of the live system (although it will not be accessible to most clients due to firewall issues. It is broken across lines for clarity:
http://irspy.indexdata.com:8018/IR-Explain---1?
version=1.1&
operation=searchRetrieve&
query=net.port=3950&
maximumRecords=10&
recordSchema=zeerex
It is beyond the scope of this document to provide a full SRU tutorial, but briefly, the URL above consists of the following parts:
- http://irspy.indexdata.com:8018
-
The base-URL of the SRU server.
- IR-Explain---1
-
The name of the SRU database.
- version=1.1, operation=searchRetrieve, etc.
-
SRU parameters specifying the operation requested.
The parameters are as follows:
- version=1.1
-
Mandatory - SRU requests must contain an explicit version identifier, and Zebra supports only version 1.1.
- operation=searchRetrieve
-
Mandatory - SRU requests must contain an operation. Zebra supports several, as discussed below.
- query=net.port=3950
-
When the operation is
searchRetrieve
, a query must be specified. The query is always expressed in CQL (Common Query Language), which Zebra's IRSpy database supports as described below. - maximumRecords=10
-
Optional. Specifies how many records to include in a search response. When omitted, defaults to zero: the response includes a hit-count but no records.
- recordSchema=zeerex
-
Optional. Specifies what format the included XML records, if any, should be in. If omitted, defaults to "dc" (Dublin Core). Zebra's IRSpy database supports several schemas as described below.
SUPPORT
SUPPORTED OPERATIONS
Zebra supports the following SRU operations:
- explain
-
This operation requires no further parameters, and returns a ZeeRex record describing the IRSpy database itself.
- searchRetrieve
-
This is the principle operation of SRU, combining searching of the database and retrieval of the records that are found. Its behaviour is specified primarily by the
query
parameter, support for which is described below, but also bystartRecord
,maximumRecords
andrecordSchema
. - scan
-
This operation scans an index of the database and returns a list of candidate search terms for that index, including hit-counts. Its behaviour is specified primarily by the
scanClause
parameter, but also bymaximumTerms
andresponsePosition
.Here is an example SRU Scan URL:
http://irspy.indexdata.com:8018/IR-Explain---1? version=1.1& operation=scan& scanClause=dc.title=fish
This lists all words occurring in titles, in alphabetical order, beginning with "fish" or, if that word does not occur in any title, the word that immediately follows it alphabetically.
The
scanClause
parameter is a tiny query, consisting only an index-name, a relation (usually "=") and a term. The supported index names are the same as those listed below.
CQL SUPPORT
The following CQL context sets are supported, and are recognised in queries by the specified prefixes:
- cql
-
The CQL context set. http://www.loc.gov/standards/sru/cql/cql-context-set.html
- rec
-
The Record Metadata context set. http://srw.cheshire3.org/contextSets/rec/1.1/
- net
-
The Network context set. http://srw.cheshire3.org/contextSets/net/
- dc
-
The Dublin Core context set. http://www.loc.gov/standards/sru/cql/dc-context-set.html
- zeerex
-
The ZeeRex context set. http://srw.cheshire3.org/contextSets/ZeeRex/
Within those sets, the following indexes are supported:
- cql.anywhere
- cql.allRecords
- rec.id
- net.protocol
- net.version
- net.method
- net.host
- net.port
- net.path
- dc.title
- dc.creator
- zeerex.numberOfRecords
- zeerex.set
- zeerex.index
- zeerex.attributeType
- zeerex.attributeValue
- zeerex.schema
- zeerex.recordSyntax
- zeerex.supports_relation
- zeerex.supports_relationModifier
- zeerex.supports_maskingCharacter
- zeerex.default_contextSet
- zeerex.default_index
These indexes may in general be used with all the relations <
, <=
, =
, >=
, >
, <>
and exact
, although of course not all combinations of index and relation make sense. The masking characters *
and ?
may be used in all appropriate circumstances, as may the word-anchoring character ^
.
Finally, sorting criteria may be specified within the query itself. Since YAZ's CQL parser does not yet implement the recently approved CQL 1.2 sorting extension described at http://zing.z3950.org/cql/sorting.html a different scheme is used involving special relation modifiers, sort
, sort-desc
and numeric
.
When a search-term that carries either the sort
or sort-desc
relation-modifier is or
'd with a query, the results of that query are sorted according to the value associated with the specified index - for example, sorted by title if the query is or
'd with dc.title=/sort 0
. In such sort-specification query terms, the term itself (0
in this example) is the precendence of the sort-key, with zero being highest. Further less significant sort keys may also be specified, using higher-valued terms. By default, sorting is lexicographical (alphabetical); however, if the additional relation modified numeric
is also specified, then numeric sorting is used.
For example, the query:
net.host = *.edu and dc.title=^a* or net.port=/sort/numeric 0
Finds records describing services hosted in the .edu
domain and whose titles' first words begin with the letter a
, and sorts the results in numeric order of the port number that they run on. And the query:
net.host = *.edu or net.port=/sort/numeric 0 or net.path=/sort-desc 1
Sorts all the .edu
-hosted services numerically by port; and further sorts each equivalence class of services running the same port alphabetically, descending, by database name.
RECORD SCHEMAS
The IRSpy Zebra database supports record retrieval using the following schemas:
- dc
-
Dublin Core records (title, creator, description, etc.)
- zeerex
-
ZeeRex records, the definitive version of the information that drives the database. These records use an extended version of the ZeeRex 2.0 schema that also includes an <irspy:status> element at the end of the record.
- index
-
An XML format that prescribes how the record is indexed for searching. This is useful for debugging, but not likely to be very exciting for casual passers-by.
SEE ALSO
ZOOM::IRSpy
The specifications for SRU (REST-like Web Service) at http://www.loc.gov/sru
The specifications for SRW (SOAP-based Web Service) at http://www.loc.gov/srw
The Z39.50 specifications at http://lcweb.loc.gov/z3950/agency/
The ZeeRex specifications at http://explain.z3950.org/
The Zebra database at http://indexdata.com/zebra
AUTHOR
Mike Taylor, <mike@indexdata.com>
COPYRIGHT AND LICENSE
Copyright (C) 2006 by Index Data ApS.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.7 or, at your option, any later version of Perl 5 you may have available.