NAME
Net::Z3950::DBIServer::Tutorial - The zSQLgate Tutorial
SYNOPSIS
server$ mysql
mysql> create database books;
mysql> use books;
mysql> GRANT ALL PRIVILEGES ON books.* TO ""@"localhost";
mysql> CREATE TABLE author(id INT, name TEXT);
mysql> INSERT INTO author(id, name) VALUES('1','Douglas Adams');
mysql> INSERT INTO author(id, name) VALUES('2','Stephen Notley');
mysql> INSERT INTO author(id, name) VALUES('3','Dave Barry');
server$ cat trivial.nzd
datasource = "DBI:mysql:dbname=books"
database "artists" {
table = "author"
search attrset bib1 {
access 12 = "id"
access 1 = "name"
}
data format xml {
record = "author"
field "authorId" = "id"
field "authorName" = "name"
}
}
server$ zSQLgate trivial.nzd @:3950
13:04:54-12/04 [server] Adding dynamic Z3950 listener on tcp:@:9999
13:04:54-12/04 [server] Starting server zSQLgate pid=28488
13:04:57-12/04 zSQLgate(1) [session] Starting session from tcp:127.0.0.1 (pid=28490)
...
client$ yaz-client @:3950
Z> base artists
Z> find @attr 1=12 @attr 2=1 3
Number of hits: 2
Z> format xml
Z> show 1+2
<author>
<authorId>1</authorId>
<authorName>Douglas Adams</authorName>
</author>
<author>
<authorId>2</authorId>
<authorName>Stephen Notley</authorName>
</author>
DESCRIPTION
This document is a tutorial for zSQLgate
, the Z39.50 server for relational databases.
If you don't know what zSQLgate
is, then you should start by reading Net::Z3950::DBIServer::Intro (the introduction to zSQLgate
) and Net::Z3950::DBIServer::LICENCE, (the licence under which it is available).
This tutorial informally walks through the main features of zSQLgate
, and for many people will be the only part of the zSQLgate
documentation that they ever read. That's fine - much of what you need to know will be here. But please be aware that some of the more esoteric features will not be discussed here: if you don't find what you want, you may need to dig into two more rigorous documents that are supplied with this tutorial:
Net::Z3950::DBIServer::Run describes the command-line options available to tailor the behaviour of
zSQLgate
at run-time.Net::Z3950::DBIServer::Spec describes the
zSQLgate
configuration file format in more detail than this document, explaining the precise meaning of its various directives.
In general, those two documents describe the sorts of things you can say to zSQLgate
and what they mean - but this one explains why you might want to say those things. The reference documents are there to answer what questions; this tutorial aims to tell you how.
A WALK THROUGH THE SYNOPSIS
The synopsis shows the process of creating a database, a zSQLgate
configuration file for that database, how to start a Z39.50 server for it, and a session with a Z39.50 client accessing it.
Let's look at the four sections separately. They will introduce us to pretty much all of the main zSQLgate
features: everything else is just refinement of these core facilities.
Database Creation
server$ mysql
mysql> create database books;
mysql> use books;
mysql> GRANT ALL PRIVILEGES ON books.* TO ""@"localhost";
mysql> CREATE TABLE author(id INT, name TEXT);
mysql> INSERT INTO author(id, name) VALUES('1','Douglas Adams');
mysql> INSERT INTO author(id, name) VALUES('2','Stephen Notley');
mysql> INSERT INTO author(id, name) VALUES('3','Dave Barry');
Here, we are creating a relational database. This is the one part of the whole process that zSQLgate
can't really help you with, because the details of how you create and populate a database are very much down to the specific RDBMS that you happen to be using. The commands above create a tiny but self-contained database called books
in the MySQL RDBMS: the actual SQL commands to do this will be very similar in most or all relational databases, but the front-end program will be different.
How do you choose which RDBMS to use? Well, for most projects, this will be a non-issue: if you're be putting a zSQLgate
front-end on an existing database, that choice will already have been made for you. (You also won't need to worry about building and populating the database!)
If you don't already have a database, and you're building one from scratch, then one good choice is PostgreSQL, which is included in recent Red Hat Linux distributions, along with support for using it in PHP3, Perl, Java (via JDBC), C (via native or ODBC interfaces), Perl, Python and Tcl. If you're not using Red Hat, you can get it from http://www.postgresql.org/ - but there are plenty of other good RDBMSs around, and I'm not really in a position to recommend one above another.
Anyway. You need to get a database from somewhere if you're going to build a Z39.50 interface to it.
zSQLgate Configuration
server$ cat trivial.nzd
datasource = "DBI:mysql:dbname=books"
database "artists" {
table = "author"
search attrset bib1 {
access 12 = "id"
access 1 = "name"
}
data format xml {
record = "author"
field "authorId" = "id"
field "authorName" = "name"
}
}
90% of the work of setting up a zSQLgate
server is writing the configuration file. Its job is to describe the relational database that is to be made available, couching its tables in terms of Z39.50 databases, and to express how the columns in those tables are mapped to access points (for searching) and data elements (for retrieval).
What we have here is just about the simplest possible complete zSQLgate
configuration file. Apart from specifying the datasource (roughly, the relational database), it provides mapping details for a single Z39.50 database (that is, in relational terms, a single table), to be known as artists
in the Z39.50 world. The details for that database consist of a statement of the name of the table that is to provide its records (author
), plus sections about how that database is to be searched, and how its data is to be formatted.
The search specification says that Z39.50 searching is done using the BIB-1 attribute set, and that two access-points are recognised: searches against access point 12 (``Local number'' in the BIB-1 attribute set) are mapped into SELECT
statements on the id
column, and those against access point 1 (``Personal name'') to the name
column.
The data specification says that zSQLgate
is to support record retrieval using the XML format, building records wrapped in an <artist
>...</artist
> pair, and that each record should contain the contents of the id
and name
columns, wrapped in <artistId
>...</artistId
> and <artistName
>...</artistName
> tag-pairs respectively.
See? It's easy!
Yes, it's possible to say much, much more in a configuration file; but you can start by writing a simple one, and add in the more sophisticated stuff as you need it, rather than needing to learn everything in one go.
Running the zSQLgate server
server$ zSQLgate trivial.nzd @:3950
13:04:54-12/04 [server] Adding dynamic Z3950 listener on tcp:@:9999
13:04:54-12/04 [server] Starting server zSQLgate pid=28488
13:04:57-12/04 zSQLgate(1) [session] Starting session from tcp:127.0.0.1 (pid=28490)
...
Once you've got your configuration file written, it's time to start the server. Generally, you'll need to give it two arguments: the name of the configuration file you want to use, and the ``listener address'' which specifies which port to receive incoming connections on. Generally, this should be @:
number, where number is the IP port number. This is conventionally 210 for Z39.50 servers, but you'll need to pick a number above 1024 if your server is not running as root, as the lower-number ports are considered to be ``privileged''.
If there's something wrong with your configuration file, the server will refuse to start up, giving a hopefully useful message to help you track down what you've done wrong. If all is well, it will go into a loop waiting for connections, fork()
ing for each incoming client and serving its requests in a subprocess. Unless you've turned logging off, you'll see startup messages similar to those above.
Connecting to the zSQLgate server
client$ yaz-client @:3950
Z> base artists
Z> find @attr 1=12 @attr 2=1 3
Number of hits: 2
Z> format xml
Z> show 1+2
<author>
<authorId>1</authorId>
<authorName>Douglas Adams</authorName>
</author>
<author>
<authorId>2</authorId>
<authorName>Stephen Notley</authorName>
</author>
Once your zSQLgate
server is up, you'll want to connect a Z39.50 client to it to check that it's doing what you intended. You can use whatever client you're used to - that's the point of an interoperable standard like Z39.50.
In this example, I'm using the simple command-line client provided by with the YAZ toolkit, because it's the one client you're guaranteed to have. (You wouldn't have been able to build zSQLgate
and the libraries it depends on without YAZ.) Start the client with a single command-line argument which specifies what host and port to connect to. (The hostname @
is shorthand for the same host the client's running on.)
Once the client is connected, you can do what you like. In the short session above, we've set the name of the database we want to use to artists
(which is the Z39.50 database name that we specified in the zSQLgate
configuration file), then fired off a search in the rather opaque but general Prefix Query Format. The search @attr 1=12 @attr 2=1 3
says to search for the term 3
with two attributes attached: the first, 1=12
, means to use access point 12 (which we defined to mean a SELECT
condition on the id
column); and the second, 2=1
is a relation attribute, meaning to search for values less than the specified one. Inside zSQLgate
, this search is translated into
SELECT * FROM author WHERE id < 3
Finally, having received a response to our search (which found two records, presumably with id
s 1 and 2), we set the desired retrieval format to XML, and fetch the records, which are rendered to the screen. (The YAZ client includes more protocol information than this in its output, but the substance of it is the two tiny XML records listed above.)
Conclusion
An ounce of example is worth a ton of explanation, isn't it?
BUILDING ON THE FIRST EXAMPLE
Comments
The zSQLgate
configuration file can contain comments, which are introduced by a hash character (#
), and run to the end of the line. They are ignored. Comments are the only part of the configuration file syntax that treats lines specially: otherwise the syntax is totally free-form.
For example, we could begin our configuration file like this:
# books.nzd - a sample configuration file for Net::Z39.50::DBIServer
# $Id: Tutorial.pm,v 1.20 2005-04-22 12:19:17 mike Exp $
#
# See Net::Z3950::DBIServer::Spec for a formal description of this
# file's format, if you need it. But it's actually pretty obvious.
(The Id line allows a version-control system such as CVS to insert information into the file, such as the date and time it was last checked in.)
Comments need not start at the beginning of a line: for example, we might want to add a note that the datasource is MySQL-specific, and perhaps to add the equivalent line for PostgreSQL, commented out:
datasource = "DBI:mysql:dbname=books" # If using MySQL
#datasource = "DBI:Pg:dbname=books" # If using PostgreSQL
Similarly, we might annotate the access-point lines with comments indicating the interpretation of the BIB-1 access points, since they are specified as rather opaque numbers:
access 12 = "id" # local number
access 1 = "name" # author
Authentication
If the relational database requires a username and password, these can be specified at the top level:
username = "simon" # If needed
password = "pieman" # If needed
Multiple Databases
A Z39.50 database corresponds roughly to a single table from a relational database. Within a relational database, a single zSQLgate
process can serve data from multiple tables, representing multiple Z39.50 databases. So let's add another table to the database, and add some records:
mysql> CREATE TABLE book(id INT, author_id INT, name TEXT,
year INT, notes TEXT, fulltext index (name));
mysql> INSERT INTO book(id,author_id,name,year)
VALUES('1','1','The Hitch Hiker''s Guide to the Galaxy','1979'); [etc.]
Now we can add a new database
section to the zSQLgate
configuration, indicating how to search in, and retrieve from, the new table:
database "works" {
table = "book"
search {
attrset bib1 {
access 12 = "id" # local number
access 4 = "name" # title
access 30 = "year" # year
}
}
data format xml {
record = "book"
field "bookId" = "id"
field "authorId" = "author_id"
field "authorName" = "author.name"
field "bookName" = "name"
field "bookYear" = "year"
field "bookNotes" = "?notes"
}
}
This works exactly the same way as the similar section for the "artists" database: Z39.50 searches against the BIB-1 access-points 12 (local number), 4 (title) and 30 (year) are implemented using the id
, name
and year
columns respectively. XML records are returned as a <book
> element containing <bookId
> derived from the id
column, etc.
Searching in Multiple Columns
The specification on the right-hand side of an access
declaration may be a comma-separated list of multiple columns to search when the specified access point is used:
access 1016 = "name,year,notes" # any
Searching with Multiple Attribute Sets
It is possible to support access points drawn from more than one attribute set. Often this is not necessary, as the BIB-1 attribute set is so nearly ubiquitous. However, supporting Z39.50 profiles developed in accordance with the Attribute Architecture, such as the Zthes and ZeeRex profiles, does require that attributes from multiple sets can be mixed - for example, the use of both a Dublin Core "title" and a Network Metadata "host" access points. Another use for multiple attribute sets is to support different access points that refer to the same columns - for example, BIB-1 "title" (4) and the attribute architecture cross-domain set's "title" (1).
In this case, simply provide multiple attrset
clauses within the database's search
section, enclosed in curly braces:
search {
attrset bib1 {
access 4 = "name" # BIB-1 title
access 30 = "year" # BIB-1 year
# ... etc. ...
}
attrset xd1 { # test for multiple attrsets
access 1 = "name" # AA cross-domain title
access 5 = "year" # AA cross-domain date
}
}
Then the following searches are equivalent: they are both title searches, using the attributes from the two different sets:
@attr 1=4 flower
@attr xd1 1=1 flower
Specifying Attribute Sets by OID
zSQLgate
knows the names of the most commonly used attribute sets; however, if it is necessary to support access points in an attribute set that it doesn't know - for example, a private one - that attribute set can be specfied by its OID rather than its name:
attrset 1.2.840.10003.3.1000.169.42 { # Private attribute-set
access 609 = "notes"
}
This can be useful for providing a way to search against a column that can't be mapped to any of the access points in the standard sets: for example, the notes
column of our books database.
Depending on what Z39.50 client you use, it should be possible to search using any attribute set by spelling out its OID in the query. For example, the YAZ command-line client lets you use:
@attr 1.2.840.10003.3.1000.169.42 1=609 guess
Specifying Default Attributes for Searching
It's possible to specify a set of default attributes that are used in every query unless overridden by attributes explicitly provided by the query itself. The attributes are listed after the defaultattrs
keyword within the search
section. This facility can be used to choose a default access point:
defaultattrs 1=4
Or default truncation:
defaultattrs 5=3
Or indeed both:
defaultattrs 1=4, 5=3
Note that default attributes specified in this manner are always taken from the BIB-1 attribute set. To make an access point in a different attribute set the default, make a "private" BIB-1 access point that searches in the same column, and make that the default.
Restricting Searches to a Subset of Available Records
it is sometime useful to hide some records from all searching - for example, records that have a "deleted" flag set, or that are marked as requiring revision before they can be published. This kind of restriction can be specified within a database
section, specifying an SQL clause to be ANDed with the query generated from what the Z39.50 client submits. For example:
restriction = "author_id != 1"
ensures that only records whose author_id
is different from 1 are included in the result sets returned to the client.
Linking Between Tables
We now approach the key to zSQLgate
's power, which is the ability to exploit relational links between tables (or, in Z39.50 terms, between databases). This is done using an auxiliary
clause within a database
sections, specifying what other table is related to the one in question, and by what relations. For example, in our toy database, every book
record has an author_id
which is the id
of a record in the author
table. This is expressed as follows:
database "works" {
table = "book"
auxiliary "author" with "author.id = book.author_id"
# ... etc. ...
This opens up the use of the auxiliary table's columns in both searching and retrieval. For example, it's possible to search the works
database by author name as follows:
access 1 = "author.name" # personal name
And to include the author's name in the retrieved records as follows:
field "authorName" = "author.name"
This only works correctly for many-to-one and one-to-many links. If an auxiliary table is specified that has many matches for the condition, then too many records will be generated - one for each linked record rather than one for each record in the primary table associated with the Z39.50 database.
Setting Attributes on XML Records
By default, the XML elements in the records generated by zSQLgate
are wrapped in a simple document element, whose name is specified by the record=
directive. Sometimes, though, it is useful to include attributes in this top-level element. This can be done using the attrs=
directive.
For example, the Guidelines for Implementing Dublin Core in XML - http://dublincore.org/documents/dc-xml-guidelines/index.shtml - specify that Dublin Core elements should be in the XML namespace http://purl.org/dc/elements/1.1/ . This namespace can be declared with a dc
prefix in the document element by specifying an xmlns:dc
attribute as follows:
attrs = "xmlns:dc='http://purl.org/dc/elements/1.1/'"
Transforming XML Records with XSLT
The facilities for constructing XML records in zSQLgate
allow only simple records to be created: a flat list of elements, each containing a field from the database, wrapped in a top-level element that may have some attributes specified. For many applications, this is sufficient. But more flexibility is required if it's necessary to construct records according to a pre-defined schema: for example, the RDF representation of Dublin Core records as described in Expressing Simple Dublin Core in RDF/XML - http://dublincore.org/documents/dcmes-xml/index.shtml - describe a structure in which the data elements are contained in a subelement of the document element, like this:
<rdf:RDF ...>
<rdf:Description ...>
<dc:title>Dave Beckett's Home Page</dc:title>
<dc:creator>Dave Beckett</dc:creator>
</rdf:Description>
</rdf:RDF>
In order to generate complex XML document such as this, zSQLgate allows an arbitrary XSLT stylesheet to be applied to the generated XML, and delivers the result of this transformation to the client. This functionality is requested using the transform
directive
transform = "dc-rdf.xsl"
within the format xml
section of the relevant database. The argument is the name of a file containing the stylesheet to be applied, interpreted relative to the working directory of the zSQLgate
server process.
Here is an example stylesheet, dc-rdf.xml
, which generates Dublin Core RDF/XML records from the raw XML records generated by the zsQLgate
core:
<?xml version="1.0"?>
<!-- $Id: Tutorial.pm,v 1.20 2005-04-22 12:19:17 mike Exp $ -->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"
doctype-public="-//DUBLIN CORE//DCMES DTD 2002/07/31//EN"
doctype-system="http://dublincore.org/documents/2002/07/31/dcmes-xml/dcmes-xml-dtd.dtd"/>
<xsl:template match="/book">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description>
<dc:title><xsl:value-of select="bookName"/></dc:title>
<dc:creator><xsl:value-of select="authorName"/></dc:creator>
</rdf:Description>
</rdf:RDF>
</xsl:template>
</xsl:stylesheet>
Conditional Fields
"?notes"
Compound Fields
"%{field1} - %{field2}"
Complex String Constants
"foo" + "bar"
Generating MARC Records
format marc
Generating GRS-1 Records
format grs1
Generating Cutomised SUTRS Records
format sutrs
Tuning Cache Size
cachesize
PUTTING IT TOGETHER
Here, then, is the complete configuration file using all of the facilities we've discussed:
# books.nzd - a sample configuration file for Net::Z39.50::DBIServer
# $Id: Tutorial.pm,v 1.20 2005-04-22 12:19:17 mike Exp $
#
# See Net::Z3950::DBIServer::Spec for a formal description of this
# file's format, if you need it. But it's actually pretty obvious.
datasource = "DBI:mysql:dbname=books" # If using MySQL
#datasource = "DBI:Pg:dbname=books" # If using PostgreSQL
#username = "simon" # If needed
#password = "pieman" # If needed
database "artists" {
table = "author"
search attrset bib1 {
access 12 = "id" # local number
access 1 = "name" # author
}
data format xml {
record = "author"
field "authorId" = "id"
field "authorName" = "name"
}
}
database "works" {
table = "book"
auxiliary "author" with "author.id = book.author_id"
restriction = "author_id != 1" # Omit Douglas Adams books
search {
defaultattrs 1=4
attrset bib1 {
access 12 = "id" # local number
access 4 = fulltext "name" # title (fulltext for MySQL)
access 30 = "year" # year
access 1016 = "name,year,notes" # any
}
attrset xd1 { # test for multiple attrsets
access 1 = "name" # AA cross-domain title
access 5 = "year" # AA cross-domain date
}
attrset 1.2.840.10003.3.1000.169.42 { # Private attribute-set
access 609 = "notes"
}
}
data {
cachesize = 2
format grs1 {
field (1,14) = "id" # tagSet-M local control number
field (3,"artistId") = "author_id"
field (2,1) = "name" # tagSet-G title
field (2,4) = "year" # tagSet-G date
field (3,"notes") = "?notes"
}
format xml {
record = "book"
#attrs = "xmlns='http://sql.z3950.org/schema/books/1.0'"
transform = "default.xsl"
field "bookId" = "id"
field "authorId" = "author_id"
field "authorName" = "author.name"
field "bookName" = "name"
field "bookYear" = "year"
field "bookNotes" = "?notes"
}
format marc {
field "001" = "id"
field "100/1$a" = "author.name"
field "245/1/0$a" = "name"
field "260$c" = "year"
field "500$a" = "*MARC record generated by zSQLgate from MySQL"
field "500$a" = "?notes"
}
format sutrs {
field "LocalIdentifier" = "id"
field "Title" = "name"
field "Author" = "author.name"
field "Published" = "year"
field "Notes" = "?notes"
field "titleStatement" = "<b>%{name}</b> -- <i>%{author.name}</i>"
}
}
}
AUTHOR
Mike Taylor <mike@miketaylor.org.uk>
First version Sunday 24th February 2002.