NAME
ROADS::HTMLOut - A class to dump out HTML in various forms
SYNOPSIS
use ROADS::HTMLOut;
EditorViewSelection
$postprocessed_strings = GenericSubs($string);
$dir = GetMessageDir($program, $view, $language, $charset);
InitLookup;
InitLang;
if (LangFileExists($program, $file, $language, $charset)) {...}
print ListMissingMandatory;
OutputHTML($program, $file, $language, $charset);
print SelectDatabases;
print SubjectListingSelection;
print TemplateTypeSelection;
print WhatsNewSelection;
DESCRIPTION
This class contains a number of methods for turning text containing ROADS specific psuedo-HTML tags into normal HTML using variable interpolation.
METHODS
print EditorViewSelection;
This method looks at the keys of the views hash array, and generates an HTML SELECT menu with an element for each of them.
GenericSubs( string );
This method knows about a large number of generic substitutions which may be carried out on a string, typically involving replacing a "fake" HTML tag with the results of a variable interpolation. These are listed separately in the ROADS technical documentation.
GetMessageDir( program, view, language, charset );
This method tries to find the most appropriate HTML messages directory to use for a given combination of program name, rendering view, language and character set.
InitLookup;
This method seeds a hash array LanguageLookup with the available language details from the ROADS installation, typically config/languages. The array is keyed on the language code and character set, e.g. "en-gb-ISO-8859-1", and the value for a given element is a path relative to the ROADS config directory, or an absolute path. This path points to the outline HTML message files for a particular language and character set combination.
InitLang;
This method initializes the Language and CharSet variables (used to select outline HTML for rendering to the end user) based on the following algorithm :-
If the command line switch -L or -C is set, its value will
be used
... otherwise if the HTTP Accept-Language or Accept-Charset
header is set, its value will be used
... otherwise if the CGI variable Language or Charset is set,
its value will be used
... otherwise the default values of "en" and "iso-8859-1"
will be used
The tests for language and character set are actually independent, though we've grouped them together here for simplicity.
LangFileExists( program, file, language, charset );
This method tests for the existence of a message file for a particular program and language/character set combination.
print ListMissingMandatory;
This method returns a string containing an HTML list structure, each entry of which is one of the elements in the scalar array MissingMandatory. This is normally used by mktemp.pl, the ROADS template editor, to indicate fields which should have ben filled in but weren't.
OutputHTML( program, file, language, charset );
This method tries to send the message file file with any variable substitutions which may be necessary for the program program in the requested language and charset if possible. We try to use HTTP content negotiation to control the directory which is searched in for message files for a given language and charset combination.
Note that this method does not send the HTTP Content-type header. This is something that any code which calls it will have to do itself.
print SelectDatabases;
This method returns an HTML SELECT structure each element of which corresponds to a database configured in the ROADS installation.
print SubjectListingSelection;
This method returns an HTML SELECT structure each element of which corresponds to a subject listing view.
print TemplateTypeSelection;
This method returns an HTML SELECT structure each element of which corresponds to one of the available template types.
print WhatsNewSelection;
This method returns an HTML SELECT structure with an element for each of the avavailable "What's New" views.
FILES
config/languages - specifies the directories where the HTML messages file may be found for a particular language and character set/encoding combination.
config/multilingual/* - default location of outline HTML messages distributed with the ROADS software. Each program has its own sub- directory under this. Programs which support multiple "views" of a data set typically have a directory program-views.
HTML message files are formatted as normal, which additional "pseudo-HTML" tags as described separately in the ROADS technical documentation.
PSEUDO-HTML TAGS
What's New listing programs
This tag is understood by bin/addwn.pl and bin/cullwn.pl:
- ADDEDTIME
-
Replaced by time at which template was last modified, found by doing a stat(2) of the file it lives in.
Survey program
This tag is understood by cgi-bin/survey.pl, the user survey program.
- X-HANDLE
-
Replaced with a unique identifier generated from the current time and the process ID of the running CGI program.
Generalized mechanism
The following tags are handled by the OutputHTML routine. This is quite flexible in terms of the directories it will look in for its HTML outlines - mainly because of the support we are adding for internationalisation.
OutputHTML is invoked with the name of a program and an associated message file, e.g. tempbyhand and nohandle.html. It then checks to see whether there this file is
- available in the preferred language(s) and charset(s).
-
NB this part is still under development!
- in a sub-directory of the ROADS config directory,
-
This should be named the same as the name of the program, e.g. config/tempbyhand/nohandle.html.
The tags we understand are:
- ADMINCGI
-
Replaced by \$ROADS::WWWAdminCgi.
- ADMIN-CGI
-
Replaced by \$ROADS::WWWAdminCgi.
- ALLTEMPLATETYPES
-
Replaced by a SELECT menu of all of the template types available, found by looking at the filenames in the \$ROADS::Config/outlines directory. An additional item, ALL will be added, and marked selected by default. See also TEMPLATETYPELIST.
- CHARSET
-
Replaced by a hidden field setting the value of the HTML form variable charset to the value of \$CharSet if present, i.e.
<INPUT TYPE="hidden" NAME="charset" VALUE="$CharSet">
- CGI-BIN
-
Replaced by \$ROADS::WWWCgiBin.
- DATABASES
-
Replaced by a SELECT menu of all of the databases which are known to the ROADS server - i.e. present in \$ROADS::Config/databases. In this context a database is essentially the combination of WHOIS++ server hostname, port number, and Destination attribute to search on. An extra entry, selected by default, will be added for ALL of the databases. See also REALDATABASES.
- HANDLE
-
Replaced by \$Handle if present
- HTDOCS
-
Replaced by \$ROADS::WWWHtDocs.
- LANGUAGE
-
Replaced by a hidden field setting the value of the HTML form variable language to the value of \$Language if present, e.g.
<INPUT TYPE="hidden" NAME="language" VALUE="$Language">
- MATCHES
-
Replaced by \$matches if present.
- MISSINGMANDATORY
-
Replaced by a bullet-point list of the contents of the @MissingMandatory array - used by the template editor to signal mandatory attributes which have not been filled in.
- MKTEMP-ADDITIONAL
-
Replaced by \$additional if present.
- MKTEMP-DEFAULT
-
Replaced by \$default if present.
- MKTEMP-MODE
-
Replaced by \$CGIvar{mode} if present.
- MKTEMP-OP
-
Replaced by \$CGIvar{op} if present.
- MKTEMP-VIEW
-
Replaced by \$CGIvar{view} if present.
- MYURL
-
Replaced by \$myurl if present.
- NAME
-
Replaced by \$longname, the full name of this subject category.
- ORIGINALHANDLE
-
Replaced by \$CGIvar{originalhandle} if present.
- QUERY
-
Replaced by \$query if present.
- SELECTLISTING
-
Replaced by a SELECT menu of all the subject listing views which are known to the ROADS server - i.e. present in \$ROADS::Config/subject-listing/views.
- SELECTVIEW
-
Replaced by a SELECT menu of all of the template editor views for this particular template type which are known to the ROADS server - i.e. present in the appropriate file in \$ROADS::Config/mktemp-views/.
- SELECTWHATSNEW
-
Replaced by a SELECT menu of all the What's New views which are known to the ROADS server - i.e. present in \$ROADS::Config/whats-new/views.
- REALDATABASES
-
Replaced by a SELECT menu of all of the databases which are known to the ROADS server - i.e. present in \$ROADS::Config/databases. In this context a database is essentially the combination of WHOIS++ server hostname, port number, and Destination attribute to search on. See also DATABASES.
- ROADSDBADMINEMAIL
-
Replaced by \$ROADS::DBAdminEmail.
- ROADSSERVICENAME
-
Replaced by \$ROADS::ServiceName.
- ROADSSYSADMINEMAIL
-
Replaced by \$ROADS::SysAdminEmail.
- SCHEME
-
Replaced by \$scheme_name, the Subject-Descriptor scheme specified on the command line, or UDC if not present.
- TEMPLATETYPE
-
Replaced by \$CGIvar{templatetype} if present.
- TEMPLATETYPELIST
-
Replaced by a SELECT menu of all of the template types available, found by looking at the filenames in the \$ROADS::Config/outlines directory. An additional item, ALL will be added, and marked selected by default. See also ALLTEMPLATETYPES.
- THISPOSTFORM
-
Creates an HTML form using the POST method, with \$myurl as the action, i.e.
<FORM ACTION="$myurl" METHOD="POST">
Note that you must supply the closing
</FORM>
- THISGETFORM
-
Creates an HTML form using the GET method, with \$myurl as the action, i.e.
<FORM ACTION="$myurl" METHOD="GET">
Note that you must supply the closing
</FORM>
INTERNATIONALIZATION (I18N)
The HTML output by the ROADS tools is capable of being internationalized by allowing a different set of HTML documents to be sent back to the end user depending upon the language and character set in use. The language and character set can be specified by (in order of decreasing priority) browser HTTP headers, CGI parameters, command line options to the scripts and built in defaults. The CGI parameters for are called language and charset, the HTTP headers are HTTP_ACCEPT_LANGUAGE and HTTP_ACCEPT_CHARSET and the options are usually -L and -C. Whilst older browsers rarely allowed the user to specify the HTTP headers, many of the newest browsers do allow the headers to be easily configured by the end user using GUI control panels (see your particular browser's documentation for details of how to do this - there are far too many browsers in use to permit us to detail this).
The out-of-the-box default language and charset for ROADS is for a language of "en" (International English) and a character set of "iso-8859-1" (ISO Latin 1 - Western European characters). The mapping between these parameters and the actual set of language pages is made using the \$ROADS::Config/languages file. This file looks something like this:
en-uk ISO-8859-1 multilingual/UK-English
en-gb ISO-8859-1 multilingual/UK-English
en-us ISO-8859-1 multilingual/UK-English
en ISO-8859-1 multilingual/UK-English
en iso-8859-1,*,utf-8 multilingual/UK-English
de ISO-8859-1 multilingual/Deutsch
Each line has a language, character set and path to a directory. The path can either be an absolute path to anywhere in the filesystems on the machine or a path relative to \$ROADS::Config (as shown in the default file above). Inside the directory, each ROADS program has its own subdirectory and it is within these subdirectories that the actual HTML is located. Currently ROADS is distributed with a full set of International English HTML files and a small demonstration subset for the mktemp.pl introduction FORM in German. Hopefully over time, contributed translations of the ROADS HTML will be made available.
The use of HTML FORMs within ROADS does currently lead to some problems for internationalisation (I18N). Both the HTML 2.0 standard (RFC1866) and the W3C's HTML 3.2 Recommendation both used coded character sets based on the ISO-8859-1 Latin-1 character set. This provides support for most Western European characters. The newer W3C HTML 4.0 recommendation is based upon Unicode and therefore allows a greater range of characters to be represented in HTML documents. It also provides support for detailing the language in use and direction that sections of the text should be render/read in.
Until the development of HTML 4.0, all form data being submitted from web browsers to CGI programs had to consist of ASCII text. Even with HTML 4.0, CGI scripts using the GET method or scripts using the POST method with the widely used application/x-www-form-urlencoded MIME type can only receive ASCII text. Only FORMs using the POST method between HTML 4.0 compliant browsers where the enclosure type is something like multipart/form-data can be used to pass non-ASCII characters. Unfortunately, HTML 4.0 browsers that support these features are currently still quite rare and the HTML 4.0 specification was only released towards the end of the ROADS v2 development phase. It is hoped that ROADS v3 will be able to make use of these new features and by that time the bulk of the web browsers in use will also support them.
In the meantime, although the ROADS indexing software is capable of indexing characters from outside of the ASCII character set, it is very difficult for cataloguers and end users to enter multilingual strings. For this reason we encourage sites that do wish to provide a multilingual service to provide at least an English version of there data,and if possible a Romanized version of the native language form(s) of their data, so that existing browsers can search their databases.
BUGS
Some confusion over variable scoping. It's also unclear whether programs should need to use the language and character set parameters (and initialize these themselves), or whether these should automatically be initialized to sensible values.
OutputHTML sends its output to the currently selected file descriptor. It might make more sense to have it return its output as a string or scalar array for further processing, e.g. by a user defined module.
SEE ALSO
admin-cgi and cgi-bin programs.
COPYRIGHT
Copyright (c) 1988, Martin Hamilton <martinh@gnu.org> and Jon Knight <jon@net.lut.ac.uk>. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
It was developed by the Department of Computer Studies at Loughborough University of Technology, as part of the ROADS project. ROADS is funded under the UK Electronic Libraries Programme (eLib), the European Commission Telematics for Research Programme, and the TERENA development programme.
AUTHOR
Jon Knight <jon@net.lut.ac.uk>, Martin Hamilton <martinh@gnu.org>