NAME
DATR2XML.pm - manipulate DATR .dtr, XML, HTML, XML
SYNOPSIS
#! perl -w
use DATR2XML;
undef $DATR2XML::includeNodePath;
$datr -> set_stylesheet('D:/DATR/XSLT/datr.xsl');
$datr_eg1 = new DATR2XML('D:\DATR\perl\eg.dtr');
$datr_eg2 = new DATR2XML('D:/DATR/perl/eg.dtr', "on");
$datr_eg3 = new DATR2XML('http://somewhere/doc.dtr', "verbose");
viewAll $datr_eg1;
$datr_eg2 -> viewHeader;
$datr_eg3 -> printHeader;
printOpening $datr_eg3;
printNodes $datr_eg3;
printClosing $datr_eg3;
printAll $datr_eg3;
save $datr_eg3;
DATR2XML::convert('D:\DATR\XSLT\eg_opening.dtr');
DESCRIPTION
This module parses into a Perl struct a DATR .dtr
-formatted file, as defined in Gerald Gazdar's 'DATR By Example' published on the DATR web-pages at the University of Sussex < http://www.sussex.ac.uk/ >.
Particular respect was paid to datanode31.html, though I confess the formal definitions found elsewhere on the site made no sense to me.
LOGGING
Process logging may be set to "off", "on" or "true", and "verbose".
REQUIRED MODULES
If internet access is required, the following modules must be installed and on the @INC path:
LWP::UserAgent
HTTP::Request
If no internet access is required, these modules will not be called.
DIAGNOSTICS
The usual warnings if it can't read or write.
EXPORTS
The module exports nothing to the calling namespace.
CAVEATS
The module does not fully support The DATR Standard Library RFC, Version 2.20. Specifically, it does not support the use of the proposed path cut operator as a full-stop within a path: all full stops are taken to signify the end of a clause.
TO DO
* Support The DATR Standard Library RFC, Version 2.20
* Change mechanism of _parseOpeningClosing to allow
line-spanning of contents.
* Support interpoloation of directives within body
as specified by the style sheet
* Fully support comment printing as specified by DATR XML DTD.
Currently lumps all comments together.
GLOBAL VARIABLES
These variables can adjust the output of the DTR parser: when they are undefined (using DATR2XML::$var = undef
) they prevent the DTR parser from outputing any element which has a default value, as defined in the DATR DTD; when they are defined with any value, they force XML output in full.
- $printComments
-
Set with any value to print comments,
undef
not to. - $includeNodePath
-
The DTD provides the default path as a null path, but this can adjusted by setting
$includeSentenceType
to 1. This can be reset by callingundef
upon the variable. See also include_sentence_type. - $includeSentenceType
-
The DATR DTD provides the default type as
==
, and this can be left if this variable is set, which is its defualt state. See also include_sentence_type. - $location_xsl
-
The path to the required XSLT stylesheet. The default is
http://www.leegoddard.com/DATR/XSLT/datr.xsl
. See also the method and procedure set_stylesheet. - $location_dtd
-
The SYSTEM location of (that is, the path to) the DATR DTD. The default is
http://www.leegoddard.com/DATR/DTD/DATR1.0.dtd
. See also the method and procedure set_dtd. - $datr_root
-
This is literally the root element as printed, and may contain a references, such as to XML schema.
Eg: $datr_root = '<DATR xmlns="x-schema:http://www.leegoddard.com/DATR/DTD/DATR1.0.xml">';
The defualt is simply the opening of the
DATR
element. See also set_schema.
PUBLIC METHODS
Constructor (new)
Creates a new DATR2XML object from file, URI or DATR .dtr
source.
Accepts: DATR source as scalar, array, scalar/array pointer, or path to a DATR file. If source is scalar or pointer to a scalar, is assumed to be just a list of node definitions, of BODY slot.
Optionally accepts a second argument to set logging: see the manual entry
for the logging method for details.
Returns: reference to object.
Object Structure: a hash with the following fields:
LOCATION - the name of the file, if any
HEADER - the file header (as defined in datrnode44.html#fileheader)
OPENING - opening declarations/directives as defined in datrnode45.html#openingdeclarations
BODY - node defintions,itself an array of hashes of the format defined in _parseNodes
CLOSING - clsoing declarations/directives as defined in datrnode47.html#closingdeclarations
include_sentence_type
Sets or resets the type
attribute of EQUATION
elements.
Calling with an argument value of 1
includes the type
attribute (default); calling with 0
forces the type
attribute to be omitted.
print_comments
Call without a value to stop comment printing; call with a value to restart comment printing. Default is to print comments.
set_stylesheet
Sets the path to the required XSLT stylesheet. See also location_xsl in the section Global Variables.
set_dtd
Sets the location of the DTD as used in the DOCTYPE SYSTEM declaration. See also location_dtd in the section Global Variables.
set_schema
Sets the location of the XML Schema as used in the root element. If called with no arguemnt value, removes all references to an XML Schema, setting $datr_root
to the opening of the DATR root tag without attributes.
Calling with a value of 1
sets the Schema to the author's, located at http://www.leegoddard.com/DATR/DTD/DATR1.0.xml
. See also datr_root in the section Global Variables.
logging
Turns logging off or on, verbose or minimal.
Accepts: "true|on|minimal" or "verbose" or "off|none|silent"
Returns: None
viewAll
Provides a rough printout of all records
Accepts: object ref;
Returns: none
viewHeader
Provides a rough printout of all nodes
Accepts: object ref;
Returns: none
viewOpening
Provides a rough view of the opening directives/definitions
Accepts: object ref;
Returns: none
viewClosing
Provides a rough view of the closing directives/definitions
Accepts: object ref;
Returns: none
viewNodes
Provides a rough printout of all nodes
Accepts: object ref;
Returns: none
save
Saves to local filesystem an XML printout of all records
Accepts: object ref;
optional file path to save at
or, for internal use, typeglob for PERL filehandle.
Returns: none
Notes: simply calls printAll, passing filehandle if necessary.
convert
Convert one or more DATR files to XML.
Accepts: I<Either>:
a filepath with an extension,
optionally with an additional destination filepath or directory,
I<or,>
for batch operation, a directory location.
Returns: nothing, will die on errors
Notes: Does not accept URLs and does not process sub-directories.
Minimizes logging during operation.
printAll
Provides an XML printout of all records
Accepts: object ref;
optional file path to save at.
or, for internal use, typeglob for PERL filehandle
Returns: none
printHeader
Provides an rough printout of all nodes
Accepts: object ref;
optional file path
or, for internal use, typeglob for PERL filehandle
Returns: none
printOpening; printClosing
Provides an XML printout of the opening/closing directives/definitions block element. Without passing a filepath or typeglob for filehandle, outputs to STDOUT. Just a wrapper for _printOpeningClosing.
Accepts: object ref;
optionally a file path
or, for internal use, typeglob for PERL filehandle
Returns: none
printNodes
Provides an XML printout of all nodes. Basically writes the EQUATION element and calls _parsePath
on each value of the object's {BODY}
key.
Accepts: object ref
Returns: none
PRIVATE METHODS
All private method subroutine names are prefixed with an underscore.
_loadFile (private method)
Load a dtr file from the local file system.
Accepts: object reference
Returns: an array of file contents
_loadURI (private method)
Load a dtr document from a URI
Accepts: object reference
Returns: an array of file contents
_parseHeader (private method)
Parses a .dtr
-format file header into the class record
Accepts: object ref;
Returns: none
Struct: This method fills the hash held in $self->{HEADER}
with whatever fields the C<.dtr> file header contains that match
a name/value pair delimited with a colon.
_parseOpening (private method)
Extracts opening directives, those occuring before node definitions, and places them into the self-object's OPENING array.
Accepts: object ref, ref to DATR data
Returns: none
_parseClosing (private method)
Extracts closing directives, those occuring before node definitions
Accepts: object ref; reference to array of DATR data
Returns: none
Notes: reverses @_ then applies same proc as _parseOpening, then reverses output
_parseNodes (private method)
Parse a list of nodes to the class BODY record.
Accepts: an obj ref and an reference to an array
of DATR data
Returns: none
Struct: This method creates the array of hashes held in $self->{BODY}
with the following fields:
NODE - the name of the current node
PATH - the (left-hand) path
TYPE - the sentence-type signifier: = or ==
VALUE - the (right-hand) value
COMMENT - an array of comments, index reflecting source line number
_parsePath (private pseudo-method)
Decodes path attributes into an XML structure.
Accepts: a string of DATR path (as in $$hash{VALUE});
optionally a second argument, being the name of a node to
build-out the sentence (cf. geraldg@cogs.susx.ac.uk, 06/07/00)
Returns: a string of XML structure
Notes: a bit of a hack, really.
_preFormatNodes (private method)
Formats nodes for processing by removing comments/directives/linefeeds
Accepts: strings or array of DATR node/path/value sentences
Returns: one string of DATR node/path/value sentences, without linebreaks
_setupOutput (private method)
Sets up a filehandle for output, whether STDOUT or not
Accepts: string of a filepath, or a filehandle, or a (ref to a) typeglob, or undef
Returns: a reference to a typeglob that is the filehandle
See also: "Passing Filehandles" in perlfaq7 Perl documentation
Note: Would it be better not to default to STDOUT but
to default to a filename specified at object construction time?
_printOpeningClosing (private pseudo-method)
Prints as XML contents of opening/clsoing, as requested.
AUTHOR and COPYRIGHT
Author: Lee Goddard code@leegoddard.com, leego@cogs.susx.ac.uk
Copyright: © Lee Goddard, 09/06/00 and as above. All Rights Reserved. License: The GNU General Public License applies: copies available from www.gnu.org/. You are free to distribute and modify this module under the same terms as those of Perl itself.
5 POD Errors
The following errors were encountered while parsing the POD:
- Around line 104:
'=item' outside of any '=over'
- Around line 178:
You forgot a '=back' before '=head1'
- Around line 353:
=cut found outside a pod block. Skipping to next block.
- Around line 386:
=cut found outside a pod block. Skipping to next block.
- Around line 1452:
Non-ASCII character seen before =encoding in '©'. Assuming CP1252