NAME

Treex::PML::Schema - Perl implements a PML schema.

DESCRIPTION

This class implements PML schemas. PML schema consists of a set of type declarations of several kinds, represented by objects inheriting from a common base class Treex::PML::Schema::Decl.

INHERITANCE

This class inherits from Treex::PML::Schema::Template.

Attribute Paths

Some methods use so called 'attribute paths' to navigate through nested and referenced type declarations. An attribute path is a '/'-separated sequence of steps, where step can be one of the following:

!type-name

'!' followed by name of a named type (this step can only occur as the very first step

name

name (of a member of a structure, element of a sequence or attribute of a container), specifying the type declaration of the specified named component

#content

the string '#content', specifying the content type declaration of a container

LM

specifying the type declaration of a list

AM

specifying the type declaration of an alt

[NNN]

where NNN is a decimal number (ignored) are an equivalent of LM or AM

Steps of the form LM, AM, and [NNN] (except when occuring at the end of an attribute path) may be omitted.

EXPORT

This module exports constants for declaration types.

EXPORT TAGS

:constants

Export constant symbols (exported by default, too).

CONSTANTS

See Treex::PML::Schema::Constants.

METHODS

Treex::PML::Schema->new ({ option => value, ... })

NOTE: Don't call this constructor directly, use Treex::PML::Factory->createPMLSchema() instead!

Parses an XML representation of a PML Schema from a string, filehandle, local file, or URL, processing the modular instructions as described in

L<http://ufal.mff.cuni.cz/jazz/PML/doc/pml_doc.html#processing>

and returns the corresponding Treex::PML::Schema object.

One of the following options must be given:

string

a XML string to parse

filename

a file name or URL

fh

a file-handle (IO::File, IO::Pipe, etc.) open for reading

The following options are optional:

base_url

base URL for referred schemas (usefull when parsing from a file-handle or a string)

use_resources

if this option is used with a true value, the parser will attempt to locate referred schemas also in Treex::PML resource paths.

revision, minimal_revision, maximal_revision

constraints to the revision number of the schema.

validate

if this option is used with a true value, the parser will validate the schema on the fly using a RelaxNG grammar given using the relaxng_schema parameter; if relaxng_schema is not given, the file 'pml_schema_inline.rng' searched for in Treex::PML resource paths is assumed.

relaxng_schema

a particular RelaxNG grammar to validate against. The value may be an URL or filename for the grammar in the RelaxNG XML format, or a XML::LibXML::RelaxNG object representation. The compact format is not supported.

Treex::PML::Schema->readFrom (filename,opts)

An obsolete alias for Treex::PML::Schema->new({%$opts, filename=>$filename}).

$schema->write ({option => value})

This method serializes the Treex::PML::Schema object to XML. See Treex::PML::Schema::XMLNode->write for implementation.

IMPORTANT: The resulting schema is simplified, that is all modular instructions are processed and removed from it, see http://ufal.mff.cuni.cz/jazz/PML/doc/pml_doc.html#processing

One of the following options must be given:

string

a scalar reference to which the XML is to be stored as a string

filename

a file name

fh

a file-handle (IO::File, IO::Pipe, etc.) open for writing

One of the following options are optional:

no_backups

if this option is used with a true value, the writer will not attempt to create backup (tilda) files when overwriting an existing file.

no_indent

if this option is used with a true value, the writer will not add additional newlines and indentatin white-space to the result XML.

$schema->get_url ()

Return location of the PML schema file.

$schema->set_url ($URI)

Set location of the PML schema file.

$schema->get_pml_version ()

Return PML version the schema conforms to.

$schema->get_revision ()

Return PML schema revision.

$schema->get_description ()

Return PML schema description.

$schema->get_root_decl ()

Return the root type declaration (see Treex::PML::Schema::Root).

$schema->get_root_type ()

Like $schema->get_root_decl->get_content_decl.

$decl->get_decl_type ()

Return the constant PML_SCHEMA_DECL (for compatibility with the Treex::PML::Schema::Decl interface).

$decl->get_decl_type_str ()

Return the string 'schema' (for compatibility with the Treex::PML::Schema::Decl interface).

$schema->get_root_name ()

Return name of the root element for PML instance.

$schema->get_type_names ()

Return names of all named type declarations.

$schema->get_named_references ()

This method returns a list of HASHrefs containing information about a named references to PML instances (each hash will currently have the keys 'name' and 'readas').

$schema->get_named_reference_info (name)

This method retrieves information about a specific named instance reference as a hash (currently with keys 'name' and 'readas').

Treex::PML::Schema::cmp_revisions($A, $B)

This function compares two schema revision strings according to the ruls described in the PML specification. Returns -1 if revision $A precedes revision $B, 0 if the revisions are equal (equivalent), and 1 if revision $A follows revision $B.

$schema->for_each_decl (sub{...})

This method traverses all nested declarations and sub-declarations and calls a given subroutine passing the sub-declaration object as a parameter.

$schema->check_revision({ option=>value })

Check that schema revision satisfies given constraints. The following options are suported:

revision: exact revision number to match

minimal_revision: minimal revision number to match

maximal_revision: maximal revision number to match

revision error: an optional error message format string with %f mark for the schema filename or URL and %e for the error string. Defaults to 'Error: wrong schema revision of %f: %e';

$schema->convert_from_hash

Compatibility method building the schema object from a nested hash structure created by XML::Simple which was used in older implementations. This is useful for upgrading objects stored in old binary dumps.

$schema->find_type_by_path (attribute-path,noresolve,decl)

Locate a declaration specified by attribute-path starting from declaration decl. If decl is undefined the root type declaration is used. (Note that attribute paths starting with '/' are always evaluated startng from the root declaration and paths starting with '!' followed by a name of a named type are evaluated starting from that type.) All references to named types are transparently resolved in each step.

The caller should pass a true value in noresolve to enforce Member, Attribute, Element, Type, or Root declaration objects to be returned rather than declarations of their content.

Attribute path is a '/'-separated sequence of steps (member, attribute, element names or strings matching [\d*]) which identifying a certain nested type declaration. A step of the aforementioned form [\d*] is match the content declaration of a List or Alt. Note however, that named stepsdive into List or Alt declarations automatically, too.

$schema->find_types_by_role (role,start_decls)

Return a list of declarations (objects derived from Treex::PML::Schema::Decl) that have role equal to role.

If start_decls is specified, it must be an ARRAY reference of declarations; in that case, only declarations nested below the listed ones are considered.

$schema->find_role (role,start_decl,opts)

WARINING: this function can be very slow, esp. if the type declarations are recursive.

Return a list of attribute paths leading to nested type declarations of decl with role equal to role.

This is equivalent to

$schema->find_decl($decl,sub{ $_[0]->{role} eq $role },$opts);

Please, see the documentation for find_dec for more information.

$schema->find_decl (callback,start_decl,opts)

WARINING: this function can be very slow, esp. if the type declarations are recursive.

Return a list of attribute paths leading to nested type declarations of decl for which a given callback returns a true value. The tested type declaration is passed to the callback as the first (and only) argument.

If start_decls is specified, it must be an ARRAY reference of declarations; in that case, only declarations nested or referred to from the listed ones are considered.

In array context return all matching nested declarations are returned. In scalar context only the first one is returned (with early stopping).

The last argument opts can be used to pass some flags to the algorithm. Currently only the flag no_childnodes is available. If true, then the function never recurses into content declaration of declarations with the role #CHILDNODES.

$schema->node_types ()

Return a list of all type declarations with the role #NODE.

$schema->get_type_by_name (name)

Return the declaration of the named type with a given name (see Treex::PML::Schema::Type).

$schema->validate_object (object, type_decl, log, flags)

Validates the data content of the given object against a specified type declaration. The type_decl argument must either be an object derived from the Treex::PML::Schema::Decl class or the name of a named type.

An array reference may be passed as the optional 3rd argument log to obtain a detailed report of all validation errors.

The flags argument can specify flags that influance the validation. The following constants can binary-OR'ed to obtain the fags:

PML_VALIDATE_NO_TREES - do not validate nested data with roles #CHIDLNODES or #TREES and do not require that objects with the role #NODE implement the Treex::PML::Node role.

PML_VALIDATE_NO_CHILDNODES - do not validate nested data with the role #CHIDLNODES.

Returns: 1 if the content conforms, 0 otherwise.

$schema->validate_field (object, attr-path, type, log)

This method is similar to validate_object, but in this case the validation is restricted to the data substructure of object specified by the attr-path argument.

type is the type of object specified either by the name of a named type, or as a Treex::PML::Type, or a type declaration.

An array reference may be passed as the optional 3rd argument log to obtain a detailed report of all validation errors.

Returns: 1 if the content conforms, 0 otherwise.

$schema->get_paths_to_atoms (\@decls, \%opts)

This method returns a list of all non-periodic canonical paths leading from given types to atomic values. Currently only the following options are supported:

no_childnodes => $bool

If true, the method does not descent to member types with the role #CHILDNODES.

no_nodes => $bool

If true, the method does not descent to member types with the role #NODE (except for the starting types).

with_LM => $bool

If true, the paths will include a LM step for each List type on the path.

with_AM => $bool

If true, the paths will include a AM step for each Alt type on the path.

with_Seq_brackets => $bool

If true, the paths will append a [0] after each step representing a sequence element

$schema->attributes (decl...)

This function tries to emulate the behavior of <<Treex::PML::FSFormat-attributes>>> to some extent.

Return attribute paths to all atomic subtypes of given type declarations. If no type declaration objects are given, then types with role #NODE are assumed. This function never descends to subtypes with role #CHILDNODES.

$schema->post_process($options)

Auxiliary method used internally by the PML Schema parser. It simplifies the schema and for each declaration object creates back references to its parent declaration and schema and pre-computes the type attribute path returned by $decl->get_decl_path().

CLASSES FOR TYPE DECLARATIONS

Treex::PML::Schema::Decl
Treex::PML::Schema::Root
Treex::PML::Schema::Type
Treex::PML::Schema::Struct
Treex::PML::Schema::Container
Treex::PML::Schema::Seq
Treex::PML::Schema::List
Treex::PML::Schema::Alt
Treex::PML::Schema::Choice
Treex::PML::Schema::CDATA
Treex::PML::Schema::Constant
Treex::PML::Schema::Member
Treex::PML::Schema::Element
Treex::PML::Schema::Attribute

SEE ALSO

Prague Markup Language (PML) format: http://ufal.mff.cuni.cz/jazz/PML/

Tree editor TrEd: http://ufal.mff.cuni.cz/~pajas/tred

Related packages: Treex::PML, Treex::PML::Schema::Template, Treex::PML::Schema::Decl, Treex::PML::Instance,

COPYRIGHT AND LICENSE

Copyright (C) 2006-2010 by Petr Pajas

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.2 or, at your option, any later version of Perl 5 you may have available.