NAME
XML::LibXML::Fixup - apply regexes to XML to fix validation and parsing errors
SYNOPSIS
use XML::LibXML::Fixup;
DESCRIPTION
This module provides an interface to an XML parser to parse and validate XML files. The module allows fixups to be applied to non-parsing and non-validating XML. For full documentation, see the POD documentation for XML::LibXML. The documentation in this module does not cover the methods inherited from XML::LibXML.
CONSTRUCTOR
Create an instance of this class by calling the new() method.
use XML::LibXML::Fixup;
my $v = XML::LibXML::Fixup->new();
METHODS
Validity checks
The documentation for XML::LibXML recommends eval'ing a parse statement and checking $@ to detect parse errors. With this module, it is recommended that parsability and validity are checked using the $v->valid() method.
- $v->valid()
-
Should be called after some parse method; for example $v->parse_string(). Returns a true value for valid/parsable XML. Returns a false value for invalid or unparsable XML.
$v->parse_string($xml); print "valid" if $v->valid();
XML fixups
These are the methods that are used to control fixing-up of XML. The fixups are applied one-by-one, during parsing in the order that they were added to the object, until the XML validates or there are no more fixups to be applied.
- $v->add_fixup($search,$replace,$description)
-
Adds a new fixup. $search is a regular expression - either a string or a quoted regular expression (see "Regexp Quote-Like Operators" in perlop). $replace is the text to replace text matching the $search pattern. $description is a description of the substitution, which will be returned by $v->fixed_up() when called in list context. The following two fixups are similar, in that they both substiture upper-case closing paragraph tags with the lower-case equivalents. The second form will treat the XML as if it were a single line.
$v->add_fixup('</P>', '</p>', 'upper-case close-para tags'); $v->add_fixup(qr!</P>!s, '</p>', 'upper-case close-para tags');
Internally to the module, the following substitution is performed during fixup:
$xml =~ s/$search/$replace/g;
At present, capturing of patterns is not supported.
- $v->clear_fixups()
-
Clears the list of fixups held within a module.
- $v->fixed_up()
-
In scalar context, returns the number of fixups that were applied during parsing. In list context, returns a list of description of the fixups that were applied during parsing. A fixup is deemed to have been applied if the search regex (first parameter to $v->add_fixup() matches the XML).
Note that this doesn't indicate whether the XML was valid after fixing up. Use in conjunction with $v->valid() to check whether fixups were necessary to parse the XML.
$v->add_fixup('</P>', '</p>', 'upper-case close-para tags'); $v->add_fixup(qr!</?foobar/?>!is, '', '<FoObAr> tags'); $v->parse_string($xml); if ($v->valid()){ if ($v->fixed_up()){ print "validated with some fixups: "; print $v->fixed_up(); # descriptions print "parsing errors without fixups were: "; print $v->get_errors(); } else { print "validated without need for fixups"; }
Errors
Because the parser might need to make several attempts at parsing the XML before success, multiple parsing errors could occur. These are stored in an array and accessed using the utility methods listed below. Note that get_last_error() inherited from XML::LibXML can also be used to retrieve the most recent error.
- $v->throw_exceptions(0)
-
Turns off (or on) the throwing of exceptions. Defaults to on. When exceptions are thrown, the parser will die (through croak()) when XML cannot be parsed or validated (only after all fixups have been applied and parsing still fails). Cuch an exception can be trapped in an eval() block. This is similar to the default behaviour of XML::LibXML. When exceptions are being suppressed, the parser will not call die() after failing to parse invalid XML. The validity of XML must, therefore, be checked using $v->valid().
When called with no arguments, returns a true value if exceptions will be thrown, or a false value if they will be suppressed.
- $v->get_errors()
-
Returns a list of errors produced during parsing and validation.
- $v->next_error()
-
Returns the next error, or undef if there are no more errors. Useful function when used as an iterator:
while(my $error = $v->next_error()){ # do something with $error }
- $v->first_error()
-
Resets position of next error to be retrieved by next_error().
NOTES
The only XML::LibXML parsing function currently supported is $v->parse_string($xml).
SEE ALSO
XML::LibXML - used by this module to validate XML.
perlop - how to quote a regular expression using qr//.
AUTHOR
Nigel Wetters, <nwetters@cpan.org>
COPYRIGHT AND LICENSE
Copyright 2002 by Rivals Digital Media Ltd. Use and distribution allowed under the same terms as Perl.