NAME

XML::LibXML::Fixup - apply regexes to XML to fix validation and parsing errors

SYNOPSIS

use XML::LibXML::Fixup;

DESCRIPTION

This module provides an interface to an XML parser to parse and validate XML files. The module allows fixups to be applied to non-parsing and non-validating XML. For full documentation, see the POD documentation for XML::LibXML. The documentation in this module does not cover the methods inherited from XML::LibXML.

CONSTRUCTOR

Create an instance of this class by calling the new() method.

use XML::LibXML::Fixup;
my $v = XML::LibXML::Fixup->new();

METHODS

Validity checks

The documentation for XML::LibXML recommends eval'ing a parse statement and checking $@ to detect parse errors. With this module, it is recommended that parsability and validity are checked using the $v->valid() method.

$v->valid()

Should be called after some parse method; for example $v->parse_string(). Returns a true value for valid/parsable XML. Returns a false value for invalid or unparsable XML.

$v->parse_string($xml);
print "valid" if $v->valid();

XML fixups

These are the methods that are used to control fixing-up of XML. The fixups are applied one-by-one, during parsing in the order that they were added to the object, until the XML validates or there are no more fixups to be applied.

$v->add_fixup($fixup,$description)

Adds a new fixup. $fixup must be a substitution regular expression or subroutine reference. If $fixup is a subroutine reference, it must act as a fliter to its first parameter, as shown in the second example (below). $description is a description of the substitution, which will be returned by $v->fixed_up() when called in list context. The following two fixups are similar, in that they both substitute upper-case closing paragraph tags with the lower-case equivalents.

$v->add_fixup('s!</P>!</p>!gs', 'upper-case close-para tags');
$v->add_fixup(sub{
                  my $xml = shift;
                  $xml =~ s#</P>#</p>#gs;
                  return $xml;
                 }, 'upper-case close-para tags');
$v->clear_fixups()

Clears the list of fixups held within a module.

$v->fixed_up()

In scalar context, returns the number of fixups that were applied during parsing. In list context, returns a list of description of the fixups that were applied during parsing. A fixup is deemed to have been applied if the search regex (first parameter to $v->add_fixup() matches the XML).

Note that this doesn't indicate whether the XML was valid after fixing up. Use in conjunction with $v->valid() to check whether fixups were necessary to parse the XML.

$v->add_fixup('s#</P>#</p>#gs', 'upper-case close-para tags');
$v->add_fixup('s#</?foobar/?>##gis', 'remove <FoObAr> tags');
$v->parse_string($xml);
if ($v->valid()){
  if ($v->fixed_up()){
    print "validated with some fixups: ";
    print $v->fixed_up(); # descriptions
    print "parsing errors without fixups were: ";
    print $v->get_errors();
  } else {
    print "validated without need for fixups";
}

Errors

Because the parser might need to make several attempts at parsing the XML before success, multiple parsing errors could occur. These are stored in an array and accessed using the utility methods listed below. Note that get_last_error() inherited from XML::LibXML can also be used to retrieve the most recent error.

$v->throw_exceptions(0)

Turns off (or on) the throwing of exceptions. Defaults to on. When exceptions are thrown, the parser will die (through croak()) when XML cannot be parsed or validated (only after all fixups have been applied and parsing still fails). Such an exception can be trapped in an eval() block. This is similar to the default behaviour of XML::LibXML. When exceptions are being suppressed, the parser will not call die() after failing to parse invalid XML. The validity of XML must, therefore, be checked using $v->valid().

When called with no arguments, returns a true value if exceptions will be thrown, or a false value if they will be suppressed.

$v->get_errors()

Returns a list of errors produced during parsing and validation.

$v->next_error()

Returns the next error, or undef if there are no more errors. Useful function when used as an iterator:

while(my $error = $v->next_error()){
  # do something with $error
}
$v->first_error()

Resets position of next error to be retrieved by next_error().

NOTES

The only XML::LibXML parsing function currently supported is $v->parse_string($xml).

SEE ALSO

XML::LibXML - used by this module to validate XML.

AUTHOR

Nigel Wetters, <nwetters@cpan.org>

COPYRIGHT AND LICENSE

Copyright 2002 by Rivals Digital Media Ltd. Use and distribution allowed under the same terms as Perl.