NAME

XML::PP - A simple XML parser

VERSION

Version 0.04

SYNOPSIS

use XML::PP;

my $parser = XML::PP->new();
my $xml = '<note id="1"><to priority="high">Tove</to><from>Jani</from><heading>Reminder</heading><body importance="high">Don\'t forget me this weekend!</body></note>';
my $tree = $parser->parse($xml);

print $tree->{name};	# 'note'
print $tree->{children}[0]->{name};	# 'to'

DESCRIPTION

You almost certainly do not need this module, for most tasks use XML::Simple or XML::LibXML. XML::PP exists only for the most lightweight of scenarios where you can't get one of the above modules to install, for example, CI/CD machines running Windows that get stuck with https://stackoverflow.com/questions/11468141/cant-load-c-strawberry-perl-site-lib-auto-xml-libxml-libxml-dll-for-module-x.

XML::PP is a simple, lightweight XML parser written in pure Perl. It does not rely on external libraries like XML::LibXML and is suitable for small XML parsing tasks. This module supports basic XML document parsing, including namespace handling, attributes, and text nodes.

METHODS

new

my $parser = XML::PP->new();
my $parser = XML::PP->new(strict => 1);
my $parser = XML::PP->new(warn_on_error => 1);

Creates a new XML::PP object. It can take several optional arguments:

  • strict - If set to true, the parser dies when it encounters unknown entities or unescaped ampersands.

  • warn_on_error - If true, the parser emits warnings for unknown or malformed XML entities. This is enabled automatically if strict is enabled.

  • logger

    Used for warnings and traces. It can be an object that understands warn() and trace() messages, such as a Log::Log4perl or Log::Any object, a reference to code, a reference to an array, or a filename.

parse

my $tree = $parser->parse($xml_string);

Parses the XML string and returns a tree structure representing the XML content. The returned structure is a hash reference with the following fields:

  • name - The tag name of the node.

  • ns - The namespace prefix (if any).

  • ns_uri - The namespace URI (if any).

  • attributes - A hash reference of attributes.

  • children - An array reference of child nodes (either text nodes or further elements).

collapse_structure

Collapse an XML-like structure into a simplified hash (like XML::Simple).

use XML::PP;

my $input = {
    name => 'note',
    children => [
        { name => 'to', children => [ { text => 'Tove' } ] },
        { name => 'from', children => [ { text => 'Jani' } ] },
        { name => 'heading', children => [ { text => 'Reminder' } ] },
        { name => 'body', children => [ { text => 'Don\'t forget me this weekend!' } ] },
    ],
    attributes => { id => 'n1' },
};

my $result = collapse_structure($input);

# Output:
# {
#     note => {
#         to      => 'Tove',
#         from    => 'Jani',
#         heading => 'Reminder',
#         body    => 'Don\'t forget me this weekend!',
#     }
# }

The collapse_structure subroutine takes a nested hash structure (representing an XML-like data structure) and collapses it into a simplified hash where each child element is mapped to its name as the key, and the text content is mapped as the corresponding value. The final result is wrapped in a note key, which contains a hash of all child elements.

This subroutine is particularly useful for flattening XML-like data into a more manageable hash format, suitable for further processing or display.

collapse_structure accepts a single argument:

  • $node (Required)

    A hash reference representing a node with the following structure:

    {
        name      => 'element_name',  # Name of the element (e.g., 'note', 'to', etc.)
        children  => [                # List of child elements
            { name => 'child_name', children => [{ text => 'value' }] },
            ...
        ],
        attributes => { ... },        # Optional attributes for the element
        ns_uri => ... ,               # Optional namespace URI
        ns => ... ,                   # Optional namespace
    }

    The children key holds an array of child elements. Each child element may have its own name and text, and the function will collapse all text values into key-value pairs.

The subroutine returns a hash reference that represents the collapsed structure, where the top-level key is note and its value is another hash containing the child elements' names as keys and their corresponding text values as values.

For example:

{
    note => {
        to      => 'Tove',
        from    => 'Jani',
        heading => 'Reminder',
        body    => 'Don\'t forget me this weekend!',
    }
}
Basic Example:

Given the following input structure:

my $input = {
    name => 'note',
    children => [
        { name => 'to', children => [ { text => 'Tove' } ] },
        { name => 'from', children => [ { text => 'Jani' } ] },
        { name => 'heading', children => [ { text => 'Reminder' } ] },
        { name => 'body', children => [ { text => 'Don\'t forget me this weekend!' } ] },
    ],
};

Calling collapse_structure will return:

{
    note => {
        to      => 'Tove',
        from    => 'Jani',
        heading => 'Reminder',
        body    => 'Don\'t forget me this weekend!',
    }
}

_parse_node

my $node = $self->_parse_node($xml_ref, $nsmap);

Recursively parses an individual XML node. This method is used internally by the parse method. It handles the parsing of tags, attributes, text nodes, and child elements. It also manages namespaces and handles self-closing tags.

AUTHOR

Nigel Horne, <njh at nigelhorne.com>

SEE ALSO

SUPPORT

This module is provided as-is without any warranty.

LICENSE AND COPYRIGHT

Copyright 2025 Nigel Horne.

Usage is subject to licence terms.

The licence terms of this software are as follows:

  • Personal single user, single computer use: GPL2

  • All other users (including Commercial, Charity, Educational, Government) must apply in writing for a licence for use from Nigel Horne at the above e-mail.