NAME

XML::Dataset - Extracts XML into Perl Datasets based upon a simple text profile markup language

VERSION

version 0.002

SYNOPSIS

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

my $profile = qq(
   catalog
      shop
         book
           author = dataset:title_and_author
           title  = dataset:title_and_author
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

DESCRIPTION

Provides a simple means of parsing XML to return a selection of information based on a markup profile describing the XML structure and how the structure relates to a logical grouping of information ( a dataset ).

METHODS

parse_using_profile

Parses XML based upon a profile.

Input: XML<string>, Profile<string>

RATIONALE

I often found myself developing, adjusting and manipulating perl code using a variety of packages to extract XML sources into logical groupings that were relevant to the underline data as opposed to a perl structure of an entire XML source.

As well as the initial time in developing an appropriate construct to parse the source data, any future changes to the XML output involved additional changes to the code base.

I wanted a simplified solution, one where I can leverage a simple markup language that I could operate on to provide the context of interest with the necessary manipulation of data where desired.

I investigated a number of options available in the perl community to simplify the overall process. Whilst many excellent options are available, I did not find an option that provided the level of simplicity that I desired. This module is a result of the effort to fulfill this requirement.

EXAMPLES

Example 1 - Simple Dataset Extraction

Overview

The following example shows the extraction of the title and author information from the example XML document into a dataset called title_and_author.

The XML::Dataset profile follows a similar structure to the XML with elements indented to depict the relationship between entities.

Information that needs to be captured from within an element ( or an attribute ) is referenced using the <value> = dataset:<dataset_name> syntax.

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

my $profile = qq(
   catalog
      shop
         book
           author = dataset:title_and_author
           title  = dataset:title_and_author
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            author   "Gambardella, Matthew",
            title    "XML Developer's Guide"
        },
        [1] {
            author   "Ralls, Kim",
            title    "Midnight Rain"
        },
        [2] {
            author   "Corets, Eva",
            title    "Maeve Ascendant"
        },
        [3] {
            author   "Corets, Eva",
            title    "Oberon's Legacy"
        }
    ]
}

Example 2 - Working With Multiple Datasets

Overview

This example builds upon the previous to facilitate an additional dataset of title_and_genre. As per the example profile, multiple datasets can be specified through a space seperated list as per 'title' which is used for both title_and_author and title_and_genre.

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

my $profile = qq(
   catalog
      shop
         book
           author = dataset:title_and_author
           title  = dataset:title_and_author dataset:title_and_genre
           genre  = dataset:title_and_genre
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            author   "Gambardella, Matthew",
            title    "XML Developer's Guide"
        },
        [1] {
            author   "Ralls, Kim",
            title    "Midnight Rain"
        },
        [2] {
            author   "Corets, Eva",
            title    "Maeve Ascendant"
        },
        [3] {
            author   "Corets, Eva",
            title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            genre   "Computer",
            title   "XML Developer's Guide"
        },
        [1] {
            genre   "Fantasy",
            title   "Midnight Rain"
        },
        [2] {
            genre   "Fantasy",
            title   "Maeve Ascendant"
        },
        [3] {
            genre   "Fantasy",
            title   "Oberon's Legacy"
        }
    ]
}

Example 3 - Handling XML Attributes

Overview

XML Attributes are treated in the profile as a sub level key/value in the profile. The following example depicts the inclusion of the attribute 'id' in the returned datasets. Note how id is indented under book and on the same level as author, title, genre etc.

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

my $profile = qq(
   catalog
      shop
         book
           id     = dataset:title_and_author dataset:title_and_genre
           author = dataset:title_and_author
           title  = dataset:title_and_author dataset:title_and_genre
           genre  = dataset:title_and_genre
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            author   "Gambardella, Matthew",
            id       "bk101",
            title    "XML Developer's Guide"
        },
        [1] {
            author   "Ralls, Kim",
            id       "bk102",
            title    "Midnight Rain"
        },
        [2] {
            author   "Corets, Eva",
            id       "bk103",
            title    "Maeve Ascendant"
        },
        [3] {
            author   "Corets, Eva",
            id       "bk104",
            title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            genre   "Computer",
            id      "bk101",
            title   "XML Developer's Guide"
        },
        [1] {
            genre   "Fantasy",
            id      "bk102",
            title   "Midnight Rain"
        },
        [2] {
            genre   "Fantasy",
            id      "bk103",
            title   "Maeve Ascendant"
        },
        [3] {
            genre   "Fantasy",
            id      "bk104",
            title   "Oberon's Legacy"
        }
    ]
}

Example 4 - Using higher level data across datasets

Overview

Information that is available at a higher level to that of the specified dataset information can be referenced and included in datasets using a combination of the external_dataset and __EXTERNAL_VALUE__ markers.

The external_dataset marker informs the parser to store the information for later use. It follows the format of external_dataset:<target> where <target> is a reference name that identifies the external store.

The __EXTERNAL_VALUE__ marker informs the parser to reference a value that is or will be stored externally. It follows the format of __EXTERNAL_VALUE__ = <external_store>:<external_value>:<target_dataset>

Optionally the __EXTERNAL_VALUE__ marker can receive an additional parameter of :<override_name> making the full syntax <external_store>:<external_value>:<target_dataset>:<override_name>

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

my $profile = qq(
   catalog
      shop
         number   = external_dataset:shop_information
         book
           id     = dataset:title_and_author dataset:title_and_genre
           author = dataset:title_and_author
           title  = dataset:title_and_author dataset:title_and_genre
           genre  = dataset:title_and_genre
           __EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            author   "Gambardella, Matthew",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            author   "Ralls, Kim",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            author   "Corets, Eva",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            author   "Corets, Eva",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            genre    "Computer",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            genre    "Fantasy",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            genre    "Fantasy",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            genre    "Fantasy",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ]
}

Example 5 - Optional Dataset Parameters: name

Overview

Dataset declarations can receive additional parameters through comma seperated inclusions. In this example the XML element of 'genre' is renamed to 'style' during processing using the name declaration.

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

my $profile = qq(
   catalog
      shop
         number   = external_dataset:shop_information
         book
           id     = dataset:title_and_author dataset:title_and_genre
           author = dataset:title_and_author
           title  = dataset:title_and_author dataset:title_and_genre
           genre  = dataset:title_and_genre,name:style
           __EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            author   "Gambardella, Matthew",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            author   "Ralls, Kim",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            author   "Corets, Eva",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            author   "Corets, Eva",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            id       "bk101",
            number   1,
            style    "Computer",
            title    "XML Developer's Guide"
        },
        [1] {
            id       "bk102",
            number   1,
            style    "Fantasy",
            title    "Midnight Rain"
        },
        [2] {
            id       "bk103",
            number   2,
            style    "Fantasy",
            title    "Maeve Ascendant"
        },
        [3] {
            id       "bk104",
            number   2,
            style    "Fantasy",
            title    "Oberon's Legacy"
        }
    ]
}

Example 6 - Optional Dataset Parameters: prefix

Overview

The prefix declaration assigns a prefix to the assignment name, for example genre with a prefix of shop_information_ will become shop_information_genre

For consistency, in this example, the external information of name uses the additional optional parameter of :<override_name> as mentioned in Example 4 to override the external name

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

my $profile = qq(
   catalog
      shop
         number   = external_dataset:shop_information
         book
           id     = dataset:title_and_author,prefix:shop_information_ dataset:title_and_genre,prefix:shop_information_
           author = dataset:title_and_author,prefix:shop_information_
           title  = dataset:title_and_author,prefix:shop_information_ dataset:title_and_genre,prefix:shop_information_
           genre  = dataset:title_and_genre,prefix:shop_information_
           __EXTERNAL_VALUE__ = shop_information:number:title_and_author:shop_information_number shop_information:number:title_and_genre:shop_information_number
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            shop_information_author   "Gambardella, Matthew",
            shop_information_id       "bk101",
            shop_information_number   1,
            shop_information_title    "XML Developer's Guide"
        },
        [1] {
            shop_information_author   "Ralls, Kim",
            shop_information_id       "bk102",
            shop_information_number   1,
            shop_information_title    "Midnight Rain"
        },
        [2] {
            shop_information_author   "Corets, Eva",
            shop_information_id       "bk103",
            shop_information_number   2,
            shop_information_title    "Maeve Ascendant"
        },
        [3] {
            shop_information_author   "Corets, Eva",
            shop_information_id       "bk104",
            shop_information_number   2,
            shop_information_title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            shop_information_genre    "Computer",
            shop_information_id       "bk101",
            shop_information_number   1,
            shop_information_title    "XML Developer's Guide"
        },
        [1] {
            shop_information_genre    "Fantasy",
            shop_information_id       "bk102",
            shop_information_number   1,
            shop_information_title    "Midnight Rain"
        },
        [2] {
            shop_information_genre    "Fantasy",
            shop_information_id       "bk103",
            shop_information_number   2,
            shop_information_title    "Maeve Ascendant"
        },
        [3] {
            shop_information_genre    "Fantasy",
            shop_information_id       "bk104",
            shop_information_number   2,
            shop_information_title    "Oberon's Legacy"
        }
    ]
}

Example 7 - Optional Dataset Parameters: process

Overview

The process parameter can be used for inline manipulation of data. In this example the author is passed through a simple subroutine that returns an uppercase value.

The parser expects methods specified by the process declaration to be available to the main namespace.

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

sub return_uc {
   return uc($_[0]);
}

my $profile = qq(
   catalog
      shop
         number   = external_dataset:shop_information
         book
           id     = dataset:title_and_author dataset:title_and_genre
           author = dataset:title_and_author,process:return_uc
           title  = dataset:title_and_author dataset:title_and_genre
           genre  = dataset:title_and_genre
           __EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            author   "GAMBARDELLA, MATTHEW",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            author   "RALLS, KIM",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            author   "CORETS, EVA",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            author   "CORETS, EVA",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            genre    "Computer",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            genre    "Fantasy",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            genre    "Fantasy",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            genre    "Fantasy",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ]
}

Example 8 - Hinting for new datasets

Overview

During processing, the parser looks for indicators that it should create a new dataset. As an example, when new data is encountered rather than overriding the existing data, a new dataset is created. Unfortunately this may lead to unexpected results when working with poorly structured input where subsets of information may be missing from the XML structure.

To mitigate this, the hint __NEW_DATASET__ = <dataset> is available to force the creation of a new dataset upon entering a block.

If there are any concerns about the consistency of the XML document then it is recommended that the __NEW_DATASET__ declaration is made within all respective blocks as part of the profile definition.

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
   </shop>
</catalog>
);

sub return_uc {
   return uc($_[0]);
}

my $profile = qq(
   catalog
      shop
         number   = external_dataset:shop_information
         book
           __NEW_DATASET__ = title_and_author title_and_genre
           id     = dataset:title_and_author dataset:title_and_genre
           author = dataset:title_and_author,process:return_uc
           title  = dataset:title_and_author dataset:title_and_genre
           genre  = dataset:title_and_genre
           __EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            author   "GAMBARDELLA, MATTHEW",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            author   "RALLS, KIM",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            author   "CORETS, EVA",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            author   "CORETS, EVA",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            genre    "Computer",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            genre    "Fantasy",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            genre    "Fantasy",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            genre    "Fantasy",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ]
}

Example 9 - Hinting for higher level data

Overview

There may be occasions where information at a parallel level is required and subsequently, that information appears after the desired dataset information. To accomodate this, the __NEW_EXTERNAL_VALUE_HOLDER__ marker is available.

This can be used to create a stub store for the holder before it is actually processed by the parser. As the module uses aliases internally, the dataset is updated with a pointer which is subsequently updated to reflect the appropriate value as and when it is reached by the parser.

The XML example has been updated to include an information section that details the shop location.

__NEW_EXTERNAL_VALUE_HOLDER__ is declared at the corresponding indentation with a value of shop_information:address This tells the parser to store an externally referencable marker with a default value of '' -

shop
   __NEW_EXTERNAL_VALUE_HOLDER__ = shop_information:address

The shop_information:address:title_and_author entry under __EXTERNAL_VALUE__ informs the parser to lookup the externally stored value and store this value in the dataset, at which point storing the exising default value -

__EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre

The indentation of information and address tells the parser to update the external_dataset entry for shop_information, subsequently updating the default value and reflecting the value where applicable across the desired datasets.

information
  address = external_dataset:shop_information

Code

use XML::Dataset;
use Data::Printer;

my $example_data = qq(<?xml version="1.0"?>
<catalog>
   <shop number="1">
      <book id="bk101">
         <author>Gambardella, Matthew</author>
         <title>XML Developer's Guide</title>
         <genre>Computer</genre>
         <price>44.95</price>
         <publish_date>2000-10-01</publish_date>
         <description>An in-depth look at creating applications 
         with XML.</description>
      </book>
      <book id="bk102">
         <author>Ralls, Kim</author>
         <title>Midnight Rain</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-12-16</publish_date>
         <description>A former architect battles corporate zombies, 
         an evil sorceress, and her own childhood to become queen 
         of the world.</description>
      </book>
      <information>
        <address>Regents Street</address>
      </information>
   </shop>
   <shop number="2">
      <book id="bk103">
         <author>Corets, Eva</author>
         <title>Maeve Ascendant</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2000-11-17</publish_date>
         <description>After the collapse of a nanotechnology 
         society in England, the young survivors lay the 
         foundation for a new society.</description>
      </book>
      <book id="bk104">
         <author>Corets, Eva</author>
         <title>Oberon's Legacy</title>
         <genre>Fantasy</genre>
         <price>5.95</price>
         <publish_date>2001-03-10</publish_date>
         <description>In post-apocalypse England, the mysterious 
         agent known only as Oberon helps to create a new life 
         for the inhabitants of London. Sequel to Maeve 
         Ascendant.</description>
      </book>
      <information>
        <address>Oxford Street</address>
      </information>
   </shop>
</catalog>
);

sub return_uc {
   return uc($_[0]);
}

my $profile = qq(
   catalog
      shop
         __NEW_EXTERNAL_VALUE_HOLDER__ = shop_information:address
         number   = external_dataset:shop_information
         book
           __NEW_DATASET__ = title_and_author title_and_genre
           id     = dataset:title_and_author dataset:title_and_genre
           author = dataset:title_and_author,process:return_uc
           title  = dataset:title_and_author dataset:title_and_genre
           genre  = dataset:title_and_genre
           __EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre shop_information:address:title_and_author shop_information:address:title_and_genre
         information
           address = external_dataset:shop_information
);

# Capture the output
my $output = parse_using_profile( $example_data, $profile ); 

# Print using Data::Printer
p $output;

Output

\ {
    title_and_author   [
        [0] {
            address   "Regents Street",
            author    "GAMBARDELLA, MATTHEW",
            id        "bk101",
            number    1,
            title     "XML Developer's Guide"
        },
        [1] {
            address   "Regents Street",
            author    "RALLS, KIM",
            id        "bk102",
            number    1,
            title     "Midnight Rain"
        },
        [2] {
            address   "Oxford Street",
            author    "CORETS, EVA",
            id        "bk103",
            number    2,
            title     "Maeve Ascendant"
        },
        [3] {
            address   "Oxford Street",
            author    "CORETS, EVA",
            id        "bk104",
            number    2,
            title     "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            address   "Regents Street",
            genre     "Computer",
            id        "bk101",
            number    1,
            title     "XML Developer's Guide"
        },
        [1] {
            address   "Regents Street",
            genre     "Fantasy",
            id        "bk102",
            number    1,
            title     "Midnight Rain"
        },
        [2] {
            address   "Oxford Street",
            genre     "Fantasy",
            id        "bk103",
            number    2,
            title     "Maeve Ascendant"
        },
        [3] {
            address   "Oxford Street",
            genre     "Fantasy",
            id        "bk104",
            number    2,
            title     "Oberon's Legacy"
        }
    ]
}

The use of Data::Printer vs Data::Dumper within the examples

I'm a long time advocate of Data::Dumper. Data::Printer is also an excellent module. In the examples, for clarity purposes Data::Printer was chosen over Data::Dumper owing to the display differences that result from the internal use of Data::Alias.

As an example, here is the output from Example 4 depicted through Data::Dumper and Data::Printer.

It's important to understand the internal structure of the datasets if you plan on making changes to the returned information.

Using Data::Printer

\ {
    title_and_author   [
        [0] {
            author   "Gambardella, Matthew",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            author   "Ralls, Kim",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            author   "Corets, Eva",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            author   "Corets, Eva",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ],
    title_and_genre    [
        [0] {
            genre    "Computer",
            id       "bk101",
            number   1,
            title    "XML Developer's Guide"
        },
        [1] {
            genre    "Fantasy",
            id       "bk102",
            number   1,
            title    "Midnight Rain"
        },
        [2] {
            genre    "Fantasy",
            id       "bk103",
            number   2,
            title    "Maeve Ascendant"
        },
        [3] {
            genre    "Fantasy",
            id       "bk104",
            number   2,
            title    "Oberon's Legacy"
        }
    ]
}

Using Data::Dumper

$VAR1 = \{
            'title_and_genre' => [
                                   {
                                     'number' => '1',
                                     'title' => 'XML Developer\'s Guide',
                                     'id' => 'bk101',
                                     'genre' => 'Computer'
                                   },
                                   {
                                     'number' => ${\${$VAR1}->{'title_and_genre'}->[0]->{'number'}},
                                     'title' => 'Midnight Rain',
                                     'id' => 'bk102',
                                     'genre' => 'Fantasy'
                                   },
                                   {
                                     'number' => '2',
                                     'title' => 'Maeve Ascendant',
                                     'id' => 'bk103',
                                     'genre' => 'Fantasy'
                                   },
                                   {
                                     'number' => ${\${$VAR1}->{'title_and_genre'}->[2]->{'number'}},
                                     'title' => 'Oberon\'s Legacy',
                                     'id' => 'bk104',
                                     'genre' => 'Fantasy'
                                   },
                                   {
                                     'number' => '1',
                                     'title' => 'XML Developer\'s Guide',
                                     'id' => 'bk101',
                                     'genre' => 'Computer'
                                   },
                                   {
                                     'number' => ${\${$VAR1}->{'title_and_genre'}->[4]->{'number'}},
                                     'title' => 'Midnight Rain',
                                     'id' => 'bk102',
                                     'genre' => 'Fantasy'
                                   },
                                   {
                                     'number' => '2',
                                     'title' => 'Maeve Ascendant',
                                     'id' => 'bk103',
                                     'genre' => 'Fantasy'
                                   },
                                   {
                                     'number' => ${\${$VAR1}->{'title_and_genre'}->[6]->{'number'}},
                                     'title' => 'Oberon\'s Legacy',
                                     'id' => 'bk104',
                                     'genre' => 'Fantasy'
                                   }
                                 ],
            'title_and_author' => [
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[0]->{'number'}},
                                      'title' => 'XML Developer\'s Guide',
                                      'author' => 'Gambardella, Matthew',
                                      'id' => 'bk101'
                                    },
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[0]->{'number'}},
                                      'title' => 'Midnight Rain',
                                      'author' => 'Ralls, Kim',
                                      'id' => 'bk102'
                                    },
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[2]->{'number'}},
                                      'title' => 'Maeve Ascendant',
                                      'author' => 'Corets, Eva',
                                      'id' => 'bk103'
                                    },
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[2]->{'number'}},
                                      'title' => 'Oberon\'s Legacy',
                                      'author' => 'Corets, Eva',
                                      'id' => 'bk104'
                                    },
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[4]->{'number'}},
                                      'title' => 'XML Developer\'s Guide',
                                      'author' => 'Gambardella, Matthew',
                                      'id' => 'bk101'
                                    },
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[4]->{'number'}},
                                      'title' => 'Midnight Rain',
                                      'author' => 'Ralls, Kim',
                                      'id' => 'bk102'
                                    },
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[6]->{'number'}},
                                      'title' => 'Maeve Ascendant',
                                      'author' => 'Corets, Eva',
                                      'id' => 'bk103'
                                    },
                                    {
                                      'number' => ${\${$VAR1}->{'title_and_genre'}->[6]->{'number'}},
                                      'title' => 'Oberon\'s Legacy',
                                      'author' => 'Corets, Eva',
                                      'id' => 'bk104'
                                    }
                                  ]
          };

SEE ALSO

Standing on the shoulders of giants, this module leverages the excellent XML::LibXML::Reader which itself is built upon the powerful libxml2 library. XML::LibXML::Reader uses an iterator approach to parsing XML documents, resulting in an approach that is easier to program than an event based parser (SAX) and much more lightweight than a tree based parser (DOM) which loads the complete tree into memory.

This was a particular consideration in the choice of scaffoloding chosen for this module.

Data::Alias is utilised internally for lookback operations. The module allows you to apply "aliasing semantics" to a section of code, causing aliases to be made wherever Perl would normally make copies instead. You can use this to improve efficiency and readability, when compared to using references.

AUTHOR

James Spurin <james@spurin.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by James Spurin.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.