NAME
XML::Dataset - Extracts XML into Perl Datasets based upon a simple text profile markup language
VERSION
version 0.003
SYNOPSIS
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
my $profile = qq(
catalog
shop
book
author = dataset:title_and_author
title = dataset:title_and_author
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
DESCRIPTION
Provides a simple means of parsing XML to return a selection of information based on a markup profile describing the XML structure and how the structure relates to a logical grouping of information ( a dataset ).
METHODS
parse_using_profile
Parses XML based upon a profile.
Input: XML<string>, Profile<string>
RATIONALE
I often found myself developing, adjusting and manipulating perl code using a variety of packages to extract XML sources into logical groupings that were relevant to the underline data as opposed to a perl structure of an entire XML source.
As well as the initial time in developing an appropriate construct to parse the source data, any future changes to the XML output involved additional changes to the code base.
I wanted a simplified solution, one where I can leverage a simple markup language that I could operate on to provide the context of interest with the necessary manipulation of data where desired.
I investigated a number of options available in the perl community to simplify the overall process. Whilst many excellent options are available, I did not find an option that provided the level of simplicity that I desired. This module is a result of the effort to fulfill this requirement.
EXAMPLES
Example 1 - Simple Dataset Extraction
Overview
The following example shows the extraction of the title and author information from the example XML document into a dataset called title_and_author.
The XML::Dataset profile follows a similar structure to the XML with elements indented to depict the relationship between entities.
Information that needs to be captured from within an element ( or an attribute ) is referenced using the <value> = dataset:<dataset_name> syntax.
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
my $profile = qq(
catalog
shop
book
author = dataset:title_and_author
title = dataset:title_and_author
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
author "Gambardella, Matthew",
title "XML Developer's Guide"
},
[1] {
author "Ralls, Kim",
title "Midnight Rain"
},
[2] {
author "Corets, Eva",
title "Maeve Ascendant"
},
[3] {
author "Corets, Eva",
title "Oberon's Legacy"
}
]
}
Example 2 - Working With Multiple Datasets
Overview
This example builds upon the previous to facilitate an additional dataset of title_and_genre. As per the example profile, multiple datasets can be specified through a space seperated list as per 'title' which is used for both title_and_author and title_and_genre.
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
my $profile = qq(
catalog
shop
book
author = dataset:title_and_author
title = dataset:title_and_author dataset:title_and_genre
genre = dataset:title_and_genre
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
author "Gambardella, Matthew",
title "XML Developer's Guide"
},
[1] {
author "Ralls, Kim",
title "Midnight Rain"
},
[2] {
author "Corets, Eva",
title "Maeve Ascendant"
},
[3] {
author "Corets, Eva",
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
genre "Computer",
title "XML Developer's Guide"
},
[1] {
genre "Fantasy",
title "Midnight Rain"
},
[2] {
genre "Fantasy",
title "Maeve Ascendant"
},
[3] {
genre "Fantasy",
title "Oberon's Legacy"
}
]
}
Example 3 - Handling XML Attributes
Overview
XML Attributes are treated in the profile as a sub level key/value in the profile. The following example depicts the inclusion of the attribute 'id' in the returned datasets. Note how id is indented under book and on the same level as author, title, genre etc.
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
my $profile = qq(
catalog
shop
book
id = dataset:title_and_author dataset:title_and_genre
author = dataset:title_and_author
title = dataset:title_and_author dataset:title_and_genre
genre = dataset:title_and_genre
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
author "Gambardella, Matthew",
id "bk101",
title "XML Developer's Guide"
},
[1] {
author "Ralls, Kim",
id "bk102",
title "Midnight Rain"
},
[2] {
author "Corets, Eva",
id "bk103",
title "Maeve Ascendant"
},
[3] {
author "Corets, Eva",
id "bk104",
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
genre "Computer",
id "bk101",
title "XML Developer's Guide"
},
[1] {
genre "Fantasy",
id "bk102",
title "Midnight Rain"
},
[2] {
genre "Fantasy",
id "bk103",
title "Maeve Ascendant"
},
[3] {
genre "Fantasy",
id "bk104",
title "Oberon's Legacy"
}
]
}
Example 4 - Using higher level data across datasets
Overview
Information that is available at a higher level to that of the specified dataset information can be referenced and included in datasets using a combination of the external_dataset and __EXTERNAL_VALUE__ markers.
The external_dataset marker informs the parser to store the information for later use. It follows the format of external_dataset:<target> where <target> is a reference name that identifies the external store.
The __EXTERNAL_VALUE__ marker informs the parser to reference a value that is or will be stored externally. It follows the format of __EXTERNAL_VALUE__ = <external_store>:<external_value>:<target_dataset>
Optionally the __EXTERNAL_VALUE__ marker can receive an additional parameter of :<override_name> making the full syntax <external_store>:<external_value>:<target_dataset>:<override_name>
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
my $profile = qq(
catalog
shop
number = external_dataset:shop_information
book
id = dataset:title_and_author dataset:title_and_genre
author = dataset:title_and_author
title = dataset:title_and_author dataset:title_and_genre
genre = dataset:title_and_genre
__EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
author "Gambardella, Matthew",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
author "Ralls, Kim",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
author "Corets, Eva",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
author "Corets, Eva",
id "bk104",
number 2,
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
genre "Computer",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
genre "Fantasy",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
genre "Fantasy",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
genre "Fantasy",
id "bk104",
number 2,
title "Oberon's Legacy"
}
]
}
Example 5 - Optional Dataset Parameters: name
Overview
Dataset declarations can receive additional parameters through comma seperated inclusions. In this example the XML element of 'genre' is renamed to 'style' during processing using the name declaration.
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
my $profile = qq(
catalog
shop
number = external_dataset:shop_information
book
id = dataset:title_and_author dataset:title_and_genre
author = dataset:title_and_author
title = dataset:title_and_author dataset:title_and_genre
genre = dataset:title_and_genre,name:style
__EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
author "Gambardella, Matthew",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
author "Ralls, Kim",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
author "Corets, Eva",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
author "Corets, Eva",
id "bk104",
number 2,
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
id "bk101",
number 1,
style "Computer",
title "XML Developer's Guide"
},
[1] {
id "bk102",
number 1,
style "Fantasy",
title "Midnight Rain"
},
[2] {
id "bk103",
number 2,
style "Fantasy",
title "Maeve Ascendant"
},
[3] {
id "bk104",
number 2,
style "Fantasy",
title "Oberon's Legacy"
}
]
}
Example 6 - Optional Dataset Parameters: prefix
Overview
The prefix declaration assigns a prefix to the assignment name, for example genre with a prefix of shop_information_ will become shop_information_genre
For consistency, in this example, the external information of name uses the additional optional parameter of :<override_name> as mentioned in Example 4 to override the external name
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
my $profile = qq(
catalog
shop
number = external_dataset:shop_information
book
id = dataset:title_and_author,prefix:shop_information_ dataset:title_and_genre,prefix:shop_information_
author = dataset:title_and_author,prefix:shop_information_
title = dataset:title_and_author,prefix:shop_information_ dataset:title_and_genre,prefix:shop_information_
genre = dataset:title_and_genre,prefix:shop_information_
__EXTERNAL_VALUE__ = shop_information:number:title_and_author:shop_information_number shop_information:number:title_and_genre:shop_information_number
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
shop_information_author "Gambardella, Matthew",
shop_information_id "bk101",
shop_information_number 1,
shop_information_title "XML Developer's Guide"
},
[1] {
shop_information_author "Ralls, Kim",
shop_information_id "bk102",
shop_information_number 1,
shop_information_title "Midnight Rain"
},
[2] {
shop_information_author "Corets, Eva",
shop_information_id "bk103",
shop_information_number 2,
shop_information_title "Maeve Ascendant"
},
[3] {
shop_information_author "Corets, Eva",
shop_information_id "bk104",
shop_information_number 2,
shop_information_title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
shop_information_genre "Computer",
shop_information_id "bk101",
shop_information_number 1,
shop_information_title "XML Developer's Guide"
},
[1] {
shop_information_genre "Fantasy",
shop_information_id "bk102",
shop_information_number 1,
shop_information_title "Midnight Rain"
},
[2] {
shop_information_genre "Fantasy",
shop_information_id "bk103",
shop_information_number 2,
shop_information_title "Maeve Ascendant"
},
[3] {
shop_information_genre "Fantasy",
shop_information_id "bk104",
shop_information_number 2,
shop_information_title "Oberon's Legacy"
}
]
}
Example 7 - Optional Dataset Parameters: process
Overview
The process parameter can be used for inline manipulation of data. In this example the author is passed through a simple subroutine that returns an uppercase value.
The parser expects methods specified by the process declaration to be available to the main namespace.
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
sub return_uc {
return uc($_[0]);
}
my $profile = qq(
catalog
shop
number = external_dataset:shop_information
book
id = dataset:title_and_author dataset:title_and_genre
author = dataset:title_and_author,process:return_uc
title = dataset:title_and_author dataset:title_and_genre
genre = dataset:title_and_genre
__EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
author "GAMBARDELLA, MATTHEW",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
author "RALLS, KIM",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
author "CORETS, EVA",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
author "CORETS, EVA",
id "bk104",
number 2,
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
genre "Computer",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
genre "Fantasy",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
genre "Fantasy",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
genre "Fantasy",
id "bk104",
number 2,
title "Oberon's Legacy"
}
]
}
Example 8 - Hinting for new datasets
Overview
During processing, the parser looks for indicators that it should create a new dataset. As an example, when new data is encountered rather than overriding the existing data, a new dataset is created. Unfortunately this may lead to unexpected results when working with poorly structured input where subsets of information may be missing from the XML structure.
To mitigate this, the hint __NEW_DATASET__ = <dataset> is available to force the creation of a new dataset upon entering a block.
If there are any concerns about the consistency of the XML document then it is recommended that the __NEW_DATASET__ declaration is made within all respective blocks as part of the profile definition.
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
</shop>
</catalog>
);
sub return_uc {
return uc($_[0]);
}
my $profile = qq(
catalog
shop
number = external_dataset:shop_information
book
__NEW_DATASET__ = title_and_author title_and_genre
id = dataset:title_and_author dataset:title_and_genre
author = dataset:title_and_author,process:return_uc
title = dataset:title_and_author dataset:title_and_genre
genre = dataset:title_and_genre
__EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
author "GAMBARDELLA, MATTHEW",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
author "RALLS, KIM",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
author "CORETS, EVA",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
author "CORETS, EVA",
id "bk104",
number 2,
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
genre "Computer",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
genre "Fantasy",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
genre "Fantasy",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
genre "Fantasy",
id "bk104",
number 2,
title "Oberon's Legacy"
}
]
}
Example 9 - Hinting for higher level data
Overview
There may be occasions where information at a parallel level is required and subsequently, that information appears after the desired dataset information. To accomodate this, the __NEW_EXTERNAL_VALUE_HOLDER__ marker is available.
This can be used to create a stub store for the holder before it is actually processed by the parser. As the module uses aliases internally, the dataset is updated with a pointer which is subsequently updated to reflect the appropriate value as and when it is reached by the parser.
The XML example has been updated to include an information section that details the shop location.
__NEW_EXTERNAL_VALUE_HOLDER__ is declared at the corresponding indentation with a value of shop_information:address This tells the parser to store an externally referencable marker with a default value of '' -
shop
__NEW_EXTERNAL_VALUE_HOLDER__ = shop_information:address
The shop_information:address:title_and_author entry under __EXTERNAL_VALUE__ informs the parser to lookup the externally stored value and store this value in the dataset, at which point storing the exising default value -
__EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre
The indentation of information and address tells the parser to update the external_dataset entry for shop_information, subsequently updating the default value and reflecting the value where applicable across the desired datasets.
information
address = external_dataset:shop_information
Code
use XML::Dataset;
use Data::Printer;
my $example_data = qq(<?xml version="1.0"?>
<catalog>
<shop number="1">
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer's Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications
with XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<information>
<address>Regents Street</address>
</information>
</shop>
<shop number="2">
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon's Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<information>
<address>Oxford Street</address>
</information>
</shop>
</catalog>
);
sub return_uc {
return uc($_[0]);
}
my $profile = qq(
catalog
shop
__NEW_EXTERNAL_VALUE_HOLDER__ = shop_information:address
number = external_dataset:shop_information
book
__NEW_DATASET__ = title_and_author title_and_genre
id = dataset:title_and_author dataset:title_and_genre
author = dataset:title_and_author,process:return_uc
title = dataset:title_and_author dataset:title_and_genre
genre = dataset:title_and_genre
__EXTERNAL_VALUE__ = shop_information:number:title_and_author shop_information:number:title_and_genre shop_information:address:title_and_author shop_information:address:title_and_genre
information
address = external_dataset:shop_information
);
# Capture the output
my $output = parse_using_profile( $example_data, $profile );
# Print using Data::Printer
p $output;
Output
\ {
title_and_author [
[0] {
address "Regents Street",
author "GAMBARDELLA, MATTHEW",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
address "Regents Street",
author "RALLS, KIM",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
address "Oxford Street",
author "CORETS, EVA",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
address "Oxford Street",
author "CORETS, EVA",
id "bk104",
number 2,
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
address "Regents Street",
genre "Computer",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
address "Regents Street",
genre "Fantasy",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
address "Oxford Street",
genre "Fantasy",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
address "Oxford Street",
genre "Fantasy",
id "bk104",
number 2,
title "Oberon's Legacy"
}
]
}
The use of Data::Printer vs Data::Dumper within the examples
I'm a long time advocate of Data::Dumper. Data::Printer is also an excellent module. In the examples, for clarity purposes Data::Printer was chosen over Data::Dumper owing to the display differences that result from the internal use of Data::Alias.
As an example, here is the output from Example 4 depicted through Data::Dumper and Data::Printer.
It's important to understand the internal structure of the datasets if you plan on making changes to the returned information.
Using Data::Printer
\ {
title_and_author [
[0] {
author "Gambardella, Matthew",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
author "Ralls, Kim",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
author "Corets, Eva",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
author "Corets, Eva",
id "bk104",
number 2,
title "Oberon's Legacy"
}
],
title_and_genre [
[0] {
genre "Computer",
id "bk101",
number 1,
title "XML Developer's Guide"
},
[1] {
genre "Fantasy",
id "bk102",
number 1,
title "Midnight Rain"
},
[2] {
genre "Fantasy",
id "bk103",
number 2,
title "Maeve Ascendant"
},
[3] {
genre "Fantasy",
id "bk104",
number 2,
title "Oberon's Legacy"
}
]
}
Using Data::Dumper
$VAR1 = \{
'title_and_genre' => [
{
'number' => '1',
'title' => 'XML Developer\'s Guide',
'id' => 'bk101',
'genre' => 'Computer'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[0]->{'number'}},
'title' => 'Midnight Rain',
'id' => 'bk102',
'genre' => 'Fantasy'
},
{
'number' => '2',
'title' => 'Maeve Ascendant',
'id' => 'bk103',
'genre' => 'Fantasy'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[2]->{'number'}},
'title' => 'Oberon\'s Legacy',
'id' => 'bk104',
'genre' => 'Fantasy'
},
{
'number' => '1',
'title' => 'XML Developer\'s Guide',
'id' => 'bk101',
'genre' => 'Computer'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[4]->{'number'}},
'title' => 'Midnight Rain',
'id' => 'bk102',
'genre' => 'Fantasy'
},
{
'number' => '2',
'title' => 'Maeve Ascendant',
'id' => 'bk103',
'genre' => 'Fantasy'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[6]->{'number'}},
'title' => 'Oberon\'s Legacy',
'id' => 'bk104',
'genre' => 'Fantasy'
}
],
'title_and_author' => [
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[0]->{'number'}},
'title' => 'XML Developer\'s Guide',
'author' => 'Gambardella, Matthew',
'id' => 'bk101'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[0]->{'number'}},
'title' => 'Midnight Rain',
'author' => 'Ralls, Kim',
'id' => 'bk102'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[2]->{'number'}},
'title' => 'Maeve Ascendant',
'author' => 'Corets, Eva',
'id' => 'bk103'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[2]->{'number'}},
'title' => 'Oberon\'s Legacy',
'author' => 'Corets, Eva',
'id' => 'bk104'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[4]->{'number'}},
'title' => 'XML Developer\'s Guide',
'author' => 'Gambardella, Matthew',
'id' => 'bk101'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[4]->{'number'}},
'title' => 'Midnight Rain',
'author' => 'Ralls, Kim',
'id' => 'bk102'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[6]->{'number'}},
'title' => 'Maeve Ascendant',
'author' => 'Corets, Eva',
'id' => 'bk103'
},
{
'number' => ${\${$VAR1}->{'title_and_genre'}->[6]->{'number'}},
'title' => 'Oberon\'s Legacy',
'author' => 'Corets, Eva',
'id' => 'bk104'
}
]
};
SEE ALSO
Standing on the shoulders of giants, this module leverages the excellent XML::LibXML::Reader which itself is built upon the powerful libxml2 library. XML::LibXML::Reader uses an iterator approach to parsing XML documents, resulting in an approach that is easier to program than an event based parser (SAX) and much more lightweight than a tree based parser (DOM) which loads the complete tree into memory.
This was a particular consideration in the choice of scaffolding chosen for this module.
Data::Alias is utilised internally for lookback operations. The module allows you to apply "aliasing semantics" to a section of code, causing aliases to be made wherever Perl would normally make copies instead. You can use this to improve efficiency and readability, when compared to using references.
THANKS
Thanks to the following for support, advice and feedback -
AUTHOR
James Spurin <james@spurin.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by James Spurin.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.