The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

XML::Table2XML - Generic conversion of tabular data to XML by reverting Excel's flattener methodology.

SYNOPSIS

        use XML::Table2XML;
        my $outXML = "";
        # first parse column path headers for attribute names, id columns and special common sibling mark ("//")
        parseHeaderForXML("rootNodeName", ['/@id','/@name2','/a']);
        # then walk through the whole data to build the actual XML string into $outXML
        my @datarows = ([1,"testName","testA"],
                                        [1,"testName","testB"],
                                        [1,"testName","testC"]);
        for my $lineData (@datarows) {
                $outXML.=addXMLLine($lineData);
        }
        #finally finish the XML and reset the static vars
        $outXML.=addXMLLine(undef);
        print $outXML;
        # yields:
        # <?xml version="1.0"?>
        # <rootNodeName id="1" name2="testName"><a>testA</a><a>testB</a><a>testC</a></rootNodeName>

DESCRIPTION

table2xml is an algorithm having two functions that allow the conversion of tabular data to XML without using XSLT. This is achieved by reverting the "Flattener" methodology used by Microsoft Excel to convert the XML tree format to a two-dimensional table (see Opening XML Files in Excel and INFO: Microsoft Excel 2002 and XML).

This reversion is achieved by:

1. (possibly) modifying the flattened table a bit to enable a simpler processing of the data,

2. sequentially processing the data column- and row wise.

The whole algorithm is done without the aid of any XML library, so it lends itself to easy translation into other environments and languages.

For producing the XML, first, we need to invoke parseHeaderForXML, using a (the first?) line with the rootnode and path information. After parsing the header info, the table data can be processed row by row by calling addXMLLine. The current data row is provided in the single argument lineData, and the built XML string returned and can be concatenated.. A final call to addXMLLine restores the static variables and finalizes the XML string (closes any open tags).

parseHeaderForXML ($rootNodeName,\@header,$LINEBREAKS,$XMLDIRECTIVE,$ENCODING)

rootNodeName is the name of the common root node. Any /@rootAttributes and /#text will be placed under respectively after this root node.

header is a list of paths denoting the "place" of the data in the targeted XML. Following special cases are allowed:

Plain elements

are denoted by /node/subnode/subsubnode/etc.../elementName

Attributes

are denoted by /node/subnode/subsubnode/etc.../@attributeName

"ID" nodes

are denoted by /node/subnode/subsubnode/etc.../#id (they are not being ouptut)

special common sibling nodes

are denoted by a leading double slash (//) special common sibling nodes are used for nested common sibling nodes (e.g., <root><a><b>test</b></a><otherData>...<root> or <root><a><b>test1</b><z>test2</z></a><otherData>...<root> ) must be located at the beginning of the last node within the nested sibling.

a root text element

is denoted by /#text

root attributes

are given as /@rootNodeAttribute

$LINEBREAKS specifies whether '\n' should be added for easier readablity, default is no linebreaks

$XMLDIRECTIVE specifies any header being inserted before the root element, default is '<?xml version="1.0"?'>.

$ENCODING denotes the Unicode Codification used to encode the string(s) returned by addXMLLine, default is 'iso-8859-1'

$returnedXML = addXMLLine(\@lineData)

lineData is a list of data elements that are converted to XML following the parsed header information.

The produced XML is returned as a function value which can be concatenated or written to a file...

Prerequisites for column order and data layout

  • move the common root (or the common subnode) siblings leftmost of the root (or resp. Subnode)

 <?xml version="1.0"?>
 <root>
 <x z="testAttX">testX</x>
 <a><b><c>TestA1</c>
 <c>TestA2</c>
 <c>TestA3</c>
 <c>TestA4</c></b></a>
 </root>
 
 /root                                          /root
 /a/b/c /x      /x/@z                           /x      /x/@z           /a/b/c
 TestA1 testX   testAttX                        testX   testAttX        TestA1
 TestA2 testX   testAttX        modify to->     testX   testAttX        TestA2
 TestA3 testX   testAttX                        testX   testAttX        TestA3
 TestA4 testX   testAttX                        testX   testAttX        TestA4
 
 <?xml version="1.0"?>
 <root>
 <a><z x="TestB"></z><b><c>TestA1</c>
 <c>TestA2</c>
 <c>TestA3</c>
 <c>TestA4</c></b></a>
 </root>
 
 /root                          /root
 /a/b/c /a/z/@x                 /a/z/@x /a/b/c
 TestA1 TestB                   TestB   TestA1
 TestA2 TestB   modify to->     TestB   TestA2
 TestA3 TestB                   TestB   TestA3
 TestA4 TestB                   TestB   TestA4
  • For nested common sibling nodes (e.g., <root><a><b>test</b></a><otherData>...<root> or <root><a><b>test1</b><c>test2</c></a><otherData>...<root>), write a double slash at the beginning of the last node within the nested sibling. Example (also includes column moving as in the examples above):

 <?xml version="1.0"?>
 <root>
 <a n=""CW""><l c=""oalp""><p v=""A1""></p></l>
 <f c=""oalvl""><p v=""W""></p></f>
 <p n=""target""></p></a>
 <a n=""CD""><l c=""oalp""><p v=""A1""></p></l>
 <f c=""oalvl""><p v=""D""></p></f></a>
 <r><pr v=""TEST""></pr>
 <ar r=""test2""></ar>
 <ar r=""test4""></ar></r>
 </root>
 
 /root
 /a/@n  /a/f/@c /a/f/p/@v       /a/l/@c /a/l/p/@v       /a/p/@n /r/ar/@r        /r/pr/@v
 CW     oalvl   W               oalp    A1              target          
 CD     oalvl   D               oalp    A1                      
                                                                test2           TEST
                                                                test4           TEST
 modify to -->
 
 /root
 /a/@n  /a/l/@c //a/l/p/@v      /a/f/@c //a/f/p/@v      /a/p/@n /r/pr/@v        /r/ar/@r
 CW     oalp    A1              oalvl   W               target          
 CD     oalp    A1              oalvl   D                       
                                                                TEST            test2
                                                                TEST            test4
  • For a first column of a subnode list that is not being a "primary key" column (i.e., having empty cells or continuous equal values), introduce an artificial #id column. Examples:

 <?xml version="1.0"?>
 <root>
 <a x="test1">testA</a>
 <a x="test2"></a>
 </root>
 
 /root                          /root                           
 /a     /a/@x   modify to->     /a/#id  /a      /a/@x           
 testA  test1                   1       testA   test1           
        test2                   2               test2           
 
 <?xml version="1.0"?>
 <root>
 <co><f><a>Numeric</a></f></co>
 <co><f><a>VarChar</a></f></co>
 <co><f><a>VarChar</a></f></co>
 <co><f><a>VarChar</a></f></co>
 <co><f><a>VarChar</a></f></co>
 <co><f><a>DBTimeStamp</a></f></co>
 <co><f><a>VarChar</a><fk>JOB_ID</fk><fl>JOB_TITLE</fl></f></co>
 <co><f><a>Numeric</a><fk>TESTID</fk><fl>TESTn</fl></f></co>
 <co><f><a>Numeric</a></f></co>
 <co><f><a>Numeric</a><fk>EMPLOYEE_ID</fk><fl>FIRST_n</fl></f></co>
 <co><f><a>Numeric</a><fk>DEPARTMENT_ID</fk><fl>DEPARTMENT_n</fl></f></co>
 </root>
 

/root modify to-> /root /co/f/a /co/f/fk /co/f/fl /co/#id /co/f/a /co/f/fk /co/f/fl Numeric 1 Numeric VarChar 2 VarChar VarChar 3 VarChar VarChar 4 VarChar VarChar 5 VarChar DBTimeStamp 6 DBTimeStamp VarChar JOB_ID JOB_TITLE 7 VarChar JOB_ID JOB_TITLE Numeric TESTID TESTn 8 Numeric TESTID TESTn Numeric 9 Numeric Numeric EMPLOYEE_ID FIRST_n 10 Numeric EMPLOYEE_ID FIRST_n Numeric DEPARTMENT_ID DEPARTMENT_n 11 Numeric DEPARTMENT_ID DEPARTMENT_n

AUTHOR

Roland Kapl, roland@kapl.org

COPYRIGHT AND LICENSE

Copyright (C) 2006 by Roland Kapl

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.