NAME
XML::Diff -- XML DOM-Tree based Diff & Patch Module
SYNOPSIS
my $diff = XML::Diff->new();
# to generate a diffgram of two XML files, use compare.
# $old and $new can be filepaths, XML as a string,
# XML::LibXML::Document or XML::LibXML::Element objects.
# The diffgram is a XML::LibXML::Document by default.
my $diffgram = $diff->compare(
-old => $old_xml,
-new => $new_xml,
);
# To patch an XML document, an patch. $old and $diffgram
# follow the same formatting rules as compare.
# The resulting XML is a XML::LibXML::Document by default.
my $patched = $diff->patch(
-old => $old,
-diffgram => $diffgram,
);
DESCRIPTION
This module provides methods for generating and applying an XML diffgram of two related XML files. The basis of the algorithm is tree-wise comparison using the DOM model as provided by XML::LibXML.
The Diffgram is well-formed XML in the XVCS namespance and supports update, insert, delete and move operations. It is meant to be human and machine readable. It uses XPath expressions for locating the nodes to operate on. See the below DIFFGRAM section for the exact syntax.
The motivation and alogrithm used by this module is discussed in MOTIVATION below.
PUBLIC METHODS
new (Constructor)
The Constructor takes no arguments. It merely creates the object for using the compare and patch methods on.
compare
Compares two XML DOM trees and returns a diffgram for converting one into the other. The default output method is a XML::LibXML::Document object. However there are number of switches to alter this behavior.
- -old
-
The old document to compare. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object
- -new
-
The new document to compare. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object
- -asString
-
If provided, the diffgram is returned via the toString(1) method of XML::LibXML
- -asFile
-
Must provide the filepath to write the diffgram to.
patch
Applies a diffgram to an XML document to generate a new XML document. The default output method is a XML::LibXML::Document object. However there are number of switches to alter this behavior.
- -old
-
The old document to compare. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object
- -diffgram
-
The diffgram to apply. Can be XML in a string, path to an XML document, a XML::LibXML::Document or XML::LibXML::Element object
- -asString
-
If provided, the new document is returned via the toString(1) method of XML::LibXML
- -asFile
-
Must provide the filepath to write the new document to.
DIFFGRAM
The diffgram is an XML document in the xvcs namespace. It's root is always e<xvcs:diffgram xmlns:xvcs="http://www.xvcs.org/">. Below diff operations are attached in order of application. Order is significant, since the way that nodes are idenitified in the default version of the diffgram is by an XPath expression, i.e. the diffgram may change the XML document in such a way that XPath expressions are either not yet valid or will not be anymore at a later point the diffgram (see KNOWN PROBLEMS for a discussion of this limitation).
The supported diffgram operations are:
xcvs:update
Update operations covers a number of sub-operations, i.e. it can be used for Text node changes, attribute add, delete and modification. An example of a Text Node change is:
<xvcs:update id="18" first-child-of="/root/block[2]/list/item[2]">
<xvcs:old-value>Old Value</xvcs:old-value>
<xvcs:new-value>New Value</xvcs:new-value>
</xvcs:update>
Attribute updates are:
<xvcs:update id="31" first-child-of="/root/block[5]">
<xvcs:attr-insert name="some_attribute" value="new value"/>
</xvcs:update>
<xvcs:update id="32" first-child-of="/root/block[6]">
<xvcs:attr-insert name="some_attribute2" value="old value"/>
</xvcs:update>
<xvcs:update id="33" first-child-of="/root/block[6]">
<xvcs:attr-update name="some_attribute3"
old-value="old value" new-value="new value/>
</xvcs:update>
xcvs:delete
<xvcs:delete id="29" follows="/root/block[3]">
<block>
<node>value</node>
</block>
</xvcs:delete>
xcvs:move
<xvcs:move id="11" follows="/root/block[1]">
<xvcs:source first-child-of="/root"/>
</xvcs:move>
xcvs:insert
<xvcs:insert id="34" follows="/root/block[1]">
<block>
<node>value</node>
</block>
</xvcs:insert>
All operations share the same attributes to identify the operation
- id
-
The xvcs:id of the node affected (currently serves only internal uses)
- follows
-
The XPath to the prior sibling of the node affected. We use relative identification since insert and move destination do not affect an existing node location. The rest of the operations follow this methodology for consistency and to allow simple reversing of an operation
- first-child-of
-
If the XPath for the node does not have a prior sibling, we use the XPath to the parent and note that our operation affects the first child of that parent
- text
-
Since XPath does not have an expression for locating a text node, Nodes following Text nodes are identified by the XPath to the prior sibling that is an Element and the text attribute to tell it to skip the next text node before starting the operation
KNOWN PROBLEMS
Does not handle any Node Types Other than Element, Attribute and Text
Diffgram operations are not guaranteed to be atomic
Delete Operations on Nodes between two Text nodes are not reversable
MOTIVATIONS
The Algorithm used in this Module is loosely based on the one described by Gregory Cobena in his Doctoral Dissertation on XyDiff. The decision to create a new implementation of this Algorithm rather than just create an XS interface to the existing XyDiff algorithm was based on wanting a perl implementation with less external dependencies and greater flexibility to add divergent features (such as using XPath for node identitication rather than XIDs).
PRIVATE METHODS
This section is mostly for reference if you are going through the code, it serves no purpose if you are just wanting to use the exposed interface
_getDoc
_buildTree
_weightmatch
_propagateMatch
_matchParents
_markChanges
_registerChange
_processChange
_local_move
_setDiff
_attachInstructions
_applyAction
_applyInsert
_insertRegister
_applyUpdate
_applyDelete
_applyMove
_applyMoveUnbind
_applyMoveBind
_debug
AUTHOR
Arne Claassen <sdether@cpan.org>
MAINTAINER
Tim Meadowcroft <timm@cpan.org>
VERSION
0.05
COPYRIGHT
2004, 2007 Arne F. Claassen, All rights reserved.