NAME

XML::DifferenceMarkup

SYNOPSIS

use XML::DifferenceMarkup qw(make_diff);

$parser = XML::LibXML->new();
$parser->keep_blanks(0);
$d1 = $parser->parse_file($fname1);
$d2 = $parser->parse_file($fname2);

$dom = make_diff($d1, $d2);
print $dom->toString(1);

REQUIRES

XML::LibXML, Algorithm::Diff

DESCRIPTION

This module implements an XML diff producing XML output. Both input and output of make_diff (the only function exported by the module) are DOM documents, as implemented by XML::LibXML. The output format is meant to be human-readable (i.e. simple, as opposed to short) - basically the diff is a subset of the input trees, annotated with instruction element nodes specifying how to convert the source tree to the target by inserting and deleting nodes. To prevent name colisions with input trees, all added elements are in a namespace http://www.locus.cz/XML/DifferenceMarkup (the diff will fail on input trees which already use that namespace).

The top-level node of the diff is always <diff/> (or rather <dm:diff xmlns:dm="http://www.locus.cz/XML/DifferenceMarkup"> ... </dm:diff> - this description elides the namespace specification from now on); under it are fragments of the input trees and instruction nodes: <insert/>, <delete/> and <copy/>. <copy/> is used in places where the input subtrees are the same - in the limit, the diff of 2 identical documents is

<?xml version="1.0"?>
<dm:diff xmlns:dm="http://www.locus.cz/XML/DifferenceMarkup">
  <dm:copy count="1"/>
</dm:diff>

(copy always has the count attribute and nothing else). <insert/> and <delete/> have the obvious meaning - in the limit a diff of 2 documents which have nothing in common is something like

<?xml version="1.0"?>
<dm:diff xmlns:dm="http://www.locus.cz/XML/DifferenceMarkup">
  <dm:delete>
    <old/>
  </dm:delete>
  <dm:insert>
    <new>
      <tree>with the whole subtree, of course</tree>
    </new>
  </dm:insert>
</dm:diff>

Note that <delete/> contains just one level of nested nodes - their subtrees are not included in the diff (but the element nodes which are included always come with all their attributes).

Instruction nodes are never nested; all nodes above an instruction node (except the top-level <diff/>) come from the input trees. A node from the input tree is included in the output diff to provide context for instruction nodes when all of the following is true:

it's an element node
it has the same name in both input trees
it has the same attributes (both names and values)
its subtree is not the same

The last condition guarantees that the "contextual" nodes always contain at least one instruction node.

BUGS

the diff does not handle changes in attribute ordering
the diff format has no merge
information outside the document element is not processed

AUTHOR

Vaclav Barta <vbar@comp.cz>

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)