NAME
PApp::XML - pxml sections and more
SYNOPSIS
use PApp::XML;
DESCRIPTION
Apart from providing XML convinience functions, the PApp::XML module manages XML templates containing pappxml directives and perl code similar to phtml sections. Together with stylesheets (PApp::XSLT) this can be used to almost totally seperate content from layout. Image a database containing XML documents with customized tags. A stylesheet can then be used to transform this XML document into html + special pappxml directives that can be used to create links etc...
Functions for XML-Generation
- xml_quote $string
-
Quotes (and returns) the given string so that it's contents won't be interpreted by an XML parser (quotes ', ", <, & and > to avoid ]]>). Example:
print xml_quote q( <xx> & <[[]]> ); => <xx> & <[[]]>
- xml_cdata $string
-
Does the same thing as
xml_quote
, but using CDATA constructs, rather than quoting individual characters. Example:print xml_cdata q(hi ]]> there); => <![CDATA[hi ]]]]><![CDATA[> there ]]>
- xml_unquote $string
-
Unquotes (and returns) an XML string (by resolving it's entities and CDATA sections). Currently, only the named predefined xml entities and numerical character entities are resolved. Everything else is silently ignored. Example:
print xml_unquote q( <![CDATA[text1]]> & text2! ); => text1 & text2!
- xml_attr $attr => $value [, $attr2 => $value2, ...]
-
Returns fully quoted $attr => $value pairs. Example:
print xml_attr authors => q(Alan Cox & Linus "kubys" Torvalds); => authors="Alan Cox & Linus "kubys" Torvalds"
- xml_tag $element_name, [$attr => $value, ...] [, $content_or_undef]
-
Generates a tag from the given element name, content and attribute name => value pairs. If content is undef, an empty tag will be generated. Example:
print xml_tag "p", align => "center" => <p align="center"/>
As a very special courtesy hack for you, if you omit the content argument entirely, only an opening tag will be generated.
Functions for Analyzing XML
- ($msg, $line, $col, $byte) = xml_check $string [, $prolog, $epilog]
-
Checks wether the given document is well-formed (as opposed to valid). This merely tries to parse the string as an xml-document. Nothing is returned if the document is well-formed.
Otherwise it returns the error message, line (one-based), column (zero-based) and character-position (zero-based) of the point the error occured.
The optional argument
$prolog
is prepended to the string, while$epilog
is appended (i.e. the document is "$prolog$string$epilog"). The cool thing is that the epilog/prolog strings are not counted in the error position (and yes, they should be free of any errors!).(Hint: Remember to utf8_upgrade before calling this function or make sure that an encoding is given in the xml declaration).
- xml_errorparser $xml, [$offset, $message]
-
This function takes a slightly damaged XML document or fragment and tries to repair it. During this process it annotates many errors with error messages in <error>-elements. It also offers the option of adding a custom error message around the specified offste in the file.
This function currently works best with HTML or HTML-like input, and tries very hard not to place error messages at places where they won't be visible.
The result should be parseable by XML parsers, but be warned that not every case will be fixed.
- xml_encoding xml-string [DEPRECATED]
-
Convinience function to detect the encoding used by the given xml string. It uses a variety of heuristics (mainly as given in appendix F of the XML specification). UCS4 and UTF-16 are ignored, mainly because I don't want to get into the byte-swapping business (maybe write an interface module for gconv?). The XML declaration itself is being ignored.
Functions for Modifying XML
- ($version, $encoding, $standalone) = xml_remove_decl $xml[, $encoding]
-
Remove the xml header, if any, from the given string and return the info. If the declaration is missing,
("1.0", $encoding || xml_encoding(), "yes")
is returned. - ($version, $encoding, $standalone) = xml2utf8 xml-string[, encoding]
-
Tries to convert the given string into utf8 (inplace). Currently only supports UTF-8 and ISO-8859-1, but could be extended easily to handle everything Expat can. Uses
xml_encoding
to autodetect the encoding unless an explicit encoding argument is given.It returns the xml declaration parameters (where encoding is always utf-8). The xml declaration itself will be removed from the string.
- expand_pi $xml, { pi => coderef, pi2 => coderef... }
-
Takes an xml string and expands all processing instructions given in the second argument by calling the respective coderef. The resulting string is returned.
The (single) argument to the coderef is the (unquoted) argument.
This function uses a regex (without backtracking in the common case) and should be fast.
For example, to execute sql commands using
sql
processing instructions, use something like this:Test xml string: <?sql select id from table where mtime = 7?> $expanded = expand_pi $xml, { sql => sub { xml_quote join "", sql_ufetch $_[0]; }, };
- xml_include $document, $base [, $uri_handler($uri, $base) ]
-
Expand any xinclude:include elements in the given
$document
by handing the href attribute and the current base URI to the$uri_handler
with this URI (-object). The$uri_handler
should fetch the document and return it (orundef
on error).Example (see http://www.w3.org/TR/xinclude/ for the definition of xinclude):
<document xmlns:xinclude="http://www.w3.org/2001/XInclude"> <xinclude:include href="http://some.host/otherdoc.xml"/> <xinclude:include href="/etc/passwd" parse="text"/> </document>
The result of running xml_include on this document will have the first include element replaced by the document element (and it's contents) of
http://some.host/otherdoc.xml
and the second include element replaced by a (correctly quoted) copy of your/etc/passwd
file.Another common example is embedding stylesheet fragments into larger stylesheets. Using xinclude for these cases is faster than xsl's include/import machanism since xinclude expansion can be done after file loading while, while xsl's include mechanism is evaluated on every parse.
<include xmlns="http://www.w3.org/2001/XInclude" href="style/xtable.xsl" parse="verbatim"/>
At the moment this function always returns utf-8 documents, regardless of the input encoding used (included text is inserted as is, any converson must be done in the uri handler).
This function does not conform to
http://www.w3.org/TR/xmlbase/
.In addition to
parse="xml"
andparse="text"
, this function also supportsparse="verbatim"
(insert text verbatim, i.e. like xslt'sdisable-output-escaping="yes"
) andparse="pxml"
(parse xml file as pxml). The typesxml-fragment
andpxml-fragment
are also under consideration. - pod2xml $pod
-
Converts a POD string (which can be either a fragment or a whole document)
The PApp::XML Factory Class
- new PApp::XML parameter => value...
-
Creates a new PApp::XML template object with the specified behaviour. It can be used as an object factory to create new
PApp::XML::Template
objects.special a hashref containing special => coderef pairs. If a special is encountered, the given coderef will be compiled in instead (i.e. it will be called each time the fragment is print'ed). The coderef will be called with a reference to the attribute hash, the element's contents (as a string) and the PApp::XML::Template object used to print the string. If a reference to a coderef is given (e.g. C<\sub {}>), the coderef will be called during parsing and the resulting string will be added to the compiled subroutine. The arguments are the same, except that the contents are not given as string but as a magic token that must be inserted into the return value. The return value is expected to be in "phtml" (L<PApp::Parser>) format, the magic "contents" token must not occur in code sections. html html output mode enable flag
At the moment there is one predefined special named
slink
, that maps almost directly into a call to slink (a leading underscore in an attribute name gets changed into a minus (-
) to allow for one-shot arguments), e.g:<papp:special _special="slink" module="kill" name="Bill" _doit="1"> Do it to Bill! </papp:special>
might get changed to (note that
module
is treated specially):slink "Do it to Bill!", "kill", -doit => 1, name => "Bill";
In a XSLT stylesheet one could define:
<xsl:template match="link"> <papp:special _special="slink"> <xsl:for-each select="@*"> <xsl:copy/> </xsl:for-each> <xsl:apply-templates/> </papp:special> </xsl:template>
Which defines a
link
element that can be used like this:<link module="kill" name="bill" _doit="1">Kill Bill!</link>
- $pappxml->dom2template($dom, {special}, key => value...)
-
Compile the given DOM into a
PApp::XML::Template
object and returns it. An additional set of specials only used to parse this dom can be passed as a hashref (this argument is optional). Additional key => value pairs will be added to the template's attribute hash. The template will be evaluated in the caller's package (e.g. to get access to __ and similar functions).On error, nothing is returned. Use the
error
method to get more information about the problem.In addition to the syntax accepted by
PApp::PCode::pxml2pcode
, this function evaluates certain XML Elements (please note that I consider the "papp" namespace to be reserved):papp:special _special="special-name" attributes... Evaluate the special with the name given by the attribute C<_special> after evaluating its content. The special will receive two arguments: a hashref with all additional attributes and a string representing an already evaluated code fragment. papp:unquote Expands ("unquotes") some (but not all) entities, namely lt, gt, amp, quot, apos. This can be easily used within a stylesheet to create verbatim html or perl sections, e.g. <papp:unquote><![CDATA[ <: echo "hallo" :> ]]></papp:unquote> A XSLT stylesheet that converts <phtml> sections just like in papp files might look like this: <xsl:template match="phtml"> <papp:unquote> <xsl:apply-templates/> </papp:unquote> </xsl:template>
- $err = $pappxml->error
-
Return information about an error as an
PApp::Exception
object (PApp::Exception). - $template->localvar([content]) [WIZARDRY]
-
Create a local variable that can be used inside specials and return a string representation of it (i.e. a magic token that represents the lvalue of the variable when compiled). Can only be called during compilation.
- $template->gen_surl(<surl-arguments>) [WIZARDY]
-
Returns a string representing a perl statement returning the surl.
- $template->gen_slink(<surl-arguments>) [WIZARDY]
-
Returns a string representing a perl statement returning the slink.
- $template->attr(key, [newvalue])
-
Return the attribute value for the given key. If
newvalue
is given, replaces the attribute and returns the previous value. - $template->print
-
Print (and execute any required specials). You can capture the output using the
PApp::capture
function.
Wizard Example
In this section I'll try to sketch out a "wizard example" that shows how PApp::XML
could be used in the real world.
Consider an application that fetches most or all content (even layout) from a database and uses a stylesheet to map xml content to html, which allows for almost total seperation of layout and content. It would have an init section loading a XSLT stylesheet and defining a content factory:
use XML::XSLT; # ugly module, but it works great!
use PApp::XML;
# create the parser
my $xsl = "$PApp::Config{LIBDIR}/stylesheet.xsl";
$xslt_parser = XML::XSLT->new($xsl, "FILE");
# create a content factory
$tt_content_factory = new PApp::XML
html => 1, # we want html output
special => {
include => sub {
my ($attr, $content) = @_;
get_content($attr->{name})->print;
},
};
# create a cache (XSLT is quite slow)
use Tie::Cache;
tie %content_cache, Tie::Cache::, { MaxCount => 30, WriteSync => 0};
Here we define an include
special that inserts another document inplace. How does get_content
(see the definition of include
) look like?
<macro name="get_content" args="$name $special"><phtml><![CDATA[<:
my $cache = $content_cache{"$lang\0$name"};
unless ($cache) {
$cache = $content_cache{"$lang\0$name"} = [
undef,
0,
];
}
if ($cache->[1] < time) {
$cache->[0] = fetch_content $name, $special;
$cache->[1] = time + 10;
}
$cache->[0];
:>]]></phtml></macro>
get_content
is nothing more but a wrapper around fetch_content
. It's sole purpose is to cache documents since parsing and transforming a xml file is quite slow (please note that I include the current language when caching documents since, of course, the documents get translated). In non-speed-critical applications you could just substitute fetch_content
for get_content
:
<macro name="fetch_content" args="$name $special"><phtml><![CDATA[<:
sql_fetch \my($id, $_name, $ctime, $body),
"select id, name, unix_timestamp(ctime), body from content where name = ?",
$name;
unless ($id) {
($id, $_name, $ctime, $body) =
(undef, undef, undef, "");
}
parse_content (gettext$body, {
special => $special,
id => $id,
name => $name,
ctime => $ctime,
lang => $lang,
});
:>]]></phtml></macro>
fetch_content
actually fetches the content string from the database. In this example, a content object has a name (which is used to reference it) a timestamp and a body, which is the actual document. After fetching the content object it uses parse_content
to transform the xml snippet into a perl sub that can be efficiently executed:
<macro name="parse_content" args="$body $attr"><phtml><![CDATA[<:
my $content = eval {
$xslt_parser->transform_document(
'<?xml version="1.0" encoding="iso-8859-1" standalone="no"?'.'>'.
"<ttt_fragment>".
$body.
"</ttt_fragment>",
"STRING"
);
my $dom = $xslt_parser->result_tree;
$tt_content_factory->dom2template($dom, %$attr);
};
if ($@) {
my $line = $@ =~ /mismatched tag at line (\d+), column \d+, byte \d+/ ? $1 : -1;
# create a fancy error message
}
$content || parse_content("");
:>]]></phtml></macro>
As you can see, it uses XSLT's transform_document
, which does the string -> DOM translation for us, and also transforms the XML code through the stylesheet. After that it uses dom2template
to compile the document into perl code and returns it.
An example stylesheet would look like this:
<xsl:template match="ttt_fragment">
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="p|em|h1|h2|br|tt|hr|small">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="include">
<papp:special _special="include" name="{@name}"/>
</xsl:template>
# add the earlier XSLT examples here.
This stylesheet would transform the following XML snippet:
<p>Look at
<link module="product" productid="7">our rubber-wobber-cake</link>
before it is <em>sold out</em>!
<include name="product_description_7"/>
</p>
Which would be turned into something like this:
<p>Look at
<papp:special _special="slink" module="product" productid="7">
our rubber-wobber-cake
</apppxml:special>
before it is <em>sold out</em>!
<papp:special _special="include" name="product_description_7"/>
</p>
Now go back and try to understand the above code! But wait! Consider that you had a content editor installed as the module content_editor
, as I happen to have. Now lets introduce the editable_content
macro:
<macro name="editable_content" args="$name %special"><phtml><![CDATA[<:
my $content;
:>
#if access_p "admin"
<table border=1><tr><td>
<:
sql_fetch \my($id), "select id from content where name = ?", $name;
if ($id) {
:><?sublink [current_locals], __"[Edit the content object \"$name\"]", "content_editor_edit", contentid => $id:><:
} else {
:><?sublink [current_locals], __"[Create the content object \"$name\"]", "content_editor_edit", contentname => $name:><:
}
$content = get_content($name,\%special);
$content->print;
:>
</table>
#else
<:
$content = get_content($name,\%special);
$content->print;
:>
#endif
<:
return $content;
:>]]></phtml></macro>
What does this do? Easy: If you are logged in as admin (i.e. have the "admin" access right), it displays a link that lets you edit the object directly. As normal user it just displays the content as-is. It could be used like this:
<perl><![CDATA[
header;
my $content = editable_content("homepage");
footer last_changed => $content->ctime;
]]></perl>
Disregarding header
and footer
, this would create a page fully dynamically out of a database, together with last-modified information, which could be edited on the web. Obviously this approach could be extended to any complexity.
SEE ALSO
PApp.
AUTHOR
Marc Lehmann <schmorp@schmorp.de>
http://home.schmorp.de/