NAME

SOAP::WSDL::Parser - How SOAP::WSDL parses XML messages

Which XML message does SOAP::WSDL parse ?

Naturally, there are two kinds of XMLdocuments (or messages) SOAP::WSDL has to parse:

  • WSDL definitions

  • SOAP messages

Parser implementations

There are different parser implementations available for SOAP messages - currently there's only one for WSDL definitions.

WSDL definitions parser

  • SOAP::WSDL::SAX::WSDLHandler

    This is a SAX handler for parsing WSDL files into object trees SOAP::WSDL works with.

    It's built as a native handler for XML::LibXML, but will also work with XML::SAX::ParserFactory.

    To parse a WSDL file, use one of the following variants:

    my $parser = XML::LibXML->new();
    my $handler = SOAP::WSDL::SAX::WSDLHandler->new();
    $parser->set_handler( $handler );
    $parser->parse( $xml );
    my $data = $handler->get_data();
    
    
    my $handler = SOAP::WSDL::SAX::WSDLHandler->new({
           base => 'XML::SAX::Base'
    });
    my $parser = XML::SAX::ParserFactor->parser(
       Handler => $handler
    );
    $parser->parse( $xml );
    my $data = $handler->get_data();

SOAP messages parser

All SOAP message handler use class resolvers for finding out which class a particular XML element should be of and type libs containing these classes.

Writing a class resolver

The class resolver must returned a method "get_class", which is passed a list ref of the current element's XPath (relative to Body), split by /.

This method must return a class name appropriate for a XML element.

A class resolver package might look like this:

package FakeResolver;

my %class_list = (
   'EnqueueMessage' => 'Typelib::TEnqueueMessage',
   'EnqueueMessage/MMessage' => 'Typelib::TMessage',
   'EnqueueMessage/MMessage/MRecipientURI' => 'SOAP::WSDL::XSD::Builtin::anyURI',
   'EnqueueMessage/MMessage/MMessageContent' => 'SOAP::WSDL::XSD::Builtin::string',
);

sub new { return bless {}, 'FakeResolver' };

sub get_class {
   my $name = join('/', @{ $_[1] });
   return ($class_list{ $name }) ? $class_list{ $name }
       : warn "no class found for $name";
};
1;

Writing type library classes

Every element must have a correspondent one in the type library.

Type library classes must provide the following methods:

Builtin types should be resolved as SOAP::WSDL::XSD::Builtin::* classes

  • new

    Constructor

  • add_FOO

    The add_FOO method is called for every child element of the XML node.

    Characters are regarded as child element of the last XML node.

A tyelib class implemented as Inside-Out object using Class::Std::Storable as base class would look like this:

package Typelib::TEnqueueMessage;
use strict;
use Class::Std::Storable;

my %MMessage_of :ATTR(:name<MMessage> :default<()>);

sub add_MMessage {
       my ($self, $value) = @_;
       my $ident = ident $self;

       # we're the first value
       return $MMessage_of{ $ident } = $value
            if not defined $MMessage_of{ $ident };

       # we're the second value
       return $MMessage_of{ $ident } = [
            $MMessage_of{ $ident }, $value ]
                if not ref $MMessage_of{ $ident } eq 'ARRAY';

       # we're third or later
       push @{ $MMessage_of{ $ident } }, $value;
       return $MMessage_of{ $ident };
   }
}
1;

Of course one could use a method factory for these add_FOO methods - see t/lib/Typelib/Base.pm for an example.

Parser implementations

  • SOAP::WSDL::SAX::MessageHandler

    This is a SAX handler for parsing WSDL files into object trees SOAP::WSDL works with.

    It's built as a native handler for XML::LibXML, but will also work with XML::SAX::ParserFactory.

    Can be used for parsing both streams (chunks) and documents.

    See SOAP::WSDL::SAX::MessageHandler for details.

  • SOAP::WSDL::Expat::MessageParser

    A XML::Parser::Expat based parser. This is the fastest parser for most SOAP messages and the default for SOAP::WSDL::Client.

  • SOAP::WSDL::Expat::MessageStreamParser

    A XML::Parser::ExpatNB based parser. Useful for parsing huge HTTP responses, as you don't need to keep everything in memory.

    See SOAP::WSDL::Expat::MessageStreamParser for details.

Performance

SOAP::WSDL::Expat::MessageParser is the fastest way of parsing SOAP messages into object trees and only slightly slower than converting them into hash data structures:

Parsing a SOAP message with a length of 5962 bytes:
SOAP::WSDL::Expat::MessageParser:
   3 wallclock secs ( 3.28 usr +  0.05 sys =  3.33 CPU) @ 60.08/s (n=200)
 
SOAP::WSDL::SAX::MessageHandler (with raw XML::LibXML):   
  5 wallclock secs ( 4.95 usr +  0.00 sys =  4.95 CPU) @ 40.38/s (n=200)

XML::Simple (XML::Parser):
  3 wallclock secs ( 2.36 usr +  0.03 sys =  2.39 CPU) @ 83.65/s (n=200)

XML::Simple (XML::SAX::Expat):
  7 wallclock secs ( 6.50 usr +  0.03 sys =  6.53 CPU) @ 30.62/s (n=200)

As the benchmark shows, all SOAP::WSDL parser variants are faster than XML::Simple with XML::SAX::Expat, and SOAP::WSDL::Expat::MessageParser almost reaches the performance of XML::Simple with XML::Parser as backend.

Parsing SOAP responses in chunks does not increase speed - at least not up to a response size of around 500k:

Benchmark: timing 5 iterations of SOAP::WSDL::SAX::MessageHandler, 
  SOAP::WSDL::Expat::MessageParser, SOAP::WSDL::Expat::MessageStreamParser...

SOAP::WSDL::Expat::MessageStreamParser: 
13 wallclock secs ( 7.39 usr +  0.09 sys =  7.48 CPU) @  0.67/s (n=5)

SOAP::WSDL::Expat::MessageParser: 
10 wallclock secs ( 5.81 usr +  0.06 sys =  5.88 CPU) @  0.85/s (n=5)

SOAP::WSDL::SAX::MessageHandler: 
14 wallclock secs ( 8.78 usr +  0.03 sys =  8.81 CPU) @  0.57/s (n=5)

Response size: 344330 bytes