NAME
Net::OAI::Record::NamespaceFilter - general filter class based on namespace URIs
SYNOPSIS
$plug = Net::OAI::Record::NamespaceFilter->new(); # Noop
$multihandler = Net::OAI::Record::NamespaceFilter->new(
'http://www.openarchives.org/OAI/2.0/oai_dc/' => 'Net::OAI::Record::OAI_DC',
'http://www.openarchives.org/OAI/2.0/provenance' => 'MySAX::ProvenanceHandler'
);
$saxfilter = new SOME_SAX_Filter;
...
$filter = Net::OAI::Record::NamespaceFilter->new(
'*' => $saxfilter, # '*' for any namespace
);
$filter = Net::OAI::Record::NamespaceFilter->new(
'*' => sub { my $x = "";
return XML::SAX::Writer->new(Output => \$x);
};
);
DESCRIPTION
It will forward any element belonging to a namespace from this list to the associated SAX filter and all of the element's children (regardless of their respective namespace) to the same one. It can be used either as a metadataHandler
or recordHandler
.
This SAX filter takes a hashref namespaces
as argument, with namespace URIs for keys ('*' for "any namespace") and the values are either
- undef
-
Matching elements and their subelements are suppressed.
If the list of namespaces ist empty or
undefined
is connected to the filter, it effectively acts as a plug to Net::OAI::Harvester. This might come handy if you are planning to get to the raw result by other means, e.g. by tapping the user agent or accessing the result's xml() method:$plug = Net::OAI::Record::NamespaceFilter->new(); $harvester = Net::OAI::Harvester->new( [ baseURL => ..., ] ); $tapped_by_ua = ""; open ($TAP, ">", \$tapped_by_ua); $harvester->userAgent()->add_handler(response_data => sub { my($response, $ua, $h, $data) = @_; print $TAP $data; }); $list = $harvester->listRecords( metadataPrefix => 'a_strange_one', recordHandler => $plug, ); print $tapped_by_ua; # complete OAI response print $list->xml(); # should be exactly the same
Comment: This is quite an efficient way of not processing the XML content of OAI records received.
- a class name of a SAX filter
-
As usual for any record element of the OAI response a new instance is created.
# end_document() of instances of MyWriter returns something meaningful... $consumer = Net::OAI::Record::NamespaceFilter->new('*'=> 'MyWriter'); $filter = Net::OAI::Record::NamespaceFilter->new( '*' => $consumer ); $list = $harvester->listAllRecords( metadataPrefix => 'oai_dc', recordHandler => $filter, ); while( $r = $list->next() ) { next if $r->status() eq "deleted"; $xmlstringref = $r->recorddata()->result('*'); ... };
Note: The handlers are instantiated for each single OAI record in the response and will see one start_document() and end_document() event in any case (this behavior is different from that of handler class names directly specified as
metadataHandler
orrecordHandler
for a request: instances from those constructions will never see such events). - a code reference for an constructor
-
Must return a SAX filter ready to accept a new document.
The following example returns a string serialization for each single record:
# end_document() events will return \$x $constructor = sub { my $x = ""; return XML::SAX::Writer->new(Output => \$x); }; $filter = Net::OAI::Record::NamespaceFilter->new( '*' => $constructor ); $list = $harvester->listRecords( metadataPrefix => 'oai_dc', recordHandler => $filter, ); while( $r = $list->next() ) { $xmlstringref = $r->recorddata()->result('*'); ... };
Comment: This example shows an approach to insulate the "true contents" of individual response records without having to provide a SAX handler class of one's own (just the addidtional prerequisite of XML::SAX::Writer). But what you get is a serialized XML document which then has to be parsed for further processing ...
- an already instantiated SAX filter
-
As usual in this case no
start_document()
andend_document()
events are forwarded to the filter.open $fh, ">", $some_file; $builder = XML::SAX::Writer->new(Output => $fh); $builder->start_document(); $rootEL = { Name => 'collection', LocalName => 'collection', NamespaceURI => "http://www.loc.gov/MARC21/slim", Prefix => "", Attributes => {} }; $builder->start_element( $rootEL ); # filter for OAI-Namespace in records: forward all $filter = Net::OAI::Record::NamespaceFilter->new( 'http://www.loc.gov/MARC21/slim' => $builder); $list = $harvester->listRecords( metadataPrefix => 'a_strange_one', metadataHandler => $filter, ); # handle resumption tokens if more than the first # chunk shall be stored into $fh .... $builder->end_element( $rootEL ); $builder->end_document(); close($fh); # ... process contents of $some_file
In this example calling the
result()
method for individual records in the response will probably not be of much use.
Caution: Depending on the namespaces specified, even a handlers which are freshly instantiated for each OAI record might be fed with more than one top-level XML element.
METHODS
new( [%namespaces] )
Creates a Handler suitable as recordHandler or metadataHandler. %namespaces has namespace URIs for keys and values according to the four types described as above.
result ( [namespace] )
If called with a namespace, it returns the result of the handler, i.e. what end_document()
returned for the record in question. Otherwise it returns a hashref for all the results with the corresponding namespaces as keys.
AUTHOR
Thomas Berger <ThB@gymel.com>