NAME
URI::Fetch - Smart URI fetching (for syndication feeds, in particular)
SYNOPSIS
use URI::Fetch;
## Simple fetch.
my $res = URI::Fetch->fetch('http://example.com/atom.xml')
or die URI::Fetch->errstr;
## Fetch using specified ETag and Last-Modified headers.
my $res = URI::Fetch->fetch('http://example.com/atom.xml',
ETag => '123-ABC',
LastModified => time - 3600,
)
or die URI::Fetch->errstr;
## Fetch using an on-disk cache that URI::Fetch manages for you.
my $cache = Cache::File->new( cache_root => '/tmp/cache' );
my $res = URI::Fetch->fetch('http://example.com/atom.xml',
Cache => $cache
)
or die URI::Fetch->errstr;
DESCRIPTION
URI::Fetch is a smart client for fetching syndication feeds (RSS, Atom, and others) in an intelligent, bandwidth- and time-saving way. That means:
GZIP support
If you have Compress::Zlib installed, URI::Fetch will automatically try to download a compressed version of the content, saving bandwidth (and time).
Last-Modified and ETag support
If you use a local cache (see the Cache parameter to fetch), URI::Fetch will keep track of the Last-Modified and ETag headers from the server, allowing you to only download feeds that have been modified since the last time you checked.
Proper understanding of HTTP error codes
Certain HTTP error codes are special, particularly when fetching syndication feeds, and well-written clients should pay special attention to them. URI::Fetch can only do so much for you in this regard, but it gives you the tools to be a well-written client.
The response from fetch gives you the raw HTTP response code, along with special handling of 2 codes:
304 (Moved Permanently)
Signals that a feed has moved permanently, and that your database of feeds should be updated to reflect the new URI.
410 (Gone)
Signals that a feed is gone and will never be coming back, and should be removed from your database of feeds (or whatever you're using).
USAGE
URI::Fetch->fetch($uri, %param)
Fetches a syndication feed identified by the URI $uri.
On success, returns a URI::Fetch::Response object; on failure, returns undef
.
%param can contain:
LastModified
ETag
LastModified and ETag can be supplied to force the server to only return the full feed if it's changed since the last request. If you're writing your own feed client, this is recommended practice, because it limits both your bandwidth use and the server's.
If you'd rather not have to store the LastModified time and ETag yourself, see the Cache parameter below (and the SYNOPSIS above).
Cache
If you'd like URI::Fetch to cache responses between requests, provide the Cache parameter with an object supporting the Cache API (e.g. Cache::File, Cache::Memory). Specifically, an object that supports
$cache->get($key)
and$cache->set($key, $value, $expires)
.If supplied, URI::Fetch will store the feed content, ETag, and last-modified time of the response in the cache, and will pull the content from the cache on subsequent requests if the feed returns a Not-Modified response.
UserAgent
Optional. You may provide your own LWP::UserAgent instance. Look into LWPx::ParanoidUserAgent if you're fetching URLs given to you by possibly malicious parties.
ContentAlterHook
Optional. A subref that gets called with a scalar reference to your content so you can modify the content before it's returned and before it's put in cache.
For instance, you may want to only cache the <head> section of an HTML document, or you may want to take a feed URL and cache only a pre-parsed version of it. If you modify the scalarref given to your hook and change it into a hashref, scalarref, or some blessed object, that same value will be returned to you later on not-modified responses.
LICENSE
URI::Fetch is free software; you may redistribute it and/or modify it under the same terms as Perl itself.
AUTHOR & COPYRIGHT
Except where otherwise noted, URI::Fetch is Copyright 2004 Benjamin Trott, ben+cpan@stupidfool.org. All rights reserved.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 181:
You forgot a '=back' before '=head1'