NAME
HTML::RSSAutodiscovery - methods for retreiving RSS-ish information from an HTML document.
SYNOPSIS
use HTML::RSSAutodiscovery;
use Data::Dumper;
my $url = "http://www.diveintomark.org/";
my $html = HTML::RSSAutodiscovery->new();
print &Dumper($html->parse($url));
# Mark's gone a bit nuts with this and
# the list is too long to include here...
# see the POD for the 'parse' method for
# details of what it returns.
DESCRIPTION
Methods for retreiving RSS-ish information from an HTML document.
PACKAGE METHODS
__PACKAGE__->new()
Object constructor. Returns an object. Woot!
OBJECT METHODS
$obj->parse($arg)
Parse an HTML document and return RSS-ish <link> information.
$arg may be either:
An HTML string, passed as a scalar reference.
A URI.
Returns an array reference of hash references whose keys are :
title
type
rel
href
$obj->locate($uri,\%args)
Like the parse method, but will perform additional lookups, if necessary or specified.
Valid arguments are
uri
String. A live, breathing URI to slurp and parse.
Required
Hash ref whose keys may be
noparse
Boolean. Don't bother parsing the document, this will also prevent you from checking for embedded links.
I don't know why you want to do this, but you can.
False, by default.
embedded
Boolean. Check all embedded links ending in '.xml', '.rss' or '.rdf' (and then 'xml', 'rss' or 'rdf') for RSS-ness.
False, by default, unless the initial parsing of the URI returns no RSS links.
embedded_and_remote
Boolean.
Boolean. Check all embedded links whose root is not the same as $uri for RSS-ness.
False, by default.
syndic8
Boolean. Check the syndic8 servers for sites matching $uri
False, by default, unless the initial parsing of the URI and any embedded links returns no RSS links.
Returns an array reference of hash references whose keys are :
title
type
rel
href
VERSION
1.21
DATE
$Date: 2004/10/17 04:13:06 $
AUTHOR
Aaron Straup Cope
SEE ALSO
Because you shouldn't need all that white space to do cool stuff ;-)
http://diveintomark.org/archives/2002/05/30.html#rss_autodiscovery
http://diveintomark.org/archives/2002/08/15.html
http://diveintomark.org/projects/misc/rssfinder.py.txt
REQUIREMENTS
BASIC
These packages are required to actually parse an HTML document or URI.
HTML::Parser
LWP::UserAgent
HTTP::Request
EMBEDDED
These packages are required to check the embedded links in a URI for RSS files. They are not loaded until run-time so they are not required for doing basic parsing
XML::RSS
SYNDIC8
These packages are required to query the syndic8 servers for RSS files associated with a URI. They are not loaded until run-time so they are not required for doing basic parsing
XMLRPC::Lite
LICENSE
Copyright (c) 2002-2004, Aaron Straup Cope. All Rights Reserved.
This is free software, you may use it and distribute it under the same terms as Perl itself.