NAME

XML::RSSLite - lightweight, "relaxed" RSS (and XML-ish) parser

SYNOPSIS

use XML::RSSLite;

parseRSS(\%result, \$content);

print "=== Channel ===\n",
      "Title: $result{'title'}\n",
      "Desc:  $result{'description'}\n",
      "Link:  $result{'link'}\n\n";

foreach $item (@{$result{'items'}}) {
print "  --- Item ---\n",
      "  Title: $item->{'title'}\n",
      "  Desc:  $item->{'description'}\n",
      "  Link:  $item->{'link'}\n\n";
}

DESCRIPTION

This module attempts to extract the maximum amount of content from available documents, and is less concerned with XML compliance than alternatives. Rather than rely on XML::Parser, it uses heuristics and good old-fashioned Perl regular expressions. It stores the data in a simple hash structure, and "aliases" certain tags so that when done, you can count on having the minimal data necessary for re-constructing a valid RSS file. This means you get the basic title, description, and link for a channel and its items.

This module extracts more usable links by parsing "scriptingNews" and "weblog" formats in addition to RDF & RSS. It also "sanitizes" the output for best results. The munging includes:

Remove html tags to leave plain text
Remove leading whitespace from URIs
By defaul strips characters except 0-9~!@#$%^&*()-+=a-zA-Z[];',.:"<>?\s
Use <url> tags when <link> is empty
Use misplaced urls in <title> when <link> is empty
Exract links from <a href=...> if required
Limit links to ftp and http(s)
Join relative item urls (beginning with / or #) to the site base

EXPORT

parseRSS($outHashRef, $inScalarRef, [$strip])

inScalarRef - required

Reference to a scalar containing the document to be parsed. NOTE: The contents will effectively be destroyed. Make a deep copy first if you care.

outHashRef - required

Reference to the hash within which to store the parsed content.

strip - optional

An expression indicating the level of winnowing to be performed on the characters permitted in the results.

1 strip non-printable characters
0 no characters are removed
undefined (Default) strip everything but:: 0-9~!@#$%^&*()-+= a-zA-Z[];',.:"<>?\t\n

EXPORTABLE

parseXML(\%parsedTree, \$parseThis, 'topTag', $comments);

parsedTree - required

Reference to hash to store the parsed document within.

parseThis - required

Reference to scalar containing the document to parse.

topTag - optional

Tag to consider the root node, leaving this undefined is not recommended.

comments - optional

false will remove contents from parseThis
true will not remove comments from parseThis
array reference is true, comments are stored here

CAVEATS

This is not a conforming parser. It does not handle the following

```
<foo bar=">">
```

<foo><bar> <bar></bar> <bar></bar> </bar></foo>

```
<![CDATA[ ]]>
```
```
PI
```

It's non-validating, without a DTD the following cannot be properly addressed

entities
namespaces: This may or may not be arriving in some future release.

AUTHOR

Jerrad Pierce <jpierce@cpan.org>.

Scott Thomason <scott@thomasons.org>

LICENSE

Portions Copyright (c) 2002,2003,2009 Jerrad Pierce, (c) 2000 Scott Thomason. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 480:: =back without =over

To install XML::RSSLite, copy and paste the appropriate command in to your terminal.

cpanm

cpanm XML::RSSLite

CPAN shell

perl -MCPAN -e shell
install XML::RSSLite

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)