NAME
BusyBird::Input::Feed - input BusyBird statuses from RSS/Atom feed
SYNOPSIS
use BusyBird;
use BusyBird::Input::Feed;
my $input = BusyBird::Input::Feed->new;
my $statuses = $input->parse($feed_xml);
timeline("feed")->add($statuses);
$statuses = $input->parse_file("feed.atom");
timeline("feed")->add($statuses);
$statuses = $input->parse_url('https://metacpan.org/feed/recent?f=');
timeline("feed")->add($statuses);
DESCRIPTION
BusyBird::Input::Feed converts RSS and Atom feeds into BusyBird status objects.
For convenience, an executable script busybird_input_feed is bundled in this distribution.
CLASS METHODS
$input = BusyBird::Input::Feed->new(%args)
The constructor.
Fields in %args
are:
use_favicon
=> BOOL (optional, default: true)-
If true (or omitted or
undef
), it tries to use the favicon of the Web site providing the feed as the statuses' icons.If it's defined and false, it won't use favicon.
user_agent
=> LWP::UserAgent object (optional)-
LWP::UserAgent object for fetching documents.
image_max_num
=> INT (optional, default: 3)-
The maximum number of image URLs extracted from the feed item.
If set to 0, it extracts no images. If set to a negative value, it extracts all image URLs from the feed item.
The extracted image URLs are stored as Twitter Entities in the status's
extended_entities
field, so that BusyBird will render them. See "extended_entities.media" in BusyBird::Manual::Status for detail.
OBJECT METHODS
$statuses = $input->parse($feed_xml_string)
$statuses = $input->parse_string($feed_xml_string)
Convert the given $feed_xml_string
into BusyBird $statuses
. parse()
method is an alias for parse_string()
.
$feed_xml_string
is the XML data to be parsed. It must be a string encoded in UTF-8.
Return value $statuses
is an array-ref of BusyBird status objects.
If $feed_xml_string
is invalid, it croaks.
$statuses = $input->parse_file($feed_xml_filename)
Same as parse_string()
except parse_file()
reads the file named $feed_xml_filename
and converts its content.
$statuses = $input->parse_url($feed_xml_url)
$statuses = $input->parse_uri($feed_xml_url)
Same as parse_string()
except parse_url()
downloads the feed XML from $feed_xml_url
and converts its content.
parse_uri()
method is an alias for parse_url()
.
EXAMPLE
The example below uses Parallel::ForkManager to parallelize parse_url()
method of BusyBird::Input::Feed. It greatly reduces the total time to download a lot of RSS/Atom feeds.
use strict;
use warnings;
use Parallel::ForkManager;
use BusyBird::Input::Feed;
use open qw(:std :encoding(utf8));
my @feeds = (
'https://metacpan.org/feed/recent?f=',
'http://www.perl.com/pub/atom.xml',
'https://github.com/perl-users-jp/perl-users.jp-htdocs/commits/master.atom',
);
my $MAX_PROCESSES = 10;
my $pm = Parallel::ForkManager->new($MAX_PROCESSES);
my $input = BusyBird::Input::Feed->new;
my @statuses = ();
$pm->run_on_finish(sub {
my ($pid, $exitcode, $id, $signal, $coredump, $statuses) = @_;
push @statuses, @$statuses;
});
foreach my $feed (@feeds) {
$pm->start and next;
warn "Start loading $feed\n";
my $statuses = $input->parse_url($feed);
warn "End loading $feed\n";
$pm->finish(0, $statuses);
}
$pm->wait_all_children;
foreach my $status (@statuses) {
print "$status->{user}{screen_name}: $status->{text}\n";
}
SEE ALSO
REPOSITORY
https://github.com/debug-ito/BusyBird-Input-Feed
BUGS AND FEATURE REQUESTS
Please report bugs and feature requests to my Github issues https://github.com/debug-ito/BusyBird-Input-Feed/issues.
Although I prefer Github, non-Github users can use CPAN RT https://rt.cpan.org/Public/Dist/Display.html?Name=BusyBird-Input-Feed. Please send email to bug-BusyBird-Input-Feed at rt.cpan.org
to report bugs if you do not have CPAN RT account.
AUTHOR
Toshio Ito, <toshioito at cpan.org>
LICENSE AND COPYRIGHT
Copyright 2014 Toshio Ito.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.