NAME
WWW::Scraper::Yahoo360 - Yahoo 360 blogs old-fashioned crappy scraper
SYNOPSIS
use WWW::Scraper::Yahoo360;
my $y360 = WWW::Scraper::Yahoo360->new({
username => 'myusername',
password => 'mypassword',
});
# Debug what's happening?
$WWW::Scraper::Yahoo360::DEBUG = 1;
# First you have to login
$y360->login() or die "Login failed?";
# High level blog information
my $blog_info = $y360->blog_info();
# Gets all the blog posts
my $posts = $y360->get_blog_posts();
# Gets all the blog post comments
my $comments = $y360->get_blog_comments();
DESCRIPTION
Ignorant web scraper, based on WWW::Mechanize, that connects to your Yahoo 360 account and tries to fetch the blog posts and comments you still have on their service.
If it breaks, well... it's a scraper.
This module is used on the My Opera Community, http://my.opera.com, to import Yahoo 360 existing blogs into My Opera blog service.
SUBROUTINES
new(\%args)
Where \%args
is a hashref with username
and password
of your Yahoo 360 account.
This creates a new WWW::Scraper::Yahoo360
object, ready to scrape.
blog_info([$blog_page])
Fetches high-level blog information for your Yahoo 360 blog. If a $blog_page
argument is supplied, the blog information is looked up inside the contents of that scalar. Otherwise it's fetched from the network. $blog_page
must contain a full HTML page string.
Returns a hashref with the some/all the following information:
link
-
Something like:
http://blog.360.yahoo.com/blog-<yourusername>
-
Most probably
public
. Could also befriends
orfriends of friends
, but never tried it. count
-
Number of blog posts in total.
start
-
First blog post on the frontpage. Should be 1.
end
-
Last blog post on the frontpage, usually 5.
title
-
Title of the blog.
blog_main_page()
Fetches the user's main blog page. Returns a string with the HTML page contents. This can be used in blog_info()
or get_blog_posts()
.
blog_page_url($link, $start, $per_page, $count)
Builds the url to fetch a specific blog page.
dump()
Dumps last accessed page content to STDOUT
login()
Logs in to Yahoo service. Returns a scalar that tells you if the login was successful or not.
get_blog_comments(\@posts)
Retrieves all comments in the user's blog. Wants the structure returned by get_blog_posts()
.
get_blogpost_comments($post)
Retrieves all comments to a single blog post. Wants a single $post
entry (hashref): one of the elements returned by get_blog_posts()
.
get_blog_posts([$blog_page, [%overrides]])
Gets all blog posts by a user. If $blog_page
is supplied, it looks for blog posts in that page only.
%overrides
can be a set passed to override some of the properties about the blog to be scraped and parsed. To see the list of properties, look at blog_info()
.
Returns an array of hashrefs, each one representing a blog post. Each post (hashref) should have the following keys:
Example:
$y360 = WWW::Scraper::Yahoo360->new({
username => '...'
password => '...',
});
$y360->login() or die "Failed login";
# Fetch only the first blog post, no matter what
my $first_page = $y360->blog_main_page();
my $blog_posts = $y360->get_blog_posts($first_page, count=>1);
comments
-
Number of comments to this blog post
description
-
Blog post content
link
-
Permanent URL of the blog post
pubDate
-
Date when the blog post was published, in
HTTP::Date
format, ex.:Sun, Nov 14 06:20:28 CET
. -
Comma delimited string of tags (ex.:
travel, holiday
) title
-
Title of the blog post
mech()
WWW::Mechanize
object accessor.
parse_date($date_string)
Tries to parse a date from the Yahoo 360 format to a unix timestamp.
EXPORTS
None by default.
AUTHOR
Cosimo Streppone, <cosimo@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2009 by Cosimo Streppone, cosimo@cpan.org
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.