Looking for help!
NAME
WWW::FetchStory::Fetcher - fetching module for WWW::FetchStory
VERSION
version 0.1827
DESCRIPTION
This is the base class for story-fetching plugins for WWW::FetchStory.
METHODS
new
$obj->WWW::FetchStory::Fetcher->new();
init
Initialize the object.
$obj->init(%args)
name
The name of the fetcher; this is basically the last component of the module name. This works as either a class function or a method.
$name = $self->name();
$name = WWW::FetchStory::Fetcher::name($class);
info
Information about the fetcher. By default this just returns the formatted name.
$info = $self->info();
priority
The priority of this fetcher. Fetchers with higher priority get tried first. This is useful where there may be a generic fetcher for a particular site, and then a more specialized fetcher for particular sections of a site. For example, there may be a generic LiveJournal fetcher, and then refinements for particular LiveJournal community, such as the sshg_exchange community. This works as either a class function or a method.
This must be overridden by the specific fetcher class.
$priority = $self->priority();
$priority = WWW::FetchStory::Fetcher::priority($class);
allow
If this fetcher can be used for the given URL, then this returns true. This must be overridden by the specific fetcher class.
if ($obj->allow($url))
{
....
}
fetch
Fetch the story, with the given options.
%story_info = $obj->fetch(
urls=>\@urls,
basename=>$basename,
toc=>0,
yaml=>0);
- basename
-
Optional basename used to construct the filenames. If this is not given, the basename is derived from the title of the story.
- epub
-
Create an EPUB file, deleting the HTML files which have been downloaded.
- toc
-
Build a table-of-contents file if this is true.
- yaml
-
Build a YAML file with meta-data about this story if this is true.
- urls
-
The URLs of the story. The first page is scraped for meta-information about the story, including the title and author. Site-specific Fetcher plugins can find additional information, including the URLs of all the chapters in a multi-chapter story.
Private Methods
get_story_basename
Figure out the file basename for a story by using its title.
$basename = $self->get_story_basename($title);
extract_story
Extract the story-content from the fetched content.
my ($story, $title) = $self->extract_story(content=>$content,
title=>$title);
make_css
Create site-specific CSS styling.
$css = $self->make_css();
tidy
Make a tidy, compliant XHTML page from the given story-content.
$content = $self->tidy(story=>$story,
title=>$title);
get_toc
Get a table-of-contents page.
get_page
Get the contents of a URL.
parse_toc
Parse the table-of-contents file.
This must be overridden by the specific fetcher class.
%info = $self->parse_toc(content=>$content,
url=>$url,
urls=>\@urls);
This should return a hash containing:
- chapters
-
An array of URLs for the chapters of the story. In the case where the story only takes one page, that will be the chapter. In the case where multiple URLs have been passed in, it will be those URLs.
- title
-
The title of the story.
It may also return additional information, such as Summary.
parse_chapter_urls
Figure out the URLs for the chapters of this story.
parse_epub_url
Figure out the URL for the EPUB version of this story, if there is one.
parse_title
Get the title from the content
parse_ch_title
Get the chapter title from the content
parse_author
Get the author from the content
parse_summary
Get the summary from the content
parse_characters
Get the characters from the content
parse_universe
Get the universe/fandom from the content
parse_recipient
Get the recipient from the content
parse_category
Get the categories from the content
parse_rating
Get the rating from the content
derive_values
Calculate additional Meta values, such as current date.
get_chapter
Get an individual chapter of the story, tidy it, and save it to a file.
$filename = $obj->get_chapter(base=>$basename,
count=>$count,
url=>$url,
title=>$title);
get_epub
Get the EPUB version of the story, tidy it, and save it to a file.
$filename = $obj->get_epub(base=>$basename,
url=>$url);
epub_replace_description
Replace or add the description to an EPUB file.
epub_add_meta
Add the given meta-data to an EPUB file.
epub_parse_one_node
Parse a node of meta-information from an EPUB file.
wordcount
Figure out the word-count.
build_toc
Build a local table-of-contents file from the meta-info about the story.
$self->build_toc(info=>\%info);
build_epub
Create an EPUB file from the story files and meta information.
$self->build_epub()
tidy_chars
Remove nasty encodings.
$content = $self->tidy_chars($content);