NAME
WWW::ImageSpool - Cache images of interest from the web.
SYNOPSIS
use WWW::ImageSpool;
mkdir("/var/tmp/imagespool", 0700);
my $spool = WWW::ImageSpool->new
(
limit => 3,
searchlimit => 10,
max => 5 * 1048576,
dictionary => "sushi.txt",
verbose => 1,
dir => "/var/tmp/imagespool"
);
$spool->run();
while($spool->uptime < 86400);
DESCRIPTION
When A WWW::ImageSpool object's run() method is called, it randomly picks keywords out of a chosen dictionary file and attempts to download images off of the internet by doing searches on these keywords. (Currently only a Google Image Search is done, via Guillaume Rousse's WWW::Google::Images module, but the internals have been set up to make it easy to hook into other engines in the future.) Images are stored in the specified directory. If the directory grows beyond the maximum size, the oldest files in the directory are deleted.
The intended purpose behind this module is to supply images on demand for any piece of software that wants abstract images, such as screensavers or webpage generators or voice synthesizers (wouldn't it be cool if a voice synthesizer extracted all the popular nouns out of a book and scrolled by pertanent images as it read to you?)
Constructor
new(%args)
Creates and returns a new WWW::ImageSpool
object.
Required parameters:
- dir => $dir
-
Directory to hold the image files in.
WWW::ImageSpool
will delete files out of this directory when it reaches the maximum size, so there shouldn't be anything in there that you want to keep.
Optional parameters:
- limit => $limit
-
Maximum number of images to fetch from any one keyword search. Defaults to 3.
- searchlimit => $searchlimit
-
Maximum number of search results to ask the search engine for. limit results will be randomly picked out of the list that the search engine returns. Default is search-engine specific (50 for Google). Most search engines will return the results in the same order each time they are called with the same keywords, so if you are using a small dictionary file it is generally a good idea to make this a lot higher than limit.
- consume => 0 | 1
-
WWW::ImageSpool re-loads the dictionary file whenever it is modified, or whenever it runs out of words. With consume set to 0, WWW::ImageSpool will never run out of words because it can re-use them as much as they want. With consume set to 1, WWW::ImageSpool deletes each word from it's internal list as it uses it, ensuring that every single word in the dictionary must be used once before any word may be used twice.
consume is set to 1 by default.
- retry => $retry
-
How many times to retry image-searching or fetching operations if they fail.
The actual maximum number of retries is ($retry * $retry); WWW::ImageSpool will try up to $retry times to find a word with good search results, then with that word, will try up to $retry times to get images from it, stopping after at least one image is successfully downloaded (or the retry is exhausted.)
retry is set to 5 by default.
- minx=> $minx, miny => $miny
-
Minimum X / Y resolution of images to return. Smaller images are discarded.
By default, minx is set to 160, and miny is set to 120.
- max => $bytes
-
Maximum size of the spool directory, in bytes. If the total size of all files in that directory ever goes over this size, the oldest file in the directory is deleted to make more room.
- dictionary => $file
-
Path to the dictionary file to use. Defaults to "/usr/share/dict/words".
- verbose => 0 - 4
-
Level of verbosity. Defaults to 0, which prints nothing. 1 prints a logfile-like status line for each iteration of run(). 2 prints each word that is picked, and advises if
WWW::ImageSpool
picked a file that already exists in the spool. 3-4 print more verbose debugging information.
Paramaters for making WWW::ImageSpool
re-entrant:
These parameters are only really useful if you are creating and destroying WWW::ImageSpool
objects throughout the lifespan of an application, but want your statistics to remain constant throughout:
- n => $n
-
How many iterations of
run()
the application has done so far. - s => $s
-
UNIX timestamp of when the application did it's first call to
run()
on aWWW::ImageSpool
object. - l => $l
-
UNIX timestamp of when the application last did a call to
run()
on aWWW::ImageSpool
object. - got => $got
-
How many images have been downloaded and stored over the life of the application (including ones that have been deleted).
Methods
run()
Pick a new keyword and attemt to download up to limit images from an image search.
Returns the actual number of images downloaded and stored.
s()
Returns the UNIX timestamp of the object's first operation.
l()
Returns the UNIX timestamp of the object's last operation.
n()
Returns how many times run()
has been called on this object.
uptime()
Returns the number of seconds between the object's first operation and it's last operation.
lag()
Returns the number of seconds between the object's last operation and the current time.
got()
Returns the total number of images that have been downloaded and stored by this object, including images that have been deleted.
BUGS
If the dictionary file suddenly disappears, WWW::ImageSpool
does not act very graceful.
TODO
There should be size limitations on individual files with a HEAD check before they are actually downloaded.
Underlying modules (WWW::ImageSpool::Source::Google
, WWW::ImageSpool::Dictionary
, etc need to be documented.
Support for multiple "Source" and "Dictionary" objects in one "ImageSpool" object.
Per-run()
control over the search configuration.
NOTE
This module may violate the terms of service of some search engines or content providers. Use at your own risk.
VERSION
0.01
LICENSE
Copyright 2004, Tyler "Crackerjack" MacDonald <tyler@yi.org> This is free software; you may redistribute it under the same terms as perl itself.