NAME

Plucene::SearchEngine::Index::URL - File reader for web URLs

DESCRIPTION

This frontend module takes a URL, downloads its content, extracts its metadata and passes the file onto a backend. The frontend registers the following Plucene fields:

mimetype

The MIME type of the data.

filename

The basename of the URL's filename.

id

The URL given.

modified

A Plucene date field representing the last modified date of the file

language

The ISO language identifier of the content

encoding

The original character set. (before conversion to UTF-8)

METHODS

Plucene::SearchEngine::Index::URL->examine($url);

This downloads and examines a file on the filesystem for the above metadata, before handling it to a backend.