NAME
Plucene::SearchEngine::Index::URL - File reader for web URLs
DESCRIPTION
This frontend module takes a URL, downloads its content, extracts its metadata and passes the file onto a backend. The frontend registers the following Plucene fields:
- mimetype
-
The MIME type of the data.
- filename
-
The basename of the URL's filename.
- id
-
The URL given.
- modified
-
A Plucene date field representing the last modified date of the file
- language
-
The ISO language identifier of the content
- encoding
-
The original character set. (before conversion to UTF-8)
METHODS
Plucene::SearchEngine::Index::URL->examine($url);
This downloads and examines a file on the filesystem for the above metadata, before handling it to a backend.