NAME

W3C::LogValidator - The W3C Log Validator - Quality-focused Web Server log processing engine

Checks quality/validity of most popular content on a Web server

SYNOPSIS

Generic, basic use of the W3C::LogValidator module. Parse configuration file and process relevant logs.

use W3C::LogValidator;
my $logprocessor = W3C::LogValidator->new("sample.conf");
$logprocessor->process;

Alternatively (use default config and process logs)

my $logprocessor = W3C::LogValidator->new;
$logprocessor->process;

DESCRIPTION

W3C::LogValidator is the main module for the W3C Log Validator, a combination of Web Server log analysis and statistics tool and Web Content quality checker.

As an easy alternative to using this module, the perl script logprocess.pl is bundled in the W3C::LogValidator distribution.

API

Constructor

$processor = W3C::LogValidator->new

Constructs a new W3C::LogValidator processor. You might pass a configuration file name, as well as a hash of attribute-value pairs as parameters to the constructor.

e.g. for mail output:

%conf = (
  "UseOutputModule" => "W3C::LogValidator::Output::Mail",
  "ServerAdmin" => 'webmaster@example.com',
  "verbose" => "3"
  );
$processor = W3C::LogValidator->new("path/to/config.conf", \%conf);

Or e.g. for HTML output:

%conf = (
  "UseOutputModule" => "W3C::LogValidator::Output::HTML",
  "OutputTo" => 'path/to/file.html',
  "verbose" => "0"
  );
$processor = W3C::LogValidator->new("path/to/config.conf", \%conf);

If given the path to a configuration file, new() will call the W3C::LogValidator::Config module to get its configuration variables. Otherwise, a default set of values is used.

Main processing method

$processor->process

Do-it-all method: Read configuration file (if any), parse log files, run them through processing modules, send result to output module.

Modules methods

$processor->config_module

Creates a configuration hash for a specific module, adding module-specific configuration variables, overriding if necessary

$processor->use_modules

Run the data parsed off the log files through the various processing (validation) modules specified by UseValidationModule in the configuration.

Log parsing and URI methods

$processor->read_logfiles

Loops through and parses all log files specified in the configuration

$processor->read_logfile('path/to.file')

Extracts URIs and number of hits from a given log file, and feeds it to the processor's URI/Hits table

$processor->find_uri

Given a log record and the type of the log (common log format, flat list of URIs, etc), extracts the URI

$processor->remove_duplicates

Given a URI, removes "directory index" suffixes such as index.html, etc so that http://foobar/ and http://foobar/index.html be counted as one resource

$processor->add_uri

Add a URI to the processor's URI/Hits table

$processor->sorted_uris

Returns the list of URIs in the processor's table, sorted by popularity (hits)

$processor->no_cgi

Tests whether a given URI contains a CGI query string

$processor->hit

Returns the number of hits for a given URI. Basically a "public" method accessing $hits{$uri};

BUGS

Public bug-tracking interface at http://www.w3.org/Bugs/Public/

AUTHOR

Olivier Thereaux <ot@w3.org> for The World Wide Web Consortium

SEE ALSO

perl(1). Up-to-date information on this tool at http://www.w3.org/QA/Tools/LogValidator/