NAME
W3C::LogValidator - The W3C Log Validator - Quality-focused Web Server log processing engine
Checks quality/validity of most popular content on a Web server
SYNOPSIS
Generic, basic use of the W3C::LogValidator
module. Parse configuration file and process relevant logs.
use W3C::LogValidator;
my $logprocessor = W3C::LogValidator->new("sample.conf");
$logprocessor->process;
Alternatively (use default config and process logs)
my $logprocessor = W3C::LogValidator->new;
$logprocessor->process;
DESCRIPTION
W3C::LogValidator
is the main module for the W3C Log Validator, a combination of Web Server log analysis and statistics tool and Web Content quality checker.
As an easy alternative to using this module, the perl script logprocess.pl is bundled in the W3C::LogValidator distribution.
API
Constructor
- $processor = W3C::LogValidator->new
-
Constructs a new
W3C::LogValidator
processor. You might pass a configuration file name, as well as a hash of attribute-value pairs as parameters to the constructor.e.g. for mail output:
%conf = ( "UseOutputModule" => "W3C::LogValidator::Output::Mail", "ServerAdmin" => 'webmaster@example.com', "verbose" => "3" ); $processor = W3C::LogValidator->new("path/to/config.conf", \%conf);
Or e.g. for HTML output:
%conf = ( "UseOutputModule" => "W3C::LogValidator::Output::HTML", "OutputTo" => 'path/to/file.html', "verbose" => "0" ); $processor = W3C::LogValidator->new("path/to/config.conf", \%conf);
If given the path to a configuration file,
new()
will call the W3C::LogValidator::Config module to get its configuration variables. Otherwise, a default set of values is used.
Main processing method
- $processor->process
-
Do-it-all method: Read configuration file (if any), parse log files, run them through processing modules, send result to output module.
Modules methods
- $processor->config_module
-
Creates a configuration hash for a specific module, adding module-specific configuration variables, overriding if necessary
- $processor->use_modules
-
Run the data parsed off the log files through the various processing (validation) modules specified by UseValidationModule in the configuration.
Log parsing and URI methods
- $processor->read_logfiles
-
Loops through and parses all log files specified in the configuration
- $processor->read_logfile('path/to.file')
-
Extracts URIs and number of hits from a given log file, and feeds it to the processor's URI/Hits table
- $processor->find_uri
-
Given a log record and the type of the log (common log format, flat list of URIs, etc), extracts the URI
- $processor->remove_duplicates
-
Given a URI, removes "directory index" suffixes such as index.html, etc so that http://foobar/ and http://foobar/index.html be counted as one resource
- $processor->add_uri
-
Add a URI to the processor's URI/Hits table
- $processor->sorted_uris
-
Returns the list of URIs in the processor's table, sorted by popularity (hits)
- $processor->no_cgi
-
Tests whether a given URI contains a CGI query string
- $processor->hit
-
Returns the number of hits for a given URI. Basically a "public" method accessing $hits{$uri};
BUGS
Public bug-tracking interface at http://www.w3.org/Bugs/Public/
AUTHOR
Olivier Thereaux <ot@w3.org> for The World Wide Web Consortium
SEE ALSO
perl(1). Up-to-date information on this tool at http://www.w3.org/QA/Tools/LogValidator/