NAME
Apache::Log::Parser - Parser for Apache Log (common, combined, and any other custom styles by LogFormat).
SYNOPSIS
my $parser = Apache::Log::Parser->new( fast => 1 );
my $log = $parser->parse($logline);
$log->{rhost}; #=> remote host
$log->{agent}; #=> user agent
DESCRIPTION
Apache::Log::Parser is a parser module for Apache logs, accepts 'common', 'combined', and any other custom style. It works relatively fast, and process quoted double-quotation properly.
Once instanciate a parser, it can parse all of types specified with one method 'parse'.
USAGE
This module requires a option 'fast' or 'strict' with instanciate.
'fast' parser works relatively fast. It can process only 'common', 'combined' and custom styles with compatibility with 'common', and cannot work with backslash-quoted double-quotes in fields.
# Default, for both of 'combined' and 'common'
my $parser = Apache::Log::Parser->new( fast => 1 );
my $log1 = $parser->parse(<<COMBINED);
192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /path/to/file.html HTTP/1.1" 200 9891 "-" "DoCoMo/2.0 P03B(c500;TB;W24H16)"
COMBINED
# $log1->{rhost}, $log1->{date}, $log1->{path}, $log1->{referer}, $log1->{agent}, ...
my $log2 = $parser->parse(<<COMMON); # parsed as 'common'
192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /path/to/file.html HTTP/1.1" 200 9891
COMMON
# For custom style(additional fields after 'common'), 'combined' and common
# custom style: LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\" \"%v\" \"%{cookie}n\" %D"
my $c_parser = Apache::Log::Parser->new( fast => [[qw(referer agent vhost usertrack request_duration)], 'combined', 'common'] );
my $log3 = $c_parser->parse(<<CUSTOM);
192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /index.html HTTP/1.1" 200 257 "http://example.com/referrer" "Any User-Agent" "example.com" "192.168.0.1201102091208001" 901
CUSTOM
# $log3->{agent}, $log3->{vhost}, $log3->{usertrack}, ...
'strict' parser works relatively slow. It can process any style format logs, with specification about separator, and checker for perfection. It can also process backslash-quoted double-quotes properly.
# 'strict' parser is available for log formats without compatibility for 'common', like 'vhost_common' ("%v %h %l %u %t \"%r\" %>s %b")
my @customized_fields = qw( rhost logname user datetime request status bytes referer agent vhost usertrack request_duration );
my $strict_parser = Apache::Log::Parser->new( strict => [
["\t", \@customized_fields, sub{my $x=shift;defined($x->{vhost}) and defined($x->{usertrack}) }], # TABs as separator
[" ", \@customized_fields, sub{my $x=shift;defined($x->{vhost}) and defined($x->{usertrack}) }],
'combined',
'common',
'vhost_common',
]);
my $log4 = $strict_parser->parse(<<CUSTOM);
192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /index.html HTTP/1.1" 200 257 "http://example.com/referrer" "Any \"Quoted\" User-Agent" "example.com" "192.168.0.1201102091208001" 901
CUSTOM
$log4->{agent} #=> 'Any "Quoted" User-Agent'
my $log5 = $strict_parser->parse(<<VHOST);
example.com 192.168.0.1 - - [07/Feb/2011:10:59:59 +0900] "GET /index.html HTTP/1.1" 200 257
VHOST
LICENSE
This software is licensed under the same terms as Perl itself.
AUTHOR
TAGOMORI Satoshi <tagomoris at gmail.com>
SEE ALSO
http://httpd.apache.org/docs/2.2/mod/mod_log_config.html#formats