The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Mail::Graph - draw graphical stats for mails/spams

SYNOPSIS

	use Mail::Graph;

	$graph = Mail::Graph->new( items => 'spam', 
	  output => 'spams/',
	  input => '~/Mail/spam/',
          );
        $graph->generate();

DESCRIPTION

This module parses mailbox files in either compressed or uncompressed form and then generates pretty statistics and graphs about them. Although at first developed to do spam statistics, it works just fine for normal mail.

File Format

The module reads in files in mbox format. These can be compressed by gzip, or just plain text. Since the module read in any files that are in one directory, it can also handle mail-dir style folders, e.g. a directory where each mail resides in an extra file.

The file format is quite simple and looks like this:

From sample_foo@example.com  Tue Oct 27 18:38:52 1998
Received: from barfel by foo.example.com (8.9.1/8.6.12) 
From: forged_bar@example.com
X-Envelope-To: <sample_foo@example.com>
Date: Tue, 27 Oct 1998 09:52:14 +0100 (CET)
Message-Id: <199810270852.12345567@example.com>
To: <none@example.com>
Subject: Sorry...
X-Loop-Detect: 1
X-Spamblock: caught by rule dummy@

This is a sample spam

Basically, an email header plus email body, separated by the From lines.

The following fields are examined to determine:

X-Envelope-To		the target address/domain
From address@domain	the sender
From date		the receiving date

METHODS

new()

Create a new Mail::Graph object.

The following options exist:

input		Path to a directory containing (gzipped) mbox files
		Alternatively, name of an (gzipped) mbox file
index		Directory where to write (and read) the index files
output		Directory where to write the output stats
items		Try 'spams' or 'mails' (can be any string)
generate	hash with names of stats to generate (1=on, 0=off):
		 month		 per each month of the year
		 day		 per each day of the month
		 hour		 per each hour of the day
		 dow		 per each day of the week
		 yearly		 per year
		 daily		 per each day (with average)
		 monthly	 per each month
		 toplevel	 per top_level domain
		 rule		 per filter rule that matched
		 target		 per target address
		 domain	         per target domain
		 last_x_days     items for each of the last x days
			         set it to the number of days you want
		 score_histogram show histogram of SpamAssassin scores
				 set it to the step-width (like 5)
		 score_daily     SA score for each of the last x days
			         set it to the number of days you want
		 score_scatter   SA scatter score diagram, set it to
				 the limit of the score (a line will be
				 draw there)
average		set to 0 to disable, otherwise it gives the number
		of days/weeks/month to average over
average_daily	if not set, uses average, 0 to disable
		number of days to average over in the daily graph
height		base height of the generated images
template	name of the template file (ending in .tpl) that is
		used to generate the html output, e.g. 'index.tpl'
no_title	set to 1 to disable graph titles, default 0
filter_domains	array ref with list of domains to show as "unknown"
filter_target	array ref with list of targets (regualr expressions)
graph_ext	extension of the generated graphs, default 'png'
last_date	in yyyy-mm-dd format: specify the last used date, any
		mail newer than that will be skipped. Defaults to today
first_date	in yyyy-mm-dd format: specify the first used date, any
		mail older than that will be skipped. Defaults to undef
		meaning any old mail will be considered.

generate()

Generate the stats, fill in the template and write it out. Takes no options.

error()

Return an error message or undef for no error.

BUGS

There are a couple of known bugs, some of the are unfinished features or problem of GD::Graph:

Divide by Zero

This is a bug in some versions of GD::Graph, when generating a graph with only one bar it will crash with this error. If you encounter this, please bug the author of GD::Graph and send me a copy.

Argument "4, 0.7%" isn't numeric

You might get a lot of warnings like

Argument "4, 0.7%" isn't numeric in numeric lt (<) at 
/usr/lib/perl5/site_perl/5.8.2/GD/Graph/Data.pm line 231.

This is a problem with GD::Graph: Mail::Graph wants to use labels like 4, 0.7% but GD::Graphs uses the same string for the label and the value of the point/bar. And thus Perl warns. This needs a small patch to GD::Graph that strips anything non-numeric out of the label before using it in numeric context. Please bug the author of GD::Graph and send me a copy.

gzipped archives are not included in the stats

Some of the gzipped archives seem to trigger some bug in Compress::Zlib, at least til version v1.32. For instance, on my system on of the sample archives in /sample/archives/ is not read properly by Compress::Zlib. I already have notified the author of Compress::Zlib.

LICENSE

This program is free software; you may redistribute it and/or modify it under the same terms as Perl itself.

AUTHOR

(c) Copyright by Tels http://bloodgate.com/ 2002.