NAME
Email::Fingerprint - Calculate a digest for recognizing duplicate emails
VERSION
Version 0.15
SYNOPSIS
Email::Fingerprint calculates a checksum that uniquely identifies an email, for use in spotting duplicate messages. The checksum is based on: the Message-ID: header; or if it doesn't exist, on the Date:, From:, To: and Cc: headers together; or if those don't exist, on the body of the message.
use Email::Fingerprint;
my $foo = Email::Fingerprint->new();
...
ATTRIBUTES
FUNCTIONS
new
$fp = new Email::Fingerprint({
input => \*INPUT, # Or $string, \@lines, etc.
checksum => "Digest::SHA", # Or "Digest::MD5", etc.
strict_checking => 1, # If true, use message bodies
%mail_header_opts,
});
Create a new fingerprinting object. If the input
option is used, Email::Fingerprint
attempts to intelligently read the email message given by that option, whether it's a string, an array of lines or a filehandle.
If $opts{checksum}
is not supplied, then Email::Fingerprint
will use the first checksum module that it finds. If it finds no modules, it will use unpack
in a ghastly manner you don't want to think about.
Any %opts
are also passed along to Mail::Header-
new>; see the perldoc for Mail::Header
options.
checksum
# Uses original/default settings to take checksum
$checksum = $fp->checksum;
# Can use any options accepted by constructor
$options = {
input => \*INPUT, # Or $string, \@lines, etc.
checksum => "Digest::SHA", # Or "Digest::MD5", etc.
strict_checking => 1, # If true, use message bodies
%mail_header_opts,
};
# Overrides one or more original/default settings
$checksum = $fp->checksum($options);
Calculates the actual email fingerprint. The optional hashref argument will permanently override the object's previous settings.
read
$fingerprint->read_string( $email );
$fingerprint->read_string( $email, \%mh_args );
Accepts the email message $email
and attempts to read it intelligently, distinguishing strings, arrayrefs and filehandles. If supplied, the optional hashref is passed on to Mail::Header.
read_string
$fingerprint->read_string( $email_string );
$fingerprint->read_string( $email_string, \%mh_args );
Accepts the email message $email_string
and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.
read_filehandle
$fingerprint->read_filehandle( $email_fh );
$fingerprint->read_filehandle( $email_fh, \%mh_args );
Accepts the email message $email_fh
and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.
read_arrayref
$fingerprint->read_arrayref( \@email_lines );
$fingerprint->read_arrayref( \@email_lines, \%mh_args );
Accepts the email message \@email_lines
and prepares it for checksum computation. If supplied, the optional hashref is passed on to Mail::Header.
message_loaded
Returns true if an email message has been loaded and is ready for checksum, or false if no message has been loaded or an error has occurred.
set_checksum
Specifies the checksum method to be used.
INTERNAL METHODS
BUILD
A constructor helper method called from the Class::Std
framework. To execute BUILD
, use new()
.
_extract_headers
Extract the Message-ID: header. If that does not exist, extract the Date:, From:, To: and Cc: headers. If those do not exist, then force strict checking so that the message body will be fingerprinted.
_extract_body
$body = $fp->_extract_body;
Gets the body of the message, as a string. Line-endings are preserved, so the body can, e.g., be printed.
This method must only be called after a message has been read. No validation is done in the method itself, so this is the user's responsibility.
_concat
@headers = qw( foo@example.com bar@example.com );
$delim = 'To:';
$string = $fp->_concat( \@headers, $delim );
# $string is now 'To:foo@example.comTo:bar@example.com'
Returns the concatenation of \@headers
, with $delim
prepended to each element of \@headers
. If $delim
is omitted, the empty string is used. \@headers
elements are all chomped before concatenation.
AUTHOR
Len Budney, <lbudney at pobox.com>
BUGS
Please report any bugs or feature requests to bug-email-fingerprint at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Email-Fingerprint. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Email::Fingerprint
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
RT: CPAN's request tracker
Search CPAN
SEE ALSO
See Mail::Header for options governing the parsing of email headers.
ACKNOWLEDGEMENTS
Email::Fingerprint is based on the eliminate_dups
script by Peter Samuel and available at http://www.qmail.org/.
COPYRIGHT & LICENSE
Copyright 2006 Len Budney, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.