NAME
Mail::SpamAssassin::Plugin::PDFInfo - PDFInfo Plugin for SpamAssassin
SYNOPSIS
loadplugin Mail::SpamAssassin::Plugin::PDFInfo
DESCRIPTION
This plugin helps detected spam using attached PDF files
- See "Usage:" below - more documentation see 20_pdfinfo.cf
-
Original info kept for history. For later changes see SVN repo ------------------------------------------------------- PDFInfo Plugin for SpamAssassin Version: 0.8 Info: $Id: PDFInfo.pm 904 2007-08-12 01:36:23Z root $ Created: 2007-08-10 Modified: 2007-08-10 By: Dallas Engelken Changes: 0.8 - added .fdf detection (thanks John Lundin) [axb] 0.7 - fixed empty body/pdf count buglet(thanks Jeremy) [axb] 0.6 - added support for tags - PDFCOUNT, PDFVERSION, PDFPRODUCER, etc. - fixed issue on perl 5.6.1 where pdf_match_details() failed to call _find_pdf_mime_parts(), resulting in no detection of pdf mime parts. - quoted-printable support - requires MIME::QuotedPrint (which should be in everyones install as a part of the MIME-Base64 package which is a SA req) - added simple pdf_is_empty_body() function with counts the body bytes minus the subject line. can add optional <bytes> param if you need to allow for a few bytes. 0.5 - fix warns for undef $pdf_tags - remove { } and \ before running eval in pdf_match_details to avoid eval error 0.4 - added pdf_is_encrypted() function - added option to look for image HxW on same line 0.3 - added 2nd fuzzy md5 which uses pdf tag layout as data - renamed pdf_image_named() to pdf_named() - PDF images are encapsulated and have no names. We are matching the PDF file name. - renamed pdf_image_name_regex() to pdf_name_regex() - PDF images are encapsulated and have no names. We are matching the PDF file name. - changed pdf_image_count() a bit and added pdf_count(). - pdf_count() checks how many pdf attachments there are on the mail - pdf_image_count() checks how many images are found within all pdfs in the mail. - removed the restriction of the pdf containing an image in order to md5 it. - added pdf_match_details() function to check the following 'details' - author: Author of PDF if specified - producer: Software used to produce PDF - creator: Software used to produce PDF, usually similar to producer - title: Title of PDF - created: Creation Date - modified: Last Modified 0.2 - support PDF octet-stream 0.1 - just ported over the imageinfo code, and renamed to pdfinfo. - removed all support for png, gif, and jpg from the code. - prepended pdf_ to all function names to avoid conflicts with ImageInfo in SA 3.2. Usage: pdf_count() body RULENAME eval:pdf_count(<min>,[max]) min: required, message contains at least x pdf mime parts max: optional, if specified, must not contain more than x pdf mime parts pdf_image_count() body RULENAME eval:pdf_image_count(<min>,[max]) min: required, message contains at least x images in pdf attachments. max: optional, if specified, must not contain more than x pdf images pdf_pixel_coverage() body RULENAME eval:pdf_pixel_coverage(<min>,[max]) min: required, message contains at least this much pixel area max: optional, if specified, message must not contain more than this much pixel area pdf_named() body RULENAME eval:pdf_named(<string>) string: exact file name match, if you need partial match, see pdf_name_regex() pdf_name_regex() body RULENAME eval:pdf_name_regex(<regex>) regex: regular expression, see examples in ruleset pdf_match_md5() body RULENAME eval:pdf_match_md5(<string>) string: 32-byte md5 hex pdf_match_fuzzy_md5() body RULENAME eval:pdf_match_md5(<string>) string: 32-byte md5 hex - see ruleset for obtaining the fuzzy md5 pdf_match_details() body RULENAME eval:pdf_match_details(<detail>,<regex>); detail: author, creator, created, modified, producer, title regex: regular expression, see examples in ruleset pdf_is_encrypted() body RULENAME eval:pdf_is_encrypted() pdf_is_empty_body() body RULENAME eval:pdf_is_empty_body(<bytes>) bytes: maximum byte count to allow and still consider it empty pdf_image_to_text_ratio() body RULENAME eval:pdf_image_to_text_ratio(<min>,<max>) Ratio calculated as body_length / total_image_area min: minimum ratio max: maximum ratio pdf_image_size_exact() body RULENAME eval:pdf_image_size_exact(<h>,<w>) h: image height is exactly h w: image width is exactly w pdf_image_size_range() body RULENAME eval:pdf_image_size_range(<minh>,<minw>,[<maxh>],[<maxw>]) minh: image height is atleast minh minw: image width is atleast minw maxh: (optional) image height is no more than maxh maxw: (optional) image width is no more than maxw NOTE: See the ruleset for more examples that are not documented here.