NAME

News::GnusFilter - package for scoring usenet posts

Version: 0.55 ($Revision: 1.6 $)

SYNOPSIS

# ~/.gnusfilter - scoring script

require 5.006;
use strict;
use News::GnusFilter qw/:tests groan references NSLOOKUP VERBOSE/;

NSLOOKUP = ""; # disables nslookups for bogus_address test
VERBOSE  = 1;  # noisier output for debugging

my $goof = News::GnusFilter->set_score( {
                                rethreaded     => 80,
                                no_context     => 60,
                            } );

# standard tests - see MESSAGE TESTS for details

missing_headers;
bogus_address;
annoying_subject;
cross_post;
mimes;
lines_too_long;
control_characters;
miswrapped;
misattribution;
jeopardy_quoted;
check_quotes; # runs multiple tests on quoted paragraphs
bad_signature;

# custom tests - see WRITING HEADERS and SCORING

if (check_quotes and not references) {
    $goof->{rethreaded} = groan "Callously rethreaded";
}

if (references and not check_quotes) {
    $goof->{no_context} = groan "Missing context";
}

__END__

Your GnusFilter script should be installed as a mime-decoder hook for gnus.

DESCRIPTION

News::GnusFilter is a pure-Perl package for scripting an inline message filter. It adds "Gnus-Warning:" headers when presented with evidence of atypical content or otherwise nonstandard formatting for usenet messages.

News::GnusFilter should be drop-in compatible with other newsreaders that are capable of filtering a usenet posting through an external application prior to display. See the CONFIGURATION section below for descriptions of tunable parameters, and the MESSAGE TESTS section for descriptions of the exported subroutines.

The strange yet powerful correlation between usenet cluelessness and bunk-peddling is best summarised in the following quote:

"Opinions may of course differ on this topic, but wouldn't it be better to persuade the hon. Usenaut, as a first priority, to post accurate information, before persuading them to abandon this remarkably accurate indicator of usenet bogosity?"

-- Alan Flavell in comp.lang.perl.misc

CONFIGURATION

Lisp for .gnus File

 (add-hook 'gnus-article-decode-hook '(lambda ()
    (gnus-article-decode-charset)
      (let ((coding-system-for-read last-coding-system-used)
	   (coding-system-for-write last-coding-system-used))
       (call-process-region (point-min) (point-max)
	   "/path/to/gnusfilter" t (current-buffer))
 )))

The recommended installation path for your script is ~/.gnusfilter.

General Parameters and Exported Symbols

These are the export lists for News::GnusFilter. See the Export manpage for more details.

    my %parameters =
       (
	 HEADER         => "Gnus-Warning", # header added
	 NSLOOKUP       => "nslookup",     # '' avoids DNS lookups
	 PASSTHRU_BYTES => 8192,           # filter disabled
	 LINE_LEN       => 80,             # columns
	 EGO            => 10,             # self-ref's in new text
	 TOLERANCE      => 50,             # % quoted text
 	 MAX_CONTROL    => 5,              # control chars
	 MIN_LINES      => 20,             # short posts are OK
	 SIG_LINES      => 4,              # acceptable sig lines
	 NEWSGROUPS     => 2,              # spam cutoff
	 FBI            => 100,            # tolerable bogosity level

	 VERBOSE        => 0,              # toggles debugging output
       );

    @EXPORT_OK = keys %parameters;
    %EXPORT_TAGS = (
		    params => \@EXPORT_OK,
		     tests => [
			        qw/
		                   missing_headers   bogus_address
                                   annoying_subject  cross_post
                                   lines_too_long    control_characters
		                   miswrapped        check_quotes
                                   jeopardy_quoted   misattribution
                                   bad_signature     mimes
				  /
                              ],
		   );
    @EXPORT = (
	        @{$EXPORT_TAGS{tests}},
		qw/
		   groan groanf
                   lines references newsgroups head body paragraphs sig
	          /
	      );

Import Options

By default, GnusFilter exports all the standard :tests. It also provides access to the message itself via the head(), body(), lines(), paragraphs(), and sig() functions. See WRITING HEADERS and SCORING for details on groan() and groanf().

If you need to tune some of the parameters, they are not exported by default, so you can import them either by name or all at once with the :params tag:

use News::GnusFilter qw/ :tests :params /;
FBI = 200;    # raise tolerable bogosity level to 200
VERBOSE = 1;  # enable debugging output
HEADER = "X-Filter";
...

The parameters are exported as lvalued subs, and is the only place where this module uses special features of perl 5.6+.

WRITING HEADERS and SCORING

groan, groanf

groan() and groanf() are the analogs of print and printf, and are exported by default. The value of the warning header may be changed globally via HEADER:

HEADER="X-Format-Warning"; # overrides default "Gnus-Warning"
groan "mycheck failed" unless mycheck(body);

Default Score Settings

These settings are modifiable through the set_score sub. See the description in Scoring API below for details.

# scoring parameters

 my %goof;                   # counts occurrence of each error type
 my %weight =                # error type => default score

(	                      # typical range of %goof value:
 totalquote       => 100,    #
 jeopardy_quoted  =>  80,    # boolean (0-1)
 misattribution   =>  60,    #
 lines_too_long   =>  50,    #

 missing_headers  =>  50,    # 0-2
 mime_crap        =>  40,    # 0-3?     :
 annoying_subject =>  40,    # ~0-4
 cross_post       =>  30,    # 0,~2-4
 bogus_address    =>  30,    # 0-3      : 822, dns
 miswrapped       =>  30,    # ~0-5     : lines (up to 5)
 control_chars    =>  20,    # 0-5      : up to 5 chars
 ego              =>   5,    # 0,~10-20 : I me my count
 overquoted       =>   2,    # 0-50     : percentage over TOLERANCE
 bad_signature    =>   2,    # 0,5-20   : lines

 code             =>  -5,    # 0,~10-30

);

# set_score - scripter's interface to %goof and %weight

    sub set_score {
	my $href = pop @_;

	# override weight table
	@weight{ keys %$href } = values %$href if ref $href;

	return bless \%goof;
    }

# score - returns Flavell Bogosity Index

    sub score {
	my $score = 0;
	$score += $goof{$_} * $weight{$_}
	    for grep {exists $weight{$_}} keys %goof;
	return $score;
    }

Scoring API - set_score, score

set_score() provides access to the %goof and %weight hashes, which form the basis of the Flavell Bogosity Index calculator score(). The SYNOPSIS contains a sample usage.

score() calculates the current bogosity index based on the rules applied so far. Neither set_score nor score are importable, so script writers should use OO-like syntax or their package-qualified names.

Note: GnusFilter is not an OO package- although set_score() returns a blessed reference to %goof, the final automatic score() calculation is not OO. However, if necessary it can be disabled by setting FBI = 0 in your script.

use News::GnusFilter qw/:tests FBI/;
FBI = 0;

MESSAGE TESTS

These are the exported functions that form the basis of a GnusFilter script. These functions are memoized to avoid repeat warnings and overscoring.

misattribution

Checks for proper attribution in quoted text.

cross_post

Warns of newsgroup spamming (level determined by NEWSGROUPS). On an original post, it returns total number of posted groups, on followups it just returns 1.

bogus_address

Validates the Reply-To: (or From:, if not present) header using rfc822 and a dns lookup on the domain. Setting NSLOOKUP to a false value will disable the dns lookup- otherwise NSLOOKUP should point to the location of your nslookup(8) binary.

control_characters

Look for control characters in the message body. returns their number (up to MAX_CONTROL).

lines_too_long

Check for oversized lines as set by LINE_LEN. The return value is boolean.

missing_headers

Verifies existence of Subject: and References: header as necessary.

miswrapped

Tests for miswrapped lines in quoted and regular text. Returns number of occurrences, which may be excessive for things like posted logfiles.

jeopardy_quoted

Tests for upside-down posting style (newsgroup replies should follow quoted text, not vice-versa). return value is boolean.

check_quotes

Overtaxed sub that checks for overquoted messages. Also looks for over-opinionated text (too many I's) and lots of code (oft considered a good thing :). In scalar context, it returns the total number of quoted lines. Resulting warnings are subject to VERBOSE, MIN_LINES, EGO, and TOLERANCE settings.

bad_signature

Checks for standard signature block. If the lines exceed SIG_LINES, it returns the number of lines in signature (up to 20). Otherwise returns 0.

+10 is added to the return value for nonstandard sig sep's.

attribution

Looks for the attribution text preceding the quoted text and returns it.

annoying_subject

Complains if the subject contains useless words in it. Returns the number of faux pas if this is an original post, otherwise returns a false value for followups.

    my @patterns =  (
		     qr/ ( [?!]{3,} ) /x,
		     qr/ ( HELP     ) /x,
		     qr/ ( PLEASE   ) /x,
		     qr/ (NEWB[IE]{2})/xi,
		     qr/ ( GURU     ) /xi,
		    );
mimes

Warns if the message is MIME-encoded.

BUGS

  • Terribly slow on large messages.

  • Etiquette rules may need adjusting for normal e-mail.

  • Does not (currently) look for quoted sigs

  • manually wrapped logfiles are heavily penalized

  • some context sensitive stuff (original, request, newsgroup, mail) is wrong

NOTES

Return values, default settings, and especially regexps are subject to change. Please send bug reports and patches to the author.

AUTHOR

Joe Schaefer <joe+cpan@sunstarsys.com>. This package borrows heavily from Tom Christiansen's msgchk script.

COPYRIGHT

Copyright 2001 Joe Schaefer. This code is free software; it is freely modifiable and redistributable under the same terms as Perl itself.