NAME
Mail::SpamAssassin::Pyzor::Digest::Pieces
DESCRIPTION
This module houses backend logic for Mail::SpamAssassin::Pyzor::Digest.
It reimplements logic found in pyzor’s digest.py module (https://github.com/SpamExperts/pyzor/blob/master/pyzor/digest.py).
FUNCTIONS
$strings_ar = digest_payloads( $EMAIL_MIME )
This imitates the corresponding object method in digest.py. It returns a reference to an array of strings. Each string can be either a byte string or a character string (e.g., UTF-8 decoded).
NB: RFC 2822 stipulates that message bodies should use CRLF line breaks, not plain LF (nor plain CR). We will thus convert any plain CRs in a quoted-printable message body into CRLF. Python, though, doesn’t do this, so the output of our implementation of digest_payloads()
diverges from that of the Python original. It doesn’t ultimately make a difference since the line-ending whitespace gets trimmed regardless, but it’s necessary to factor in when comparing the output of our implementation with the Python output.
normalize( $STRING )
This imitates the corresponding object method in digest.py. It modifies $STRING
in-place.
As with the original implementation, if $STRING
contains (decoded) Unicode characters, those characters will be parsed accordingly. So:
$str = "123\xc2\xa0"; # [ c2 a0 ] == \u00a0, non-breaking space
normalize($str);
The above will leave $str
alone, but this:
utf8::decode($str);
normalize($str);
… will trim off the last two bytes from $str
.
$yn = should_handle_line( $STRING )
This imitates the corresponding object method in digest.py. It returns a boolean.
$sr = assemble_lines( \@LINES )
This assembles a string buffer out of @LINES. The string is the buffer of octets that will be hashed to produce the message digest.
Each member of @LINES is expected to be an octet string, not a character string.
($main, $sub, $encoding, $checkval) = parse_content_type( $CONTENT_TYPE )
@lines = splitlines( $TEXT )
Imitates str.splitlines()
. (cf. pydoc str
)
Returns a plain list in list context. Returns the number of items to be returned in scalar context.