NAME
MIME::Tools::traps - pitfalls and gotchas for users of MIME-tools
SYNOPSIS
This is part of the MIME-tools documentation. See MIME::Tools for the full table of contents.
DESCRIPTION
Things in MIME-tools to beware of...
Fuzzing of CRLF and newline on input
RFC-1521 dictates that MIME streams have lines terminated by CRLF ("\r\n"
). However, it is extremely likely that folks will want to parse MIME streams where each line ends in the local newline character "\n"
instead.
An attempt has been made to allow the parser to handle both CRLF and newline-terminated input.
Fuzzing of CRLF and newline when decoding
The "7bit"
and "8bit"
decoders will decode both a "\n"
and a "\r\n"
end-of-line sequence into a "\n"
.
The "binary"
decoder (default if no encoding specified) still outputs stuff verbatim... so a MIME message with CRLFs and no explicit encoding will be output as a text file that, on many systems, will have an annoying ^M at the end of each line... but this is as it should be.
Fuzzing of CRLF and newline when encoding/composing
All encoders currently output the end-of-line sequence as a "\n"
, with the assumption that the local mail agent will perform the conversion from newline to CRLF when sending the mail. However, there probably should be an option to output CRLF as per RFC-1521.
Inability to handle multipart boundaries with embedded newlines
Let's get something straight: this is an evil, EVIL practice. If your mailer creates multipart boundary strings that contain newlines, give it two weeks notice and find another one. If your mail robot receives MIME mail like this, regard it as syntactically incorrect, which it is.
Ignoring non-header headers
People like to hand the parser raw messages straight from POP3 or from a mailbox. There is often predictable non-header information in front of the real headers; e.g., the initial "From" line in the following message:
From - Wed Mar 22 02:13:18 2000
Return-Path: <eryq@zeegee.com>
Subject: Hello
The parser simply ignores such stuff quietly. Perhaps it shouldn't, but most people seem to want that behavior.
Fuzzing of empty multipart preambles
Please note that there is currently an ambiguity in the way preambles are parsed in. The following message fragments both are regarded as having an empty preamble (where \n
indicates a newline character):
Content-type: multipart/mixed; boundary="xyz"\n
Subject: This message (#1) has an empty preamble\n
\n
--xyz\n
...
Content-type: multipart/mixed; boundary="xyz"\n
Subject: This message (#2) also has an empty preamble\n
\n
\n
--xyz\n
...
In both cases, the first completely-empty line (after the "Subject") marks the end of the header.
But we should clearly ignore the second empty line in message #2, since it fills the role of "the newline which is only there to make sure that the boundary is at the beginning of a line". Such newlines are never part of the content preceding the boundary; thus, there is no preamble "content" in message #2.
However, it seems clear that message #1 also has no preamble "content", and is in fact merely a compact representation of an empty preamble.
Use of a temp file during parsing
Why not do everything in core? Although the amount of core available on even a modest home system continues to grow, the size of attachments continues to grow with it. I wanted to make sure that even users with small systems could deal with decoding multi-megabyte sounds and movie files. That means not being core-bound.
As of the released 5.3xx, MIME::Parser gets by with only one temp file open per parser. This temp file provides a sort of infinite scratch space for dealing with the current message part. It's fast and lightweight, but you should know about it anyway.
Why do I assume that MIME objects are email objects?
Achim Bohnet once pointed out that MIME headers do nothing more than store a collection of attributes, and thus could be represented as objects which don't inherit from Mail::Header.
I agree in principle, but RFC-1521 says otherwise. RFC-1521 [MIME] headers are a syntactic subset of RFC-822 [email] headers. Perhaps a better name for these modules would have been RFC1521:: instead of MIME::, but we're a little beyond that stage now.
When I originally wrote these modules for the CPAN, I agonized for a long time about whether or not they really should subclass from Mail::Internet (then at version 1.17). Thanks to Graham Barr, who graciously evolved MailTools 1.06 to be more MIME-friendly, unification was achieved at MIME-tools release 2.0. The benefits in reuse alone have been substantial.
You can't print exactly what you parsed!
Parsing is a (slightly) lossy operation. Because of things like ambiguities in base64-encoding, the following is not going to spit out its input unchanged in all cases:
$entity = $parser->parse(\*STDIN);
$entity->print(\*STDOUT);
If you're using MIME::Tools to process email, remember to save the data you parse if you want to send it on unchanged. This is vital for things like PGP-signed email.
(Sing it with me, kids: you can't / always print / what you paaaarsed...)
SEE ALSO
See "SYNOPSIS" in MIME::Tools for the full table of contents.