[Dovecot] Broken mail clients? [MIME] Long attachment encoded filenames (for non-ASCII characters etc)
Hi,
I've noticed a possible minor issue with long encoded filenames for attachments where these filenames are split across multiple lines. My understanding of character encoding and MIME is not as good as it should be, so I may easily have got this all mixed up, in which case sorry for the noise...
Although I understand the preferred method for handling filenames split across multiple lines (because they're too long to fit on one line in the message) is that suggested in RFC2184/2231, so for example, filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6 filename*1=etc%2Epdf
I find that some mail clients do this instead, filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?="
In Dovecot this results in, 0 fetch 25 body
- 25 FETCH (BODY (("text" "plain" ("charset" "ISO-8859-1") NIL NIL "7bit" 239 8)("application" "pdf" ("name" "=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?=") NIL NIL "base64" 219130) "mixed"))
esp. note the unwanted space - or in fact the sequence ?= =? between the two sections of the filename. I think a possible tweak for Dovecot would be to combine the filename parts in this situation to remove the ?= =?. I'm not sure if an IMAP client should know to combine the parts in their current format. FWIW I see that Courier does the same as Dovecot in this situation.
I think the 'alternative' method of splitting filenames I'm raising breaks RFC2047 (details below), but unfortunately this method is used by some large email generators like gmail - also details below.
Key bits from RFC2047 section 5 part (3) re. only a single encoded-word ('phrase') being allowed for a MIME Content-Type / Content-Disposition:
phrase = 1*( encoded-word / word )
An 'encoded-word' MUST NOT be used in parameter of a MIME
Content-Type or Content-Disposition field, or in any structured
field body except within a 'comment' or 'phrase'.
Here are the mail clients I noted this issue with (original filenames destroyed because I've been examining my client's emails for this issue - with their permission),
(AOL) X-Mailer: Webmail 33953-STANDARD Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="=?utf-8?Q?abcde?= =?utf-8?Q?abcde=C3=A9abcde.jpg?=" Content-Type: image/jpeg; name="=?utf-8?Q?abcde?= =?utf-8?Q?abcde=C3=A9abcde.jpg?="
Gmail: Content-Type: application/pdf; name="=?ISO-8859-1?Q?with_a_=EA=CA=E6_super=2Dlong_name_that=27s_bound?= =?ISO-8859-1?Q?_to_overflow_a_line_boundary_to_test_gmail=2Epdf?=" Content-Disposition: attachment; filename="=?ISO-8859-1?Q?with_a_=EA=CA=E6_super=2Dlong_name_that=27s_bound?= =?ISO-8859-1?Q?_to_overflow_a_line_boundary_to_test_gmail=2Epdf?="
X-Mailer: YahooMailWebService/0.8.113.313619 Content-Type: application/vnd.openxmlformats- officedocument.wordprocessingml.document; name="=?utf-8?B?base64encodedstring?= =?utf-8?B?base64encodedstring?=" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="=?utf-8?B?base64encodedstring?= =?utf-8?B?base64encodedstring?="
X-Mailer: Lotus Notes Release 6.5.5 November 30, 2005: Content-type: application/pdf; name="=?ISO-8859-1?Q?abcde=E9abcde=E9abcde=E9?= =?ISO-8859-1?Q?abcde=2Cl=2Epdf?=" Content-Disposition: attachment; filename="=?ISO-8859-1?Q?abcde=E9abcde=E9_abcde=E9?= =?ISO-8859-1?Q?abcde=2Cl=2Epdf?=" Content-ID: <20__=snip> Content-transfer-encoding: base64
X-Mailer: Lotus Domino Web Server Release 6.5.5FP1 HF551 November 27, 2007: Content-type: application/pdf; name="=?windows-1252?Q?abcde_=28=E9?= =?windows-1252?Q?=29=2Epdf?=" Content-Disposition: attachment; filename="=?windows-1252?Q?abcde_=28=E9?= =?windows-1252?Q?=29=2Epdf?=" Content-transfer-encoding: base64
Timo also noted the same style of filename encoding in Apple Mail in the previous thread I started, it would be interesting to try Apple Mail with a very long filename to cause it to split across multiple lines and see how it encodes the filename then,
Looks like Apple Mail also sends:
Content-Type: application/octet-stream; name="=?iso-8859-1?Q?p=E4=E4?="
Best regards,
Andrew.
Quoting Andrew Richards ar-dovecotlist@acrconsulting.co.uk:
Hi,
I've noticed a possible minor issue with long encoded filenames for
attachments where these filenames are split across multiple lines. My understanding of character encoding and MIME is not as good as it should be, so I may easily have got this all mixed up, in which case sorry for the noise...Although I understand the preferred method for handling filenames
split across multiple lines (because they're too long to fit on one line in the
message) is that suggested in RFC2184/2231, so for example, filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6 filename*1=etc%2EpdfI find that some mail clients do this instead, filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?="
In Dovecot this results in, 0 fetch 25 body
- 25 FETCH (BODY (("text" "plain" ("charset" "ISO-8859-1") NIL NIL "7bit" 239 8)("application" "pdf" ("name" "=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?=") NIL NIL "base64" 219130) "mixed"))
esp. note the unwanted space - or in fact the sequence ?= =? between the two sections of the filename. I think a possible tweak for Dovecot would be to combine the filename parts in this situation to remove the ?= =?.
I'm not sure if an IMAP client should know to combine the parts in their current format. FWIW I see that Courier does the same as Dovecot in this situation.
Dovecot's behavior is correct. There's nothing "special" about that
name parameter - it's not RFC 2231 encoded - so the IMAP server should
output the exact header text as-is. Those two parts were separated by
space in the original header - they should be separated by space when
grabbing the fetch data.
If the *client* wants to workaround these broken messages, it can do
whatever munging is wants to translate the contents of the "name"
parameter. But that should be up to the client. An IMAP server
should not be making wild assumptions about what the original sender
wanted to do with the message vs. what it actually sent.
FYI: A workaround is to do something like this when sending a message:
Content-Dispostion: attachment; filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?="; filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6; filename*1=etc%2Epdf
When parsing, MIME parsers *generally* perform in a FIFO manner
(although see note below). So what will happen here is an IMAP server
will overwrite the initial broken filename parameter with the correct,
unencoded RFC 2231 parameter.
(Note: RFC 2045 [5] states that parameter order is not significant, so
you can't depend on this 100%. But any decent RFC 2231 MIME parser
will do sanity checking no matter the order of the parameters and
should never replace an parameter value generated via RFC 2231
encoding with a parameter value that is non-encoded).
Conversely, a broken (or at least non-RFC2231 MIME parser) that sees
the above header will instead report three different parameters -
filename, filename*0*, and filename*1. Non 2231 agents will most
likely try to do RFC 2046 unencoding on the 'filename' parameter,
which will succeed. 2231 agents will recognize that 2231 data exists
and will do the necessary concatenation/unencoding itself on the
'filename*0*' and 'filename*1' parameters, and will completely ignore
the 'filename' parameter.
michael
(Subject line altered - original was confused)
On Friday 30 September 2011 00:07:08 Michael M Slusarz wrote:
Quoting Andrew Richards ar-dovecotlist@acrconsulting.co.uk:
Hi,
I've noticed a possible minor issue with long encoded filenames for attachments where these filenames are split across multiple lines. My understanding of character encoding and MIME is not as good as it should be, so I may easily have got this all mixed up, in which case sorry for the noise...
Although I understand the preferred method for handling filenames split across multiple lines (because they're too long to fit on one line in the message) is that suggested in RFC2184/2231, so for example, filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6 filename*1=etc%2Epdf
I find that some mail clients do this instead, filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?="
In Dovecot this results in, 0 fetch 25 body
- 25 FETCH (BODY (("text" "plain" ("charset" "ISO-8859-1") NIL NIL "7bit" 239 8)("application" "pdf" ("name" "=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?=") NIL NIL "base64" 219130) "mixed"))
esp. note the unwanted space - or in fact the sequence ?= =? between the two sections of the filename. I think a possible tweak for Dovecot would be to combine the filename parts in this situation to remove the ?= =?.
Correcting myself: ...remove the ?= =?ISO-8859-1?Q? (not just ?= =?) to generate the string in this example, "=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6etc=2Epdf?="
I'm not sure if an IMAP client should know to combine the parts in their current format. FWIW I see that Courier does the same as Dovecot in this situation.
Dovecot's behavior is correct. There's nothing "special" about that name parameter - it's not RFC 2231 encoded - so the IMAP server should output the exact header text as-is. Those two parts were separated by space in the original header - they should be separated by space when grabbing the fetch data.
I can accept that Dovecot's behaviour is technically correct, but my point is that (if I've understood correctly) with some large mailers like Gmail acting in a non-RFC2231 manner, is it worth adapting Dovecot to play nicely with them. Possibly I'm conflating 2 separate issues: Munging together non-RFC2231 attachment filename parts, large mailers not using RFC2231 to handle long non- ASCII filenames.
If the *client* wants to workaround these broken messages, it can do whatever munging is wants to translate the contents of the "name" parameter. But that should be up to the client. An IMAP server should not be making wild assumptions about what the original sender wanted to do with the message vs. what it actually sent.
FYI: A workaround is to do something like this when sending a message:
Content-Dispostion: attachment; filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?= =?ISO-8859-1?Q?etc=2Epdf?="; filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6; filename*1=etc%2Epdf
Sure: I accept that that's the preferred way to handle long filenames that need to be encoded - but I'm noting that there are badly-behaved large mailers that don't do so, so I wonder if it's worth Dovecot mitigating the effects.
Best regards,
Andrew.
participants (2)
-
Andrew Richards
-
Michael M Slusarz