[Dovecot] Broken mail clients? [MIME] Long attachment encoded filenames (for non-ASCII characters etc)

Fri Sep 30 02:07:08 EEST 2011

Quoting Andrew Richards <ar-dovecotlist at acrconsulting.co.uk>:

> Hi,
>
> I've noticed a possible minor issue with long encoded filenames for  
> attachments
> where these filenames are split across multiple lines. My understanding of
> character encoding and MIME is not as good as it should be, so I may easily
> have got this all mixed up, in which case sorry for the noise...
>
> Although I understand the preferred method for handling filenames  
> split across
> multiple lines (because they're too long to fit on one line in the  
> message) is
> that suggested in RFC2184/2231, so for example,
>             filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6
>             filename*1=etc%2Epdf
>
> I find that some mail clients do this instead,
>         filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
>         =?ISO-8859-1?Q?etc=2Epdf?="
>
> In Dovecot this results in,
> 0 fetch 25 body
> * 25 FETCH (BODY (("text" "plain" ("charset" "ISO-8859-1") NIL NIL "7bit" 239
> 8)("application" "pdf" ("name"
> "=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
> =?ISO-8859-1?Q?etc=2Epdf?=") NIL NIL "base64" 219130) "mixed"))
>
> esp. note the unwanted space - or in fact the sequence ?= =? between the two
> sections of the filename. I think a possible tweak for Dovecot would be to
> combine the filename parts in this situation to remove the ?= =?.  
> I'm not sure
> if an IMAP client should know to combine the parts in their current format.
> FWIW I see that Courier does the same as Dovecot in this situation.

Dovecot's behavior is correct.  There's nothing "special" about that  
name parameter - it's not RFC 2231 encoded - so the IMAP server should  
output the exact header text as-is.  Those two parts were separated by  
space in the original header - they should be separated by space when  
grabbing the fetch data.

If the *client* wants to workaround these broken messages, it can do  
whatever munging is wants to translate the contents of the "name"  
parameter.  But that should be up to the client.  An IMAP server  
should not be making wild assumptions about what the original sender  
wanted to do with the message vs. what it actually sent.

FYI: A workaround is to do something like this when sending a message:

Content-Dispostion: attachment;
  filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
  =?ISO-8859-1?Q?etc=2Epdf?=";
  filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6;
  filename*1=etc%2Epdf

When parsing, MIME parsers *generally* perform in a FIFO manner  
(although see note below).  So what will happen here is an IMAP server  
will overwrite the initial broken filename parameter with the correct,  
unencoded RFC 2231 parameter.

(Note: RFC 2045 [5] states that parameter order is not significant, so  
you can't depend on this 100%.  But any decent RFC 2231 MIME parser  
will do sanity checking no matter the order of the parameters and  
should never replace an parameter value generated via RFC 2231  
encoding with a parameter value that is non-encoded).

Conversely, a broken (or at least non-RFC2231 MIME parser) that sees  
the above header will instead report three different parameters -  
filename, filename*0*, and filename*1.  Non 2231 agents will most  
likely try to do RFC 2046 unencoding on the 'filename' parameter,  
which will succeed.  2231 agents will recognize that 2231 data exists  
and will do the necessary concatenation/unencoding itself on the  
'filename*0*' and 'filename*1' parameters, and will completely ignore  
the 'filename' parameter.

michael