[Dovecot] Long attachment encoded filenames (for non-ASCII characters etc) in MIME headers & corresponding Dovecot behaviour

Fri Sep 30 02:30:55 EEST 2011

(Subject line altered - original was confused)

On Friday 30 September 2011 00:07:08 Michael M Slusarz wrote:
> Quoting Andrew Richards <ar-dovecotlist at acrconsulting.co.uk>:
> > Hi,
> >
> > I've noticed a possible minor issue with long encoded filenames for
> > attachments
> > where these filenames are split across multiple lines. My understanding
> > of character encoding and MIME is not as good as it should be, so I may
> > easily have got this all mixed up, in which case sorry for the noise...
> >
> > Although I understand the preferred method for handling filenames
> > split across multiple lines (because they're too long to fit on one line
> > in the message) is that suggested in RFC2184/2231, so for example,
> >             filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6
> >             filename*1=etc%2Epdf
> >
> > I find that some mail clients do this instead,
> >         filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
> >         =?ISO-8859-1?Q?etc=2Epdf?="
> >
> > In Dovecot this results in,
> > 0 fetch 25 body
> > * 25 FETCH (BODY (("text" "plain" ("charset" "ISO-8859-1") NIL NIL "7bit"
> > 239 8)("application" "pdf" ("name"
> > "=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
> > =?ISO-8859-1?Q?etc=2Epdf?=") NIL NIL "base64" 219130) "mixed"))
> >
> > esp. note the unwanted space - or in fact the sequence ?= =? between the
> > two sections of the filename. I think a possible tweak for Dovecot would
> > be to combine the filename parts in this situation to remove the ?= =?.

Correcting myself: ...remove the ?= =?ISO-8859-1?Q? (not just ?= =?) to 
generate the string in this example,
"=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6etc=2Epdf?="

> > I'm not sure
> > if an IMAP client should know to combine the parts in their current
> > format. FWIW I see that Courier does the same as Dovecot in this
> > situation.
> 
> Dovecot's behavior is correct.  There's nothing "special" about that
> name parameter - it's not RFC 2231 encoded - so the IMAP server should
> output the exact header text as-is.  Those two parts were separated by
> space in the original header - they should be separated by space when
> grabbing the fetch data.

I can accept that Dovecot's behaviour is technically correct, but my point is 
that (if I've understood correctly) with some large mailers like Gmail acting 
in a non-RFC2231 manner, is it worth adapting Dovecot to play nicely with 
them. Possibly I'm conflating 2 separate issues: Munging together non-RFC2231 
attachment filename parts, large mailers not using RFC2231 to handle long non-
ASCII filenames.

> If the *client* wants to workaround these broken messages, it can do
> whatever munging is wants to translate the contents of the "name"
> parameter.  But that should be up to the client.  An IMAP server
> should not be making wild assumptions about what the original sender
> wanted to do with the message vs. what it actually sent.
> 
> FYI: A workaround is to do something like this when sending a message:
>
> Content-Dispostion: attachment;
>   filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
>   =?ISO-8859-1?Q?etc=2Epdf?=";
>   filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6;
>   filename*1=etc%2Epdf

Sure: I accept that that's the preferred way to handle long filenames that need 
to be encoded - but I'm noting that there are badly-behaved large mailers that 
don't do so, so I wonder if it's worth Dovecot mitigating the effects.

Best regards,

Andrew.