Re: [Dovecot] Long attachment encoded filenames (for non-ASCII characters etc) in MIME headers & corresponding Dovecot behaviour

30 Sep 2011 · *0*

      (Subject line altered - original was confused)
On Friday 30 September 2011 00:07:08 Michael M Slusarz wrote:
...
Quoting Andrew Richards <ar-dovecotlist@acrconsulting.co.uk>:
...
Hi,
I've noticed a possible minor issue with long encoded filenames for
attachments
where these filenames are split across multiple lines. My understanding
of character encoding and MIME is not as good as it should be, so I may
easily have got this all mixed up, in which case sorry for the noise...
Although I understand the preferred method for handling filenames
split across multiple lines (because they're too long to fit on one line
in the message) is that suggested in RFC2184/2231, so for example,
filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6
filename*1=etc%2Epdf
I find that some mail clients do this instead,
filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
=?ISO-8859-1?Q?etc=2Epdf?="
In Dovecot this results in,
0 fetch 25 body

25 FETCH (BODY (("text" "plain" ("charset" "ISO-8859-1") NIL NIL "7bit"
239 8)("application" "pdf" ("name"
"=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
=?ISO-8859-1?Q?etc=2Epdf?=") NIL NIL "base64" 219130) "mixed"))

esp. note the unwanted space - or in fact the sequence ?= =? between the
two sections of the filename. I think a possible tweak for Dovecot would
be to combine the filename parts in this situation to remove the ?= =?.
Correcting myself: ...remove the ?= =?ISO-8859-1?Q? (not just ?= =?) to
generate the string in this example,
"=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6etc=2Epdf?="
...
...
I'm not sure
if an IMAP client should know to combine the parts in their current
format. FWIW I see that Courier does the same as Dovecot in this
situation.
Dovecot's behavior is correct.  There's nothing "special" about that
name parameter - it's not RFC 2231 encoded - so the IMAP server should
output the exact header text as-is.  Those two parts were separated by
space in the original header - they should be separated by space when
grabbing the fetch data.
I can accept that Dovecot's behaviour is technically correct, but my point is
that (if I've understood correctly) with some large mailers like Gmail acting
in a non-RFC2231 manner, is it worth adapting Dovecot to play nicely with
them. Possibly I'm conflating 2 separate issues: Munging together non-RFC2231
attachment filename parts, large mailers not using RFC2231 to handle long non-
ASCII filenames.
...
If the *client* wants to workaround these broken messages, it can do
whatever munging is wants to translate the contents of the "name"
parameter.  But that should be up to the client.  An IMAP server
should not be making wild assumptions about what the original sender
wanted to do with the message vs. what it actually sent.
FYI: A workaround is to do something like this when sending a message:
Content-Dispostion: attachment;
filename="=?ISO-8859-1?Q?accented_characters_here_=EA=CA=E6?=
=?ISO-8859-1?Q?etc=2Epdf?=";
filename*0*=iso-8859-1''accented_characters_here_%EA%CA%E6;
filename*1=etc%2Epdf
Sure: I accept that that's the preferred way to handle long filenames that need
to be encoded - but I'm noting that there are badly-behaved large mailers that
don't do so, so I wonder if it's worth Dovecot mitigating the effects.
Best regards,
Andrew.