[Dovecot] Dovecot failing to parse some UTF-8 encoded attachment filenames, returning empty string instead
Hi,
I'm seeing a strange problem with some attachment filenames that are UTF-8 encoded. The problem seems to be related to spaces and/or unusual characters in filenames, like accented characters (or perhaps just to filenames if UTF-8 encoded; I've not explored that fully). These filenames are shown as empty strings in IMAP using Dovecot. I've attached a sample message that exhibits this problem, trimmed down to fairly bare essentials. By comparison I find that (for example) Courier happily returns the filename (still encoded). Although I suspect the problem lies within Dovecot, it may be an underlying Unicode or other component that's at the root of the problem.
I can replicate this by putting the attached message in a mailbox (I'm using Maildir format mailboxes, so I just drop the raw file in Maildir/new and change the ownership of the file to match the mailbox owner). Then a pretend IMAP session to show the problem,
$ telnet localhost 143 Trying ::1... Connected to localhost. Escape character is '^]'.
- OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE AUTH=PLAIN] Dovecot ready. 0 login some.one@test.domain password 0 OK [CAPABILITY IMAP4rev1 LITERAL+ SASL-IR LOGIN-REFERRALS ID ENABLE IDLE SORT SORT=DISPLAY THREAD=REFERENCES THREAD=REFS MULTIAPPEND UNSELECT CHILDREN NAMESPACE UIDPLUS LIST-EXTENDED I18NLEVEL=1 CONDSTORE QRESYNC ESEARCH ESORT SEARCHRES WITHIN CONTEXT=SEARCH LIST-STATUS] Logged in 0 select inbox
- FLAGS (\Answered \Flagged \Deleted \Seen \Draft)
- OK [PERMANENTFLAGS (\Answered \Flagged \Deleted \Seen \Draft \*)] Flags permitted.
- 4 EXISTS
- 0 RECENT
- OK [UNSEEN 1] First unseen.
- OK [UIDVALIDITY 1316621730] UIDs valid
- OK [UIDNEXT 8] Predicted next UID
- OK [HIGHESTMODSEQ 1] Highest 0 OK [READ-WRITE] Select completed. 0 fetch 4 body
- 4 FETCH (BODY (("text" "html" ("charset" "iso-8859-15") NIL NIL "base64" 278 5)("application" "octet-stream" ("name" "") NIL NIL "base64" 18) "mixed")) 0 OK Fetch completed. 0 logout
- BYE Logging out 0 OK Logout completed. Connection closed by foreign host. $
especially note the ("name" "") part showing a supposedly empty filename.
I've observed this behaviour on the following versions of Dovecot,
- 1.2.9 on Ubuntu 10.04LTS (pre-compiled version)
- 1.2.17 on Fedora 13 (pre-compiled version)
- 2.0.15 on Fedora 13 (from source)
I don't think the Dovecot configuration is relevant, but I've put it below for good measure for the 2.0.15 setup.
Any ideas on what might be causing this?
Best regards,
Andrew.
# dovecot -n # 2.0.15: /usr/local/etc/dovecot/dovecot.conf # OS: Linux 2.6.34.9-69.fc13.i686.PAE i686 Fedora release 13 (Goddard) auth_debug = yes default_login_user = nobody log_path = /var/log/dovecot.log passdb { args = /usr/local/bin/checkcdb driver = checkpassword } protocols = imap pop3 service auth { user = root } service imap-login { inet_listener imap { ssl = no } } service pop3-login { inet_listener pop3 { ssl = no } } ssl = no userdb { driver = prefetch }
On 22.9.2011, at 1.59, Andrew Richards wrote:
I'm seeing a strange problem with some attachment filenames that are UTF-8 encoded. The problem seems to be related to spaces and/or unusual characters in filenames, like accented characters (or perhaps just to filenames if UTF-8 encoded; I've not explored that fully).
The problem is that the client sends it wrong:
Content-Type: application/octet-stream; name==?UTF-8?B?dGhpc19mYWlscy50eHQ=?= Content-Disposition: attachment; filename==?UTF-8?B?dGhpc19mYWlscy50eHQ=?=
These are both wrong. First of all they are illegal because they have = and ? characters, from RFC 2045:
parameter := attribute "=" value value := token / quoted-string token := 1*<any (US-ASCII) CHAR except SPACE, CTLs, or tspecials> tspecials := "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "\" / <"> "/" / "[" / "]" / "?" / "=" ; Must be in quoted-string, ; to use within parameter values
Also from RFC 2047 (encoded-word is the =?UTF-8?...?= thing):
- An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'.
The proper way to do this would be to use RFC 2184, which looks something like this:
Content-Disposition: attachment; filename*=iso-8859-1''p%E4%E4
Looks like Apple Mail also sends:
Content-Type: application/octet-stream; name="=?iso-8859-1?Q?p=E4=E4?="
That is inside a quoted-string, so it's not broken, but clients aren't really supposed to decode that string in there either.
Anyway .. I'll check tomorrow if I can easily add code to workaround your problem. If it's just a minor change I'll do it.
On Thu, 2011-09-22 at 02:45 +0300, Timo Sirainen wrote:
Anyway .. I'll check tomorrow if I can easily add code to workaround your problem. If it's just a minor change I'll do it.
On Thursday 22 September 2011 00:45:32 Timo Sirainen wrote:
On 22.9.2011, at 1.59, Andrew Richards wrote:
I'm seeing a strange problem with some attachment filenames that are UTF-8 encoded. The problem seems to be related to spaces and/or unusual characters in filenames, like accented characters (or perhaps just to filenames if UTF-8 encoded; I've not explored that fully).
The problem is that the client sends it wrong:
Content-Type: application/octet-stream; name==?UTF-8?B?dGhpc19mYWlscy50eHQ=?= Content-Disposition: attachment; filename==?UTF-8?B?dGhpc19mYWlscy50eHQ=?=
- An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'. ...snip... Anyway .. I'll check tomorrow if I can easily add code to workaround your
These are both wrong. First of all they are illegal because they have = and ? characters, from RFC 2045: ...snip... Also from RFC 2047 (encoded-word is the =?UTF-8?...?= thing): problem. If it's just a minor change I'll do it.
Wow - a very thorough response only 45 minutes after I'd posted the question, and in your follow up message you've already provided a suggested fix - a huge thank you!
So in summary it's a "Garbage in -> Garbage out" issue... This also explains why I'm only getting this issue with one client after a Courier->Dovecot migration. I'll research which mail client program(s) are generating the faulty messages for completeness for this thread.
I expect to test the fix later today or tomorrow, I'll update the thread accordingly when I've done so.
Best regards,
Andrew.
On Thursday 22 September 2011 12:31:40 Andrew Richards wrote:
On Thursday 22 September 2011 00:45:32 Timo Sirainen wrote:
On 22.9.2011, at 1.59, Andrew Richards wrote:
I'm seeing a strange problem with some attachment filenames that are UTF-8 encoded. The problem seems to be related to spaces and/or unusual characters in filenames, like accented characters (or perhaps just to filenames if UTF-8 encoded; I've not explored that fully).
The problem is that the client sends it wrong:
Content-Type: application/octet-stream; name==?UTF-8?B?dGhpc19mYWlscy50eHQ=?= Content-Disposition: attachment; filename==?UTF-8?B?dGhpc19mYWlscy50eHQ=?=
These are both wrong. First of all they are illegal because they have = and
? characters, from RFC 2045:
...snip...
Also from RFC 2047 (encoded-word is the =?UTF-8?...?= thing):
- An 'encoded-word' MUST NOT be used in parameter of a MIME Content-Type or Content-Disposition field, or in any structured field body except within a 'comment' or 'phrase'.
...snip... Anyway .. I'll check tomorrow if I can easily add code to workaround your problem. If it's just a minor change I'll do it.
Wow - a very thorough response only 45 minutes after I'd posted the question, and in your follow up message you've already provided a suggested fix - a huge thank you!
So in summary it's a "Garbage in -> Garbage out" issue... This also explains why I'm only getting this issue with one client after a Courier->Dovecot migration. I'll research which mail client program(s) are generating the faulty messages for completeness for this thread.
I expect to test the fix later today or tomorrow, I'll update the thread accordingly when I've done so.
Firstly, my apologies - it's been a week before I've got back to this - I was making sense of another MIME issue which I thought might be related (it isn't), but I'll start a new thread for that.
The fix works just fine. However...
...regarding the broken MIME fields: These look to occur in the form I noted for a single client program only, which on further investigation turns out to be an in-house[-written] mail program of my client, and therefore this problem is unlikely to bite other people: I've had permission to search the client's mailboxes for similar non-conforming emails and they only occur for this one in-house mail program.
Summary: False alarm. However once again a huge thank-you to Timo for the patch to workaround this broken data.
Best regards,
Andrew.
participants (2)
-
Andrew Richards
-
Timo Sirainen