[Dovecot] Possible header parsing problem
Hi,
I ran into a problem wherein my mail client (RoundCube) would not display a message from a Dovecot IMAP server (claiming that the message had no content). The raw source of the message looked fine, but the body structure returned by Dovecot only had the first text/plain part and not the alternative text/html part. The message looks like:
... headers removed ...
X-Mailer: Lotus Notes Release 6.5.1 January 21, 2004
Message-ID: <...>
From: user@host.domain
Date: Mon, 20 Oct 2008 14:15:55 -0600
Content-Type: multipart/alternative; boundary="=_alternative
006F3A73872574E8_="
This is a multipart message in MIME format.
--=_alternative 006F3A73872574E8_=
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=us-ascii
blah blah blah
--=_alternative 006F3A73872574E8_=
Content-Transfer-Encoding: 7bit
Content-Type: text/html;
charset=us-ascii
<br><font size=2 face="sans-serif">blah blah blah in HTML</font>
--=_alternative 006F3A73872574E8_=--
I did a little bit of tracing through the parsing code (message-header-parser.c:message_parse_header_next()) and it appeared that the boundary in the Content-Type header was not parsed correctly, evidently because the header line was folded in the middle of the boundary string. RFC 822 appears to allow folding in a quoted string like this (§3.3 "quoted-string"), so I'm curious whether the parsing is working correctly.
Thanks for your help!
Here is my Dovecot information: version: 1.1.4 "dovecot -n" output: # 1.1.4: /usr/local/etc/dovecot.conf Warning: fd limit 256 is lower than what Dovecot can use under full load (more than 384). Either grow the limit or change login_max_processes_count and max_mail_processes settings base_dir: /var/dovecot/ info_log_path: /var/log/dovecot.log listen: *, [::] ssl_cert_file: /System/Library/OpenSSL/certs/imapd.pem ssl_key_file: /System/Library/OpenSSL/certs/privkey.out login_dir: /var/dovecot/login login_executable: /usr/local/libexec/dovecot/imap-login max_mail_processes: 256 mail_location: maildir:%h/Maildir namespace: type: private separator: / inbox: yes list: yes subscriptions: yes namespace: type: shared separator: / prefix: Shared/ location: maildir:/Users/Shared/Maildir list: yes subscriptions: yes auth default: passdb: driver: pam args: imap userdb: driver: passwd
-- *Eric Stadtherr* estadtherr@gmail.com <mailto:estadtherr@gmail.com>
On Wed, 2008-10-22 at 20:59 -0600, Eric Stadtherr wrote:
Content-Type: multipart/alternative; boundary="=_alternative 006F3A73872574E8_="
Is there one space, two spaces or a TAB at the beginning of the second line?
I did a little bit of tracing through the parsing code (message-header-parser.c:message_parse_header_next()) and it appeared that the boundary in the Content-Type header was not parsed correctly, evidently because the header line was folded in the middle of the boundary string. RFC 822 appears to allow folding in a quoted string like this (§3.3 "quoted-string"), so I'm curious whether the parsing is working correctly.
Fixed: http://hg.dovecot.org/dovecot-1.1/rev/25b0cf7c62d3
But I'm not sure if I should convert the following TAB to a space. UW-IMAP seems to do that, but RFC just says that the CRLF should be dropped.
On Thu, 23 Oct 2008 19:06:19 +0300, Timo Sirainen <tss@iki.fi> wrote:
On Wed, 2008-10-22 at 20:59 -0600, Eric Stadtherr wrote:
Content-Type: multipart/alternative; boundary="=_alternative 006F3A73872574E8_="
Is there one space, two spaces or a TAB at the beginning of the second line?
There is one space at the beginning of the continuation line. The parsed full_value basically looks like: [multipart/alternative; boundary="=_alternative\n 006F3A73872574E8_="]
I did a little bit of tracing through the parsing code (message-header-parser.c:message_parse_header_next()) and it appeared that the boundary in the Content-Type header was not parsed correctly, evidently because the header line was folded in the middle of the boundary string. RFC 822 appears to allow folding in a quoted string like this (§3.3 "quoted-string"), so I'm curious whether the parsing is
working correctly.
Fixed: http://hg.dovecot.org/dovecot-1.1/rev/25b0cf7c62d3
But I'm not sure if I should convert the following TAB to a space. UW-IMAP seems to do that, but RFC just says that the CRLF should be dropped.
I always prefer strict adherence to the RFC, which says:
The process of moving from this folded multiple-line
representation of a header field to its single line represen-
tation is called "unfolding". Unfolding is accomplished by
regarding CRLF immediately followed by a LWSP-char as
equivalent to the LWSP-char.
So, what you did looks good!
-- Eric Stadtherr estadtherr@gmail.com
Timo Sirainen wrote:
But I'm not sure if I should convert the following TAB to a space. UW-IMAP seems to do that, but RFC just says that the CRLF should be dropped.
As pointed out in https://bugzilla.mozilla.org/show_bug.cgi?id=240924#c7, this could lead to strange behaviour. So I'd vote for replacing the folding tab to a space.
On Oct 24, 2008, at 12:35 PM, Jakob Hirsch wrote:
Timo Sirainen wrote:
But I'm not sure if I should convert the following TAB to a space. UW-IMAP seems to do that, but RFC just says that the CRLF should be dropped.
As pointed out in https://bugzilla.mozilla.org/show_bug.cgi? id=240924#c7, this could lead to strange behaviour. So I'd vote for
replacing the folding tab to a space.
Actually Dovecot already replaces all tabs to spaces when sending
ENVELOPE, BODY and BODYSTRUCTURE replies. The only issue here is about
the internal parsing where I think it's better to be strict.
Timo Sirainen wrote:
lead to strange behaviour. So I'd vote for replacing the folding tab to a space. Actually Dovecot already replaces all tabs to spaces when sending ENVELOPE, BODY and BODYSTRUCTURE replies. The only issue here is about the internal parsing where I think it's better to be strict.
Oh, ok, then I got that wrong.
I only wonder why I still see tabs in the Subject field in TB's message list (and in other lines in message source). Using v1.2.alpha3.
On Fri, 2008-10-24 at 14:37 +0200, Jakob Hirsch wrote:
Timo Sirainen wrote:
lead to strange behaviour. So I'd vote for replacing the folding tab to a space. Actually Dovecot already replaces all tabs to spaces when sending ENVELOPE, BODY and BODYSTRUCTURE replies. The only issue here is about the internal parsing where I think it's better to be strict.
Oh, ok, then I got that wrong.
I only wonder why I still see tabs in the Subject field in TB's message list (and in other lines in message source). Using v1.2.alpha3.
Because TB most likely doesn't use ENVELOPE but parses the headers itself.
On Thu, 23 Oct 2008 19:06:19 +0300, Timo Sirainen <tss@iki.fi> wrote:
On Wed, 2008-10-22 at 20:59 -0600, Eric Stadtherr wrote:
Content-Type: multipart/alternative; boundary="=_alternative 006F3A73872574E8_="
Is there one space, two spaces or a TAB at the beginning of the second line?
I did a little bit of tracing through the parsing code (message-header-parser.c:message_parse_header_next()) and it appeared that the boundary in the Content-Type header was not parsed correctly, evidently because the header line was folded in the middle of the boundary string. RFC 822 appears to allow folding in a quoted string like this (§3.3 "quoted-string"), so I'm curious whether the parsing is
working correctly.
Fixed: http://hg.dovecot.org/dovecot-1.1/rev/25b0cf7c62d3
But I'm not sure if I should convert the following TAB to a space. UW-IMAP seems to do that, but RFC just says that the CRLF should be dropped.
I grabbed a snapshot of the CM baseline with that fix, but that message still doesn't display correctly. I ran it through the message_parser test case and your fix look like it resulted in correct header values and correct body parsing, but the BODYSTRUCTURE response from the server still only contains the first part (plus the boundary name).
Any suggestions where to look? I looked through the code that handles the BODYSTRUCTURE fetch command and it looked like it eventually filtered down to the same parser functions used by the test case, so I'm not sure where else the problem could be introduced...
-- Eric Stadtherr estadtherr@gmail.com
On Oct 28, 2008, at 3:23 AM, Eric Stadtherr wrote:
Fixed: http://hg.dovecot.org/dovecot-1.1/rev/25b0cf7c62d3
But I'm not sure if I should convert the following TAB to a space. UW-IMAP seems to do that, but RFC just says that the CRLF should be dropped.
I grabbed a snapshot of the CM baseline with that fix, but that
message still doesn't display correctly. I ran it through the message_parser
test case and your fix look like it resulted in correct header values and correct body parsing, but the BODYSTRUCTURE response from the server
still only contains the first part (plus the boundary name).Any suggestions where to look? I looked through the code that
handles the BODYSTRUCTURE fetch command and it looked like it eventually
filtered down to the same parser functions used by the test case, so I'm not sure
where else the problem could be introduced...
Did you delete dovecot.index.cache file? Otherwise it replies with the
cached value.
On Tue, 28 Oct 2008 03:31:13 +0200, Timo Sirainen <tss@iki.fi> wrote:
On Oct 28, 2008, at 3:23 AM, Eric Stadtherr wrote:
Fixed: http://hg.dovecot.org/dovecot-1.1/rev/25b0cf7c62d3
But I'm not sure if I should convert the following TAB to a space. UW-IMAP seems to do that, but RFC just says that the CRLF should be dropped.
I grabbed a snapshot of the CM baseline with that fix, but that
message still doesn't display correctly. I ran it through the message_parser
test case and your fix look like it resulted in correct header values and correct body parsing, but the BODYSTRUCTURE response from the server
still only contains the first part (plus the boundary name).Any suggestions where to look? I looked through the code that
handles the BODYSTRUCTURE fetch command and it looked like it eventually
filtered down to the same parser functions used by the test case, so I'm not sure
where else the problem could be introduced...Did you delete dovecot.index.cache file? Otherwise it replies with the
cached value.
That was it, thanks!
-- Eric Stadtherr estadtherr@gmail.com
participants (3)
-
Eric Stadtherr
-
Jakob Hirsch
-
Timo Sirainen