[Dovecot] Sieve rule issue with certain character sets
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
I have a global sieve rule in place to filter mailing-lists. This has worked well so far. Recently however one subscriber on a list seems to create strange character set encodings in the 'From' and 'To' headers. This leads to unprocessed/unfiltered mails (no errors thrown). Is this a configuration or Pigeonhole issue (latest HG used)?
Headers from failing mail:
From: "=?UTF-8?B?VG9yaW50aGllbA==?=" <user@domain.tld> To: "=?UTF-8?B?ImJpbmQtdXNlcnNAbGlzdHMuaXNjLm9yZyI=?=" <bind-users@lists.isc.org>
Relevant sieve rule:
if allof (address :is ["To","CC"] ["bind-users@lists.isc.org","bind-users@isc.org","comp-protocols-dns-bind@isc.org"], header :contains "List-Id" "bind-users.lists.isc.org") { fileinto "Public/Mailing-Lists/Bind-Users"; }
Regards Thomas
-----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEYEARECAAYFAk0dvZ4ACgkQ+meF/S97aXgYTACfZC70bsYn3d4/VRY2GfK5lF0k xOIAnRsBliv9EErA919vc5KoTAHhq3rC =IE3C -----END PGP SIGNATURE-----
On 12/31/2010 12:25 PM, Thomas Leuxner wrote:
I have a global sieve rule in place to filter mailing-lists. This has worked well so far. Recently however one subscriber on a list seems to create strange character set encodings in the 'From' and 'To' headers. This leads to unprocessed/unfiltered mails (no errors thrown). Is this a configuration or Pigeonhole issue (latest HG used)?
I executed the following to investigate this issue:
### sieve-test -t - -Tlevel=matching ~/frop.sieve ~/frop.eml
## Started executing script 'frop'
3: address test
3: starting :is' match with
i;ascii-casemap' comparator:
3: extracting To' headers from message 3: parsing address header value
""bind-users@lists.isc.org""
<bind-users@lists.isc.org>'
3: extracting all' part from non-address value
""bind-users@lists.isc.org"" <bind-users@lists.isc.org>'
3: matching value ""bind-users@lists.isc.org"" <bind-users@lists.isc.org>' 3: with key
bind-users@lists.isc.org' => 0
3: with key bind-users@isc.org' => 0 3: with key
comp-protocols-dns-bind@isc.org' => 0
3: extracting `CC' headers from message
3: finishing match with result: not matched
3: jump if result is false
3: jumping to line 5
## Finished executing script 'frop'
Performed actions:
(none)
Implicit keep:
- store message in folder: INBOX
sieve-test(stephan): Info: final result: success ###
Apparently, the MIME-encoded part of those address headers includes double quotes, duplicating the ones surrounding the encoded part already. As can be seen from the above trace, this decodes into an invalid address representation, causing Pigeonhole to handle it as opaque text.
If those quotes are really supposed to be part of the 'phrase' part of the e-mail address, I think those should have been escaped somehow. I don't think that encoding can be an alternative for that. Timo, any thoughts?
Regards,
Stephan.
On Sat, 2011-01-01 at 11:35 +0100, Stephan Bosch wrote:
To: "=?UTF-8?B?ImJpbmQtdXNlcnNAbGlzdHMuaXNjLm9yZyI=?=" <bind-users@lists.isc.org>
I think this is a valid address..
If those quotes are really supposed to be part of the 'phrase' part of the e-mail address, I think those should have been escaped somehow. I don't think that encoding can be an alternative for that. Timo, any thoughts?
I think we need to change the parsing code here. Don't use mail_get_first_header_utf8() or mail_get_headers_utf8() if you intend to parse the value. First parse the addresses, then convert the display-names to UTF8 if necessary. I'll change the sorting code to do this too.
On 1/1/2011 12:21 PM, Timo Sirainen wrote:
On Sat, 2011-01-01 at 11:35 +0100, Stephan Bosch wrote:
To: "=?UTF-8?B?ImJpbmQtdXNlcnNAbGlzdHMuaXNjLm9yZyI=?=" <bind-users@lists.isc.org> I think this is a valid address..
I gave RFC2047 and RFC822 a quick read. The phase part is a sequence of one or more `word' syntax items. In Section 5 of RFC2047,
http://tools.ietf.org/html/rfc2047#section-5
at point (3) the `encoded-word' syntax is mentioned as a replacement of the word syntax with a phase part. In RFC822 the word syntax was either an atom or a quoted-string. In the situation above, obviously, a quoted sting is used. In the item list that follows in section 5 of RFC2047, however, the above situation is explicitly denied:
- An 'encoded-word' MUST NOT appear within a 'quoted-string'.
So, there does seem to be a bug in the mailer used by the person sending the message.
Regards,
Stephan.
On Sat, 2011-01-01 at 13:19 +0100, Stephan Bosch wrote:
To: "=?UTF-8?B?ImJpbmQtdXNlcnNAbGlzdHMuaXNjLm9yZyI=?=" <bind-users@lists.isc.org> I think this is a valid address..
- An 'encoded-word' MUST NOT appear within a 'quoted-string'.
So, there does seem to be a bug in the mailer used by the person sending the message.
Yes, but we'd still have this problem even if this wasn't inside a quoted-string, because encoded-words themselves can contain any characters. Also in RFC 2047:
NOTE: Decoding and display of encoded-words occurs *after* a structured field body is parsed into tokens. It is therefore possible to hide 'special' characters in encoded-words which, when displayed, will be indistinguishable from 'special' characters in the surrounding text. For this and other reasons, it is NOT generally possible to translate a message header containing 'encoded-word's to an unencoded form which can be parsed by an RFC 822 mail reader.
On 1/1/2011 1:24 PM, Timo Sirainen wrote:
On Sat, 2011-01-01 at 13:19 +0100, Stephan Bosch wrote:
To: "=?UTF-8?B?ImJpbmQtdXNlcnNAbGlzdHMuaXNjLm9yZyI=?=" <bind-users@lists.isc.org> I think this is a valid address..
- An 'encoded-word' MUST NOT appear within a 'quoted-string'.
So, there does seem to be a bug in the mailer used by the person sending the message. Yes, but we'd still have this problem even if this wasn't inside a quoted-string, because encoded-words themselves can contain any characters. Also in RFC 2047:
NOTE: Decoding and display of encoded-words occurs *after* a structured field body is parsed into tokens. It is therefore possible to hide 'special' characters in encoded-words which, when displayed, will be indistinguishable from 'special' characters in the surrounding text. For this and other reasons, it is NOT generally possible to translate a message header containing 'encoded-word's to an unencoded form which can be parsed by an RFC 822 mail reader.
Oh, ok. Well, then not using _utf8() functions for the address test still solves the problem right?
Regards,
Stephan.
On Sat, 2011-01-01 at 13:37 +0100, Stephan Bosch wrote:
NOTE: Decoding and display of encoded-words occurs *after* a structured field body is parsed into tokens. It is therefore possible to hide 'special' characters in encoded-words which, when displayed, will be indistinguishable from 'special' characters in the surrounding text. For this and other reasons, it is NOT generally possible to translate a message header containing 'encoded-word's to an unencoded form which can be parsed by an RFC 822 mail reader.
Oh, ok. Well, then not using _utf8() functions for the address test still solves the problem right?
Yep.
On 01/01/2011 01:53 PM, Timo Sirainen wrote:
On Sat, 2011-01-01 at 13:37 +0100, Stephan Bosch wrote:
NOTE: Decoding and display of encoded-words occurs *after* a structured field body is parsed into tokens. It is therefore possible to hide 'special' characters in encoded-words which, when displayed, will be indistinguishable from 'special' characters in the surrounding text. For this and other reasons, it is NOT generally possible to translate a message header containing 'encoded-word's to an unencoded form which can be parsed by an RFC 822 mail reader.
Oh, ok. Well, then not using _utf8() functions for the address test still solves the problem right?
Yep.
Fixed:
http://hg.rename-it.nl/dovecot-2.0-pigeonhole/rev/99f8dc1e246a
Regards,
Stephan.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Am 01.01.2011 um 14:12 schrieb Stephan Bosch:
Fixed:
http://hg.rename-it.nl/dovecot-2.0-pigeonhole/rev/99f8dc1e246a
Regards,
Stephan.
Thanks to both of you. Will patch and report back when I see mitigation on the ML. (Depending on the "strange poster" though :) )
Regards Thomas -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.16 (Darwin)
iEYEARECAAYFAk0fKyYACgkQ+meF/S97aXiMWACfRAqWofSBNcGlAYFBKl9zvM1m IT8An2HNoCriNv7ngSLk6a/4SjFAxWUM =mXJk -----END PGP SIGNATURE-----
On Sat, 2011-01-01 at 14:12 +0100, Stephan Bosch wrote:
Oh, ok. Well, then not using _utf8() functions for the address test still solves the problem right?
Yep.
Fixed:
http://hg.rename-it.nl/dovecot-2.0-pigeonhole/rev/99f8dc1e246a
I think there are other places that need fixing too:
src/lib-sieve/plugins/enotify/mailto/ntfy-mailto.c: if ( mail_get_headers_utf8 src/lib-sieve/plugins/notify/ext-notify-common.c: if ( mail_get_headers_utf8(msgdata->mail, "from", &header) >= 0 ) src/lib-sieve/plugins/vacation/cmd-vacation.c: if ( mail_get_headers_utf8
On 1/1/2011 3:03 PM, Timo Sirainen wrote:
On Sat, 2011-01-01 at 14:12 +0100, Stephan Bosch wrote:
Fixed:
http://hg.rename-it.nl/dovecot-2.0-pigeonhole/rev/99f8dc1e246a I think there are other places that need fixing too:
src/lib-sieve/plugins/enotify/mailto/ntfy-mailto.c: if ( mail_get_headers_utf8 Nope. Is subject header. src/lib-sieve/plugins/notify/ext-notify-common.c: if ( mail_get_headers_utf8(msgdata->mail, "from",&header)>= 0 ) Check. src/lib-sieve/plugins/vacation/cmd-vacation.c: if ( mail_get_headers_utf8
Check.
Fixed:
http://hg.rename-it.nl/dovecot-2.0-pigeonhole/rev/146a2a9d5cb0
Regards,
Stephan
On Sat, 2011-01-01 at 15:24 +0100, Stephan Bosch wrote:
src/lib-sieve/plugins/notify/ext-notify-common.c: if ( mail_get_headers_utf8(msgdata->mail, "from",&header)>= 0 ) Check.
Actually I'm now less sure about this :) It's inserted into message body and intended to be human readable? Then the _utf8() would have been right I guess.
On 01/01/2011 03:33 PM, Timo Sirainen wrote:
On Sat, 2011-01-01 at 15:24 +0100, Stephan Bosch wrote:
src/lib-sieve/plugins/notify/ext-notify-common.c: if ( mail_get_headers_utf8(msgdata->mail, "from",&header)>= 0 ) Check.
Actually I'm now less sure about this :) It's inserted into message body and intended to be human readable? Then the _utf8() would have been right I guess.
Uh, you are right. Fixed:
http://hg.rename-it.nl/dovecot-2.0-pigeonhole/rev/442a5fb51d76
Regards,
Stephan.
D'oh. Now why didn't I reply to the second part?
On 1/1/2011 12:21 PM, Timo Sirainen wrote:
On Sat, 2011-01-01 at 11:35 +0100, Stephan Bosch wrote: I think we need to change the parsing code here. Don't use mail_get_first_header_utf8() or mail_get_headers_utf8() if you intend to parse the value. First parse the addresses, then convert the display-names to UTF8 if necessary. I'll change the sorting code to do this too.
In light of my previous e-mail, I think we can suffice with not using the _utf8() functions when the address needs to be parsed. The phrase part is not used and encodings are not allowed in the actual address itself.
Regards,
Stephan
participants (3)
-
Stephan Bosch
-
Thomas Leuxner
-
Timo Sirainen