newer
[Dovecot] Security holes in CMU...

[Dovecot] Sieve and locale (Japanese)

older
[Dovecot] Command died with signal...

Jorgen Lundman

11 Sep 2009 11 Sep '09

4:18 a.m.

I have setup Dovecot-1.2 'delivery' to use Dovecot's Sieve as well (both versions are latest from dovecot.org).

I am curious if sieve can handle Japanese, or locale, in general in the language. In particular, the subject which is encoded even more complicated.

if header :contains "subject" ["test", "テスト"] {

So if I want to file based on above, test, and tesuto (utf-8). Most Japanese mail is send in iso-2022-jp, will it use iconv to change to utf-8 before the test? Then handle the Subject encoding? (It's also possible to be in EUC-JP, but that is unusual).

Or is it something on TODO?

-- Jorgen Lundman | <lundman@lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)

Show replies by date

Timo Sirainen

11 Sep 11 Sep

4:36 a.m.

On Sep 10, 2009, at 10:18 PM, Jorgen Lundman wrote:

...

if header :contains "subject" ["test", "テスト"] {

So if I want to file based on above, test, and tesuto (utf-8). Most
Japanese mail is send in iso-2022-jp, will it use iconv to change to
utf-8 before the test? Then handle the Subject encoding? (It's also
possible to be in EUC-JP, but that is unusual).

All text, including subject, is converted to UTF-8 with iconv before
comparing, as long as Dovecot was compiled with iconv support
(automatically as long as iconv devel files were found).

Jorgen Lundman

4:42 a.m.

...

All text, including subject, is converted to UTF-8 with iconv before comparing, as long as Dovecot was compiled with iconv support (automatically as long as iconv devel files were found).

I should have know that it would just work, and done right. Thanks Timo.

Lund

-- Jorgen Lundman | <lundman@lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)

Jorgen Lundman

4:51 a.m.

For completeness sake, here is a report of it working beautifully:

cat .dovecot.sieve

require "fileinto"; if header :contains "X-Spam-Flag" "YES" { fileinto "spam"; }

if header :contains "Subject" "テスト" { fileinto "nihongo"; }

Subject: =?ISO-2022-JP?B?GyRCIVolSyVlITwlOSVqJWobKEI=?= =?ISO-2022-JP?B?GyRCITwlOSFbGyhCcGFwZXJib3kmY28uIBskQiQsJS8laiUoJSQbKEI=?= =?ISO-2022-JP?B?GyRCJT8hPCRyQmdKZz04ISohViVaJVElXCUvJWolKCUkJT8hPCU6GyhC?= =?ISO-2022-JP?B?GyRCJTMlcyVGJTklSCFdMkYkQCEqRy4kJCQ8ISolLyVqJSglJCU/GyhC?= =?ISO-2022-JP?B?GyRCITxBNDB3PTg5ZyEqIV0hVzMrOkUbKEI=?=

(That's the string:【ニュースリリース】paperboy&co. がクリエイターを大募集！「ペパボクリエイターズコンテスト−夏だ！熱いぜ！クリエイター全員集合！−」開催)

Sep 11 11:46:22 test-vmx01.unix dovecot: [ID 583609 local0.info] deliver(user1@domain): sieve: msgid=<20090911024611.C7348232223@domain>: stored mail into mailbox 'nihongo'

Perfect!

Lund

-- Jorgen Lundman | <lundman@lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)

Jorgen Lundman

5:23 a.m.

New subject: [Dovecot] Sieve and locale (Japanese), string length

Damn I apologise for the noise now, but I did manage to run into one problem:

Subject: 日本語ららららららららららららららららららららららららららららららららららららららららららららららららららららら

Subject: =?UTF-8?B?5pel5pys6Kqe44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?= =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?= =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?= =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?= Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit

Testing against:

if header :contains "Subject" "らららららららららららららららら" { fileinto "nihongo"; }

=> OK

if header :contains "Subject" "ららららららららららららららららら" { fileinto "nihongo"; }

=> NG

So it seems the longest word test is 16 UTF8 chars (or 48 bytes?). So as long as we use small words, it should be ok.

Out of curiousity, can we increase this limit?

-- Jorgen Lundman | <lundman@lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)

Stephan Bosch

13 Sep 13 Sep

10:01 a.m.

New subject: [Dovecot] Sieve and locale (Japanese), string length

Jorgen Lundman wrote:

...

Damn I apologise for the noise now, but I did manage to run into one problem:

Subject: 日本語ららららららららららららららららららららららららららららららららららららららららららららららららららららら

Subject: =?UTF-8?B?5pel5pys6Kqe44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?= =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?=

...

=?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?=
=?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?= [...]

So it seems the longest word test is 16 UTF8 chars (or 48 bytes?). So as long as we use small words, it should be ok.

Out of curiousity, can we increase this limit? Well, this looks very much like a bug to me. And, conveniently, it does not look like a bug in Sieve :o). By adding a small debug line i found that the Dovecot function mail_get_headers_utf8() returns the following for the above string:

日本語ららららららららららららららららららららららららららららららららららららららららららららららららららららら

Note the additional spaces between the characters. This is due to the 75 character limit for RFC2822 header lines, meaning that (as seen above) the RFC2047 encoding is broken into multiple parts.

Timo: As far as I understand RFC2047, the (<CRLF>)<SPC> sequence between the RFC2047 encoded words is not supposed to be added as a space. I admit that it is not a very well-specified RFC. The examples mention something about encoding space inside the encoded words if a joining space is required.

Regards,

Stephan

Timo Sirainen

14 Sep 14 Sep

1:03 a.m.

New subject: [Dovecot] Sieve and locale (Japanese), string length

On Sun, 2009-09-13 at 10:01 +0200, Stephan Bosch wrote:

...

Timo: As far as I understand RFC2047, the (<CRLF>)<SPC> sequence between the RFC2047 encoded words is not supposed to be added as a space. I admit that it is not a very well-specified RFC. The examples mention something about encoding space inside the encoded words if a joining space is required.

I actually noticed this and added it to my TODO just a few days ago while I was writing the RFC 2047 encoder. Fixed now: http://hg.dovecot.org/dovecot-1.2/rev/28241a6e1178

Jorgen Lundman

1:55 a.m.

New subject: [Dovecot] Sieve and locale (Japanese), string length

Thank you Timo and Stephan!

Timo Sirainen wrote:

...

On Sun, 2009-09-13 at 10:01 +0200, Stephan Bosch wrote:

...
Timo: As far as I understand RFC2047, the (<CRLF>)<SPC> sequence between the RFC2047 encoded words is not supposed to be added as a space. I admit that it is not a very well-specified RFC. The examples mention something about encoding space inside the encoded words if a joining space is required.

I actually noticed this and added it to my TODO just a few days ago while I was writing the RFC 2047 encoder. Fixed now: http://hg.dovecot.org/dovecot-1.2/rev/28241a6e1178

-- Jorgen Lundman | <lundman@lundman.net> Unix Administrator | +81 (0)3 -5456-2687 ext 1017 (work) Shibuya-ku, Tokyo | +81 (0)90-5578-8500 (cell) Japan | +81 (0)3 -3375-1767 (home)

6048

Age (days ago)

6050

Last active (days ago)

List overview

7 comments

3 participants

participants (3)

Jorgen Lundman
Stephan Bosch
Timo Sirainen

[Dovecot] Sieve and locale (Japanese)

Jorgen Lundman

Timo Sirainen

Jorgen Lundman

Jorgen Lundman

cat .dovecot.sieve

Jorgen Lundman

Stephan Bosch

Timo Sirainen

Jorgen Lundman

tags

participants (3)