[Dovecot] Sieve and locale (Japanese), string length

Stephan Bosch stephan at rename-it.nl
Sun Sep 13 11:01:55 EEST 2009


Jorgen Lundman wrote:
> 
> Damn I apologise for the noise now, but I did manage to run into one 
> problem:
> 
> Subject: 日本語らららららららららららららららららららららららららららら 
> らららら ららららららららららららららららららららら
> 
> Subject: 
> =?UTF-8?B?5pel5pys6Kqe44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?=
>  =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?= 

>  =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?=  
>  =?UTF-8?B?44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ44KJ?=
[...]
> 
> So it seems the longest word test is 16 UTF8 chars (or 48 bytes?). So as 
> long as we use small words, it should be ok.
> 
> Out of curiousity, can we increase this limit?
Well, this looks very much like a bug to me. And, conveniently, it does 
not look like a bug in Sieve :o). By adding a small debug line i found 
that the Dovecot function mail_get_headers_utf8() returns the following 
for the above string:

日本語ららららららららららら らららららららららららららららら ららららら 
ららららららららららら らららららららららら

Note the additional spaces between the characters. This is due to the 75 
character limit for RFC2822 header lines, meaning that (as seen above) 
the RFC2047 encoding is broken into multiple parts.

Timo: As far as I understand RFC2047, the (<CRLF>)<SPC> sequence between 
the RFC2047 encoded words is not supposed to be added as a space. I 
admit that it is not a very well-specified RFC. The examples mention 
something about encoding space inside the encoded words if a joining 
space is required.

Regards,

Stephan




More information about the dovecot mailing list