sieve filtering utf 8 strings

Stephan Bosch stephan at rename-it.nl
Thu Sep 3 00:20:25 UTC 2015


Op 9/2/2015 om 5:03 PM schreef Sergey Schwartz:
> Guys,
>
> I'm completely stuck, so asking for advice.
> My user has a sieve script which checks message header if it contains
> words in russian like 'Лист бронирования отправлен'.
>
> Pritty simple script
>
> # rule:[Отправлено]
> if allof (header :contains "subject" "LDS (robot): Лист бронирования
> отправлен", header :contains "from" "noreply at bgoperator.com")
> {
>     fileinto "Отправлено";
> }
>
> I don't have errors compiling the script or executing it via LMTP, but
> it doesn't work.
> Normally user receives messages from robot with subject encoded as
> quoted-printable
>
> Subject: =?UTF-8?Q?LDS_(robot):_=D0=9B=D0=B8=D1=81=D1=82?=
>  =?UTF-8?Q?_=D0=B1=D1=80=D0=BE=D0=BD=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD?=
>
>  =?UTF-8?Q?=D0=B8=D1=8F__=D0=BE=D1=82=D0=BF=D1=80=D0=B0=D0=B2=D0=BB=D0=B5?=
>
>
>
> When I send a test message via Thunderbird with required words - sieve
> works fine and subject is encoded in base64
>
> Subject:
> =?UTF-8?B?0JvQuNGB0YIg0LHRgNC+0L3QuNGA0L7QstCw0L3QuNGPINC+0YLQv9GA?=
>  =?UTF-8?B?0LDQstC70LXQvQ==?=
>
> It is the same text, but encodind is different - base 64 works fine
> and quoted-printable does not.
> Is it possible to have both supported for sieve ?

Both should be supported. I checked your encoded text using a test suite
script (see below for a long answer) and it seems that your encoding is
not what you expect.

This:

Subject: =?UTF-8?Q?LDS_(robot):_=D0=9B=D0=B8=D1=81=D1=82?=
 =?UTF-8?Q?_=D0=B1=D1=80=D0=BE=D0=BD=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD?=
 =?UTF-8?Q?=D0=B8=D1=8F__=D0=BE=D1=82=D0=BF=D1=80=D0=B0=D0=B2=D0=BB=D0=B5?=

Yields:

"LDS (robot): Лист бронирования  отправле"

Notice the two spaces before отправле and the missing Cyrillic N at the
end. The two spaces are caused by the double '__' in the third line of
the encoded subject. The final N in the subject is just not encoded.

This:

Subject: =?UTF-8?Q?LDS_(robot):_=D0=9B=D0=B8=D1=81=D1=82?=
 =?UTF-8?Q?_=D0=B1=D1=80=D0=BE=D0=BD=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD?=
 =?UTF-8?Q?=D0=B8=D1=8F_=D0=BE=D1=82=D0=BF=D1=80=D0=B0=D0=B2=D0=BB=D0=B5?=
 =?UTF-8?Q?=D0=BD?=

Yields:

"LDS (robot): Лист бронирования отправлен"

Which is obviously OK.

So, to me, it seems as though the program that creates these messages is
encoding  the wrong text or is messing up encoding itself.

Regards,

Stephan.


LONG ANSWER:

I wrote a little test suite script like this:

<SCRIPT>
require "vnd.dovecot.testsuite";

test_set "message" text:
Subject: =?UTF-8?Q?LDS_(robot):_=D0=9B=D0=B8=D1=81=D1=82?=
 =?UTF-8?Q?_=D0=B1=D1=80=D0=BE=D0=BD=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD?=
 =?UTF-8?Q?=D0=B8=D1=8F__=D0=BE=D1=82=D0=BF=D1=80=D0=B0=D0=B2=D0=BB=D0=B5?=
From: noreply at bgoperator.com
To: friep at example.net

Frop!
.
;

test "Test original" {
    # rule:[Отправлено]
    if not allof (
        header :contains "subject" "LDS (robot): Лист бронирования
отправлен",
        header :contains "from" "noreply at bgoperator.com")
    {
        test_fail "Failed";
    }
}

test_set "message" text:
Subject: =?UTF-8?Q?LDS_(robot):_=D0=9B=D0=B8=D1=81=D1=82?=
 =?UTF-8?Q?_=D0=B1=D1=80=D0=BE=D0=BD=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD?=
 =?UTF-8?Q?=D0=B8=D1=8F_=D0=BE=D1=82=D0=BF=D1=80=D0=B0=D0=B2=D0=BB=D0=B5?=
 =?UTF-8?Q?=D0=BD?=
From: noreply at bgoperator.com
To: friep at example.net

Frop!
.
;

test "Test mended" {
    # rule:[Отправлено]
    if not allof (
        header :contains "subject" "LDS (robot): Лист бронирования
отправлен",
        header :contains "from" "noreply at bgoperator.com")
    {
        test_fail "Failed";
    }
}
</SCRIPT>

I executed it from the source directory:

$ src/testsuite/testsuite -Tlevel=matching -t - ~/frop.svtest

<OUTPUT>
Test case: /home/stephan/frop.svtest:


      ## Started executing script 'frop.svtest'
   3: testsuite: test_set command
   3:   set test parameter 'message' = "Subject:
=?UTF-8?Q?LDS_(robot):_=D0=9B=D0=B8=D1=81=D1=82?=
 =?UTF-8?Q?_=D0=B1=D1=80=D0=BE=D0=BD=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD?=
 =?UTF-8?Q?=D0=B8=D1=8F__=D0=BE=D1=82=D0=BF=D1=80=D0=B0=D0=B2=D0=BB=D0=B5?=
From: noreply at bgoperator.com
To: friep at example.net

Frop!
"

  14: ** Testsuite test start: "Test original"
  16: header test
  16:   starting `:contains' match with `i;ascii-casemap' comparator:
  16:   extracting `subject' headers from message
  16:   matching value `LDS (robot): Лист бронирования  отправле'
  16:     with key `LDS (robot): Лист бронирования отправлен' => 0
  16:   finishing match with result: not matched
  17: jump if result is false
  17:   jumping to line 20
  20: testsuite: test_fail command; FAIL current test
 1: Test 'Test original' FAILED: Failed
  20: jumping to line 24
  24: testsuite: test_set command
  24:   set test parameter 'message' = "Subject:
=?UTF-8?Q?LDS_(robot):_=D0=9B=D0=B8=D1=81=D1=82?=
 =?UTF-8?Q?_=D0=B1=D1=80=D0=BE=D0=BD=D0=B8=D1=80=D0=BE=D0=B2=D0=B0=D0=BD?=
 =?UTF-8?Q?=D0=B8=D1=8F_=D0=BE=D1=82=D0=BF=D1=80=D0=B0=D0=B2=D0=BB=D0=B5?=
 =?UTF-8?Q?=D0=BD?=
From: noreply at bgoperator.com
To: friep at example.net

Frop!
"

  36: ** Testsuite test start: "Test mended"
  38: header test
  38:   starting `:contains' match with `i;ascii-casemap' comparator:
  38:   extracting `subject' headers from message
  38:   matching value `LDS (robot): Лист бронирования отправлен'
  38:     with key `LDS (robot): Лист бронирования отправлен' => 1
  38:   finishing match with result: matched
  39: jump if result is false
  39:   not jumping
  40: header test
  40:   starting `:contains' match with `i;ascii-casemap' comparator:
  40:   extracting `from' headers from message
  40:   matching value `noreply at bgoperator.com'
  40:     with key `noreply at bgoperator.com' => 1
  40:   finishing match with result: matched
  40: jump if result is false
  40:   not jumping
  40: jumping to line 42
  42: ** Testsuite test end

 2: Test 'Test mended' SUCCEEDED
      ## Finished executing script 'frop.svtest'

FAIL: 1 of 2 tests failed.
</OUTPUT>

Regards,

Stephan.




More information about the dovecot mailing list