Hello!

Recently I've faced with strange issue. So, I want to filter some mails which do not contain cyrillic symbols. I would not like receive email on foreign language except russian and I'm using rule below, but it does not work when text of mail has unicode symbol u2019 or ’ - right single quotation mark.

require ["body","regex"];
      # rule:[Regexp test]
      if not body :text :regex
".*[аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРсСтТуУфФхХцЦчЧшШщЩъЪыЫьЬэЭюЮяЯ].*"
      {
          discard;
          stop;
      }

I checked this behavior on different versions of dovecot and pigeonhole, and it was the same in all cases. If I change u2019 to, for instance to apostrophe, it begins work as I expect - the mail discarded.

Below some information which was got from sieve-test utility. The text consist of only one phrase Test’test.


Check string Test’test - without u2019, using apostrophe

root@a4e4b17d33a1:/srv/mail# tail -2 1586937347.M574837P24389.vps.kveri.ru\,S\=1904\,W\=1944\:2\,S 

Test'test 
sieve-test output
* Script metadata (block: 0):

class = file
class.version = 0
location = /srv/mail/roundcube.sieve

* Required extensions (block: 1):

  0: body (id: 18)
  1: regex (id: 13)

* Main program (block: 2):

Address   Line  Code
00000000:       DEBUG BLOCK: 3
00000001:       EXTENSIONS [2]:
00000002:         body
00000004:         regex
00000006:    3: BODY
00000007:         BODY-TRANSFORM: TEXT
0000000b:         match type: regex
0000000d:         key list: STR[138] ".*[аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРс...
0000009b:    3: JMPTRUE 6 [000000a2]
000000a0:    5: DISCARD
000000a1:    6: STOP
000000a2:    6: [End of code]


Performed actions:

 * discard

Implicit keep:

  (none)
In this case the rule works as I expect


The second test Test’test with ’ instead of apostrophe

root@a4e4b17d33a1:/srv/mail# tail -2 1586937347.M574837P24389.vps.kveri.ru\,S\=1904\,W\=1944\:2\,S 

Test’test 


sieve-test output

* Script metadata (block: 0):

class = file
class.version = 0
location = /srv/mail/roundcube.sieve

* Required extensions (block: 1):

  0: body (id: 18)
  1: regex (id: 13)

* Main program (block: 2):

Address   Line  Code
00000000:       DEBUG BLOCK: 3
00000001:       EXTENSIONS [2]:
00000002:         body
00000004:         regex
00000006:    3: BODY
00000007:         BODY-TRANSFORM: TEXT
0000000b:         match type: regex
0000000d:         key list: STR[138] ".*[аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРс...
0000009b:    3: JMPTRUE 6 [000000a2]
000000a0:    5: DISCARD
000000a1:    6: STOP
000000a2:    6: [End of code]


Performed actions:

  (none)

Implicit keep:

 * store message in folder: INBOX

In this case email "was located" into INBOX, but I expected it should be discarded. As I said this behavior does not depend on dovecot and pigeonhole version - I've tried dovecot 2.3.7, 2.2.30.x, 2.3.9.3 and 2.3.10, pigeonhole 0.5.7.2, 0.5.9 and 0.5.10, fresh install and working in docker container. The dovecot-sysreport was taken from the last one. What am I doing wrong? Is it pigeonhole bug or smth like that?