<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-text-html" lang="x-unicode">
<p>Hello!</p>
<p>Recently I've faced with strange issue. So, I want to filter
some mails which do not contain cyrillic symbols. I would not
like receive email on foreign language except russian and I'm
using rule below, but it does not work when text of mail has
unicode symbol u2019 or ’ - right single quotation mark.</p>
<pre><code>require ["body","regex"];</code></pre>
<pre><code> # rule:[Regexp test]</code></pre>
<pre><code> if not body :text :regex
".*[аÐбБвВгГдДеЕёÐжЖзЗиИйЙкКлЛмМнÐоОпПрРÑСтТуУфФхХцЦчЧшШщЩъЪыЫьЬÑÐюЮÑЯ].*"</code></pre>
<pre><code> {</code></pre>
<pre><code> Â Â Â discard;</code></pre>
<pre><code> Â Â Â stop;</code></pre>
<pre><code> }</code></pre>
<p> I checked this behavior on different versions of dovecot and
pigeonhole, and it was the same in all cases. If I change u2019
to, for instance to apostrophe, it begins work as I expect - the
mail discarded.<br>
</p>
<p>Below some information which was got from sieve-test utility.
The text consist of only one phrase <b>Test’test</b>.</p>
<br>
<p>Check string <b>Test’test </b>- without u2019, using
apostrophe</p>
<pre>root@<a class="moz-txt-link-freetext" href="a4e4b17d33a1:/srv/mail#">a4e4b17d33a1:/srv/mail#</a> tail -2 1586937347.M574837P24389.vps.kveri.ru\,S\=1904\,W\=1944\:2\,S
Test'test
</pre>
sieve-test output<br>
<pre>* Script metadata (block: 0):
class = file
class.version = 0
location = /srv/mail/roundcube.sieve
* Required extensions (block: 1):
 0: body (id: 18)
 1: regex (id: 13)
* Main program (block: 2):
Address  Line Code
00000000:Â Â Â Â Â Â DEBUG BLOCK: 3
00000001:Â Â Â Â Â Â EXTENSIONS [2]:
00000002:Â Â Â Â Â Â Â Â body
00000004:Â Â Â Â Â Â Â Â regex
00000006:Â Â Â 3: BODY
00000007:Â Â Â Â Â Â Â Â BODY-TRANSFORM: TEXT
0000000b:Â Â Â Â Â Â Â Â match type: regex
0000000d:        key list: STR[138] ".*[аÐбБвВгГдДеЕёÐжЖзЗиИйЙкКлЛмМнÐоОпПрРÑ...
0000009b:Â Â Â 3: JMPTRUE 6 [000000a2]
000000a0:Â Â Â 5: DISCARD
000000a1:Â Â Â 6: STOP
000000a2:Â Â Â 6: [End of code]
Performed actions:
 * discard
Implicit keep:
 (none)
</pre>
In this case the rule works as I expect<br>
<p><br>
The second test <b>Test’test</b> with ’ instead of apostrophe</p>
<pre>root@<a class="moz-txt-link-freetext" href="a4e4b17d33a1:/srv/mail#">a4e4b17d33a1:/srv/mail#</a> tail -2 1586937347.M574837P24389.vps.kveri.ru\,S\=1904\,W\=1944\:2\,S
Test’test
</pre>
<p><br>
</p>
<p>sieve-test output<br>
</p>
<pre>* Script metadata (block: 0):
class = file
class.version = 0
location = /srv/mail/roundcube.sieve
* Required extensions (block: 1):
 0: body (id: 18)
 1: regex (id: 13)
* Main program (block: 2):
Address  Line Code
00000000:Â Â Â Â Â Â DEBUG BLOCK: 3
00000001:Â Â Â Â Â Â EXTENSIONS [2]:
00000002:Â Â Â Â Â Â Â Â body
00000004:Â Â Â Â Â Â Â Â regex
00000006:Â Â Â 3: BODY
00000007:Â Â Â Â Â Â Â Â BODY-TRANSFORM: TEXT
0000000b:Â Â Â Â Â Â Â Â match type: regex
0000000d:        key list: STR[138] ".*[аÐбБвВгГдДеЕёÐжЖзЗиИйЙкКлЛмМнÐоОпПрРÑ...
0000009b:Â Â Â 3: JMPTRUE 6 [000000a2]
000000a0:Â Â Â 5: DISCARD
000000a1:Â Â Â 6: STOP
000000a2:Â Â Â 6: [End of code]
Performed actions:
 (none)
Implicit keep:
 * store message in folder: INBOX
</pre>
<p> In this case email "was located" into INBOX, but I expected it
should be discarded. As I said this behavior does not depend on
dovecot and pigeonhole version - I've tried dovecot 2.3.7,
2.2.30.x, 2.3.9.3 and 2.3.10, pigeonhole 0.5.7.2, 0.5.9 and
0.5.10, fresh install and working in docker container. The
dovecot-sysreport was taken from the last one. What am I doing
wrong? Is it pigeonhole bug or smth like that?</p>
<p><br>
</p>
</div>
</body>
</html>