<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
<div class="moz-text-html" lang="x-unicode">
<p>Hello!</p>
<p>Recently I've faced with strange issue. So, I want to filter
some mails which do not contain cyrillic symbols. I would not
like receive email on foreign language except russian and I'm
using rule below, but it does not work when text of mail has
unicode symbol u2019 or ’ - right single quotation mark.</p>
<pre><code>require ["body","regex"];</code></pre>
<pre><code> # rule:[Regexp test]</code></pre>
<pre><code> if not body :text :regex
".*[аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРсСтТуУфФхХцЦчЧшШщЩъЪыЫьЬэЭюЮяЯ].*"</code></pre>
<pre><code> {</code></pre>
<pre><code> discard;</code></pre>
<pre><code> stop;</code></pre>
<pre><code> }</code></pre>
<p> I checked this behavior on different versions of dovecot and
pigeonhole, and it was the same in all cases. If I change u2019
to, for instance to apostrophe, it begins work as I expect - the
mail discarded.<br>
</p>
<p>Below some information which was got from sieve-test utility.
The text consist of only one phrase <b>Test’test</b>.</p>
<br>
<p>Check string <b>Test’test </b>- without u2019, using
apostrophe</p>
<pre>root@<a class="moz-txt-link-freetext" href="a4e4b17d33a1:/srv/mail#">a4e4b17d33a1:/srv/mail#</a> tail -2 1586937347.M574837P24389.vps.kveri.ru\,S\=1904\,W\=1944\:2\,S
Test'test
</pre>
sieve-test output<br>
<pre>* Script metadata (block: 0):
class = file
class.version = 0
location = /srv/mail/roundcube.sieve
* Required extensions (block: 1):
0: body (id: 18)
1: regex (id: 13)
* Main program (block: 2):
Address Line Code
00000000: DEBUG BLOCK: 3
00000001: EXTENSIONS [2]:
00000002: body
00000004: regex
00000006: 3: BODY
00000007: BODY-TRANSFORM: TEXT
0000000b: match type: regex
0000000d: key list: STR[138] ".*[аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРс...
0000009b: 3: JMPTRUE 6 [000000a2]
000000a0: 5: DISCARD
000000a1: 6: STOP
000000a2: 6: [End of code]
Performed actions:
* discard
Implicit keep:
(none)
</pre>
In this case the rule works as I expect<br>
<p><br>
The second test <b>Test’test</b> with ’ instead of apostrophe</p>
<pre>root@<a class="moz-txt-link-freetext" href="a4e4b17d33a1:/srv/mail#">a4e4b17d33a1:/srv/mail#</a> tail -2 1586937347.M574837P24389.vps.kveri.ru\,S\=1904\,W\=1944\:2\,S
Test’test
</pre>
<p><br>
</p>
<p>sieve-test output<br>
</p>
<pre>* Script metadata (block: 0):
class = file
class.version = 0
location = /srv/mail/roundcube.sieve
* Required extensions (block: 1):
0: body (id: 18)
1: regex (id: 13)
* Main program (block: 2):
Address Line Code
00000000: DEBUG BLOCK: 3
00000001: EXTENSIONS [2]:
00000002: body
00000004: regex
00000006: 3: BODY
00000007: BODY-TRANSFORM: TEXT
0000000b: match type: regex
0000000d: key list: STR[138] ".*[аАбБвВгГдДеЕёЁжЖзЗиИйЙкКлЛмМнНоОпПрРс...
0000009b: 3: JMPTRUE 6 [000000a2]
000000a0: 5: DISCARD
000000a1: 6: STOP
000000a2: 6: [End of code]
Performed actions:
(none)
Implicit keep:
* store message in folder: INBOX
</pre>
<p> In this case email "was located" into INBOX, but I expected it
should be discarded. As I said this behavior does not depend on
dovecot and pigeonhole version - I've tried dovecot 2.3.7,
2.2.30.x, 2.3.9.3 and 2.3.10, pigeonhole 0.5.7.2, 0.5.9 and
0.5.10, fresh install and working in docker container. The
dovecot-sysreport was taken from the last one. What am I doing
wrong? Is it pigeonhole bug or smth like that?</p>
<p><br>
</p>
</div>
</body>
</html>