------------------ Original ------------------
From: "dovecot" <dovecot-request@dovecot.org>;
Date: Tue, Mar 30, 2021 11:55 PM
To: "dovecot"<dovecot@dovecot.org>;
Subject: dovecot Digest, Vol 215, Issue 77

Send dovecot mailing list submissions to
dovecot@dovecot.org

To subscribe or unsubscribe via the World Wide Web, visit
https://dovecot.org/mailman/listinfo/dovecot
or, via email, send a message with subject or body 'help' to
dovecot-request@dovecot.org

You can reach the person managing the list at
dovecot-owner@dovecot.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of dovecot digest..."


Today's Topics:

   1. special characters in FTS Solr fail (piuma@piumalab.org)
   2. json_parse_number broken by compiler optimization
      (Christian Ehrhardt)
   3. Re: json_parse_number broken by compiler optimization (Aki Tuomi)


----------------------------------------------------------------------

Message: 1
Date: Tue, 30 Mar 2021 16:20:56 +0200
From: "piuma@piumalab.org" <piuma@piumalab.org>
To: dovecot@dovecot.org
Subject: special characters in FTS Solr fail
Message-ID: <3c779646-e2fc-9f51-063c-a8816450dbfd@piumalab.org>
Content-Type: text/plain; charset="utf-8"

Hi, I want to report a bug about searching.
My dovecot version is 2.3.10.1 (a3d0e1171) and use Solr 7.7.2 for full text search.
The issue appear when try to search using solr special characters. For example
if I have emails with the subject that contain the only character "(" open
parenthesis, then try to search it I receive zero results.

In the Solr log seems the "(" is correctly excaped as "\(":

2021-03-30 15:14:20.304 INFO  (qtp333683827-20) [   x:dovecot] o.a.s.c.S.Request
[dovecot]  webapp=/solr path=/select
params={q={!lucene+q.op%3DAND}subject:\(&fl=uid,score&sort=uid+asc&fq=%2Bbox:6fff5821c5a2365df4060000d0d97f83+%2Buser:titolare01@kendama.it&rows=631&wt=xml}
hits=0 status=0 QTime=2

I provide dovecot-sysreport as attachment.

Best regards
   Danilo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dovecot-sysreport-pecbox02.par-tec.it-1617116347.tar.gz
Type: application/gzip
Size: 7395 bytes
Desc: not available
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210330/08492366/attachment-0001.gz>

------------------------------

Message: 2
Date: Tue, 30 Mar 2021 17:53:27 +0200
From: Christian Ehrhardt <christian.ehrhardt@canonical.com>
To: dovecot@dovecot.org
Subject: json_parse_number broken by compiler optimization
Message-ID:
<CAATJJ0JB9e4cZrS=hiM0r=yF4-Fz693ZK3qLa41L-rYVfy8QbQ@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Hi,
the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1
build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz
A coworker tried 2.3.14 but got the same result.

What fails is the json_parser build time test like:
  test-json-parser.c:161: Assert(#25) failed:
null_strcmp(json_output[pos].value, value) == 0

I was looking into that a bit more and what I found is that it is
dependent on the new toolchain
of gcc 10.2.0-1.

Not all calls to json_parse_* fail, e.g. the first one looks all good and passes
I was iterating the tests using a report function defined like

(gdb) define repcon
>c
>p pos
>p json_output[pos].type
>p type
>p json_output[pos].value
>p value
>call null_strcmp(json_output[pos].value, value)
>end

The first one to be bad was:
Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at
test-json-parser.c:161
161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
$84 = 25
$85 = JSON_TYPE_NUMBER
$86 = JSON_TYPE_NUMBER
$87 = 0x555555633b25 "-12.456"
$88 = 0x555555693110 ""
$89 = 45

Earlier and later parsing was happy, for example

Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at
test-json-parser.c:161
161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
$90 = 27
$91 = JSON_TYPE_NUMBER
$92 = JSON_TYPE_NUMBER
$93 = 0x555555633b32 "12.456e9"
$94 = 0x555555693110 "12.456e9"
$95 = 0
(gdb)


We have two things we compare here.
1. json_output[] which is a static define and for this value is
   67 ????{ JSON_TYPE_NUMBER, "-12.456" },
2. the return value that json_parse_next returns.
   25 ????"  \"sub2\":-12.456,\n"
  148 ????????????????ret = json_parse_next(parser, &type, &value);

I tried a non negative number and got success which is suspicious

Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at
test-json-parser.c:164
164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
$122 = 25
$123 = JSON_TYPE_NUMBER
$124 = JSON_TYPE_NUMBER
$125 = 0x555555633b2c "12.456"
$126 = 0x555555693110 "12.456"
$127 = 0
(gdb)


Also the return value otherwise LGTM, it is recognized as a number:
540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) {
(gdb) n
541 *type_r = JSON_TYPE_NUMBER;

So parsing worked one might think, but the value was not set.

(gdb) p *value_r
$833 = 0x555555693110 ""

It turns out json_parse_number is too optimized.
Too optimized for debugging further into json_parse_number.
But also optimization is causing the issue.
With -O0 or -O1 it works fine, even putting the lower optimization
level just around json_parse_number it is enough to avoid the issue.


Not a fix but "avoidance" for now:

--- a/src/lib/json-parser.c
+++ b/src/lib/json-parser.c
@@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa
  return 1;
 }

+#pragma GCC push_options
+#pragma GCC optimize ("-O0")
 static int json_parse_number(struct json_parser *parser, const char **value_r)
 {
  int ret;
@@ -415,6 +417,7 @@ static int json_parse_number(struct json
  *value_r = str_c(parser->value);
  return 1;
 }
+#pragma GCC pop_options

 static int json_parse_atom(struct json_parser *parser, const char *atom)
 {


I was trying to see what happened and tracked each str_append_c
that happened inside json_parse_number.
A breakpoint on str_append_c and backtraces showed that we clearly do the
"right" parsing. From the backtraces I've seen it does (just from positions):

For example
(gdb) bt
#0  str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44
#1  json_parse_digits (parser=<optimized out>) at json-parser.c:366
=> that is a digit

It does "- digit digit dot digit digit digit"

This perfectly matches the input value, but still the result stays empty:
(gdb) p (char*)(str->data)
$738 = 0x555555693110 ""

In comparison - when parsing the latter 12.456e9 which works I see two
odd things:
1. the value isn't cleared despite
    str_truncate(parser->value, 0);
   at the beginning of json_parse_number
2. the value (this time) is read exactly as I assumed

Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0,
parser=0x555555692ee0) at json-parser.c:394
394 str_truncate(parser->value, 0);
Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0)
at str.h:44
44 buffer_append_c(str, chr);
(gdb) p (char*)(str->data)
$741 = 0x555555693110 "sub3"

^^ not cleared, still has "sub3" even later on

Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0)
at str.h:44
44 buffer_append_c(str, chr);
(gdb) p (char*)(str->data)
$742 = 0x555555693110 "1ub3"

^^ reading left to right one char at a time "over the old value",
which didn't update the data.

So it seems in json_parse_number the ordering of
a) str_truncate(parser->value, 0);
b) the parsing itself
is messed up by optimization.

We already know from the above that reducing just this function to -O1 or less
avoids the issue. But what is it really - I don't know?

Maybe you have a better idea what is going on, the issue should be
reproducible when building
dovecot with the toolchain present on the latest Ubuntu being 21.04
(Hirsute) right now.

--
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd


------------------------------

Message: 3
Date: Tue, 30 Mar 2021 18:55:44 +0300 (EEST)
From: Aki Tuomi <aki.tuomi@open-xchange.com>
To: Christian Ehrhardt <christian.ehrhardt@canonical.com>,
dovecot@dovecot.org
Subject: Re: json_parse_number broken by compiler optimization
Message-ID:
<1219230463.3728.1617119744636@appsuite-dev-gw1.open-xchange.com>
Content-Type: text/plain; charset=UTF-8


> On 30/03/2021 18:53 Christian Ehrhardt <christian.ehrhardt@canonical.com> wrote:
>

> Hi,
> the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1
> build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz
> A coworker tried 2.3.14 but got the same result.
>
> What fails is the json_parser build time test like:
>   test-json-parser.c:161: Assert(#25) failed:
> null_strcmp(json_output[pos].value, value) == 0
>
> I was looking into that a bit more and what I found is that it is
> dependent on the new toolchain
> of gcc 10.2.0-1.
>
> Not all calls to json_parse_* fail, e.g. the first one looks all good and passes
> I was iterating the tests using a report function defined like
>
> (gdb) define repcon
> >c
> >p pos
> >p json_output[pos].type
> >p type
> >p json_output[pos].value
> >p value
> >call null_strcmp(json_output[pos].value, value)
> >end
>
> The first one to be bad was:
> Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at
> test-json-parser.c:161
> 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
> $84 = 25
> $85 = JSON_TYPE_NUMBER
> $86 = JSON_TYPE_NUMBER
> $87 = 0x555555633b25 "-12.456"
> $88 = 0x555555693110 ""
> $89 = 45
>
> Earlier and later parsing was happy, for example
>
> Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at
> test-json-parser.c:161
> 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
> $90 = 27
> $91 = JSON_TYPE_NUMBER
> $92 = JSON_TYPE_NUMBER
> $93 = 0x555555633b32 "12.456e9"
> $94 = 0x555555693110 "12.456e9"
> $95 = 0
> (gdb)
>
>
> We have two things we compare here.
> 1. json_output[] which is a static define and for this value is
>    67 ????{ JSON_TYPE_NUMBER, "-12.456" },
> 2. the return value that json_parse_next returns.
>    25 ????"  \"sub2\":-12.456,\n"
>   148 ????????????????ret = json_parse_next(parser, &type, &value);
>
> I tried a non negative number and got success which is suspicious
>
> Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at
> test-json-parser.c:164
> 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
> $122 = 25
> $123 = JSON_TYPE_NUMBER
> $124 = JSON_TYPE_NUMBER
> $125 = 0x555555633b2c "12.456"
> $126 = 0x555555693110 "12.456"
> $127 = 0
> (gdb)
>
>
> Also the return value otherwise LGTM, it is recognized as a number:
> 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) {
> (gdb) n
> 541 *type_r = JSON_TYPE_NUMBER;
>
> So parsing worked one might think, but the value was not set.
>
> (gdb) p *value_r
> $833 = 0x555555693110 ""
>
> It turns out json_parse_number is too optimized.
> Too optimized for debugging further into json_parse_number.
> But also optimization is causing the issue.
> With -O0 or -O1 it works fine, even putting the lower optimization
> level just around json_parse_number it is enough to avoid the issue.
>
>
> Not a fix but "avoidance" for now:
>
> --- a/src/lib/json-parser.c
> +++ b/src/lib/json-parser.c
> @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa
>   return 1;
>  }
>
> +#pragma GCC push_options
> +#pragma GCC optimize ("-O0")
>  static int json_parse_number(struct json_parser *parser, const char **value_r)
>  {
>   int ret;
> @@ -415,6 +417,7 @@ static int json_parse_number(struct json
>   *value_r = str_c(parser->value);
>   return 1;
>  }
> +#pragma GCC pop_options
>
>  static int json_parse_atom(struct json_parser *parser, const char *atom)
>  {
>
>
> I was trying to see what happened and tracked each str_append_c
> that happened inside json_parse_number.
> A breakpoint on str_append_c and backtraces showed that we clearly do the
> "right" parsing. From the backtraces I've seen it does (just from positions):
>
> For example
> (gdb) bt
> #0  str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44
> #1  json_parse_digits (parser=<optimized out>) at json-parser.c:366
> => that is a digit
>
> It does "- digit digit dot digit digit digit"
>
> This perfectly matches the input value, but still the result stays empty:
> (gdb) p (char*)(str->data)
> $738 = 0x555555693110 ""
>
> In comparison - when parsing the latter 12.456e9 which works I see two
> odd things:
> 1. the value isn't cleared despite
>     str_truncate(parser->value, 0);
>    at the beginning of json_parse_number
> 2. the value (this time) is read exactly as I assumed
>
> Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0,
> parser=0x555555692ee0) at json-parser.c:394
> 394 str_truncate(parser->value, 0);
> Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0)
> at str.h:44
> 44 buffer_append_c(str, chr);
> (gdb) p (char*)(str->data)
> $741 = 0x555555693110 "sub3"
>
> ^^ not cleared, still has "sub3" even later on
>
> Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0)
> at str.h:44
> 44 buffer_append_c(str, chr);
> (gdb) p (char*)(str->data)
> $742 = 0x555555693110 "1ub3"
>
> ^^ reading left to right one char at a time "over the old value",
> which didn't update the data.
>
> So it seems in json_parse_number the ordering of
> a) str_truncate(parser->value, 0);
> b) the parsing itself
> is messed up by optimization.
>
> We already know from the above that reducing just this function to -O1 or less
> avoids the issue. But what is it really - I don't know?
>
> Maybe you have a better idea what is going on, the issue should be
> reproducible when building
> dovecot with the toolchain present on the latest Ubuntu being 21.04
> (Hirsute) right now.
>
> --
> Christian Ehrhardt
> Staff Engineer, Ubuntu Server
> Canonical Ltd

Thanks!

We'll look into this.

Aki


------------------------------

Subject: Digest Footer

_______________________________________________
dovecot mailing list
dovecot@dovecot.org
https://dovecot.org/mailman/listinfo/dovecot

------------------------------

End of dovecot Digest, Vol 215, Issue 77
****************************************