json_parse_number broken by compiler optimization

Christian Ehrhardt christian.ehrhardt at canonical.com
Wed Mar 31 09:46:58 EEST 2021


On Tue, Mar 30, 2021 at 9:21 PM Josef 'Jeff' Sipek
<jeff.sipek at open-xchange.com> wrote:
>
> On Tue, Mar 30, 2021 at 13:34:54 -0400, Josef 'Jeff' Sipek wrote:
> > On Tue, Mar 30, 2021 at 17:53:27 +0200, Christian Ehrhardt wrote:
> > > Hi,
> > > the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1
> > > build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovecot_1%3A2.3.13+dfsg1-1build1_BUILDING.txt.gz
> > > A coworker tried 2.3.14 but got the same result.
> > >
> > > What fails is the json_parser build time test like:
> > >   test-json-parser.c:161: Assert(#25) failed:
> > > null_strcmp(json_output[pos].value, value) == 0
> > >
> > > I was looking into that a bit more and what I found is that it is
> > > dependent on the new toolchain
> > > of gcc 10.2.0-1.
> >
> > FWIW, I managed to reproduce it on FreeBSD with gcc 11, so the good news for
> > you is that it isn't Ubuntu specific :)
> >
> > I'll debug further.
>
> The culprit seems to be LTO.  If you disable LTO, everything should work
> just fine.

I've had LTO disabled and it has still shown the same effect (with my
gcc 10.2.0-1).
I'll give it a non-LTO retry and double check if it really changed the
compile options accordingly.
I'll let you know about that later on.

>  So, I think that'll be the "official" workaround - and a much
> better one than disabling optimization completely.

Well, "completely" is a bit hard, as I only disabled it on a single
function and not the full build :-)
But yeah if it really turns out to be LTO then disabling that will be
fine as an avoidance until we've found the underlying root cause.

> Now, the big question is, is something in the test breaking or is the parser
> itself somehow triggering this.
>
> Jeff.
>
> >
> > Thanks again for the report,
> >
> > Jeff.
> >
> > >
> > > Not all calls to json_parse_* fail, e.g. the first one looks all good and passes
> > > I was iterating the tests using a report function defined like
> > >
> > > (gdb) define repcon
> > > >c
> > > >p pos
> > > >p json_output[pos].type
> > > >p type
> > > >p json_output[pos].value
> > > >p value
> > > >call null_strcmp(json_output[pos].value, value)
> > > >end
> > >
> > > The first one to be bad was:
> > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at
> > > test-json-parser.c:161
> > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
> > > $84 = 25
> > > $85 = JSON_TYPE_NUMBER
> > > $86 = JSON_TYPE_NUMBER
> > > $87 = 0x555555633b25 "-12.456"
> > > $88 = 0x555555693110 ""
> > > $89 = 45
> > >
> > > Earlier and later parsing was happy, for example
> > >
> > > Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at
> > > test-json-parser.c:161
> > > 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
> > > $90 = 27
> > > $91 = JSON_TYPE_NUMBER
> > > $92 = JSON_TYPE_NUMBER
> > > $93 = 0x555555633b32 "12.456e9"
> > > $94 = 0x555555693110 "12.456e9"
> > > $95 = 0
> > > (gdb)
> > >
> > >
> > > We have two things we compare here.
> > > 1. json_output[] which is a static define and for this value is
> > >    67 »···{ JSON_TYPE_NUMBER, "-12.456" },
> > > 2. the return value that json_parse_next returns.
> > >    25 »···"  \"sub2\":-12.456,\n"
> > >   148 »···»···»···»···ret = json_parse_next(parser, &type, &value);
> > >
> > > I tried a non negative number and got success which is suspicious
> > >
> > > Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at
> > > test-json-parser.c:164
> > > 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos);
> > > $122 = 25
> > > $123 = JSON_TYPE_NUMBER
> > > $124 = JSON_TYPE_NUMBER
> > > $125 = 0x555555633b2c "12.456"
> > > $126 = 0x555555693110 "12.456"
> > > $127 = 0
> > > (gdb)
> > >
> > >
> > > Also the return value otherwise LGTM, it is recognized as a number:
> > > 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) {
> > > (gdb) n
> > > 541 *type_r = JSON_TYPE_NUMBER;
> > >
> > > So parsing worked one might think, but the value was not set.
> > >
> > > (gdb) p *value_r
> > > $833 = 0x555555693110 ""
> > >
> > > It turns out json_parse_number is too optimized.
> > > Too optimized for debugging further into json_parse_number.
> > > But also optimization is causing the issue.
> > > With -O0 or -O1 it works fine, even putting the lower optimization
> > > level just around json_parse_number it is enough to avoid the issue.
> > >
> > >
> > > Not a fix but "avoidance" for now:
> > >
> > > --- a/src/lib/json-parser.c
> > > +++ b/src/lib/json-parser.c
> > > @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa
> > >   return 1;
> > >  }
> > >
> > > +#pragma GCC push_options
> > > +#pragma GCC optimize ("-O0")
> > >  static int json_parse_number(struct json_parser *parser, const char **value_r)
> > >  {
> > >   int ret;
> > > @@ -415,6 +417,7 @@ static int json_parse_number(struct json
> > >   *value_r = str_c(parser->value);
> > >   return 1;
> > >  }
> > > +#pragma GCC pop_options
> > >
> > >  static int json_parse_atom(struct json_parser *parser, const char *atom)
> > >  {
> > >
> > >
> > > I was trying to see what happened and tracked each str_append_c
> > > that happened inside json_parse_number.
> > > A breakpoint on str_append_c and backtraces showed that we clearly do the
> > > "right" parsing. From the backtraces I've seen it does (just from positions):
> > >
> > > For example
> > > (gdb) bt
> > > #0  str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44
> > > #1  json_parse_digits (parser=<optimized out>) at json-parser.c:366
> > > => that is a digit
> > >
> > > It does "- digit digit dot digit digit digit"
> > >
> > > This perfectly matches the input value, but still the result stays empty:
> > > (gdb) p (char*)(str->data)
> > > $738 = 0x555555693110 ""
> > >
> > > In comparison - when parsing the latter 12.456e9 which works I see two
> > > odd things:
> > > 1. the value isn't cleared despite
> > >     str_truncate(parser->value, 0);
> > >    at the beginning of json_parse_number
> > > 2. the value (this time) is read exactly as I assumed
> > >
> > > Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0,
> > > parser=0x555555692ee0) at json-parser.c:394
> > > 394 str_truncate(parser->value, 0);
> > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0)
> > > at str.h:44
> > > 44 buffer_append_c(str, chr);
> > > (gdb) p (char*)(str->data)
> > > $741 = 0x555555693110 "sub3"
> > >
> > > ^^ not cleared, still has "sub3" even later on
> > >
> > > Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0)
> > > at str.h:44
> > > 44 buffer_append_c(str, chr);
> > > (gdb) p (char*)(str->data)
> > > $742 = 0x555555693110 "1ub3"
> > >
> > > ^^ reading left to right one char at a time "over the old value",
> > > which didn't update the data.
> > >
> > > So it seems in json_parse_number the ordering of
> > > a) str_truncate(parser->value, 0);
> > > b) the parsing itself
> > > is messed up by optimization.
> > >
> > > We already know from the above that reducing just this function to -O1 or less
> > > avoids the issue. But what is it really - I don't know?
> > >
> > > Maybe you have a better idea what is going on, the issue should be
> > > reproducible when building
> > > dovecot with the toolchain present on the latest Ubuntu being 21.04
> > > (Hirsute) right now.
> > >
> > > --
> > > Christian Ehrhardt
> > > Staff Engineer, Ubuntu Server
> > > Canonical Ltd



-- 
Christian Ehrhardt
Staff Engineer, Ubuntu Server
Canonical Ltd


More information about the dovecot mailing list