On Wed, Mar 31, 2021 at 8:46 AM Christian Ehrhardt <christian.ehrhardt@canonical.com> wrote:
On Tue, Mar 30, 2021 at 9:21 PM Josef 'Jeff' Sipek <jeff.sipek@open-xchange.com> wrote:
On Tue, Mar 30, 2021 at 13:34:54 -0400, Josef 'Jeff' Sipek wrote:
On Tue, Mar 30, 2021 at 17:53:27 +0200, Christian Ehrhardt wrote:
Hi, the recent Ubuntu (re)builds uncovered an issue with dovecot 1:2.3.13+dfsg1-1 build log: https://launchpadlibrarian.net/529849650/buildlog_ubuntu-hirsute-amd64.dovec... A coworker tried 2.3.14 but got the same result.
What fails is the json_parser build time test like: test-json-parser.c:161: Assert(#25) failed: null_strcmp(json_output[pos].value, value) == 0
I was looking into that a bit more and what I found is that it is dependent on the new toolchain of gcc 10.2.0-1.
FWIW, I managed to reproduce it on FreeBSD with gcc 11, so the good news for you is that it isn't Ubuntu specific :)
I'll debug further.
The culprit seems to be LTO. If you disable LTO, everything should work just fine.
I've had LTO disabled and it has still shown the same effect (with my gcc 10.2.0-1). I'll give it a non-LTO retry and double check if it really changed the compile options accordingly. I'll let you know about that later on.
Indeed, I wonder what I tried yesterday in regard to LTO then .. :-/ I can confirm that disabling LTO fixes the issue for me as well and for now that should be a good mitigation until the root cause is found and fixed. Since it might help debugging the underlying problem with LTO here is another data point. With LTO enabled (and skipping the json-parser issues with my optimization trick) there is another testcase later that fails (but works with LTO disabled): test-istream-attachment.c:354: Assert failed: memcmp(data + sizeof(BINARY_TEXT_LONG)-1, BINARY_TEXT_SHORT, strlen(BINARY_TEXT_SHORT)) == 0 istream attachment ................................................... : FAILED Panic: file test-istream-attachment.c: line 395 (test_istream_attachment_extractor_one): assertion failed: (size >= prefix_len && memcmp(data, mail_broken_input_body_prefix, prefix_len) == 0) Error: Raw backtrace: ./test-istream-attachment(+0x4cd95) [0x55c0db91bd95] -> ./test-istream-attachment(backtrace_get+0x75) [0x55c0db91bf65] -> ./test-istream-attachment(+0x2a7fb) [0x55c0db8f97fb] -> ./test-istream-attachment(+0x2a837) [0x55c0db8f9837] -> ./test-istream-attachment(+0x13c5c) [0x55c0db8e2c5c] -> ./test-istream-attachment(+0x12d39) [0x55c0db8e1d39] -> ./test-istream-attachment(+0x1cca3) [0x55c0db8ebca3] -> ./test-istream-attachment(+0x2424d) [0x55c0db8f324d] -> ./test-istream-attachment(test_run+0x63) [0x55c0db8f32f3] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xd5) [0x7f60d232d565] -> ./test-istream-attachment(_start+0x2e) [0x55c0db8e7c2e] /bin/bash: line 1: 1650909 Aborted (core dumped) ./$bin
So, I think that'll be the "official" workaround - and a much better one than disabling optimization completely.
Well, "completely" is a bit hard, as I only disabled it on a single function and not the full build :-) But yeah if it really turns out to be LTO then disabling that will be fine as an avoidance until we've found the underlying root cause.
Now, the big question is, is something in the test breaking or is the parser itself somehow triggering this.
Jeff.
Thanks again for the report,
Jeff.
Not all calls to json_parse_* fail, e.g. the first one looks all good and passes I was iterating the tests using a report function defined like
(gdb) define repcon
c p pos p json_output[pos].type p type p json_output[pos].value p value call null_strcmp(json_output[pos].value, value) end
The first one to be bad was: Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at test-json-parser.c:161 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); $84 = 25 $85 = JSON_TYPE_NUMBER $86 = JSON_TYPE_NUMBER $87 = 0x555555633b25 "-12.456" $88 = 0x555555693110 "" $89 = 45
Earlier and later parsing was happy, for example
Breakpoint 2, test_json_parser_success (full_size=<optimized out>) at test-json-parser.c:161 161 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); $90 = 27 $91 = JSON_TYPE_NUMBER $92 = JSON_TYPE_NUMBER $93 = 0x555555633b32 "12.456e9" $94 = 0x555555693110 "12.456e9" $95 = 0 (gdb)
We have two things we compare here. 1. json_output[] which is a static define and for this value is 67 »···{ JSON_TYPE_NUMBER, "-12.456" }, 2. the return value that json_parse_next returns. 25 »···" \"sub2\":-12.456,\n" 148 »···»···»···»···ret = json_parse_next(parser, &type, &value);
I tried a non negative number and got success which is suspicious
Breakpoint 5, test_json_parser_success (full_size=<optimized out>) at test-json-parser.c:164 164 test_assert_idx(null_strcmp(json_output[pos].value, value) == 0, pos); $122 = 25 $123 = JSON_TYPE_NUMBER $124 = JSON_TYPE_NUMBER $125 = 0x555555633b2c "12.456" $126 = 0x555555693110 "12.456" $127 = 0 (gdb)
Also the return value otherwise LGTM, it is recognized as a number: 540 } else if ((ret = json_parse_number(parser, value_r)) >= 0) { (gdb) n 541 *type_r = JSON_TYPE_NUMBER;
So parsing worked one might think, but the value was not set.
(gdb) p *value_r $833 = 0x555555693110 ""
It turns out json_parse_number is too optimized. Too optimized for debugging further into json_parse_number. But also optimization is causing the issue. With -O0 or -O1 it works fine, even putting the lower optimization level just around json_parse_number it is enough to avoid the issue.
Not a fix but "avoidance" for now:
--- a/src/lib/json-parser.c +++ b/src/lib/json-parser.c @@ -386,6 +386,8 @@ static int json_parse_int(struct json_pa return 1; }
+#pragma GCC push_options +#pragma GCC optimize ("-O0") static int json_parse_number(struct json_parser *parser, const char **value_r) { int ret; @@ -415,6 +417,7 @@ static int json_parse_number(struct json *value_r = str_c(parser->value); return 1; } +#pragma GCC pop_options
static int json_parse_atom(struct json_parser *parser, const char *atom) {
I was trying to see what happened and tracked each str_append_c that happened inside json_parse_number. A breakpoint on str_append_c and backtraces showed that we clearly do the "right" parsing. From the backtraces I've seen it does (just from positions):
For example (gdb) bt #0 str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 #1 json_parse_digits (parser=<optimized out>) at json-parser.c:366 => that is a digit
It does "- digit digit dot digit digit digit"
This perfectly matches the input value, but still the result stays empty: (gdb) p (char*)(str->data) $738 = 0x555555693110 ""
In comparison - when parsing the latter 12.456e9 which works I see two odd things: 1. the value isn't cleared despite str_truncate(parser->value, 0); at the beginning of json_parse_number 2. the value (this time) is read exactly as I assumed
Breakpoint 17, json_parse_number (value_r=0x7fffffffe1a0, parser=0x555555692ee0) at json-parser.c:394 394 str_truncate(parser->value, 0); Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 44 buffer_append_c(str, chr); (gdb) p (char*)(str->data) $741 = 0x555555693110 "sub3"
^^ not cleared, still has "sub3" even later on
Breakpoint 18, str_append_c (chr=<optimized out>, str=0x5555556930d0) at str.h:44 44 buffer_append_c(str, chr); (gdb) p (char*)(str->data) $742 = 0x555555693110 "1ub3"
^^ reading left to right one char at a time "over the old value", which didn't update the data.
So it seems in json_parse_number the ordering of a) str_truncate(parser->value, 0); b) the parsing itself is messed up by optimization.
We already know from the above that reducing just this function to -O1 or less avoids the issue. But what is it really - I don't know?
Maybe you have a better idea what is going on, the issue should be reproducible when building dovecot with the toolchain present on the latest Ubuntu being 21.04 (Hirsute) right now.
-- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd
-- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd
-- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd