Test failure on ARM: backtrace_append
Hi,
I am a packager in Guix and user of dovecot on an ARM server (armhf). The Guix package fails to build because of a test error (see the last lines of the full build log: https://ci.guix.gnu.org/log/f5if2qb5rsqag3a6jy5vga1l9hm9pkj0-dovecot-2.3.9.3). As the name suggests, this is dovecot 2.3.9.3, the latest version. Previous versions built fine, as well as the same version on x86 and x86_64. Note that the build on guix's build farm was done in a qemu vm, but I got the same test failure on real hardware.
I'm not sure what the best course of action is. Did I find a legitimate bug, or should I simply ignore that test for that architecture? Dovecot itself seems to be running fine when I ignore the tests.
Thanks for any guidance you could provide!
On 20/02/2020 18:56 Julien Lepiller julien@lepiller.eu wrote:
Hi,
I am a packager in Guix and user of dovecot on an ARM server (armhf). The Guix package fails to build because of a test error (see the last lines of the full build log: https://ci.guix.gnu.org/log/f5if2qb5rsqag3a6jy5vga1l9hm9pkj0-dovecot-2.3.9.3). As the name suggests, this is dovecot 2.3.9.3, the latest version. Previous versions built fine, as well as the same version on x86 and x86_64. Note that the build on guix's build farm was done in a qemu vm, but I got the same test failure on real hardware.
I'm not sure what the best course of action is. Did I find a legitimate bug, or should I simply ignore that test for that architecture? Dovecot itself seems to be running fine when I ignore the tests.
Thanks for any guidance you could provide!
On 20/02/2020 18:56 Julien Lepiller julien@lepiller.eu wrote:
Hi,
I am a packager in Guix and user of dovecot on an ARM server (armhf). The Guix package fails to build because of a test error (see the last lines of the full build log: https://ci.guix.gnu.org/log/f5if2qb5rsqag3a6jy5vga1l9hm9pkj0-dovecot-2.3.9.3). As the name suggests, this is dovecot 2.3.9.3, the latest version. Previous versions built fine, as well as the same version on x86 and x86_64. Note that the build on guix's build farm was done in a qemu vm, but I got the same test failure on real hardware.
I'm not sure what the best course of action is. Did I find a legitimate bug, or should I simply ignore that test for that architecture? Dovecot itself seems to be running fine when I ignore the tests.
Thanks for any guidance you could provide!
(sorry for the previous mail, client acted up...)
Can you provide backtrace for the core file? This seems to be related to libunwind somehow.
Aki
Le 20 février 2020 12:58:52 GMT-05:00, Aki Tuomi aki.tuomi@open-xchange.com a écrit :
On 20/02/2020 18:56 Julien Lepiller julien@lepiller.eu wrote:
Hi,
I am a packager in Guix and user of dovecot on an ARM server (armhf). The Guix package fails to build because of a test error (see the last lines of the full build log: https://ci.guix.gnu.org/log/f5if2qb5rsqag3a6jy5vga1l9hm9pkj0-dovecot-2.3.9.3). As the name suggests, this is dovecot 2.3.9.3, the latest version. Previous versions built fine, as well as the same version on x86 and x86_64. Note that the build on guix's build farm was done in a qemu vm, but I got the same test failure on real hardware.
I'm not sure what the best course of action is. Did I find a legitimate bug, or should I simply ignore that test for that architecture? Dovecot itself seems to be running fine when I ignore the tests.
Thanks for any guidance you could provide!
(sorry for the previous mail, client acted up...)
Can you provide backtrace for the core file? This seems to be related to libunwind somehow.
Aki
I can try, but it will take me some time to rebuild. I forgot to keep the previous attempts. I think this is unrelated to libunwind since the asserts that fail are in the #elif case for libunwind. We don't build with libunwind (should we? What does it bring us?)
On 20/02/2020 20:31 Julien Lepiller julien@lepiller.eu wrote:
Le 20 février 2020 12:58:52 GMT-05:00, Aki Tuomi aki.tuomi@open-xchange.com a écrit :
On 20/02/2020 18:56 Julien Lepiller julien@lepiller.eu wrote:
Hi,
I am a packager in Guix and user of dovecot on an ARM server (armhf). The Guix package fails to build because of a test error (see the last lines of the full build log: https://ci.guix.gnu.org/log/f5if2qb5rsqag3a6jy5vga1l9hm9pkj0-dovecot-2.3.9.3). As the name suggests, this is dovecot 2.3.9.3, the latest version. Previous versions built fine, as well as the same version on x86 and x86_64. Note that the build on guix's build farm was done in a qemu vm, but I got the same test failure on real hardware.
I'm not sure what the best course of action is. Did I find a legitimate bug, or should I simply ignore that test for that architecture? Dovecot itself seems to be running fine when I ignore the tests.
Thanks for any guidance you could provide!
(sorry for the previous mail, client acted up...)
Can you provide backtrace for the core file? This seems to be related to libunwind somehow.
Aki
I can try, but it will take me some time to rebuild. I forgot to keep the previous attempts. I think this is unrelated to libunwind since the asserts that fail are in the #elif case for libunwind. We don't build with libunwind (should we? What does it bring us?)
Then it's somehow related into how the backtrace handling works in ARM. Anyways, cannot say anything without the core file.
Aki
Le 20 février 2020 13:37:20 GMT-05:00, Aki Tuomi aki.tuomi@open-xchange.com a écrit :
On 20/02/2020 20:31 Julien Lepiller julien@lepiller.eu wrote:
Le 20 février 2020 12:58:52 GMT-05:00, Aki Tuomi aki.tuomi@open-xchange.com a écrit :
On 20/02/2020 18:56 Julien Lepiller julien@lepiller.eu wrote:
Hi,
I am a packager in Guix and user of dovecot on an ARM server
(armhf).
The Guix package fails to build because of a test error (see the last lines of the full build log:
As the name suggests, this is dovecot 2.3.9.3, the latest version. Previous versions built fine, as well as the same version on x86 and x86_64. Note that the build on guix's build farm was done in a qemu vm, but I got the same test failure on real hardware.
I'm not sure what the best course of action is. Did I find a
legitimate bug, or should I simply ignore that test for that architecture? Dovecot itself seems to be running fine when I ignore
https://ci.guix.gnu.org/log/f5if2qb5rsqag3a6jy5vga1l9hm9pkj0-dovecot-2.3.9.3). the
tests.
Thanks for any guidance you could provide!
(sorry for the previous mail, client acted up...)
Can you provide backtrace for the core file? This seems to be related to libunwind somehow.
Aki
I can try, but it will take me some time to rebuild. I forgot to keep the previous attempts. I think this is unrelated to libunwind since the asserts that fail are in the #elif case for libunwind. We don't build with libunwind (should we? What does it bring us?)
Then it's somehow related into how the backtrace handling works in ARM. Anyways, cannot say anything without the core file.
Aki
Thanks, it took me two hours to produce the core dump, but here it is. I suppose putting it as an attachment is not ok for a mailing list, so I put it on my server instead (tell me if you want it as an attachment instead, I couldn't find that info). See https://lepiller.eu/files/core (400KB) and https://lepiller.eu/files/test-lib (2.7MB) (the program that dumped core). Thanks for your help!
Le Fri, 21 Feb 2020 07:25:15 +0200 (EET), Aki Tuomi aki.tuomi@open-xchange.com a écrit :
Thank you,
as I don't have arm at hand, can you provide
gdb .test-lib core bt full
output?
Aki Tuomi
There you are (sorry for the long file names, that's how guix works, it's expected):
[New LWP 21501]
Core was generated by `./test-lib'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0xb6e24d30 in strchr () from
/gnu/store/n7c20pjm6q1xq1gqjqzzys1yk9fy7n1k-glibc-2.29/lib/libc.so.6
(gdb) bt full
#0 0xb6e24d30 in strchr () from
/gnu/store/n7c20pjm6q1xq1gqjqzzys1yk9fy7n1k-glibc-2.29/lib/libc.so.6
No symbol table info available.
#1 0xb6e26094 in strstr () from
/gnu/store/n7c20pjm6q1xq1gqjqzzys1yk9fy7n1k-glibc-2.29/lib/libc.so.6
No symbol table info available.
#2 0x000144fe in test_backtrace_get () at test-backtrace.c:45
bt = 0x0
bt = <optimized out>
#3 test_backtrace () at test-backtrace.c:56
No locals.
#4 0x000353a4 in test_run_named_funcs (tests=tests@entry=0x6608c
So the problem is that we got no backtrace at all, which means that your system is not working as expected. If you have no libunwind installed (with headers) then it will use libc backtrace get, which apparently on your system is producing failure. You can try adding -UHAVE_BACKTRACE_SYMBOLS and -UHAVE_WALKCONTEXT to EXTRA_CFLAGS, this will disable backtrace_get in dovecot with no other side effects.
So basically, ./configure ... EXTRA_CFLAGS="-UHAVE_BACKTRACE_SYMBOLS -UHAVE_WALKCONTEXT" or remove them from config.h before compiling.
Aki
On 21.2.2020 14.23, Julien Lepiller wrote:
Le Fri, 21 Feb 2020 07:25:15 +0200 (EET), Aki Tuomi aki.tuomi@open-xchange.com a écrit :
Thank you,
as I don't have arm at hand, can you provide
gdb .test-lib core bt full
output?
Aki Tuomi
There you are (sorry for the long file names, that's how guix works, it's expected):
[New LWP 21501] Core was generated by `./test-lib'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0xb6e24d30 in strchr () from /gnu/store/n7c20pjm6q1xq1gqjqzzys1yk9fy7n1k-glibc-2.29/lib/libc.so.6 (gdb) bt full #0 0xb6e24d30 in strchr () from /gnu/store/n7c20pjm6q1xq1gqjqzzys1yk9fy7n1k-glibc-2.29/lib/libc.so.6 No symbol table info available. #1 0xb6e26094 in strstr () from /gnu/store/n7c20pjm6q1xq1gqjqzzys1yk9fy7n1k-glibc-2.29/lib/libc.so.6 No symbol table info available. #2 0x000144fe in test_backtrace_get () at test-backtrace.c:45 bt = 0x0 bt = <optimized out> #3 test_backtrace () at test-backtrace.c:56 No locals. #4 0x000353a4 in test_run_named_funcs (tests=tests@entry=0x6608c
, match=match@entry=0x6ddcc "") at test-common.c:288 _data_stack_cur_id = 2 i = <optimized out> #5 0x00035bf0 in test_run_named_with_fatals (match=0x6ddcc "", tests=0x6608c , fatals=0x6603c ) at test-common.c:370 No locals. #6 0xb6de6426 in __libc_start_main () from /gnu/store/n7c20pjm6q1xq1gqjqzzys1yk9fy7n1k-glibc-2.29/lib/libc.so.6 No symbol table info available. #7 0x00013268 in _start () No symbol table info available. Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)
participants (2)
-
Aki Tuomi
-
Julien Lepiller