Auth SEGV on sparc64, alignment problem?
Apologies first for using two addresses, but I can’t currently read my email at distal.com. :-)
I was previously running dovecot2-2.2.29.1_2 on FreeBSD 11 on sparc64. Trying to debug a problem I was having with one of my clients, I upgraded to dovecot-2.2.33.2_4 on that same server. However, I cannot connect now, log shows:
Feb 20 16:55:00 westeros dovecot: master: Dovecot v2.2.33.2 (d6601f4ec) starting up for imap, pop3, lmtp Feb 20 16:55:31 westeros dovecot: auth: Fatal: master: service(auth): child 25395 killed with signal 11 (core dumped) Feb 20 16:55:31 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 2 secs Feb 20 16:55:31 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 0 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, TLS handshaking, session=<ASDFSAFSADFSAD> Feb 20 16:55:33 westeros dovecot: auth: Fatal: master: service(auth): child 25398 killed with signal 11 (core dumped) Feb 20 16:55:33 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 4 secs Feb 20 16:55:33 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 2 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, session=<d46tyesdy5dsyd> Feb 20 16:55:37 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 8 secs Feb 20 16:55:37 westeros dovecot: auth: Fatal: master: service(auth): child 25400 killed with signal 11 (core dumped)
Loading the core file, as described https://www.dovecot.org/bugreport.html , shows the error in libc somewhere:
(gdb) bt full #0 __unaligned_load ( p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 val = 0 i = 0 #1 0x00000000109f9f6c in __unaligned_fixup (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 addr = <value optimized out> val = <value optimized out> insn = 3254807616 sig = <value optimized out> #2 0x00000000109f9d50 in __sparc_utrap (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap.c:100 sig = 272013984 #3 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. #4 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. Previous frame identical to this frame (corrupt stack?) (gdb)
As this is a sparc64, with 8-byte alignment requirements, I’m guessing that’s the issue. Many a piece of software has failed to respect that and crashed. But, I’m not sure.
Does anyone have any suggestions? I’ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try.
Thanks…
- Chris
Your core dump looks a bit broken. Since it seems to die instantly, can you try gdb /path/to/auth and just run it?
Aki
On 21.02.2018 02:08, Chris Ross wrote:
Apologies first for using two addresses, but I can’t currently read my email at distal.com. :-)
I was previously running dovecot2-2.2.29.1_2 on FreeBSD 11 on sparc64. Trying to debug a problem I was having with one of my clients, I upgraded to dovecot-2.2.33.2_4 on that same server. However, I cannot connect now, log shows:
Feb 20 16:55:00 westeros dovecot: master: Dovecot v2.2.33.2 (d6601f4ec) starting up for imap, pop3, lmtp Feb 20 16:55:31 westeros dovecot: auth: Fatal: master: service(auth): child 25395 killed with signal 11 (core dumped) Feb 20 16:55:31 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 2 secs Feb 20 16:55:31 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 0 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, TLS handshaking, session=<ASDFSAFSADFSAD> Feb 20 16:55:33 westeros dovecot: auth: Fatal: master: service(auth): child 25398 killed with signal 11 (core dumped) Feb 20 16:55:33 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 4 secs Feb 20 16:55:33 westeros dovecot: imap-login: Disconnected: Auth process broken (disconnected before auth was ready, waited 2 secs): user=<>, rip=2001::xxx, lip=2001:470:e24c:200::ae25, session=<d46tyesdy5dsyd> Feb 20 16:55:37 westeros dovecot: master: Error: service(auth): command startup failed, throttling for 8 secs Feb 20 16:55:37 westeros dovecot: auth: Fatal: master: service(auth): child 25400 killed with signal 11 (core dumped)
Loading the core file, as described https://www.dovecot.org/bugreport.html , shows the error in libc somewhere:
(gdb) bt full #0 __unaligned_load ( p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 val = 0 i = 0 #1 0x00000000109f9f6c in __unaligned_fixup (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 addr = <value optimized out> val = <value optimized out> insn = 3254807616 sig = <value optimized out> #2 0x00000000109f9d50 in __sparc_utrap (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap.c:100 sig = 272013984 #3 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. #4 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. Previous frame identical to this frame (corrupt stack?) (gdb)
As this is a sparc64, with 8-byte alignment requirements, I’m guessing that’s the issue. Many a piece of software has failed to respect that and crashed. But, I’m not sure.
Does anyone have any suggestions? I’ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try.
Thanks…
- Chris
Sadly, that doesn’t help either. Over the past day, I’ve built and installed a different branch of the OS (stable/11, instead of release/11.1), to see if a new compiler/libc might change things. Sadly, it does not.
In the same situation now, auth fails immediately with signal 11. Running gdb on auth (from build dir, compiled -g -O2) shows something similar.
- Chris
# gdb work/dovecot-2.2.33.2/src/auth/.libs/auth GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc64-marcel-freebsd"... (gdb) list 372 /* ask auth master to disconnect us */ 373 auth_worker_client_send_shutdown(); 374 } 375 } 376 377 int main(int argc, char *argv[]) 378 { 379 int c; 380 381 master_service = master_service_init("auth", 0, &argc, &argv, "w"); (gdb) run Starting program: /usr/ports/mail/dovecot/work/dovecot-2.2.33.2/src/auth/.libs/auth
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb) bt #0 0x000000004022a380 in ?? () #1 0x0000000000000008 in ?? () Previous frame identical to this frame (corrupt stack?) (gdb)
On Feb 21, 2018, at 02:01, Aki Tuomi <aki.tuomi@dovecot.fi> wrote:
Your core dump looks a bit broken. Since it seems to die instantly, can you try gdb /path/to/auth and just run it?
Aki
Hi!
Unfortunately we do not have a Sparc64 with any OS at hand. Maybe you could
break main r s
until it breaks?
Aki
On 22 February 2018 at 05:14 Chris Ross <cross+dovecot@distal.com> wrote:
Sadly, that doesn’t help either. Over the past day, I’ve built and installed a different branch of the OS (stable/11, instead of release/11.1), to see if a new compiler/libc might change things. Sadly, it does not.
In the same situation now, auth fails immediately with signal 11. Running gdb on auth (from build dir, compiled -g -O2) shows something similar.
- Chris
# gdb work/dovecot-2.2.33.2/src/auth/.libs/auth GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc64-marcel-freebsd"... (gdb) list 372 /* ask auth master to disconnect us */ 373 auth_worker_client_send_shutdown(); 374 } 375 } 376 377 int main(int argc, char *argv[]) 378 { 379 int c; 380 381 master_service = master_service_init("auth", 0, &argc, &argv, "w"); (gdb) run Starting program: /usr/ports/mail/dovecot/work/dovecot-2.2.33.2/src/auth/.libs/auth
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb) bt #0 0x000000004022a380 in ?? () #1 0x0000000000000008 in ?? () Previous frame identical to this frame (corrupt stack?) (gdb)
On Feb 21, 2018, at 02:01, Aki Tuomi <aki.tuomi@dovecot.fi> wrote:
Your core dump looks a bit broken. Since it seems to die instantly, can you try gdb /path/to/auth and just run it?
Aki
Fancy, while not fun. :-) But thanks, that does work. Doing that, n’ing over calls to strcmp, it failed:
passdbs_init () at passdb.c:313 313 passdb_register_module(&passdb_ldap); (gdb) passdb_register_module (iface=0x280120) at passdb.c:33 33 old_iface = passdb_interface_find(iface->name); (gdb) passdb_interface_find (name=0x16fe60 "ldap") at passdb.c:20 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 26 return NULL; (gdb) 27 } (gdb) passdb_register_module (iface=0x280120) at passdb.c:34 34 if (old_iface != NULL && old_iface->verify_plain == NULL) { (gdb) 37 } else if (old_iface != NULL) { (gdb) 41 array_append(&passdb_interfaces, &iface, 1); (gdb) 42 } (gdb) passdbs_init () at passdb.c:314 314 passdb_register_module(&passdb_sql); (gdb) 315 passdb_register_module(&passdb_sia); (gdb) 316 passdb_register_module(&passdb_static); (gdb) 317 passdb_register_module(&passdb_oauth2); (gdb) 318 } (gdb) main_preinit () at main.c:186 186 userdbs_init(); (gdb) 188 password_schemes_init(); (gdb) 190 services = read_global_settings(); (gdb)
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb) Cannot find bounds of current function (gdb)
Next step I’ll stop before that and be more careful about n’ing things, but. Just passing on context while I have it.
Thanks. More later.
- Chris
On Feb 22, 2018, at 02:25, Aki Tuomi <aki.tuomi@dovecot.fi> wrote:
Hi!
Unfortunately we do not have a Sparc64 with any OS at hand. Maybe you could
break main r s
until it breaks?
Aki
Okay. Got to the next bit pretty quickly.:
Breakpoint 4, auth_settings_read (service=0x0, pool=0x4104b020, output_r=0x7fdfffff6d0) at auth-settings.c:522 522 input.module = "auth"; (gdb) n 523 input.service = service; (gdb) n 524 if (master_service_settings_read(master_service, &input, (gdb) s
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb)
So, why did it not step into master_service_settings_read ? Trying again:
523 input.service = service; (gdb) s 524 if (master_service_settings_read(master_service, &input, (gdb) list 519 520 i_zero(&input); 521 input.roots = set_roots; 522 input.module = "auth"; 523 input.service = service; 524 if (master_service_settings_read(master_service, &input, 525 output_r, &error) < 0) 526 i_fatal("Error reading configuration: %s", error); 527 528 pool_ref(pool); (gdb) p input $1 = {roots = 0x27fbd8, config_path = 0x0, preserve_environment = false, preserve_user = false, preserve_home = false, never_exec = false, use_sysexits = false, parse_full_config = false, module = 0x16ad70 "auth", service = 0x0, username = 0x0, local_ip = {family = 0, u = {ip6 = { __u6_addr = {__u6_addr8 = '\0' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, ip4 = { s_addr = 0}}}, remote_ip = {family = 0, u = {ip6 = {__u6_addr = { __u6_addr8 = '\0' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, ip4 = {s_addr = 0}}}, local_name = 0x0} (gdb) p &input $2 = (struct master_service_settings_input *) 0x7fdfffff5a8 (gdb) p output_r $3 = (struct master_service_settings_output *) 0x7fdfffff6d0 (gdb) p &error $4 = (const char **) 0x7fdfffff598 (gdb) p error $6 = 0x10dbd0 "@\005?\204\001" (gdb) p master_service $5 = (struct master_service *) 0x41030000 (gdb) s
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb)
Any ideas here? I’m not sure where to look next…
- Chris
On Feb 22, 2018, at 10:10, Chris Ross <cross+dovecot@distal.com> wrote:
Fancy, while not fun. :-) But thanks, that does work. Doing that, n’ing over calls to strcmp, it failed:
passdbs_init () at passdb.c:313 313 passdb_register_module(&passdb_ldap); (gdb) passdb_register_module (iface=0x280120) at passdb.c:33 33 old_iface = passdb_interface_find(iface->name); (gdb) passdb_interface_find (name=0x16fe60 "ldap") at passdb.c:20 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 21 struct passdb_module_interface *iface = *ifaces; (gdb) 23 if (strcmp(iface->name, name) == 0) (gdb) n 20 array_foreach(&passdb_interfaces, ifaces) { (gdb) 26 return NULL; (gdb) 27 } (gdb) passdb_register_module (iface=0x280120) at passdb.c:34 34 if (old_iface != NULL && old_iface->verify_plain == NULL) { (gdb) 37 } else if (old_iface != NULL) { (gdb) 41 array_append(&passdb_interfaces, &iface, 1); (gdb) 42 } (gdb) passdbs_init () at passdb.c:314 314 passdb_register_module(&passdb_sql); (gdb) 315 passdb_register_module(&passdb_sia); (gdb) 316 passdb_register_module(&passdb_static); (gdb) 317 passdb_register_module(&passdb_oauth2); (gdb) 318 } (gdb) main_preinit () at main.c:186 186 userdbs_init(); (gdb) 188 password_schemes_init(); (gdb) 190 services = read_global_settings(); (gdb)
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb) Cannot find bounds of current function (gdb)
Next step I’ll stop before that and be more careful about n’ing things, but. Just passing on context while I have it.
Thanks. More later.
- Chris
On Feb 22, 2018, at 02:25, Aki Tuomi <aki.tuomi@dovecot.fi> wrote:
Hi!
Unfortunately we do not have a Sparc64 with any OS at hand. Maybe you could
break main r s
until it breaks?
Aki
(long gdb output, you’ve been warned)
Okay. So, the libdovecot shared library in /usr/local was stripped. Replaced that, and got farther. gdb walk below.
It looks to me like it gets deep into the OS’s vfork/execv where it catches a trap/crashes. Is this a problem I can catch, or something wrong with running in gdb? I notice this is a SIGTRAP, where the binary when run out of gdb gets a SIGSEGV, and that’s what a loaded core shows.
Thanks for any assistance.
- Chris
Breakpoint 3, master_service_exec_config (service=0x41030000, input=0x7fdfffff5a8) at master-service-settings.c:125 125 const char **conf_argv, *binary_path = service->argv[0]; (gdb) n 128 (void)t_binary_abspath(&binary_path); (gdb) n 130 if (!service->keep_environment && !input->preserve_environment) { (gdb) 131 if (input->preserve_home) (gdb) 133 if (input->preserve_user) (gdb) 135 if ((service->flags & MASTER_SERVICE_FLAG_STANDALONE) != 0) (gdb) 136 master_service_import_environment("LOG_STDERR_TIMESTAMP"); (gdb) 140 if (getenv(DOVECOT_PRESERVE_ENVS_ENV) == NULL) (gdb) 146 if (input->use_sysexits) (gdb) 150 i = 0; (gdb) 151 argv_max_count = 11 + (service->argc + 1) + 1; (gdb) 152 conf_argv = t_new(const char *, argv_max_count); (gdb) 153 conf_argv[i++] = DOVECOT_CONFIG_BIN_PATH; (gdb) 154 if (input->service != NULL) { (gdb) 158 conf_argv[i++] = "-c"; (gdb) 159 conf_argv[i++] = service->config_path; (gdb) 160 if (input->module != NULL) { (gdb) 161 conf_argv[i++] = "-m"; (gdb) 162 conf_argv[i++] = input->module; (gdb) 163 if (service->want_ssl_settings) { (gdb) 168 if (input->parse_full_config) (gdb) 171 conf_argv[i++] = "-e"; (gdb) 172 conf_argv[i++] = binary_path; (gdb) 173 memcpy(conf_argv+i, service->argv + 1, (gdb) 175 i += service->argc; (gdb) 177 i_assert(i < argv_max_count); (gdb) 178 execv_const(conf_argv[0], conf_argv); (gdb) p conf_argv $3 = (const char **) 0x41016e48 (gdb) p conf_argv[0] $4 = 0x4064f6d8 "/usr/local/bin/doveconf" (gdb) p *conf_argv $5 = 0x4064f6d8 "/usr/local/bin/doveconf" (gdb) s execv_const (path=0x4064f6d8 "/usr/local/bin/doveconf", argv=0x41016e48) at execv-const.c:23 23 (void)execv(path, argv_drop_const(argv)); (gdb) p parth No symbol "parth" in current context. (gdb) p path $6 = 0x4064f6d8 "/usr/local/bin/doveconf" (gdb) s argv_drop_const (argv=0x41016e48) at execv-const.c:13 13 for (count = 0; argv[count] != NULL; count++) ; (gdb) p argv $7 = (const char * const *) 0x41016e48 (gdb) p argv[0] $8 = 0x4064f6d8 "/usr/local/bin/doveconf" (gdb) p argv[1] $9 = 0x4064f708 "-c" (gdb) p argv[2] $10 = 0x41040000 "/usr/local/etc/dovecot/dovecot.conf" (gdb) p argv[3] $11 = 0x4064f710 "-m" (gdb) p argv[4] $12 = 0x16ad70 "auth" (gdb) p argv[5] $13 = 0x4064f728 "-e" (gdb) p argv[6] $14 = 0x7fdfffffd18 "/usr/ports/mail/dovecot/work/stage/usr/local/libexec/dovecot/auth" (gdb) p argv[7] $15 = 0x0 (gdb) n 15 ret = t_new(char *, count + 1); (gdb) 16 for (i = 0; i < count; i++) (gdb) 17 ret[i] = t_strdup_noconst(argv[i]); (gdb) 16 for (i = 0; i < count; i++) (gdb) 17 ret[i] = t_strdup_noconst(argv[i]); (gdb) 16 for (i = 0; i < count; i++) (gdb) 17 ret[i] = t_strdup_noconst(argv[i]); (gdb) 16 for (i = 0; i < count; i++) (gdb) 17 ret[i] = t_strdup_noconst(argv[i]); (gdb) 16 for (i = 0; i < count; i++) (gdb) 17 ret[i] = t_strdup_noconst(argv[i]); (gdb) 16 for (i = 0; i < count; i++) (gdb) 17 ret[i] = t_strdup_noconst(argv[i]); (gdb) 16 for (i = 0; i < count; i++) (gdb) 17 ret[i] = t_strdup_noconst(argv[i]); (gdb) 16 for (i = 0; i < count; i++) (gdb) 18 return ret; (gdb) 19 } (gdb)
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb) b argv_drop_const Breakpoint 4 at 0x405d50b8: file execv-const.c, line 13. (gdb) run The program being debugged has been started already. Start it from the beginning? (y or n) y
Starting program: /usr/ports/mail/dovecot/work/stage/usr/local/libexec/dovecot/auth Error in re-setting breakpoint 3: No source file named master-service-settings.c. Error in re-setting breakpoint 4: No source file named execv-const.c.
Breakpoint 3, master_service_exec_config (service=0x41030000, input=0x7fdfffff5a8) at master-service-settings.c:125 125 const char **conf_argv, *binary_path = service->argv[0]; (gdb) n 128 (void)t_binary_abspath(&binary_path); (gdb) c Continuing.
Breakpoint 4, argv_drop_const (argv=0x41016e48) at execv-const.c:13
13 for (count = 0; argv[count] != NULL; count++) ;
(gdb) p argv
$16 = (const char * const *) 0x41016e48
(gdb) p argv[1]
$17 = 0x4064f708 "-c"
(gdb) p argv[6]
$18 = 0x7fdfffffd18 "/usr/ports/mail/dovecot/work/stage/usr/local/libexec/dovecot/auth"
(gdb) p argv[7]
$19 = 0x0
(gdb) n
15 ret = t_new(char *, count + 1);
(gdb) n
16 for (i = 0; i < count; i++)
(gdb) p ret
$20 = (char **) 0x41016eb8
(gdb) p ret[0]
$21 = 0x0
(gdb) p ret[1]
$22 = 0x0
(gdb) n
17 ret[i] = t_strdup_noconst(argv[i]);
(gdb) p i,ret[i]
$23 = 0x0
(gdb) p i
$24 = 0
(gdb) p ret[i]
$25 = 0x0
(gdb) p argv[i]
$26 = 0x4064f6d8 "/usr/local/bin/doveconf"
(gdb) n
16 for (i = 0; i < count; i++)
(gdb) p i
$27 = 0
(gdb) p ret[i]
$28 = 0x41016ef8 "/usr/local/bin/doveconf"
(gdb) p ret[i+1]
$29 = 0x0
(gdb) n
17 ret[i] = t_strdup_noconst(argv[i]);
(gdb) n
16 for (i = 0; i < count; i++)
(gdb)
17 ret[i] = t_strdup_noconst(argv[i]);
(gdb)
16 for (i = 0; i < count; i++)
(gdb)
17 ret[i] = t_strdup_noconst(argv[i]);
(gdb)
16 for (i = 0; i < count; i++)
(gdb)
17 ret[i] = t_strdup_noconst(argv[i]);
(gdb)
16 for (i = 0; i < count; i++)
(gdb)
17 ret[i] = t_strdup_noconst(argv[i]);
(gdb)
16 for (i = 0; i < count; i++)
(gdb)
17 ret[i] = t_strdup_noconst(argv[i]);
(gdb)
16 for (i = 0; i < count; i++)
(gdb)
18 return ret;
(gdb) p ret
$30 = (char **) 0x41016eb8
(gdb) p ret[]
A syntax error in expression, near ]'. (gdb) p *ret $31 = 0x41016ef8 "/usr/local/bin/doveconf" (gdb) p *ret[] A syntax error in expression, near
]'.
(gdb) p ret[0]
$32 = 0x41016ef8 "/usr/local/bin/doveconf"
(gdb) p ret[1]
$33 = 0x41016f10 "-c"
(gdb) p ret[2]
$34 = 0x41016f18 "/usr/local/etc/dovecot/dovecot.conf"
(gdb) p ret[3]
$35 = 0x41016f40 "-m"
(gdb) p ret[4]
$36 = 0x41016f48 "auth"
(gdb) p ret[5]
$37 = 0x41016f50 "-e"
(gdb) p ret[6]
$38 = 0x41016f58 "/usr/ports/mail/dovecot/work/stage/usr/local/libexec/dovecot/auth"
(gdb) p ret[7]
$39 = 0x0
(gdb) s
19 }
(gdb)
execv (name=0x4064f6d8 "/usr/local/bin/doveconf", argv=0x41016eb8)
at /usr/src/lib/libc/gen/exec.c:135
135 (void)_execve(name, argv, environ);
(gdb) p name
$40 = 0x4064f6d8 "/usr/local/bin/doveconf"
(gdb) p argv
$41 = (char * const *) 0x41016eb8
(gdb) p *argv
$42 = 0x41016ef8 "/usr/local/bin/doveconf"
(gdb) p environ
$43 = (char **) 0x4106e000
(gdb) p *environ
$44 = 0x4105f048 "DOVECOT_PRESERVE_ENVS=LOG_STDERR_TIMESTAMP"
(gdb) p environ[0]
$45 = 0x4105f048 "DOVECOT_PRESERVE_ENVS=LOG_STDERR_TIMESTAMP"
(gdb) p environ[1]
$46 = 0x4102f000 "COLUMNS=80"
(gdb) p environ[2]
$47 = 0x4102f020 "LINES=34"
(gdb) p environ[3]
$48 = 0x41028040 "LANG=en_US.UTF-8"
(gdb) p environ[4]
$49 = 0x4103b0a0 "PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin:/root/bin"
(gdb) p environ[5]
$50 = 0x41028060 "TERM=xterm-256color"
(gdb) p environ[6]
$51 = 0x4102f030 "LOGNAME=cross"
(gdb) p environ[7]
$52 = 0x4102f040 "USER=cross"
(gdb) p environ[8]
$53 = 0x4102f050 "USERNAME=cross"
(gdb) p environ[9]
$54 = 0x41028080 "MAIL=/var/mail/root"
(gdb) p environ[10]
$55 = 0x4102f060 "HOME=/root"
(gdb) p environ[11]
$56 = 0x4102f070 "SHELL=/bin/csh"
(gdb) p environ[12]
$57 = 0x4103b0f0 "SUDO_COMMAND=/usr/bin/gdb work/stage/usr/local/libexec/dovecot//auth"
(gdb) p environ[13]
$58 = 0x4102f080 "SUDO_USER=cross"
(gdb) p environ[14]
$59 = 0x4102f090 "SUDO_UID=1001"
(gdb) p environ[15]
$60 = 0x4102f0a0 "SUDO_GID=20"
(gdb) p environ[16]
$61 = 0x410280a0 "HOSTTYPE=FreeBSD"
(gdb) p environ[17]
$62 = 0x4102f0b0 "VENDOR=sun"
(gdb) p environ[18]
$63 = 0x4102f0c0 "OSTYPE=FreeBSD"
(gdb) p environ[19]
$64 = 0x4102f0d0 "MACHTYPE=sparc"
(gdb) p environ[20]
$65 = 0x410280c0 "PWD=/usr/ports/mail/dovecot"
(gdb) p environ[21]
$66 = 0x4102f0e0 "GROUP=wheel"
(gdb) p environ[22]
$67 = 0x410280e0 "HOST=westeros.distal.com"
(gdb) p environ[23]
$68 = 0x4102f0f0 "REMOTEHOST=2001"
(gdb) p environ[24]
$69 = 0x4102f100 "EDITOR=vi"
(gdb) p environ[25]
$70 = 0x4102f110 "PAGER=more"
(gdb) p environ[26]
$71 = 0x4102f120 "BLOCKSIZE=K"
(gdb) p environ[27]
$72 = 0x0
(gdb) s
134 {
(gdb) s
0x0000000040ab2800 in vfork () at /usr/src/lib/libc/gen/exec.c:133
133 execv(const char *name, char * const *argv)
(gdb)
0x0000000040ab2804 133 execv(const char *name, char * const *argv)
(gdb) print argv
No symbol "argv" in current context.
(gdb) s
execv (name=0x4064f6d8 "/usr/local/bin/doveconf", argv=0x41016eb8)
at /usr/src/lib/libc/gen/exec.c:135
135 (void)_execve(name, argv, environ);
(gdb) s
137 }
(gdb) s
135 (void)_execve(name, argv, environ);
(gdb) s
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb)
On Feb 22, 2018, at 10:21, Chris Ross <cross+dovecot@distal.com> wrote:
Okay. Got to the next bit pretty quickly.: 524 if (master_service_settings_read(master_service, &input, (gdb) list 519 520
(gdb) s
Program received signal SIGTRAP, Trace/breakpoint trap. Cannot remove breakpoints because program is no longer writable. It might be running in another process. Further execution is probably impossible. 0x000000004022a380 in ?? () (gdb)
Any ideas here? I’m not sure where to look next…
As this is a sparc64, with 8-byte alignment requirements, I’m guessing that’s the issue. Many a piece of software has failed to respect that and crashed. But, I’m not sure.
Does anyone have any suggestions? I’ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try.
Thanks…
On what specific hardware you are running FreeBSD/sparc64? I have some old Sun desktops lying around with UltraSPARC-III and UltraSPARC-IIIi processors. Maybe I need to power them up again so that we can run some tests on big-endian machine ourself.
Sami
On Tue, Feb 20, 2018 at 19:08:27 -0500, Chris Ross wrote:
Apologies first for using two addresses, but I can’t currently read my email at distal.com. :-)
I was previously running dovecot2-2.2.29.1_2 on FreeBSD 11 on sparc64. Trying to debug a problem I was having with one of my clients, I upgraded to dovecot-2.2.33.2_4 on that same server. However, I cannot connect now, log shows:
...
Loading the core file, as described https://www.dovecot.org/bugreport.html , shows the error in libc somewhere:
I read the your other mails in this thread; can you run things as before and do a 'bt full' on the core file with the debug-symbol-enabled libdovecot? gdb seems to be catching the SIGTRAPs, which is making things a bit confusing.
(gdb) bt full #0 __unaligned_load ( p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4)
This address looks like ASCII - "append\x0em", so my theory at the moment is:
(1) something clobbers a pointer (2) the CPU attempts to execute a load from the address (3) a utrap is generated to handle unaligned load (4) the utrap code attempts to emulate the unaligned load (5) the CPU fails to access the address since it is bogus, and a SIGSEGV is generated
Now, I'm have no idea why it'd first try to work around the alignment requirement before doing a quick sanity check and generating SIGSEGV to begin with, but that's my theory based on the info available so far. Hopefully, a stack trace from a core file will help.
Thanks,
Jeff.
at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:45
val = 0 i = 0 #1 0x00000000109f9f6c in __unaligned_fixup (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 addr = <value optimized out> val = <value optimized out> insn = 3254807616 sig = <value optimized out> #2 0x00000000109f9d50 in __sparc_utrap (uf=0x7fdffffee40) at /usr/src/release-11.1.0/lib/libc/sparc64/sys/__sparc_utrap.c:100 sig = 272013984 #3 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. #4 0x000000001094a10c in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. Previous frame identical to this frame (corrupt stack?) (gdb)
As this is a sparc64, with 8-byte alignment requirements, I’m guessing that’s the issue. Many a piece of software has failed to respect that and crashed. But, I’m not sure.
Does anyone have any suggestions? I’ve built it locally (via ports), so if there are compiler options I can/should try, I certainly can try.
Thanks…
- Chris
-- If I have trouble installing Linux, something is wrong. Very wrong. - Linus Torvalds
On Feb 22, 2018, at 15:21, Josef 'Jeff' Sipek <jeff.sipek@dovecot.fi> wrote:
Loading the core file, as described https://www.dovecot.org/bugreport.html , shows the error in libc somewhere:
I read the your other mails in this thread; can you run things as before and do a 'bt full' on the core file with the debug-symbol-enabled libdovecot? gdb seems to be catching the SIGTRAPs, which is making things a bit confusing.
(gdb) bt full #0 __unaligned_load ( p=0x617070656e640e6d <Address 0x617070656e640e6d out of bounds>, size=4)
No difference there. I changed the install process to not strip things, and manually copied in all of the libs in /usr/local/lib/dovecot again with unstripped (I think libtool stripped them, I just rejiggered makefiles and install-sh).
Loading a core from a SEGV shows:
Loaded symbols for /libexec/ld-elf.so.1 #0 __unaligned_load ( p=0x706172736572690a <Address 0x706172736572690a out of bounds>, size=4) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 45 val = (val << 8) | p[i]; (gdb) bt full #0 __unaligned_load ( p=0x706172736572690a <Address 0x706172736572690a out of bounds>, size=4) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap_align.c:45 val = 0 i = 0 #1 0x0000000040adb7cc in __unaligned_fixup (uf=0x7fdfffff110) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap_align.c:78 addr = <value optimized out> val = <value optimized out> insn = 3254806592 sig = <value optimized out> #2 0x0000000040adb5b0 in __sparc_utrap (uf=0x7fdfffff110) at /usr/src/lib/libc/sparc64/sys/__sparc_utrap.c:100 sig = 16 #3 0x0000000040a2c1cc in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. #4 0x0000000040a2c1cc in __sparc_utrap_gen () from /lib/libc.so.7 No symbol table info available. Previous frame identical to this frame (corrupt stack?) (gdb)
(Which as you note below, that address is actually “parseri\n”)
This address looks like ASCII - "append\x0em", so my theory at the moment is:
(1) something clobbers a pointer (2) the CPU attempts to execute a load from the address (3) a utrap is generated to handle unaligned load (4) the utrap code attempts to emulate the unaligned load (5) the CPU fails to access the address since it is bogus, and a SIGSEGV is generated
Now, I'm have no idea why it'd first try to work around the alignment requirement before doing a quick sanity check and generating SIGSEGV to begin with, but that's my theory based on the info available so far. Hopefully, a stack trace from a core file will help.
Unfortunately it seems not to have. But, good catch on the pointer value there being ASCII data. Let me know if you have any other ideas.
- Chris
participants (4)
-
Aki Tuomi
-
Chris Ross
-
Josef 'Jeff' Sipek
-
Sami Ketola