Re: Auth process sometimes stop responding after upgrade
In data martedì 18 settembre 2018 14:07:26 CEST, Aki Tuomi ha scritto:
If you are using systemd, create /etc/systemd/system/dovecot.service.d/limits.conf and put [Service]LimitCORE=infinity
and run systemctl daemon-reloadsystemctl restart dovecot
Nope, I'm on a debian 7, without systemd. Anyway, I've resolved the issue: I had to set fs.suid_dumpable BEFORE starting dovecot.
I've got a core dump, and here is the backtrace. Let me know if you want the core file.
gdb) bt
#0 0xb76e4428 in __kernel_vsyscall ()
#1 0xb74636c1 in raise () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
#2 0xb7466af2 in abort () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
#3 0xb76485ae in default_fatal_finish (type=<optimized out>, status=status@entry=0) at
failures.c:201
#4 0xb7648641 in i_internal_fatal_handler (ctx=0xbf839cc0, format=0x805c274 "file %s:
line %d (%s): assertion failed: (%s)", args=0xbf839ce4 "4\370\005\bI\004") at failures.c:670
#5 0xb75ce35e in i_panic (format=format@entry=0x805c274 "file %s: line %d (%s):
assertion failed: (%s)") at failures.c:275
#6 0x080574f7 in doveadm_connection_deinit (_conn=_conn@entry=0xbf839d60) at
doveadm-connection.c:1097
#7 0x08057f03 in doveadm_connection_input (conn=0x0) at doveadm-connection.c:1051
#8 0xb76613db in io_loop_call_io (io=0x82fb780) at ioloop.c:600
#9 0xb7662e1e in io_loop_handler_run_internal (ioloop=ioloop@entry=0x82bd648) at
ioloop-epoll.c:223
#10 0xb7661496 in io_loop_handler_run (ioloop=ioloop@entry=0x82bd648) at ioloop.c:649
#11 0xb7661658 in io_loop_run (ioloop=0x82bd648) at ioloop.c:624
#12 0xb75da45e in master_service_run (service=0x82bd578,
callback=callback@entry=0x804d360
*Simone Lazzaris* *Qcom S.p.A.* simone.lazzaris@qcom.it[1] | www.qcom.it[2]
- LinkedIn[3]* | *Facebook*[4] [5]
[1] mailto:simone.lazzaris@qcom.it [2] https://www.qcom.it [3] https://www.linkedin.com/company/qcom-spa [4] http://www.facebook.com/qcomspa [5] https://www.qcom.it/includes/email-banner.gif
Can you provide 'bt full'
Aki
On 18.09.2018 15:15, Simone Lazzaris wrote:
In data martedì 18 settembre 2018 14:07:26 CEST, Aki Tuomi ha scritto:
If you are using systemd, create
/etc/systemd/system/dovecot.service.d/limits.conf and put
[Service]LimitCORE=infinity
and run
systemctl daemon-reloadsystemctl restart dovecot
Nope, I'm on a debian 7, without systemd. Anyway, I've resolved the issue: I had to set fs.suid_dumpable BEFORE starting dovecot.
I've got a core dump, and here is the backtrace. Let me know if you want the core file.
gdb) bt
#0 0xb76e4428 in __kernel_vsyscall ()
#1 0xb74636c1 in raise () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
#2 0xb7466af2 in abort () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
#3 0xb76485ae in default_fatal_finish (type=<optimized out>, status=status@entry=0) at failures.c:201
#4 0xb7648641 in i_internal_fatal_handler (ctx=0xbf839cc0, format=0x805c274 "file %s: line %d (%s): assertion failed: (%s)", args=0xbf839ce4 "4\370\005\bI\004") at failures.c:670
#5 0xb75ce35e in i_panic (format=format@entry=0x805c274 "file %s: line %d (%s): assertion failed: (%s)") at failures.c:275
#6 0x080574f7 in doveadm_connection_deinit (_conn=_conn@entry=0xbf839d60) at doveadm-connection.c:1097
#7 0x08057f03 in doveadm_connection_input (conn=0x0) at doveadm-connection.c:1051
#8 0xb76613db in io_loop_call_io (io=0x82fb780) at ioloop.c:600
#9 0xb7662e1e in io_loop_handler_run_internal (ioloop=ioloop@entry=0x82bd648) at ioloop-epoll.c:223
#10 0xb7661496 in io_loop_handler_run (ioloop=ioloop@entry=0x82bd648) at ioloop.c:649
#11 0xb7661658 in io_loop_run (ioloop=0x82bd648) at ioloop.c:624
#12 0xb75da45e in master_service_run (service=0x82bd578, callback=callback@entry=0x804d360
) at master-service.c:719 #13 0x0804cf5e in main (argc=1, argv=0x82bd300) at main.c:366
--
Simone LazzarisResponsabile datacenter Qcom S.p.A.Via Roggia Vignola, 9 | 24047 Treviglio (BG) T +39036347905 | D +3903631970352| M +393938111237 simone.lazzaris@qcom.it mailto:simone.lazzaris@qcom.it| www.qcom.it https://www.qcom.itQcom Official PagesLinkedIn https://www.linkedin.com/company/qcom-spa| Facebook http://www.facebook.com/qcomspa
In data martedì 18 settembre 2018 14:25:25 CEST, Aki Tuomi ha scritto:
Can you provide 'bt full'
Sure:
(gdb) bt full
#0 0xb76e4428 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb74636c1 in raise () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7466af2 in abort () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb76485ae in default_fatal_finish (type=<optimized out>, status=status@entry=0) at
failures.c:201
backtrace = 0x82b5168 "/usr/local/lib/dovecot/libdovecot.so.0(+0xa15be)
[0xb76485be] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xa1641) [0xb7648641] -> /usr/
local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0xb75ce35e] -> dove"...
#4 0xb7648641 in i_internal_fatal_handler (ctx=0xbf839cc0, format=0x805c274 "file %s:
line %d (%s): assertion failed: (%s)", args=0xbf839ce4 "4\370\005\bI\004") at failures.c:670
status = 0
#5 0xb75ce35e in i_panic (format=format@entry=0x805c274 "file %s: line %d (%s):
assertion failed: (%s)") at failures.c:275
ctx = {type = LOG_TYPE_PANIC, exit_status = 0, timestamp = 0x0, timestamp_usecs = 0}
args = 0xbf839ce4 "4\370\005\bI\004"
#6 0x080574f7 in doveadm_connection_deinit (_conn=_conn@entry=0xbf839d60) at
doveadm-connection.c:1097
conn = 0x82fb580
__FUNCTION__ = "doveadm_connection_deinit"
#7 0x08057f03 in doveadm_connection_input (conn=0x0) at doveadm-connection.c:1051
line = <optimized out>
ret = <optimized out>
#8 0xb76613db in io_loop_call_io (io=0x82fb780) at ioloop.c:600
ioloop = 0x82bd648
t_id = 2
__FUNCTION__ = "io_loop_call_io"
#9 0xb7662e1e in io_loop_handler_run_internal (ioloop=ioloop@entry=0x82bd648) at
ioloop-epoll.c:223
ctx = 0x82c9a40
events = 0x0
event = 0x82c9a80
list = 0x82e1830
io = <optimized out>
tv = {tv_sec = 0, tv_usec = 236182}
events_count = 0
msecs = <optimized out>
ret = 1
i = <optimized out>
j = <optimized out>
call = <optimized out>
__FUNCTION__ = "io_loop_handler_run_internal"
#10 0xb7661496 in io_loop_handler_run (ioloop=ioloop@entry=0x82bd648) at ioloop.c:649
No locals.
#11 0xb7661658 in io_loop_run (ioloop=0x82bd648) at ioloop.c:624
__FUNCTION__ = "io_loop_run"
#12 0xb75da45e in master_service_run (service=0x82bd578,
callback=callback@entry=0x804d360
I have realized that on these machine used to run dovecot 2.1.x and were upgraded with "make install" on 2.2.36. I've found a library that is not upgraded (maybe is missing or not compiled on the new version).
-rw-r--r-- 1 root staff 1963428 Jun 17 2016 /usr/local/lib/dovecot/libdovecot-ssl.a -rwxr-xr-x 1 root staff 1014 Jun 17 2016 /usr/local/lib/dovecot/libdovecot-ssl.la lrwxrwxrwx 1 root staff 23 Jun 17 2016 /usr/local/lib/dovecot/libdovecot-ssl.so -> libdovecot-ssl.so.0.0.0 lrwxrwxrwx 1 root staff 23 Jun 17 2016 /usr/local/lib/dovecot/libdovecot-ssl.so.0 -> libdovecot-ssl.so.0.0.0 -rwxr-xr-x 1 root staff 1284527 Jun 17 2016 /usr/local/lib/dovecot/libdovecot-ssl.so.0.0.0
Can that be the ultimate cause?
*Simone Lazzaris* *Qcom S.p.A.* simone.lazzaris@qcom.it[1] | www.qcom.it[2]
- LinkedIn[3]* | *Facebook[4]* [5]
[1] mailto:simone.lazzaris@qcom.it [2] https://www.qcom.it [3] https://www.linkedin.com/company/qcom-spa [4] http://www.facebook.com/qcomspa [5] https://www.qcom.it/includes/email-banner.gif
On 18 Sep 2018, at 15.15, Simone Lazzaris s.lazzaris@interactive.eu wrote:
I've got a core dump, and here is the backtrace. Let me know if you want the core file.
It would be useful if we're able to use it. Could you use https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh to send the core and the related binaries (e.g. just email to me)? The usage is explained at the beginning of the script. At least in theory we could then debug with the core file, although I've had some trouble even then.
But just in case the core doesn't work, could you also do:
bt full fr 8 p *((struct doveadm_connection *)io->context) p *((struct doveadm_connection *)io->context)->input
In data mercoledì 19 settembre 2018 09:30:47 CEST, Timo Sirainen ha scritto:
On 18 Sep 2018, at 15.15, Simone Lazzaris s.lazzaris@interactive.eu wrote:
I've got a core dump, and here is the backtrace. Let me know if you want the core file. It would be useful if we're able to use it. Could you use https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh to send the core and the related binaries (e.g. just email to me)? The usage is explained at the beginning of the script. At least in theory we could then debug with the core file, although I've had some trouble even then.
But just in case the core doesn't work, could you also do:
bt full fr 8 p *((struct doveadm_connection *)io->context) p *((struct doveadm_connection *)io->context)->input
I'm sending you the tarball created with core-tar; and just in case:
root@imap-front4:/usr/local/src/dovecot-2.2.36# gdb ./src/director/.libs/director /var/tmp/ core.10733 GNU gdb (GDB) 7.4.1-debian Copyright (C) 2012 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "i486-linux-gnu". For bug reporting instructions, please see: http://www.gnu.org/software/gdb/bugs/... Reading symbols from /usr/local/src/dovecot-2.2.36/src/director/.libs/director...done. [New LWP 10733]
warning: Can't read pathname for load map: Input/output error.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/i686/cmov/libthread_db.so.1".
Core was generated by `dovecot/director'.
Program terminated with signal 6, Aborted.
#0 0xb76e4428 in __kernel_vsyscall ()
(gdb) bt full
#0 0xb76e4428 in __kernel_vsyscall ()
No symbol table info available.
#1 0xb74636c1 in raise () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
No symbol table info available.
#2 0xb7466af2 in abort () from /lib/i386-linux-gnu/i686/cmov/libc.so.6
No symbol table info available.
#3 0xb76485ae in default_fatal_finish (type=<optimized out>, status=status@entry=0) at
failures.c:201
backtrace = 0x82b5168 "/usr/local/lib/dovecot/libdovecot.so.0(+0xa15be)
[0xb76485be] -> /usr/local/lib/dovecot/libdovecot.so.0(+0xa1641) [0xb7648641] -> /usr/
local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0xb75ce35e] -> dove"...
#4 0xb7648641 in i_internal_fatal_handler (ctx=0xbf839cc0, format=0x805c274 "file %s:
line %d (%s): assertion failed: (%s)", args=0xbf839ce4 "4\370\005\bI\004") at failures.c:670
status = 0
#5 0xb75ce35e in i_panic (format=format@entry=0x805c274 "file %s: line %d (%s):
assertion failed: (%s)") at failures.c:275
ctx = {type = LOG_TYPE_PANIC, exit_status = 0, timestamp = 0x0, timestamp_usecs = 0}
args = 0xbf839ce4 "4\370\005\bI\004"
#6 0x080574f7 in doveadm_connection_deinit (_conn=_conn@entry=0xbf839d60) at
doveadm-connection.c:1097
conn = 0x82fb580
__FUNCTION__ = "doveadm_connection_deinit"
#7 0x08057f03 in doveadm_connection_input (conn=0x0) at doveadm-connection.c:1051
line = <optimized out>
ret = <optimized out>
#8 0xb76613db in io_loop_call_io (io=0x82fb780) at ioloop.c:600
ioloop = 0x82bd648
t_id = 2
__FUNCTION__ = "io_loop_call_io"
#9 0xb7662e1e in io_loop_handler_run_internal (ioloop=ioloop@entry=0x82bd648) at
ioloop-epoll.c:223
ctx = 0x82c9a40
events = 0x0
event = 0x82c9a80
list = 0x82e1830
io = <optimized out>
tv = {tv_sec = 0, tv_usec = 236182}
events_count = 0
msecs = <optimized out>
ret = 1
i = <optimized out>
j = <optimized out>
call = <optimized out>
__FUNCTION__ = "io_loop_handler_run_internal"
#10 0xb7661496 in io_loop_handler_run (ioloop=ioloop@entry=0x82bd648) at ioloop.c:649
No locals.
#11 0xb7661658 in io_loop_run (ioloop=0x82bd648) at ioloop.c:624
__FUNCTION__ = "io_loop_run"
#12 0xb75da45e in master_service_run (service=0x82bd578,
callback=callback@entry=0x804d360
On 19 Sep 2018, at 11.11, Simone Lazzaris s.lazzaris@interactive.eu wrote:
In data mercoledì 19 settembre 2018 09:30:47 CEST, Timo Sirainen ha scritto:
On 18 Sep 2018, at 15.15, Simone Lazzaris
mailto:s.lazzaris@interactive.eu> wrote: I've got a core dump, and here is the backtrace. Let me know if you want the core file. It would be useful if we're able to use it. Could you use https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh <https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh> to send the core and the related binaries (e.g. just email to me)? The usage is explained at the beginning of the script. At least in theory we could then debug with the core file, although I've had some trouble even then.
But just in case the core doesn't work, could you also do:
bt full fr 8 p *((struct doveadm_connection *)io->context) p *((struct doveadm_connection *)io->context)->input
I'm sending you the tarball created with core-tar; and just in case:
Thanks, the core worked fine. Does the attached patch (on top of the previous one) help?
On 19 Sep 2018, at 11.30, Timo Sirainen tss@iki.fi wrote:
On 19 Sep 2018, at 11.11, Simone Lazzaris
mailto:s.lazzaris@interactive.eu> wrote: In data mercoledì 19 settembre 2018 09:30:47 CEST, Timo Sirainen ha scritto:
On 18 Sep 2018, at 15.15, Simone Lazzaris
mailto:s.lazzaris@interactive.eu> wrote: I've got a core dump, and here is the backtrace. Let me know if you want the core file. It would be useful if we're able to use it. Could you use https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh <https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh> to send the core and the related binaries (e.g. just email to me)? The usage is explained at the beginning of the script. At least in theory we could then debug with the core file, although I've had some trouble even then.
But just in case the core doesn't work, could you also do:
bt full fr 8 p *((struct doveadm_connection *)io->context) p *((struct doveadm_connection *)io->context)->input
I'm sending you the tarball created with core-tar; and just in case:
Thanks, the core worked fine. Does the attached patch (on top of the previous one) help?
Or here's a slightly different patch, although it should be basically the same fix. This includes the previous patch as well.
On 19 Sep 2018, at 11.42, Timo Sirainen tss@iki.fi wrote:
On 19 Sep 2018, at 11.30, Timo Sirainen
mailto:tss@iki.fi> wrote: On 19 Sep 2018, at 11.11, Simone Lazzaris
mailto:s.lazzaris@interactive.eu> wrote: In data mercoledì 19 settembre 2018 09:30:47 CEST, Timo Sirainen ha scritto:
On 18 Sep 2018, at 15.15, Simone Lazzaris
mailto:s.lazzaris@interactive.eu> wrote: I've got a core dump, and here is the backtrace. Let me know if you want the core file. It would be useful if we're able to use it. Could you use https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh <https://dovecot.org/tools/core-tar.sh https://dovecot.org/tools/core-tar.sh> to send the core and the related binaries (e.g. just email to me)? The usage is explained at the beginning of the script. At least in theory we could then debug with the core file, although I've had some trouble even then.
But just in case the core doesn't work, could you also do:
bt full fr 8 p *((struct doveadm_connection *)io->context) p *((struct doveadm_connection *)io->context)->input
I'm sending you the tarball created with core-tar; and just in case:
Thanks, the core worked fine. Does the attached patch (on top of the previous one) help?
Or here's a slightly different patch, although it should be basically the same fix. This includes the previous patch as well.
No, forget about that patch. Looks like I forgot I had already fixed this crash, and I guess I was testing with master mainly, which is why I wasn't able to reproduce the crash now: https://github.com/dovecot/core/commit/c0583917fe760b2d901acf83387cc8edb6f99... https://github.com/dovecot/core/commit/c0583917fe760b2d901acf83387cc8edb6f99...
No, forget about that patch. Looks like I forgot I had already fixed this crash, and I guess I was testing with master mainly, which is why I wasn't able to reproduce the crash now: https://github.com/dovecot/core/commit/c0583917fe760b2d901acf83387cc8edb6f9 9550 <https://github.com/dovecot/core/commit/c0583917fe760b2d901acf83387cc8edb6f 99550>
You nailed it!
I've applied the fix from commit c058 and that solved the issue. I'm no longer able to crash the director by stopping/starting a backend.
*Simone Lazzaris* *Qcom S.p.A.* simone.lazzaris@qcom.it[1] | www.qcom.it[2]
- LinkedIn[3]* | *Facebook[4]* [5]
[1] mailto:simone.lazzaris@qcom.it [2] https://www.qcom.it [3] https://www.linkedin.com/company/qcom-spa [4] http://www.facebook.com/qcomspa [5] https://www.qcom.it/includes/email-banner.gif
participants (3)
-
Aki Tuomi
-
Simone Lazzaris
-
Timo Sirainen