Panic: file dsync-brain-mailbox.c: line 358 (dsync_brain_sync_mailbox_deinit): assertion failed: (brain->failed || brain->sync_type == DSYNC_BRAIN_SYNC_TYPE_CHANGED)
Hello,
Here is a Panic that happened while doing some testing with two servers both running Dovecot v2.2.26 on CentOS 7.
These are test servers owning 32 accounts whose data were copied from our production server.
What I've done is:
server01# doveadm force-resync -A '*' server01# doveadm replicator replicate -f '*'
For 5 accounts I obtained the following crash:
2016-10-28T14:09:43.236946+02:00 server01 dovecot: dsync-server(someuser): Panic: file dsync-brain-mailbox.c: line 358 (dsync_brain_sync_mailbox_deinit): assertion failed: (brain->failed || brain->sync_type == DSYNC_BRAIN_SYNC_TYPE_CHANGED) 2016-10-28T14:09:43.237441+02:00 server01 dovecot: dsync-server(someuser): Error: Raw backtrace: /usr/local/lib/dovecot/libdovecot.so.0(+0x8f7e0) [0x7f3d9318d7e0] -> /usr/local/lib/dovecot/libdovecot.so.0(+0x8f8be) [0x7f3d9318d8be] -> /usr/local/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f3d9312b9be] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x438243] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x438da7] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x4368be] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x436c71] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x44becf] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x4c) [0x7f3d931a0c3c] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xe7) [0x7f3d931a1fd7] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x25) [0x7f3d931a0cc5] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f3d931a0e78] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x41fc7e] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x421256] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x433654] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x4c) [0x7f3d931a0c3c] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xe7) [0x7f3d931a1fd7] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x25) [0x7f3d931a0cc5] -> /usr/local/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f3d931a0e78] -> /usr/local/lib/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f3d93131a23] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x413c87] -> /lib64/libc.so.6(__libc_start_main+0xf5) [0x7f3d92d5db15] -> dovecot/doveadm-server 10.0.0.2 someuser slave_recv_mailbox [0x413d25] 2016-10-28T14:09:43.238013+02:00 server01 dovecot: dsync-server(someuser): Fatal: master: service(doveadm): child 96390 killed with signal 6 (core dumps disabled) 2016-10-28T14:09:43.505098+02:00 server01 dovecot: dsync-server(someuser): Error: read(server02.localdomain) failed: read(size=5807) failed: Connection reset by peer (last sent=mailbox_state, last recv=mailbox_state)
Regards, Gilles.
On 28 Oct 2016, at 15:36, Gilles Chauvin <gilles.chauvin@univ-rouen.fr> wrote:
Hello,
Here is a Panic that happened while doing some testing with two servers both running Dovecot v2.2.26 on CentOS 7.
These are test servers owning 32 accounts whose data were copied from our production server.
What I've done is:
server01# doveadm force-resync -A '*' server01# doveadm replicator replicate -f '*'
For 5 accounts I obtained the following crash:
2016-10-28T14:09:43.236946+02:00 server01 dovecot: dsync-server(someuser): Panic: file dsync-brain-mailbox.c: line 358 (dsync_brain_sync_mailbox_deinit): assertion failed: (brain->failed || brain->sync_type == DSYNC_BRAIN_SYNC_TYPE_CHANGED)
This code hasn't changed for quite a long time. So I don't think this is a new bug in 2.2.26. Can you try reproduce it easily? If yes, could you try if the attached patch fixes it?
Hi Timo,
On 28/10/2016 16:28, Timo Sirainen wrote:
This code hasn't changed for quite a long time. So I don't think this is a new bug in 2.2.26. Can you try reproduce it easily? If yes, could you try if the attached patch fixes it?
The last time we played with Dovecot's replication was during the v2.1 era and we ended avoiding its use due to numerous bugs and serious issues.
Now, we are planning on migrating our Dovecot 2.2.18 VM to two physical servers running the latest release and we thought it would be a good idea to run some new tests, 4 years later, to see how it goes now! We started our new tests some days ago with v2.2.25. This explains why, if this problem isn't new, I wasn't able to report it sooner.
Back on topic: While typing the same commands as before, the problem doesn't seem to reproduce after your patch was applied.
I'll let you know if it shows up again.
Thanks, Regards, Gilles.
participants (2)
-
Gilles Chauvin
-
Timo Sirainen