17.11.2016 11:01 Sander Lepik kirjutas:
Hi!
We have 2 servers replicating each other and after upgrade to 2.2.26.0 we are seeing this in the logs:
Primary server:
Nov 17 09:37:39 mailhost01 dovecot: dsync-server(user@domain.ee): Panic: file dsync-brain-mailbox.c: line 814 (dsync_brain_slave_recv_mailbox): assertion failed: (memcmp(dsync_box->mailbox_guid, local_dsync_box.mailbox_guid, sizeof(dsync_box->mailbox_guid)) == 0) Nov 17 09:37:39 mailhost01 dovecot: dsync-server(user@domain.ee): Error: Raw backtrace: /usr/lib/dovecot/libdovecot.so.0(+0x9438e) [0x7f3ccceb238e] -> /usr/lib/dovecot/libdovecot.so.0(+0x9447c) [0x7f3ccceb247c] -> /usr/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f3ccce4ba4e] -> dovecot/doveadm-server(dsync_brain_slave_recv_mailbox+0x3d8) [0x7f3ccd8f66f8] -> dovecot/doveadm-server(dsync_brain_run+0x650) [0x7f3ccd8f4110] -> dovecot/doveadm-server(+0x4143b) [0x7f3ccd8f443b] -> dovecot/doveadm-server(+0x5735f) [0x7f3ccd90a35f] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x4c) [0x7f3cccec6bdc] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0x10a) [0x7f3cccec809a] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x25) [0x7f3cccec6c65] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f3cccec6e08] -> dovecot/doveadm-server(+0x26b99) [0x7f3ccd8d9b99] -> dovecot/doveadm-server(+0x28efc) [0x7f3ccd8dbefc] -> dovecot/doveadm-server(+0x3daba) [0x7f3ccd8f0aba] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x4c) [0x7f3cccec6bdc] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0x10a) [0x7f3cccec809a] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x25) [0x7f3cccec6c65] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f3cccec6e08] -> /usr/lib/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f3ccce51f53] -> dovecot/doveadm-server(main+0x19f) [0x7f3ccd8ccdef] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f3ccca94b45] -> dovecot/doveadm-server(+0x19ea6) [0x7f3ccd8ccea6] Nov 17 09:37:39 mailhost01 dovecot: dsync-server(user@domain.ee): Fatal: master: service(doveadm): child 42621 killed with signal 6 (core dumps disabled)
Looking at the logs more deeply we can see that it's pretty sure that it was automatic full resync causing this error. But if that's the case then replicator's status is giving wrong info.
It shows that full sync was successful and nothing failed - the process crashed and there were no more retries in the logs - how can it be successful?
-- Sander