Hi,
I'm a but clueless, having issues with replication. doveadm dsync -u hans
works.
But using the following replication setup, I see coredumps.
Where to go next?
Interestingly not for all users. (For testing purposes I've only 2 users. One having about 20 messages: here even the replication works, but the 2nd user (having about 14k messages) fails. The failure seems to happen immediatly after starting the replication attempt).
I saw the replication crashing using Dovecot packages from the current Debian distro. In order to debug this, I'm now using 2.3.20, built from Git.
The replicator config boils down to this (complete config is attached)
replication_full_sync_interval = 90d
replication_max_conns = 16
mail_plugins = $mail_plugins notify replication
plugin {
mail_replica = tcps:smtp-mz.example.com:9090
}
service replicator {
process_min_avail = 1
unix_listener replicator-doveadm {
mode = 0666
}
}
service aggregator {
fifo_listener replication-notify-fifo {
user = dovecot
mode = 0666
}
unix_listener replication-notify {
user = dovecot
mode = 0666
}
}
The stacktrace I get looks like this:
Stack trace of thread 729589: #0 0x00007f9e4f2c8ce1 __GI_raise (libc.so.6 + 0x38ce1) #1 0x00007f9e4f2b2537 __GI_abort (libc.so.6 + 0x22537) #2 0x00007f9e4f5fe8c6 default_fatal_finish (libdovecot.so.0 + 0x558c6) #3 0x00007f9e4f6ab601 i_internal_fatal_handler (libdovecot.so.0 + 0x102601) #4 0x00007f9e4f5fe589 i_panic (libdovecot.so.0 + 0x55589) #5 0x00007f9e4f5fe99e fd_set_nonblock (libdovecot.so.0 + 0x5599e) #6 0x00005564cbe6a08d cmd_dsync_ibc_stream_init (doveadm-server + 0x3008d) #7 0x00005564cbe6b772 cmd_dsync_run (doveadm-server + 0x31772) #8 0x00005564cbe6d284 doveadm_mail_next_user (doveadm-server + 0x33284) #9 0x00005564cbe6e4ba doveadm_mail_cmd_exec (doveadm-server + 0x344ba) #10 0x00005564cbe7eb71 doveadm_cmd_run_ver2 (doveadm-server + 0x44b71) #11 0x00005564cbe8300a doveadm_cmd_server_run_ver2 (doveadm-server + 0x4900a) #12 0x00007f9e4f6c1799 io_loop_call_io (libdovecot.so.0 + 0x118799) #13 0x00007f9e4f6c2e82 io_loop_handler_run_internal (libdovecot.so.0 + 0x119e82) #14 0x00007f9e4f6c1840 io_loop_handler_run (libdovecot.so.0 + 0x118840) #15 0x00007f9e4f6c1a00 io_loop_run (libdovecot.so.0 + 0x118a00) #16 0x00007f9e4f6343a3 master_service_run (libdovecot.so.0 + 0x8b3a3) #17 0x00005564cbe5d9a2 main (doveadm-server + 0x239a2) #18 0x00007f9e4f2b3d0a __libc_start_main (libc.so.6 + 0x23d0a) #19 0x00005564cbe5da2a _start (doveadm-server + 0x23a2a)
2023 Mar 28 00:09:28 clone doveadm(hans)<729589><TdSzNhcUImT1IQsAKQCViw>: Fatal: master: service(doveadm): child 729589 killed with signal 6 (core dumped)
I'm a bit clueless now, even using the coredump and gdb.
-- Heiko