Different Replicator Crash??
I added the suggested "count--" and recompiled, and it now appears to work, the replication goes both ways, eyeballing the e-mail on the two servers they look the same. Good.
But part of the time I am still getting what looks very much like the same crash, is there another "count--" I need to add to that file?
Jan 28 10:53:08 la dovecot[96175]: replicator: Panic: data stack: Out of memory when allocating 268435496 bytes Jan 28 10:53:08 la dovecot[96175]: replicator: Error: Raw backtrace: #0 t_askpass[0x7fb404b0c0] -> #1 backtrace_append[0x7fb404b374] -> #2 backtr
ace_get[0x7fb404b510] -> #3 execvp_const[0x7fb4057ba4] -> #4 i_syslog_fatal_handler[0x7fb4058510] -> #5 i_panic[0x7fb3fa6808] -> #6 t_pop_pass_st
r[0x7fb4050eb4] -> #7 connection_deinit[0x7fb4056430] -> #8 pool_datastack_create[0x7fb407e7b0] -> #9 array_bsearch_insert_pos_i[0x7fb40446d0] ->
#10 t_base64_scheme_decode[0x7fb404cb24] -> #11 buffer_append[0x7fb404d3d0] -> #12 replicator_queue_push[0x558c4b4f40] -> #13 replicator_brain_i
nit[0x558c4b53e4] -> #14 _start[0x558c4b3d70] -> #15 _start[0x558c4b3f04] -> #16 io_loop_call_io[0x7fb4072f20] -> #17 io_loop_handler_run_interna
l[0x7fb4075450] -> #18 io_loop_handler_run[0x7fb4075680] -> #19 io_loop_run[0x7fb4075944] -> #20 master_service_run[0x7fb3fd84b0] -> #21 main[0x5
58c4b3930] -> #22 __libc_init_first[0x7fb3dd76d0] -> #23 __libc_start_main[0x7fb3dd7780] -> #24 _start[0x558c4b3b80] Jan 28 10:53:08 la dovecot[96175]: replicator: Fatal: master: service(replicator): child 96714 killed with signal 6 (core dumps disabled - https:
//dovecot.org/bugreport.html#coredumps)
Thanks,
-kb
After adding patch you mentioned I no longer have replicatior out of memory. Are you sure you patched both sides?
31.01.2025 22:27, Kent Borg via dovecot пишет:
I added the suggested "count--" and recompiled, and it now appears to work, the replication goes both ways, eyeballing the e-mail on the two servers they look the same. Good.
But part of the time I am still getting what looks very much like the same crash, is there another "count--" I need to add to that file?
Jan 28 10:53:08 la dovecot[96175]: replicator: Panic: data stack: Out of memory when allocating 268435496 bytes Jan 28 10:53:08 la dovecot[96175]: replicator: Error: Raw backtrace: #0 t_askpass[0x7fb404b0c0] -> #1 backtrace_append[0x7fb404b374] -> #2 backtr
ace_get[0x7fb404b510] -> #3 execvp_const[0x7fb4057ba4] -> #4 i_syslog_fatal_handler[0x7fb4058510] -> #5 i_panic[0x7fb3fa6808] -> #6 t_pop_pass_st
r[0x7fb4050eb4] -> #7 connection_deinit[0x7fb4056430] -> #8 pool_datastack_create[0x7fb407e7b0] -> #9 array_bsearch_insert_pos_i[0x7fb40446d0] ->
#10 t_base64_scheme_decode[0x7fb404cb24] -> #11 buffer_append[0x7fb404d3d0] -> #12 replicator_queue_push[0x558c4b4f40] -> #13 replicator_brain_i
nit[0x558c4b53e4] -> #14 _start[0x558c4b3d70] -> #15 _start[0x558c4b3f04] -> #16 io_loop_call_io[0x7fb4072f20] -> #17 io_loop_handler_run_interna
l[0x7fb4075450] -> #18 io_loop_handler_run[0x7fb4075680] -> #19 io_loop_run[0x7fb4075944] -> #20 master_service_run[0x7fb3fd84b0] -> #21 main[0x5
58c4b3930] -> #22 __libc_init_first[0x7fb3dd76d0] -> #23 __libc_start_main[0x7fb3dd7780] -> #24 _start[0x558c4b3b80] Jan 28 10:53:08 la dovecot[96175]: replicator: Fatal: master: service(replicator): child 96714 killed with signal 6 (core dumps disabled - https:
//dovecot.org/bugreport.html#coredumps)Thanks,
-kb
dovecot mailing list -- dovecot@dovecot.org To unsubscribe send an email to dovecot-leave@dovecot.org
On 2/2/25 8:00 PM, Dmitry Melekhov via dovecot wrote:
After adding patch you mentioned I no longer have replicatior out of memory.
The syncing now appears to work correct and I *mostly* don't get the error. I used to get the error I think with ever sync. This new error? I didn't know it was even happening until I saw my logwatch e-mails reporting a few every day.
Are you sure you patched both sides?
No, did not patch the far side, just the side where I configured the replicator to run. (Can the replicator run on both sides? Do they play nicely if both replicating to the other?)
Thanks,
-kb
03.02.2025 22:21, Kent Borg пишет:
On 2/2/25 8:00 PM, Dmitry Melekhov via dovecot wrote:
After adding patch you mentioned I no longer have replicatior out of memory.
The syncing now appears to work correct and I *mostly* don't get the error. I used to get the error I think with ever sync. This new error? I didn't know it was even happening until I saw my logwatch e-mails reporting a few every day.
I'd check if this problem is for specific mail boxes.
Or, may be, limit number of replicators, as usually suggested before this patch.
Are you sure you patched both sides?
No, did not patch the far side, just the side where I configured the replicator to run. (Can the replicator run on both sides? Do they play nicely if both replicating to the other?)
Yes, it can run on both sides.
Don't know how it works though...
On 2/3/25 8:19 PM, Dmitry Melekhov via dovecot wrote:
Are you sure you patched both sides?
No, did not patch the far side, just the side where I configured the replicator to run. (Can the replicator run on both sides? Do they play nicely if both replicating to the other?)
Yes, it can run on both sides.
Don't know how it works though...
At the moment I have replication running on one end and it seems to be /mostly/ working. I do get fairly rare:
Warning: Failed to do incremental sync for mailbox INBOX …messages in the log, but very few as a ratio to sync attempts, as whatever the problem is /seems/ to work itself out. I hope.
Running two-sided replication seems a harder problem than running one-sided. So I don't think I trust running replication on both sides.
I thought Dovecot was a mature program and replication a real feature, but to get replication working this well I needed to take a kindly suggested patch, compile and try it, see it was still broken, stare at the patch and realize the suggested patch was also broken, and make my own stab at something that wasn't an obvious leak…but isn't necessarily correct, either.
I posted my code change to the list…but got no response. I fear am in obscure territory here and no one really knows nor has the time to investigate. I'm worried that replication is deleted from version 2.4 because it didn't entirely work in my version 2.3, but it was too broken to properly fix.
I don't mind an occasional error message like the one above, I don't mind occasionally being out of sync for a short time, but I do *not* want to lose data. Will I?
-kb
P.S. Soapbox rant: After looking at the code in the function I was messing with I could see it was clearly broken as it was, and was broken in the suggested patch, and I don't know if my change is good and makes sense. I see various stuff there that might or might not be correct, it would be a lot of work to understand it all and find out. And then I lament that were it written in Rust a lot of bugs simply could not be there——because the compiler would do that analysis and complain. Yes, I realize Dovecot is too old to have been written in Rust, but something like https://stalw.art/ is not too old. If my Dovecot set up works I'll let it run, I hope it will run happily for a few years, because I don't want to start over. But when it /is/ time to replace things…
participants (4)
-
Dmitry Melekhov
-
Kent Borg
-
Kent Borg
-
Kent Borg