Fwd: Some mails do not get replicated anymore after memory-exhaust
Hey guys, overall I have an working dovecot replication between 2 servers running on amazon cloud. Sadly I had some messages that my server ran out of memory. After investigating a little bit further I realized that some mails didn't got replicated, but I'm not sure if this was related to the memory exhaust. I was expecting that the full-sync would catch them up but sadly it's not. Attached I'm adding: * /etc/dovecot/dovecot.conf from both servers * one sample of my memory-exhaust exception * maildir directory listing of one mailbox on both servers * commands + outpot of manual attempt for full-replication * grep information of missing mail inside Maildir on both servers Here is my configuration from both servers. The configugration is 1:1 the same except the mail_replica server. Please note one server runs on debian 8.7 and the other one on 7.11. ---- SERVER A
# dovecot -n # 2.2.13: /etc/dovecot/dovecot.conf # OS: Linux 3.2.0-4-amd64 x86_64 Debian 8.7 ---- SERVER B # dovecot -n # 2.2.13: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-34-pve i686 Debian 7.11 auth_mechanisms = plain login disable_plaintext_auth = no doveadm_password = **** doveadm_port = 12345 listen = *,[::] log_timestamp = "%Y-%m-%d %H:%M:%S " mail_max_userip_connections = 100 mail_plugins = notify replication quota mail_privileged_group = vmail passdb { args = /etc/dovecot/dovecot-sql.conf driver = sql } plugin { mail_replica = tcp:*.****.de quota = dict:user::file:/var/vmail/%d/%n/.quotausage replication_full_sync_interval = 1 hours sieve = /var/vmail/%d/%n/.sieve sieve_max_redirects = 25 } protocols = imap replication_max_conns = 2 service aggregator { fifo_listener replication-notify-fifo { mode = 0666 user = vmail } unix_listener replication-notify { mode = 0666 user = vmail } } service auth { unix_listener /var/spool/postfix/private/auth { group = postfix mode = 0660 user = postfix } unix_listener auth-userdb { group = vmail mode = 0600 user = vmail } user = root } service config { unix_listener config { user = vmail } } service doveadm { inet_listener { port = 12345 } user = vmail } service imap-login { client_limit = 1000 process_limit = 512 } service lmtp { unix_listener /var/spool/postfix/private/dovecot-lmtp { group = postfix mode = 0600 user = postfix } } service replicator { process_min_avail = 1 unix_listener replicator-doveadm { mode = 0666 } } ssl_cert =
This is the exception which I got several times: Feb 26 16:16:39 mx dovecot: replicator: Panic: data stack: Out of memory
when allocating 268435496 bytes Feb 26 16:16:39 mx dovecot: replicator: Error: Raw backtrace: /usr/lib/dovecot/libdovecot.so.0(+0x6b6fe) [0x7f7ca2b0a6fe] -> /usr/lib/dovecot/libdovecot.so.0(+0x6b7ec) [0x7f7ca2b0a7ec] -> /usr/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f7ca2ac18fb] -> /usr/lib/dovecot/libdovecot.so.0(+0x6977e) [0x7f7ca2b0877e] -> /usr/lib/dovecot/libdovecot.so.0(+0x699db) [0x7f7ca2b089db] -> /usr/lib/dovecot/libdovecot.so.0(+0x82198) [0x7f7ca2b21198] -> /usr/lib/dovecot/libdovecot.so.0(+0x6776d) [0x7f7ca2b0676d] -> /usr/lib/dovecot/libdovecot.so.0(buffer_write+0x6c) [0x7f7ca2b069dc] -> dovecot/replicator(replicator_queue_push+0x14e) [0x7f7ca2fa17ae] -> dovecot/replicator(+0x4f9e) [0x7f7ca2fa0f9e] -> dovecot/replicator(+0x4618) [0x7f7ca2fa0618] -> dovecot/replicator(+0x4805) [0x7f7ca2fa0805] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x3f) [0x7f7ca2b1bd0f] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xf9) [0x7f7ca2b1cd09] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x9) [0x7f7ca2b1bd79] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f7ca2b1bdf8] -> /usr/lib/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f7ca2ac6dc3] -> dovecot/replicator(main+0x195) [0x7f7ca2f9f8b5] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7ca2715b45] -> dovecot/replicator(+0x395d) [0x7f7ca2f9f95d] Feb 26 16:16:39 mx dovecot: imap(***.com): Warning: replication(***.com): Sync failure: Feb 26 16:16:39 mx dovecot: replicator: Fatal: master: service(replicator): child 24012 killed with signal 6 (core dumps disabled)
This is the current maildir listing on Server A # ls -la /var/vmail/*.eu/*h/Maildir/new/
total 24 drwx------ 2 vmail vmail 4096 Feb 27 18:12 . drwx------ 15 vmail vmail 4096 Feb 27 21:47 .. -rw------- 1 vmail vmail 3600 Feb 27 14:49 1488206976.M277562P25620.mail, S=3600,W=3671 -rw------- 1 vmail vmail 4390 Feb 27 15:17 1488208642.M513542P27111.mail, S=4390,W=4478:2,S -rw------- 1 vmail vmail 3577 Feb 27 16:32 1488213157.M307300P30773.mail, S=3577,W=3648:2,S
This is the current maildir listing on Server B # ls -la /var/vmail/*.eu/*h/Maildir/new/
total 16 drwx------ 2 vmail vmail 12288 Feb 27 16:45 . drwx------ 15 vmail vmail 4096 Feb 27 21:47 ..
This is how I tried to manually sync it doveadm -v sync -u *h@*.eu -f tcp:mx.***.de:12345 This is the users sync status # doveadm replicator status 'cheecoh@ragequit.eu'
username priority fast sync full sync failed *h@*.eu none 00:24:47 10:57:04 -
Then I tried to lookup for the mail-id which is also the same on both servers # grep -ri "M277562P25620" /var/vmail/*.eu/*h/
/var/vmail/*.eu/*h/Maildir/dovecot-uidlist:493 :1488206976.M277562P25620. mail,S=3600,W=3671
I have no idea what else I could do. I could also pass a "doveadm -Dv sync" output but this one is really huge.. Best Regards Christoph Kluge
The amount of non-replicated mails on the mirror starts to grow without any
exceptions inside the log.
Is there a way how I can enforce a full-replication incl. directory scans
through the doveadm utility?
Besides that are there any arguments against a non-destructive rsync?
Could it break anything i.e. flags/dupes?
Best
On Mon, Feb 27, 2017 at 11:36 PM, Christoph Kluge
Hey guys,
overall I have an working dovecot replication between 2 servers running on amazon cloud. Sadly I had some messages that my server ran out of memory. After investigating a little bit further I realized that some mails didn't got replicated, but I'm not sure if this was related to the memory exhaust. I was expecting that the full-sync would catch them up but sadly it's not.
Attached I'm adding: * /etc/dovecot/dovecot.conf from both servers * one sample of my memory-exhaust exception * maildir directory listing of one mailbox on both servers * commands + outpot of manual attempt for full-replication * grep information of missing mail inside Maildir on both servers
Here is my configuration from both servers. The configugration is 1:1 the same except the mail_replica server. Please note one server runs on debian 8.7 and the other one on 7.11.
---- SERVER A
# dovecot -n # 2.2.13: /etc/dovecot/dovecot.conf # OS: Linux 3.2.0-4-amd64 x86_64 Debian 8.7 ---- SERVER B # dovecot -n # 2.2.13: /etc/dovecot/dovecot.conf # OS: Linux 2.6.32-34-pve i686 Debian 7.11 auth_mechanisms = plain login disable_plaintext_auth = no doveadm_password = **** doveadm_port = 12345 listen = *,[::] log_timestamp = "%Y-%m-%d %H:%M:%S " mail_max_userip_connections = 100 mail_plugins = notify replication quota mail_privileged_group = vmail passdb { args = /etc/dovecot/dovecot-sql.conf driver = sql } plugin { mail_replica = tcp:*.****.de quota = dict:user::file:/var/vmail/%d/%n/.quotausage replication_full_sync_interval = 1 hours sieve = /var/vmail/%d/%n/.sieve sieve_max_redirects = 25 } protocols = imap replication_max_conns = 2 service aggregator { fifo_listener replication-notify-fifo { mode = 0666 user = vmail } unix_listener replication-notify { mode = 0666 user = vmail } } service auth { unix_listener /var/spool/postfix/private/auth { group = postfix mode = 0660 user = postfix } unix_listener auth-userdb { group = vmail mode = 0600 user = vmail } user = root } service config { unix_listener config { user = vmail } } service doveadm { inet_listener { port = 12345 } user = vmail } service imap-login { client_limit = 1000 process_limit = 512 } service lmtp { unix_listener /var/spool/postfix/private/dovecot-lmtp { group = postfix mode = 0600 user = postfix } } service replicator { process_min_avail = 1 unix_listener replicator-doveadm { mode = 0666 } } ssl_cert =
This is the exception which I got several times:
Feb 26 16:16:39 mx dovecot: replicator: Panic: data stack: Out of memory
when allocating 268435496 bytes Feb 26 16:16:39 mx dovecot: replicator: Error: Raw backtrace: /usr/lib/dovecot/libdovecot.so.0(+0x6b6fe) [0x7f7ca2b0a6fe] -> /usr/lib/dovecot/libdovecot.so.0(+0x6b7ec) [0x7f7ca2b0a7ec] -> /usr/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f7ca2ac18fb] -> /usr/lib/dovecot/libdovecot.so.0(+0x6977e) [0x7f7ca2b0877e] -> /usr/lib/dovecot/libdovecot.so.0(+0x699db) [0x7f7ca2b089db] -> /usr/lib/dovecot/libdovecot.so.0(+0x82198) [0x7f7ca2b21198] -> /usr/lib/dovecot/libdovecot.so.0(+0x6776d) [0x7f7ca2b0676d] -> /usr/lib/dovecot/libdovecot.so.0(buffer_write+0x6c) [0x7f7ca2b069dc] -> dovecot/replicator(replicator_queue_push+0x14e) [0x7f7ca2fa17ae] -> dovecot/replicator(+0x4f9e) [0x7f7ca2fa0f9e] -> dovecot/replicator(+0x4618) [0x7f7ca2fa0618] -> dovecot/replicator(+0x4805) [0x7f7ca2fa0805] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x3f) [0x7f7ca2b1bd0f] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xf9) [0x7f7ca2b1cd09] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x9) [0x7f7ca2b1bd79] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_run+0x38) [0x7f7ca2b1bdf8] -> /usr/lib/dovecot/libdovecot.so.0(master_service_run+0x13) [0x7f7ca2ac6dc3] -> dovecot/replicator(main+0x195) [0x7f7ca2f9f8b5] -> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7ca2715b45] -> dovecot/replicator(+0x395d) [0x7f7ca2f9f95d] Feb 26 16:16:39 mx dovecot: imap(***.com): Warning: replication(***.com): Sync failure: Feb 26 16:16:39 mx dovecot: replicator: Fatal: master: service(replicator): child 24012 killed with signal 6 (core dumps disabled)
This is the current maildir listing on Server A
# ls -la /var/vmail/*.eu/*h/Maildir/new/
total 24 drwx------ 2 vmail vmail 4096 Feb 27 18:12 . drwx------ 15 vmail vmail 4096 Feb 27 21:47 .. -rw------- 1 vmail vmail 3600 Feb 27 14:49 1488206976.M277562P25620.mail, S=3600,W=3671 -rw------- 1 vmail vmail 4390 Feb 27 15:17 1488208642.M513542P27111.mail, S=4390,W=4478:2,S -rw------- 1 vmail vmail 3577 Feb 27 16:32 1488213157.M307300P30773.mail, S=3577,W=3648:2,S
This is the current maildir listing on Server B
# ls -la /var/vmail/*.eu/*h/Maildir/new/
total 16 drwx------ 2 vmail vmail 12288 Feb 27 16:45 . drwx------ 15 vmail vmail 4096 Feb 27 21:47 ..
This is how I tried to manually sync it
doveadm -v sync -u *h@*.eu -f tcp:mx.***.de:12345
This is the users sync status
# doveadm replicator status 'cheecoh@ragequit.eu'
username priority fast sync full sync failed *h@*.eu none 00:24:47 10:57:04 -
Then I tried to lookup for the mail-id which is also the same on both servers
# grep -ri "M277562P25620" /var/vmail/*.eu/*h/
/var/vmail/*.eu/*h/Maildir/dovecot-uidlist:493 :1488206976.M277562P25620.mail,S=3600,W=3671
I have no idea what else I could do. I could also pass a "doveadm -Dv sync" output but this one is really huge..
Best Regards Christoph Kluge
participants (1)
-
Christoph Kluge