Some mails do not get replicated anymore after memory-exhaust

Christoph Kluge me at christoph-kluge.eu
Thu Mar 2 09:10:50 UTC 2017


The amount of non-replicated mails on the mirror starts to grow without any
exceptions inside the log.
Is there a way how I can enforce a full-replication incl. directory scans
through the doveadm utility?

Besides that are there any arguments against a non-destructive rsync?
Could it break anything i.e. flags/dupes?

Best

On Mon, Feb 27, 2017 at 11:36 PM, Christoph Kluge <me at christoph-kluge.eu>
wrote:

> Hey guys,
>
> overall I have an working dovecot replication between 2 servers running on
> amazon cloud. Sadly I had some messages that my server ran out of memory.
> After investigating a little bit further I realized that some mails didn't
> got replicated, but I'm not sure if this was related to the memory exhaust.
> I was expecting that the full-sync would catch them up but sadly it's not.
>
> Attached I'm adding:
> * /etc/dovecot/dovecot.conf from both servers
> * one sample of my memory-exhaust exception
> * maildir directory listing of one mailbox on both servers
> * commands + outpot of manual attempt for full-replication
> * grep information of missing mail inside Maildir on both servers
>
> Here is my configuration from both servers. The configugration is 1:1 the
> same except the mail_replica server. Please note one server runs on debian
> 8.7 and the other one on 7.11.
>
> ---- SERVER A
>> # dovecot -n
>> # 2.2.13: /etc/dovecot/dovecot.conf
>> # OS: Linux 3.2.0-4-amd64 x86_64 Debian 8.7
>> ---- SERVER B
>> # dovecot -n
>> # 2.2.13: /etc/dovecot/dovecot.conf
>> # OS: Linux 2.6.32-34-pve i686 Debian 7.11
>> auth_mechanisms = plain login
>> disable_plaintext_auth = no
>> doveadm_password = ****
>> doveadm_port = 12345
>> listen = *,[::]
>> log_timestamp = "%Y-%m-%d %H:%M:%S "
>> mail_max_userip_connections = 100
>> mail_plugins = notify replication quota
>> mail_privileged_group = vmail
>> passdb {
>>   args = /etc/dovecot/dovecot-sql.conf
>>   driver = sql
>> }
>> plugin {
>>   mail_replica = tcp:*.****.de
>>   quota = dict:user::file:/var/vmail/%d/%n/.quotausage
>>   replication_full_sync_interval = 1 hours
>>   sieve = /var/vmail/%d/%n/.sieve
>>   sieve_max_redirects = 25
>> }
>> protocols = imap
>> replication_max_conns = 2
>> service aggregator {
>>   fifo_listener replication-notify-fifo {
>>     mode = 0666
>>     user = vmail
>>   }
>>   unix_listener replication-notify {
>>     mode = 0666
>>     user = vmail
>>   }
>> }
>> service auth {
>>   unix_listener /var/spool/postfix/private/auth {
>>     group = postfix
>>     mode = 0660
>>     user = postfix
>>   }
>>   unix_listener auth-userdb {
>>     group = vmail
>>     mode = 0600
>>     user = vmail
>>   }
>>   user = root
>> }
>> service config {
>>   unix_listener config {
>>     user = vmail
>>   }
>> }
>> service doveadm {
>>   inet_listener {
>>     port = 12345
>>   }
>>   user = vmail
>> }
>> service imap-login {
>>   client_limit = 1000
>>   process_limit = 512
>> }
>> service lmtp {
>>   unix_listener /var/spool/postfix/private/dovecot-lmtp {
>>     group = postfix
>>     mode = 0600
>>     user = postfix
>>   }
>> }
>> service replicator {
>>   process_min_avail = 1
>>   unix_listener replicator-doveadm {
>>     mode = 0666
>>   }
>> }
>> ssl_cert = </etc/postfix/smtpd.cert
>> ssl_key = </etc/postfix/smtpd.key
>> ssl_protocols = !SSLv2 !SSLv3
>> userdb {
>>   driver = prefetch
>> }
>> userdb {
>>   args = /etc/dovecot/dovecot-sql.conf
>>   driver = sql
>> }
>> protocol imap {
>>   mail_plugins = notify replication quota imap_quota
>> }
>> protocol pop3 {
>>   mail_plugins = quota
>>   pop3_uidl_format = %08Xu%08Xv
>> }
>> protocol lda {
>>   mail_plugins = notify replication quota sieve
>>   postmaster_address = webmaster at localhost
>> }
>> protocol lmtp {
>>   mail_plugins = notify replication quota sieve
>>   postmaster_address = webmaster at localhost
>> }
>
>
> This is the exception which I got several times:
>
> Feb 26 16:16:39 mx dovecot: replicator: Panic: data stack: Out of memory
>> when allocating 268435496 bytes
>> Feb 26 16:16:39 mx dovecot: replicator: Error: Raw backtrace:
>> /usr/lib/dovecot/libdovecot.so.0(+0x6b6fe) [0x7f7ca2b0a6fe] ->
>> /usr/lib/dovecot/libdovecot.so.0(+0x6b7ec) [0x7f7ca2b0a7ec] ->
>> /usr/lib/dovecot/libdovecot.so.0(i_fatal+0) [0x7f7ca2ac18fb] ->
>> /usr/lib/dovecot/libdovecot.so.0(+0x6977e) [0x7f7ca2b0877e] ->
>> /usr/lib/dovecot/libdovecot.so.0(+0x699db) [0x7f7ca2b089db] ->
>> /usr/lib/dovecot/libdovecot.so.0(+0x82198) [0x7f7ca2b21198] ->
>> /usr/lib/dovecot/libdovecot.so.0(+0x6776d) [0x7f7ca2b0676d] ->
>> /usr/lib/dovecot/libdovecot.so.0(buffer_write+0x6c) [0x7f7ca2b069dc] ->
>> dovecot/replicator(replicator_queue_push+0x14e) [0x7f7ca2fa17ae] ->
>> dovecot/replicator(+0x4f9e) [0x7f7ca2fa0f9e] -> dovecot/replicator(+0x4618)
>> [0x7f7ca2fa0618] -> dovecot/replicator(+0x4805) [0x7f7ca2fa0805] ->
>> /usr/lib/dovecot/libdovecot.so.0(io_loop_call_io+0x3f) [0x7f7ca2b1bd0f]
>> -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run_internal+0xf9)
>> [0x7f7ca2b1cd09] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_handler_run+0x9)
>> [0x7f7ca2b1bd79] -> /usr/lib/dovecot/libdovecot.so.0(io_loop_run+0x38)
>> [0x7f7ca2b1bdf8] -> /usr/lib/dovecot/libdovecot.so.0(master_service_run+0x13)
>> [0x7f7ca2ac6dc3] -> dovecot/replicator(main+0x195) [0x7f7ca2f9f8b5] ->
>> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7f7ca2715b45]
>> -> dovecot/replicator(+0x395d) [0x7f7ca2f9f95d]
>> Feb 26 16:16:39 mx dovecot: imap(***.com): Warning: replication(***.com):
>> Sync failure:
>> Feb 26 16:16:39 mx dovecot: replicator: Fatal: master:
>> service(replicator): child 24012 killed with signal 6 (core dumps disabled)
>
>
> This is the current maildir listing on Server A
>
> # ls -la /var/vmail/*.eu/*h/Maildir/new/
>> total 24
>> drwx------  2 vmail vmail 4096 Feb 27 18:12 .
>> drwx------ 15 vmail vmail 4096 Feb 27 21:47 ..
>> -rw-------  1 vmail vmail 3600 Feb 27 14:49 1488206976.M277562P25620.mail,
>> S=3600,W=3671
>> -rw-------  1 vmail vmail 4390 Feb 27 15:17 1488208642.M513542P27111.mail,
>> S=4390,W=4478:2,S
>> -rw-------  1 vmail vmail 3577 Feb 27 16:32 1488213157.M307300P30773.mail,
>> S=3577,W=3648:2,S
>
>
> This is the current maildir listing on Server B
>
> # ls -la /var/vmail/*.eu/*h/Maildir/new/
>> total 16
>> drwx------  2 vmail vmail 12288 Feb 27 16:45 .
>> drwx------ 15 vmail vmail  4096 Feb 27 21:47 ..
>
>
> This is how I tried to manually sync it
>
> doveadm -v sync -u *h@*.eu -f tcp:mx.***.de:12345
>
>
> This is the users sync status
>
> # doveadm replicator status 'cheecoh at ragequit.eu'
>> username priority fast sync full sync failed
>> *h@*.eu none     00:24:47  10:57:04  -
>
>
> Then I tried to lookup for the mail-id which is also the same on both
> servers
>
> # grep -ri "M277562P25620" /var/vmail/*.eu/*h/
>> /var/vmail/*.eu/*h/Maildir/dovecot-uidlist:493
>> :1488206976.M277562P25620.mail,S=3600,W=3671
>
>
> I have no idea what else I could do. I could also pass a "doveadm -Dv
> sync" output but this one is really huge..
>
> Best Regards
> Christoph Kluge
>
>


More information about the dovecot mailing list