dsync replication fails with No space left on device / Out of memory

Steven Varco dovecot.org at bbs.varco.ch
Mon Jul 5 11:00:38 EEST 2021


> Aki Tuomi aki.tuomi at open-xchange.com 
> Fri Jul 2 09:14:47 EEST 2021
> 
> The disk issue is likely that disk space on mail_temp_dir runs out, which is usually /tmp.


Hi Aki

Many thanks for that hint, it actually lead me to the root cause of the problem! :)

As during the process the /tmp filesystem fills- and after empties so fast I could not even see the filesystem filling up when actively monitoring it with the watch command. It took like a microsecond when I could only see that /tmp increased somehow and immediately decreased again. Thats why I not noticed this in the first place.

I then increased the filesystem size and all the problems suddenly vanished. - Not just the "No space left on device“, suppringsly also the error log message: „Out of memory“ ist gone now, so they were somehow connected to eachother.

cheers,
Steven

-- 
https://steven.varco.ch/
https://www.tech-island.com/



> Am 02.07.2021 um 07:43 schrieb Jörg Faudin Schulz <js at faudin.de>:
> 
> Hi,
> 
> the memory issue has already been reported, not resolved yet:
> 
> https://www.mail-archive.com/dovecot@dovecot.org/msg83763.html
> 
> 
> the disk-free issue is something different. Increasing memory parameters doesn't help- the sync only crashes later.
> 
> Here, everything seems to be synced fine nevertheless.
> 
> 
> 
> Am 02.07.21 um 02:56 schrieb Harlan Stenn:
>> Inodes?  df -i
>> 
>> On 7/1/2021 5:07 PM, Steven Varco wrote:
>>> Hi All
>>> 
>>> Since I configured dsync replication I get strange errors in the maillog on my two mail dovecot nodes:
>>> 
>>> PRIMARY:
>>> Jul  2 01:21:42 mx01.example.com dovecot: doveadm: Error: read(mx02.example.com) failed: read(size=3148) failed: Connection reset by peer (last sent=mail, last recv=mail (EOL))
>>> 
>>> 
>>> The secondary is more interesting:
>>> 
>>> SECONDARY
>>> Jul  2 01:21:42 mx02 dovecot: doveadm: Error: close(-1[istream-seekable.c:237]) failed: No space left on device
>>> Jul  2 01:21:43 mx02 dovecot: doveadm: Fatal: pool_system_realloc(268435456): Out of memory
>>> Jul  2 01:21:43 mx02 dovecot: doveadm: Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0xa192e) [0x7f2e9be4c92e] -> /usr/lib64/dovecot/libdovecot.so.0(+0xa1a0e) [0x7f2e9be4ca0e] -> /usr/lib64/dovecot/libdovecot.so.0(i_error+0) [0x7f2e9bddc3d3] -> /usr/lib64/dovecot/libdo
>>> Jul  2 01:21:43 mx02 dovecot: doveadm: Fatal: master: service(doveadm): child 2876 returned error 83 (Out of memory (service doveadm { vsz_limit=256 MB }, you may need to increase it) - set CORE_OUTOFMEM=1 environment to get core dump)
>>> Jul  2 01:21:51 mx02 dovecot: dsync-local(user at example.com): Error: Raw backtrace: /usr/lib64/dovecot/libdovecot.so.0(+0xa192e) [0x7fd56e17e92e] -> /usr/lib64/dovecot/libdovecot.so.0(+0xa1a0e) [0x7fd56e17ea0e] -> /usr/lib64/dovecot/libdovecot.so.0(i_error+0) [0x7fd56e10e3d3] -> /us
>>> Jul  2 01:21:51 mx02 dovecot: dsync-local(user at example.com): Fatal: master: service(doveadm): child 2882 returned error 83 (Out of memory (service doveadm { vsz_limit=256 MB }, you may need to increase it) - set CORE_OUTOFMEM=1 environment to get core dump)
>>> 
>>> 
>>> The error messages state that disk space and/or memory is a problem, but disk space and memory is enough available:
>>> 
>>> mx02 [~] # df -h /srv/mail/
>>> Filesystem               Size  Used Avail Use% Mounted on
>>> /dev/mapper/system-mail   10G  5.7G  4.3G  58% /srv/mail
>>> 
>>> mx02 [~] # free -m
>>>               total        used        free      shared  buff/cache   available
>>> Mem:           3789        1602        1088         199        1097        1759
>>> Swap:           471          93         378
>>> 
>>> 
>>> I also tried to increase vsz_limit from 256 MB to 512 MB, which did not help.
>>> 
>>> 
>>> And for the sake of completness also the connection to the doveadm port works well from both nodes:
>>> 
>>> mx01-prod [~] # telnet mx02 14310
>>> Trying 172.20.19.225...
>>> Connected to mx02.
>>> Escape character is '^]'.
>>> ^]
>>> 
>>> 
>>> mx02 [~] # telnet mx01 14310
>>> Trying 172.20.19.251...
>>> Connected to mx01.
>>> Escape character is '^]'.
>>> ^]
>>> 
>>> 
>>> Although mail replication seems to be working properly and mails are in sync on both nodes (as what I could see), I would like to find the cause of this messages, as this does definetely don’t look normal…
>>> 
>>> I’m grateful for any help, since I’m quite on a struggle now…
>>> 
>>> Steven
>>> 
>>> 
>>> Here’s my config
>>> --------------------------------------------------------------------------------
>>> # doveconf -n
>>> # 2.2.36 (1f10bfa63): /etc/dovecot/dovecot.conf
>>> # Pigeonhole version 0.4.24 (124e06aa)
>>> # OS: Linux 3.10.0-1160.31.1.el7.x86_64 x86_64 CentOS Linux release 7.9.2009 (Core)
>>> # Hostname: mx01.example.com
>>> auth_mechanisms = plain login
>>> auth_verbose = yes
>>> dict {
>>>   sqlquota = mysql:/etc/dovecot/dict-sqlquota.conf.ext
>>> }
>>> doveadm_password =  # hidden, use -P to show it
>>> doveadm_port = 14310
>>> first_valid_uid = 1000
>>> mail_plugins = quota notify replication
>>> managesieve_notify_capability = mailto
>>> managesieve_sieve_capability = fileinto reject envelope encoded-character vacation subaddress comparator-i;ascii-numeric relational regex imap4flags copy include variables body enotify environment mailbox date index ihave duplicate mime foreverypart extracttext
>>> mbox_write_locks = fcntl
>>> namespace inbox {
>>>   inbox = yes
>>>   location =
>>>   mailbox Drafts {
>>>     special_use = \Drafts
>>>   }
>>>   mailbox Junk {
>>>     special_use = \Junk
>>>   }
>>>   mailbox Sent {
>>>     special_use = \Sent
>>>   }
>>>   mailbox "Sent Messages" {
>>>     special_use = \Sent
>>>   }
>>>   mailbox Trash {
>>>     special_use = \Trash
>>>   }
>>>   prefix =
>>>   separator = /
>>>   type = private
>>> }
>>> passdb {
>>>   args = /etc/dovecot/dovecot-sql.conf.ext
>>>   driver = sql
>>> }
>>> plugin {
>>>   mail_replica = tcp:mx02.example.com
>>>   quota = maildir:User quota
>>>   quota_exceeded_message = Quota exceeded, please go to http://www.example.com/over_quota_help for instructions on how to fix this.
>>>   quota_rule2 = INBOX.Trash:storage=+100M
>>>   quota_status_nouser = DUNNO
>>>   quota_status_overquota = 552 5.2.2 Mailbox is full / Mailbox ist voll
>>>   quota_status_success = DUNNO
>>>   quota_warning = storage=90%% quota-warning 90 %u
>>>   quota_warning2 = -storage=90%% quota-warning below %u
>>>   sieve = file:~/sieve;active=~/.dovecot.sieve
>>> }
>>> postmaster_address = postmaster at example.com
>>> protocols = imap pop3 lmtp sieve
>>> replication_dsync_parameters = -d -l 30 -U
>>> service aggregator {
>>>   fifo_listener replication-notify-fifo {
>>>     user = vmail
>>>   }
>>>   unix_listener replication-notify {
>>>     user = vmail
>>>   }
>>> }
>>> service auth {
>>>   unix_listener /var/spool/postfix/private/auth {
>>>     group = postfix
>>>     mode = 0660
>>>     user = postfix
>>>   }
>>>   unix_listener auth-userdb {
>>>     user = vmail
>>>   }
>>> }
>>> service dict {
>>>   unix_listener dict {
>>>     user = vmail
>>>   }
>>> }
>>> service doveadm {
>>>   inet_listener {
>>>     port = 14310
>>>     ssl = no
>>>   }
>>> }
>>> service managesieve-login {
>>>   inet_listener sieve {
>>>     port = 4190
>>>   }
>>> }
>>> service quota-status {
>>>   client_limit = 1
>>>   executable = quota-status -p postfix
>>>   inet_listener {
>>>     port = 14340
>>>   }
>>> }
>>> service quota-warning {
>>>   executable = script /usr/local/libexec/dovecot/quota-warning.sh
>>>   unix_listener quota-warning {
>>>     user = vmail
>>>   }
>>>   user = vmail
>>> }
>>> service replicator {
>>>   process_min_avail = 1
>>>   unix_listener replicator-doveadm {
>>>     mode = 0600
>>>     user = vmail
>>>   }
>>> }
>>> ssl = required
>>> ssl_cert = </etc/ssl/acme/certs/mail.example.com.chain.crt
>>> ssl_key =  # hidden, use -P to show it
>>> userdb {
>>>   args = /etc/dovecot/dovecot-sql.conf.ext
>>>   driver = sql
>>> }
>>> verbose_proctitle = yes
>>> protocol lmtp {
>>>   mail_plugins = quota notify replication sieve
>>> }
>>> protocol lda {
>>>   mail_plugins = quota notify replication sieve
>>> }
>>> protocol imap {
>>>   mail_max_userip_connections = 20
>>>   mail_plugins = quota notify replication imap_quota
>>> }
>>> --------------------------------------------------------------------------------
>>> 
>>> 
>>> mx02.example.com has exact the same config, except of:
>>> --------------------------------------------------------------------------------
>>> plugin {
>>>   mail_replica = tcp:mx01.example.com
>>> --------------------------------------------------------------------------------
>>> 
>>> 



More information about the dovecot mailing list