severe fork() problems with new dovecot server

J. Niklas dove-io-drxf at foonet.de
Tue Mar 1 16:40:34 UTC 2016


Hi all,

I am getting to you because I am pretty desperate by now.
Please read this and share your ideas.

I set up a new Mail server with roughly 750 active accounts on it, based
on the usual dovecot+postfix+amavis+sql based vmail setup.
This is not the first time I did something like this, but admittedly my
largest setup so far.

I took my time to setup everything pretty straight and stable and
migrated all accounts in time with almost no glitches using imapc and
pop3c. Things started out smoothly - until the first business morning
when the log suddenly began to pop messages like these:

dovecot: master: Error: service(imap): fork() failed: Resource
temporarily unavailable
dovecot: master: Error: service(pop3): fork() failed: Resource
temporarily unavailable

The error is not confined to a certain service.
There are no other messages in dmesg or syslog which could help in
closing in on the root cause.

The server always runs at load around 0.4 to 0.5 and has plenty of RAM
and Disk, too.

Of course I checked resource limits and cranked up everything I could
possibly think of to numbers that should easily suffice:

/etc/security/limits.conf:
daemon           soft    nproc           2048
daemon           hard    nproc           3128
dovecot          soft    nproc           2048
dovecot          hard    nproc           3128
vmail            soft    nproc           2048
vmail            hard    nproc           3128
vmail            soft    nofile         32768
vmail            hard    nofile         49152

/etc/systemd/system/dovecot.service.d/ulimits.conf:
[Service]
LimitNOFILE=65534

However, all my attempts on resolving the problem were futile so far.

To work around the problem I implemented a log watcher which constantly
looks for the fork failure and dumps some debug info before restarting
dovecot to make the problem "go away" - until dovecot hits whatever
barrier there is again.

Here are today's numbers I gathered from this:

> forkdump_03-01_09-42: pop=3, imap=446, all=503, postfix:22, sys: 716
> forkdump_03-01_10-09: pop=9, imap=444, all=510, postfix:28, sys: 741
> forkdump_03-01_10-42: pop=5, imap=450, all=507, postfix:25, sys: 748
> forkdump_03-01_11-18: pop=5, imap=451, all=512, postfix:29, sys: 757
> forkdump_03-01_11-48: pop=3, imap=474, all=511, postfix:37, sys: 765
> forkdump_03-01_12-23: pop=7, imap=444, all=511, postfix:22, sys: 752
> forkdump_03-01_12-53: pop=3, imap=444, all=503, postfix:24, sys: 747
> forkdump_03-01_13-21: pop=3, imap=458, all=504, postfix:31, sys: 766
> forkdump_03-01_13-41: pop=4, imap=451, all=505, postfix:23, sys: 755
> forkdump_03-01_14-35: pop=5, imap=436, all=502, postfix:31, sys: 768
> forkdump_03-01_14-59: pop=3, imap=445, all=501, postfix:27, sys: 764
> forkdump_03-01_15-28: pop=6, imap=452, all=507, postfix:25, sys: 769

There you see the number of pop3 and imap childs in the process list as
well as the total number of all dovecot processes and all postfix
processes, summarized directly after the fork problem appears in the
log. The last column is the sum of all processes in the system.

Here is a run where the numbers are measured as the server approaches
the last entry in the list above.

> 15:26:15: pop=3, imap=432, all=485, VSZ: 14353k, postfix:30, sys: 749
> 15:26:30: pop=4, imap=435, all=490, VSZ: 14497k, postfix:30, sys: 754
> 15:26:45: pop=3, imap=439, all=492, VSZ: 14560k, postfix:28, sys: 754
> 15:27:00: pop=3, imap=444, all=497, VSZ: 14711k, postfix:28, sys: 760
> 15:27:15: pop=4, imap=442, all=496, VSZ: 14673k, postfix:25, sys: 756
> 15:27:30: pop=4, imap=444, all=497, VSZ: 14721k, postfix:25, sys: 757
> 15:27:46: pop=3, imap=446, all=498, VSZ: 14756k, postfix:25, sys: 758
> 15:28:01: pop=3, imap=446, all=497, VSZ: 14718k, postfix:24, sys: 756
> 15:28:16: pop=3, imap=455, all=507, VSZ: 15018k, postfix:25, sys: 767
> 15:28:18: pop=3, imap=455, all=507, VSZ: 15018k, postfix:25, sys: 767
> 15:28:20: pop=3, imap=452, all=504, VSZ: 14931k, postfix:25, sys: 764
---- service(pop3): fork() failed: Resource temporarily unavailable ----
> 15:28:22: pop=4, imap=452, all=505, VSZ: 14954k, postfix:25, sys: 770
> 15:28:24: pop=4, imap=452, all=505, VSZ: 14954k, postfix:25, sys: 770
> 15:28:26: pop=4, imap=451, all=504, VSZ: 14925k, postfix:25, sys: 769
> 15:28:28: pop=5, imap=451, all=506, VSZ: 14980k, postfix:25, sys: 771
---- dovecot restart complete ----
> 15:28:30: pop=3, imap=11, all=22, VSZ: 485k, postfix:25, sys: 282
> 15:29:00: pop=4, imap=65, all=83, VSZ: 2293k, postfix:26, sys: 344
> 15:29:31: pop=3, imap=91, all=111, VSZ: 3129k, postfix:27, sys: 373

It always seems to happen shortly after the number of dovecot processes
goes beyond 500. The VSZ column is the VSZ sum of all dovecot procs.

The number of files opened by vmail (lsof -u <vmailuid> | wc -l) at that
moment usually is between 13k and 15k.

sysctl fs.file-nr mostly is around 16k total but the system limit is set
way higher (at 1532853)

However, as you can see in the following dump, there is no process_limit
or whatever being set to 500 or 512 or similar.

# --------- su vmail -c "ulimit -Ha:" ---------------

core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 60026
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 49152
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) unlimited
cpu time               (seconds, -t) unlimited
max user processes              (-u) 3128
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

# ---------------- dovecot -n (mildly anonymized) ---------------

# 2.2.18: /etc/dovecot/dovecot.conf
# Pigeonhole version 0.4.8 (0c4ae064f307+)
# OS: Linux 4.4.0-8-generic x86_64 Ubuntu Xenial Xerus (development
branch) xfs
auth_cache_negative_ttl = 0
auth_cache_size = 50 M
auth_mechanisms = plain login
default_client_limit = 10000
dict {
  quota = mysql:/etc/dovecot/dovecot-dict-sql.conf.ext
}
hostname = my.mail.host
imap_client_workarounds = delay-newmail
lda_mailbox_autocreate = yes
lda_mailbox_autosubscribe = yes
lda_original_recipient_header = X-Original-To
listen = *
mail_location = sdbox:/mailsrv/users/%d/%u
mail_plugins = quota notify mail_log fts
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope
encoded-character vacation subaddress comparator-i;ascii-numeric
relational regex imap4flags copy include variables body enotify
environment mailbox date index ihave duplicate
namespace inbox {
  inbox = yes
  location =
  mailbox Drafts {
    auto = subscribe
    special_use = \Drafts
  }
  mailbox Junk {
    auto = subscribe
    special_use = \Junk
  }
  mailbox Sent {
    auto = subscribe
    special_use = \Sent
  }
  mailbox Trash {
    auto = subscribe
    special_use = \Trash
  }
  prefix =
}
passdb {
  args = /etc/dovecot/dovecot-sql.conf.ext
  driver = sql
}
plugin {
  fts_autoindex = yes
  quota = dict:User quota::proxy::quota
  sieve = file:~/sieve;active=~/.dovecot.sieve
  sieve_dir = ~/sieve
  stats_refresh = 30s
  stats_track_cmds = yes
}
postmaster_address = postmaster at my.mail.host
protocols = " imap lmtp sieve pop3"
service anvil {
  client_limit = 11227
}
service auth {
  client_limit = 11324
  unix_listener /mailsrv/spool/private/auth {
    group = postfix
    mode = 0666
    user = postfix
  }
}
service dict {
  unix_listener dict {
    mode = 0600
    user = vmail
  }
}
service imap-login {
  process_limit = 10000
  process_min_avail = 5
  service_count = 0
  vsz_limit = 2 G
}
service imap {
  process_limit = 2048
  vsz_limit = 2 G
}
service lmtp {
  inet_listener lmtp {
    address = 127.0.0.1
    port = 24
  }
}
service managesieve-login {
  inet_listener sieve {
    port = 4190
  }
}
service pop3-login {
  process_limit = 1024
  process_min_avail = 3
  service_count = 0
  vsz_limit = 2 G
}
service pop3 {
  process_limit = 1024
}
service stats {
  fifo_listener stats-mail {
    mode = 0600
    user = vmail
  }
}
ssl_cert = </etc/ssl/local/my.mail.host.crt.pem
ssl_key = </etc/ssl/local/my.mail.host.key.pem
userdb {
  args = /etc/dovecot/dovecot-sql.conf.ext
  driver = sql
}
verbose_proctitle = yes
protocol lmtp {
  mail_plugins = quota notify mail_log fts quota sieve
}
protocol lda {
  mail_plugins = quota notify mail_log fts sieve
}
protocol imap {
  mail_max_userip_connections = 15
  mail_plugins = quota notify mail_log fts imap_quota stats imap_stats
}
protocol pop3 {
  mail_max_userip_connections = 15
}
# ------------  end dovecot -n -------------

So, here I am, out of ideas for the moment. Do you have any?

-- 
kind regards,
             Jonathan


More information about the dovecot mailing list