severe fork() problems with new dovecot server
J. Niklas
dove-io-drxf at foonet.de
Tue Mar 1 16:40:34 UTC 2016
Hi all,
I am getting to you because I am pretty desperate by now.
Please read this and share your ideas.
I set up a new Mail server with roughly 750 active accounts on it, based
on the usual dovecot+postfix+amavis+sql based vmail setup.
This is not the first time I did something like this, but admittedly my
largest setup so far.
I took my time to setup everything pretty straight and stable and
migrated all accounts in time with almost no glitches using imapc and
pop3c. Things started out smoothly - until the first business morning
when the log suddenly began to pop messages like these:
dovecot: master: Error: service(imap): fork() failed: Resource
temporarily unavailable
dovecot: master: Error: service(pop3): fork() failed: Resource
temporarily unavailable
The error is not confined to a certain service.
There are no other messages in dmesg or syslog which could help in
closing in on the root cause.
The server always runs at load around 0.4 to 0.5 and has plenty of RAM
and Disk, too.
Of course I checked resource limits and cranked up everything I could
possibly think of to numbers that should easily suffice:
/etc/security/limits.conf:
daemon soft nproc 2048
daemon hard nproc 3128
dovecot soft nproc 2048
dovecot hard nproc 3128
vmail soft nproc 2048
vmail hard nproc 3128
vmail soft nofile 32768
vmail hard nofile 49152
/etc/systemd/system/dovecot.service.d/ulimits.conf:
[Service]
LimitNOFILE=65534
However, all my attempts on resolving the problem were futile so far.
To work around the problem I implemented a log watcher which constantly
looks for the fork failure and dumps some debug info before restarting
dovecot to make the problem "go away" - until dovecot hits whatever
barrier there is again.
Here are today's numbers I gathered from this:
> forkdump_03-01_09-42: pop=3, imap=446, all=503, postfix:22, sys: 716
> forkdump_03-01_10-09: pop=9, imap=444, all=510, postfix:28, sys: 741
> forkdump_03-01_10-42: pop=5, imap=450, all=507, postfix:25, sys: 748
> forkdump_03-01_11-18: pop=5, imap=451, all=512, postfix:29, sys: 757
> forkdump_03-01_11-48: pop=3, imap=474, all=511, postfix:37, sys: 765
> forkdump_03-01_12-23: pop=7, imap=444, all=511, postfix:22, sys: 752
> forkdump_03-01_12-53: pop=3, imap=444, all=503, postfix:24, sys: 747
> forkdump_03-01_13-21: pop=3, imap=458, all=504, postfix:31, sys: 766
> forkdump_03-01_13-41: pop=4, imap=451, all=505, postfix:23, sys: 755
> forkdump_03-01_14-35: pop=5, imap=436, all=502, postfix:31, sys: 768
> forkdump_03-01_14-59: pop=3, imap=445, all=501, postfix:27, sys: 764
> forkdump_03-01_15-28: pop=6, imap=452, all=507, postfix:25, sys: 769
There you see the number of pop3 and imap childs in the process list as
well as the total number of all dovecot processes and all postfix
processes, summarized directly after the fork problem appears in the
log. The last column is the sum of all processes in the system.
Here is a run where the numbers are measured as the server approaches
the last entry in the list above.
> 15:26:15: pop=3, imap=432, all=485, VSZ: 14353k, postfix:30, sys: 749
> 15:26:30: pop=4, imap=435, all=490, VSZ: 14497k, postfix:30, sys: 754
> 15:26:45: pop=3, imap=439, all=492, VSZ: 14560k, postfix:28, sys: 754
> 15:27:00: pop=3, imap=444, all=497, VSZ: 14711k, postfix:28, sys: 760
> 15:27:15: pop=4, imap=442, all=496, VSZ: 14673k, postfix:25, sys: 756
> 15:27:30: pop=4, imap=444, all=497, VSZ: 14721k, postfix:25, sys: 757
> 15:27:46: pop=3, imap=446, all=498, VSZ: 14756k, postfix:25, sys: 758
> 15:28:01: pop=3, imap=446, all=497, VSZ: 14718k, postfix:24, sys: 756
> 15:28:16: pop=3, imap=455, all=507, VSZ: 15018k, postfix:25, sys: 767
> 15:28:18: pop=3, imap=455, all=507, VSZ: 15018k, postfix:25, sys: 767
> 15:28:20: pop=3, imap=452, all=504, VSZ: 14931k, postfix:25, sys: 764
---- service(pop3): fork() failed: Resource temporarily unavailable ----
> 15:28:22: pop=4, imap=452, all=505, VSZ: 14954k, postfix:25, sys: 770
> 15:28:24: pop=4, imap=452, all=505, VSZ: 14954k, postfix:25, sys: 770
> 15:28:26: pop=4, imap=451, all=504, VSZ: 14925k, postfix:25, sys: 769
> 15:28:28: pop=5, imap=451, all=506, VSZ: 14980k, postfix:25, sys: 771
---- dovecot restart complete ----
> 15:28:30: pop=3, imap=11, all=22, VSZ: 485k, postfix:25, sys: 282
> 15:29:00: pop=4, imap=65, all=83, VSZ: 2293k, postfix:26, sys: 344
> 15:29:31: pop=3, imap=91, all=111, VSZ: 3129k, postfix:27, sys: 373
It always seems to happen shortly after the number of dovecot processes
goes beyond 500. The VSZ column is the VSZ sum of all dovecot procs.
The number of files opened by vmail (lsof -u <vmailuid> | wc -l) at that
moment usually is between 13k and 15k.
sysctl fs.file-nr mostly is around 16k total but the system limit is set
way higher (at 1532853)
However, as you can see in the following dump, there is no process_limit
or whatever being set to 500 or 512 or similar.
# --------- su vmail -c "ulimit -Ha:" ---------------
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 60026
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 49152
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 3128
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
# ---------------- dovecot -n (mildly anonymized) ---------------
# 2.2.18: /etc/dovecot/dovecot.conf
# Pigeonhole version 0.4.8 (0c4ae064f307+)
# OS: Linux 4.4.0-8-generic x86_64 Ubuntu Xenial Xerus (development
branch) xfs
auth_cache_negative_ttl = 0
auth_cache_size = 50 M
auth_mechanisms = plain login
default_client_limit = 10000
dict {
quota = mysql:/etc/dovecot/dovecot-dict-sql.conf.ext
}
hostname = my.mail.host
imap_client_workarounds = delay-newmail
lda_mailbox_autocreate = yes
lda_mailbox_autosubscribe = yes
lda_original_recipient_header = X-Original-To
listen = *
mail_location = sdbox:/mailsrv/users/%d/%u
mail_plugins = quota notify mail_log fts
managesieve_notify_capability = mailto
managesieve_sieve_capability = fileinto reject envelope
encoded-character vacation subaddress comparator-i;ascii-numeric
relational regex imap4flags copy include variables body enotify
environment mailbox date index ihave duplicate
namespace inbox {
inbox = yes
location =
mailbox Drafts {
auto = subscribe
special_use = \Drafts
}
mailbox Junk {
auto = subscribe
special_use = \Junk
}
mailbox Sent {
auto = subscribe
special_use = \Sent
}
mailbox Trash {
auto = subscribe
special_use = \Trash
}
prefix =
}
passdb {
args = /etc/dovecot/dovecot-sql.conf.ext
driver = sql
}
plugin {
fts_autoindex = yes
quota = dict:User quota::proxy::quota
sieve = file:~/sieve;active=~/.dovecot.sieve
sieve_dir = ~/sieve
stats_refresh = 30s
stats_track_cmds = yes
}
postmaster_address = postmaster at my.mail.host
protocols = " imap lmtp sieve pop3"
service anvil {
client_limit = 11227
}
service auth {
client_limit = 11324
unix_listener /mailsrv/spool/private/auth {
group = postfix
mode = 0666
user = postfix
}
}
service dict {
unix_listener dict {
mode = 0600
user = vmail
}
}
service imap-login {
process_limit = 10000
process_min_avail = 5
service_count = 0
vsz_limit = 2 G
}
service imap {
process_limit = 2048
vsz_limit = 2 G
}
service lmtp {
inet_listener lmtp {
address = 127.0.0.1
port = 24
}
}
service managesieve-login {
inet_listener sieve {
port = 4190
}
}
service pop3-login {
process_limit = 1024
process_min_avail = 3
service_count = 0
vsz_limit = 2 G
}
service pop3 {
process_limit = 1024
}
service stats {
fifo_listener stats-mail {
mode = 0600
user = vmail
}
}
ssl_cert = </etc/ssl/local/my.mail.host.crt.pem
ssl_key = </etc/ssl/local/my.mail.host.key.pem
userdb {
args = /etc/dovecot/dovecot-sql.conf.ext
driver = sql
}
verbose_proctitle = yes
protocol lmtp {
mail_plugins = quota notify mail_log fts quota sieve
}
protocol lda {
mail_plugins = quota notify mail_log fts sieve
}
protocol imap {
mail_max_userip_connections = 15
mail_plugins = quota notify mail_log fts imap_quota stats imap_stats
}
protocol pop3 {
mail_max_userip_connections = 15
}
# ------------ end dovecot -n -------------
So, here I am, out of ideas for the moment. Do you have any?
--
kind regards,
Jonathan
More information about the dovecot
mailing list