Upgrade 2.2.27 to 2.3.9.2: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable
Hi,
After upgrading Dovecot from version 2.2.27 to 2.3.9.2, we are sporadically seeing lots of these errors in the error log on many of our servers:
imap-login: Error: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable - http://wiki2.dovecot.org/SocketUnavailable
The issue is causing significant delays and/or timeouts on login.
From what I can tell, this happens because the imap-login service is unable to connect to the imap service socket, however I am unable to identify the root cause of the issue.
We have increased the client_limit for the stats service, as described in the upgrade document at (https://wiki2.dovecot.org/Upgrading/2.3). It is set to the same value as the process_limit for imap service.
System CPU usage does not appear to be saturated. The CPU usage sits at around 50% when these errors appear.
We have doubled the number of file descriptors after upgrading from 2.2.27. It was previously set to 16392 (which worked fine):
~# cat /proc/pidof dovecot
/limits | grep "open files"
Max open files 30000 30000 files
We have tried increasing default_process_limit and default_client_limit from the default 1000 to 3000, but this has no effect.
Current configuration (with irrelevant parts removed):
~# doveconf -n
default_client_limit = 3000 # these were raised after upgrading in attempt to remedy default_process_limit = 3000 default_vsz_limit = 512 M [...] service imap-login { client_limit = 1250 inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 16 process_min_avail = 16 service_count = 0 vsz_limit = 512 M } service imap { client_limit = 1 process_limit = 20000 } service stats { client_limit = 20000 } [...]
We appear to be nowhere near the 20000 process_limit set for the imap service. We are also not seeing any warnings being logged with regards to the process_limit.
~# doveadm who -1 | wc -l 8765
~# doveadm process status | grep "^imap " | wc -l 7583
~# ps aux | grep "dovecot/imap" | wc -l 7449
~# doveadm process status | grep -v "^imap " name pid available_count total_count idle_start last_status_update last_kill_sent stats 19875 12384 582387 0 1578305489 0 pop3-login 19868 1000 0 0 1578061504 0 pop3-login 19867 1000 0 0 1578061503 0 pop3-login 19866 1000 0 0 1578061502 0 pop3-login 19865 1000 0 0 1578061503 0 pop3-login 19864 1000 0 0 1578061503 0 pop3-login 19853 1000 0 0 1578061504 0 pop3-login 19852 1000 0 0 1578061504 0 pop3-login 19851 1000 0 0 1578061504 0 pop3-login 19849 1000 0 0 1578061503 0 pop3-login 19847 1000 0 0 1578061503 0 pop3-login 19846 1000 0 0 1578061503 0 pop3-login 19845 1000 0 0 1578061503 0 pop3-login 19844 1000 0 0 1578061503 0 pop3-login 19843 1000 0 0 1578061503 0 pop3-login 19842 1000 0 0 1578061503 0 pop3-login 19839 1000 0 0 1578061502 0 log 19841 2973 27 0 1578061502 0 imap-login 19873 666 14942 0 1578305488 0 imap-login 19872 523 32792 0 1578305489 0 imap-login 19871 316 62876 0 1578305489 0 imap-login 19870 167 78332 0 1578305489 0 imap-login 19869 118 97989 0 1578305489 0 imap-login 19863 0 184726 0 1578305489 0 imap-login 19862 0 193094 0 1578305489 0 imap-login 19861 0 186800 0 1578305487 0 imap-login 19860 675 17806 0 1578305487 0 imap-login 19859 0 169446 0 1578305489 0 imap-login 19858 0 143517 0 1578305489 0 imap-login 19857 0 119215 0 1578305488 0 imap-login 19856 0 151958 0 1578305489 0 imap-login 19855 483 47036 0 1578305488 0 imap-login 19854 1 185240 0 1578305489 0 imap-login 19840 567 23859 0 1578305485 0 config 19874 2967 124482 0 1578305489 0 auth 19885 2967 124185 0 1578305489 0 anvil 10017 967 7567 0 1578305463 0
-- Eirik
While I not certain that this is related to the problem we are experiencing, we are also (more rarely) seeing these:
dovecot: master: Warning: service(imap-login): process_limit (16) reached, client connections are being dropped
imap-login was set to process_limit=16, client_limit=1250 (for a total of 20000 concurrent clients) on 2.2.27. We never had any issues with this then.
I tried raising it to client_limit=2500, for a total of 40000 concurrent connections, but we are still seeing the errors logged every now and then.
Almost every single available imap-login socket is reported as being used! Is something eating up the sockets processes and not freeing them?
~# doveadm process status | grep "^imap-login " imap-login 8025 2197 419 0 1578406002 0 imap-login 25790 120 5439 0 1578406002 0 imap-login 24231 8 8573 0 1578406001 0 imap-login 15707 0 91617 0 1578405978 0 imap-login 15706 0 81715 0 1578406002 0 imap-login 15705 0 57033 0 1578405992 0 imap-login 15704 0 88287 0 1578405996 0 imap-login 15703 0 68346 0 1578405999 0 imap-login 15701 0 24825 0 1578406002 0 imap-login 15695 0 15796 0 1578406001 0 imap-login 15693 0 38740 0 1578405997 0 imap-login 15692 0 11609 0 1578405998 0 imap-login 15691 0 46832 0 1578406002 0 imap-login 15690 0 10816 0 1578406001 0 imap-login 15689 0 84235 0 1578406002 0 imap-login 15683 0 30898 0 1578406001 0
I am counting 37 675 used imap-login sockets (according to dovecot), on a machine with only ~7500 active TCP connections and a similar number of logged in users:
~# ss -t | wc -l 7430 ~# doveadm process status | grep "^imap " | wc -l 7224
Here is doveconf -n
for service imap-login again (setup in
"high-performance mode" as described in wiki):
service imap-login { client_limit = 2500 inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 16 process_min_avail = 16 service_count = 0 vsz_limit = 512 M }
I would greatly appreciate some pointers, because this is mindboggling confusing.
On 06/01/2020 11:28, Eirik Rye wrote:
Hi,
After upgrading Dovecot from version 2.2.27 to 2.3.9.2, we are sporadically seeing lots of these errors in the error log on many of our servers:
imap-login: Error: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable - http://wiki2.dovecot.org/SocketUnavailable
The issue is causing significant delays and/or timeouts on login.
From what I can tell, this happens because the imap-login service is unable to connect to the imap service socket, however I am unable to identify the root cause of the issue.
We have increased the client_limit for the stats service, as described in the upgrade document at (https://wiki2.dovecot.org/Upgrading/2.3). It is set to the same value as the process_limit for imap service.
System CPU usage does not appear to be saturated. The CPU usage sits at around 50% when these errors appear.
We have doubled the number of file descriptors after upgrading from 2.2.27. It was previously set to 16392 (which worked fine):
~# cat /proc/
pidof dovecot
/limits | grep "open files" Max open files 30000 30000 filesWe have tried increasing default_process_limit and default_client_limit from the default 1000 to 3000, but this has no effect.
Current configuration (with irrelevant parts removed):
~# doveconf -n
default_client_limit = 3000 # these were raised after upgrading in attempt to remedy default_process_limit = 3000 default_vsz_limit = 512 M [...] service imap-login { client_limit = 1250 inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 16 process_min_avail = 16 service_count = 0 vsz_limit = 512 M } service imap { client_limit = 1 process_limit = 20000 } service stats { client_limit = 20000 } [...]
We appear to be nowhere near the 20000 process_limit set for the imap service. We are also not seeing any warnings being logged with regards to the process_limit.
~# doveadm who -1 | wc -l 8765
~# doveadm process status | grep "^imap " | wc -l 7583
~# ps aux | grep "dovecot/imap" | wc -l 7449
~# doveadm process status | grep -v "^imap " name pid available_count total_count idle_start last_status_update last_kill_sent stats 19875 12384 582387 0 1578305489 0 pop3-login 19868 1000 0 0 1578061504 0 pop3-login 19867 1000 0 0 1578061503 0 pop3-login 19866 1000 0 0 1578061502 0 pop3-login 19865 1000 0 0 1578061503 0 pop3-login 19864 1000 0 0 1578061503 0 pop3-login 19853 1000 0 0 1578061504 0 pop3-login 19852 1000 0 0 1578061504 0 pop3-login 19851 1000 0 0 1578061504 0 pop3-login 19849 1000 0 0 1578061503 0 pop3-login 19847 1000 0 0 1578061503 0 pop3-login 19846 1000 0 0 1578061503 0 pop3-login 19845 1000 0 0 1578061503 0 pop3-login 19844 1000 0 0 1578061503 0 pop3-login 19843 1000 0 0 1578061503 0 pop3-login 19842 1000 0 0 1578061503 0 pop3-login 19839 1000 0 0 1578061502 0 log 19841 2973 27 0 1578061502 0 imap-login 19873 666 14942 0 1578305488 0 imap-login 19872 523 32792 0 1578305489 0 imap-login 19871 316 62876 0 1578305489 0 imap-login 19870 167 78332 0 1578305489 0 imap-login 19869 118 97989 0 1578305489 0 imap-login 19863 0 184726 0 1578305489 0 imap-login 19862 0 193094 0 1578305489 0 imap-login 19861 0 186800 0 1578305487 0 imap-login 19860 675 17806 0 1578305487 0 imap-login 19859 0 169446 0 1578305489 0 imap-login 19858 0 143517 0 1578305489 0 imap-login 19857 0 119215 0 1578305488 0 imap-login 19856 0 151958 0 1578305489 0 imap-login 19855 483 47036 0 1578305488 0 imap-login 19854 1 185240 0 1578305489 0 imap-login 19840 567 23859 0 1578305485 0 config 19874 2967 124482 0 1578305489 0 auth 19885 2967 124185 0 1578305489 0 anvil 10017 967 7567 0 1578305463 0
-- Eirik
On 07 Jan 2020, at 07:15, Dovecot Mailing List dovecot@dovecot.org wrote:
dovecot: master: Warning: service(imap-login): process_limit (16) reached, client connections are being dropped
Is this an actual problem? Most client connections are idle, so I doubt if the occasional drop is noticed at all.
Also, why a limit of 16? Have you tried, oh, 160 instead?
-- "Master, what is the difference between a humanistic, monastic system of belief in which wisdom is sought by means of an apparently nonsensical system of questions and answers, and a lot of mystic gibberish made up on the spur of the moment?" Wen considered this for some time, and a last said: "A fish!”
On 08/01/2020 01:47, @lbutlr wrote:
On 07 Jan 2020, at 07:15, Dovecot Mailing List dovecot@dovecot.org wrote:
dovecot: master: Warning: service(imap-login): process_limit (16) reached, client connections are being dropped
Is this an actual problem? Most client connections are idle, so I doubt if the occasional drop is noticed at all.
I assume it would mean Dovecot is dropping new incoming connections (e.g. new users trying to log in), not existing idle connections? Dropping new incoming connections is a major problem, because some email clients (e.g. Outlook) are notorious for giving the users password prompts whenever something unexpected happens.
It is a problem because it is unexpected behaviour. Where are the imap-login sockets going, if not to serve active TCP connections? On all servers, the number total-available imap-login processes is significantly larger than the number of TCP connections on the server. In some cases, it is about
~# ss -nt | wc -l 7817 ~# ss -nt "( sport = :143 or sport = :993 )" | wc -l 7812 ~# doveadm process status | grep "^imap-login " imap-login 17384 1462 44553 0 1578475790 0 imap-login 17383 550 66830 0 1578475790 0 imap-login 17382 488 59074 0 1578475789 0 imap-login 17381 929 58251 0 1578475790 0 imap-login 17380 712 62790 0 1578475790 0 imap-login 17377 1608 35143 0 1578475790 0 imap-login 17376 1718 27955 0 1578475790 0 imap-login 17375 1880 14903 0 1578475789 0 imap-login 17374 1965 10754 0 1578475790 0 imap-login 17373 1796 21137 0 1578475790 0 imap-login 17372 1194 52360 0 1578475789 0 imap-login 17371 1994 7482 0 1578475790 0 imap-login 17370 2061 3793 0 1578475789 0 imap-login 17369 2127 2992 0 1578475789 0 imap-login 17368 2014 5152 0 1578475790 0 imap-login 17362 2144 2723 0 1578475787 0 ~# doveadm process status | grep "^imap-login " | awk "{sum += \$3} END {print 40000-sum}" 15385
Why is there a (growing) discrepancy between the number of active TCP connections, and the number of reported free imap-login sockets? Is something else using imap-login thr
Also, why a limit of 16? Have you tried, oh, 160 instead?
The limit was set to match the process_limit of the imap service (16*1250=20000).
We have made the assumption that imap-login should not require more sockets than the maximum number of concurrent imap sessions. We have not had any issues with this before upgrading, but it appears to not be the case in 2.3.x?
-- Eirik
On 08/01/2020 10:49, Eirik Rye wrote:
[...]
In any case, the biggest issue is the one described in the original message in this thread, where logins are periodically failing for many for users:
2020-01-08T10:54:57.436400+01:00 imap dovecot: imap-login: Error: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable - http://wiki2.dovecot.org/SocketUnavailable 2020-01-08T10:54:57.437550+01:00 imap dovecot: imap-login: Internal login failure
At the same time:
~# ps aux | grep "dovecot/imap$" | wc -l
6637
~# doveadm process status | grep "^imap " | wc -l
6660
~# cat /proc/pidof dovecot
/limits | grep "open files"
Max open files 32768 32768 files
~# doveconf -n
[...]
service imap-login {
client_limit = 2500
inet_listener imap {
port = 143
}
inet_listener imaps {
port = 993
ssl = yes
}
process_limit = 16
process_min_avail = 16
service_count = 0
vsz_limit = 512 M
}
service imap {
client_limit = 1
process_limit = 20000
}
-- Eirik
Can you enable 'verbose_proctitle=yes' and maybe compare with ss -tp output?
Aki
On 08/01/2020 11:58 Eirik Rye rye@domeneshop.no wrote:
On 08/01/2020 10:49, Eirik Rye wrote:
[...]
In any case, the biggest issue is the one described in the original message in this thread, where logins are periodically failing for many for users:
2020-01-08T10:54:57.436400+01:00 imap dovecot: imap-login: Error: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable - http://wiki2.dovecot.org/SocketUnavailable 2020-01-08T10:54:57.437550+01:00 imap dovecot: imap-login: Internal login failure
At the same time:
~# ps aux | grep "dovecot/imap$" | wc -l 6637 ~# doveadm process status | grep "^imap " | wc -l 6660 ~# cat /proc/
pidof dovecot
/limits | grep "open files" Max open files 32768 32768 files ~# doveconf -n [...] service imap-login { client_limit = 2500 inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 16 process_min_avail = 16 service_count = 0 vsz_limit = 512 M } service imap { client_limit = 1 process_limit = 20000 }-- Eirik
On 08/01/2020 11:09, Aki Tuomi wrote:
Can you enable 'verbose_proctitle=yes' and maybe compare with ss -tp output?
Thanks for the response!
What exactly did you want to compare? ss -tp
does not appear to show
cmdline/process title. I enabled it for one server, but will have to
wait for users to reconnect and for the issue to reappear there.
In the meantime, on another server experiencing the same issue at this moment (same configuration), but no proctitles:
root@imap03:~# doveadm process status | grep "^imap " | wc -l 7564 root@imap03:~# ps aux | grep "dovecot/imap$" | wc -l 7570 root@imap03:~# ss -tp | grep '"imap-login"' | wc -l 8009 root@imap03:~# ss -tp | grep '"imap"' | wc -l 536 root@imap03:~# ss -nt "( sport = :143 or sport = :993 )" | wc -l 8739 root@imap03:~# doveadm who -1 | wc -l 7581
The ss -tp
output appears normal (e.g. 8009+536 ~= 8739).
-- Eirik
With proctitles below.
Strangely, Dovecot is reporting 12679 imap-login sockets in "pre-login"
state, which is about 50% more than there are active TCP connections to
the server (8271, of which 7546 are in "imap-login" process according to
ss -tp
).
On another server, not currently experiencing this issue, the number of TCP sockets appears to more closely match the number of imap-login sockets used (either in pre-login or TLS proxing).
imap:~# ps -f --ppid pidof dovecot
| egrep -v "dovecot/(imap|pop3) "
UID PID PPID C STIME TTY TIME CMD
dovenull 5262 5260 0 11:48 ? 00:00:00 dovecot/pop3-login
dovenull 5263 5260 2 11:48 ? 00:03:53 dovecot/imap-login [724
pre-login + 113 TLS proxies]
dovecot 5264 5260 0 11:48 ? 00:01:15 dovecot/anvil [21
connections]
root 5265 5260 1 11:48 ? 00:02:18 dovecot/log
dovenull 5266 5260 0 11:48 ? 00:00:00 dovecot/pop3-login
dovenull 5267 5260 0 11:48 ? 00:00:00 dovecot/pop3-login
dovenull 5268 5260 0 11:48 ? 00:00:00 dovecot/pop3-login
dovenull 5269 5260 20 11:48 ? 00:35:34 dovecot/imap-login [1073
pre-login + 1199 TLS proxies]
dovenull 5270 5260 1 11:48 ? 00:02:25 dovecot/imap-login [615
pre-login + 67 TLS proxies]
dovenull 5271 5260 0 11:48 ? 00:01:39 dovecot/imap-login [489
pre-login + 44 TLS proxies]
dovenull 5272 5260 14 11:48 ? 00:24:44 dovecot/imap-login [938
pre-login + 720 TLS proxies]
dovenull 5273 5260 6 11:48 ? 00:10:30 dovecot/imap-login [845
pre-login + 242 TLS proxies]
dovenull 5274 5260 4 11:48 ? 00:06:54 dovecot/imap-login [817
pre-login + 209 TLS proxies]
dovenull 5275 5260 0 11:48 ? 00:01:23 dovecot/imap-login [445
pre-login + 36 TLS proxies]
dovenull 5276 5260 25 11:48 ? 00:43:14 dovecot/imap-login [1064
pre-login + 1434 TLS proxies]
dovenull 5277 5260 1 11:48 ? 00:02:19 dovecot/imap-login [523
pre-login + 58 TLS proxies]
dovenull 5278 5260 9 11:48 ? 00:16:19 dovecot/imap-login [937
pre-login + 462 TLS proxies]
dovenull 5279 5260 19 11:48 ? 00:33:24 dovecot/imap-login [937
pre-login + 823 TLS proxies]
dovenull 5280 5260 1 11:48 ? 00:03:04 dovecot/imap-login [655
pre-login + 92 TLS proxies]
dovenull 5281 5260 26 11:48 ? 00:45:20 dovecot/imap-login [969
pre-login + 1450 TLS proxies]
dovenull 5282 5260 6 11:48 ? 00:10:55 dovecot/imap-login [917
pre-login + 303 TLS proxies]
dovenull 5283 5260 4 11:48 ? 00:08:36 dovecot/imap-login [731
pre-login + 128 TLS proxies]
root 5284 5260 2 11:48 ? 00:03:55 dovecot/config
dovecot 5285 5260 1 11:48 ? 00:02:19 dovecot/stats [7968
connections]
dovecot 5286 5260 3 11:48 ? 00:05:53 dovecot/auth [138 wait,
0 passdb, 0 userdb]
528246 10322 5260 0 14:36 ? 00:00:00 [imap]
root 16688 5260 0 14:40 ? 00:00:00 dovecot/imap
imap02:~# doveadm process status | grep "^imap-login " | awk "{sum +=
2500-\$3} END {print sum}"
20081
imap:~# ps -f --ppid pidof dovecot
| grep "dovecot/imap " | wc -l
7937
imap:~# doveadm process status | grep "^imap " | wc -l
7937
imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap\"" | wc -l
559
imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep
"\"imap-login\"" | wc -l
7546
imap:~# ss -ntp "( sport = :143 or sport = :993 )" | wc -l
8271
On 08/01/2020 11:33, Eirik Rye wrote:
On 08/01/2020 11:09, Aki Tuomi wrote:
Can you enable 'verbose_proctitle=yes' and maybe compare with ss -tp output?
Thanks for the response!
What exactly did you want to compare?
ss -tp
does not appear to show cmdline/process title. I enabled it for one server, but will have to wait for users to reconnect and for the issue to reappear there.In the meantime, on another server experiencing the same issue at this moment (same configuration), but no proctitles:
root@imap03:~# doveadm process status | grep "^imap " | wc -l 7564 root@imap03:~# ps aux | grep "dovecot/imap$" | wc -l 7570 root@imap03:~# ss -tp | grep '"imap-login"' | wc -l 8009 root@imap03:~# ss -tp | grep '"imap"' | wc -l 536 root@imap03:~# ss -nt "( sport = :143 or sport = :993 )" | wc -l 8739 root@imap03:~# doveadm who -1 | wc -l 7581
The
ss -tp
output appears normal (e.g. 8009+536 ~= 8739).
-- Eirik
As a workaround for the titular issue, I have tried enabling the "imap-hibernate" service on a couple of servers to reduce the number of running imap processes.
Since ~50-60% of clients are in IDLE at any one time, this allows us to reduce the number of running imap processes to less than half of what it was.
After this I have yet to see the "net_connect_unix(imap) failed" errors on these servers, however instead we are now instead seeing a few of these:
imap([redacted])<30266><[redacted]>: Error: imap-master: Failed to import client state: Mailbox INBOX: Couldn't get recently expunged UIDs (uidnext=13479 highest_modseq=1716)
and
imap([redacted])<20055><[redacted]>: Error: imap-master: Failed to import client state: Mailbox INBOX: Couldn't send flag changes imap([redacted])<20055><[redacted]>: NOTIFY failed in the middle of FETCH reply in=0 out=0 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
The former I suspect may be related to our NFS-backed store and simultaneous access from different, since many of the errors are accompanied by other corruption related errors:
imap([redacted])<30266><[redacted]>: Error: Mailbox INBOX: vsize-hdr has invalid size: 36
This whole upgrade is part of a migration to a director-backed system, because we are seeing a too large amount of corruption errors (locking/indexes, etc) assumed to be caused by NFS.
Once we introduce the director cluster in front, these errors should hopefully be strongly reduced. :-)
-- Eirik
On 08/01/2020 15:14, Eirik Rye wrote:
With proctitles below.
Strangely, Dovecot is reporting 12679 imap-login sockets in "pre-login" state, which is about 50% more than there are active TCP connections to the server (8271, of which 7546 are in "imap-login" process according to
ss -tp
).On another server, not currently experiencing this issue, the number of TCP sockets appears to more closely match the number of imap-login sockets used (either in pre-login or TLS proxing).
imap:~# ps -f --ppid
pidof dovecot
| egrep -v "dovecot/(imap|pop3) " UID PID PPID C STIME TTY TIME CMD dovenull 5262 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5263 5260 2 11:48 ? 00:03:53 dovecot/imap-login [724 pre-login + 113 TLS proxies] dovecot 5264 5260 0 11:48 ? 00:01:15 dovecot/anvil [21 connections] root 5265 5260 1 11:48 ? 00:02:18 dovecot/log dovenull 5266 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5267 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5268 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5269 5260 20 11:48 ? 00:35:34 dovecot/imap-login [1073 pre-login + 1199 TLS proxies] dovenull 5270 5260 1 11:48 ? 00:02:25 dovecot/imap-login [615 pre-login + 67 TLS proxies] dovenull 5271 5260 0 11:48 ? 00:01:39 dovecot/imap-login [489 pre-login + 44 TLS proxies] dovenull 5272 5260 14 11:48 ? 00:24:44 dovecot/imap-login [938 pre-login + 720 TLS proxies] dovenull 5273 5260 6 11:48 ? 00:10:30 dovecot/imap-login [845 pre-login + 242 TLS proxies] dovenull 5274 5260 4 11:48 ? 00:06:54 dovecot/imap-login [817 pre-login + 209 TLS proxies] dovenull 5275 5260 0 11:48 ? 00:01:23 dovecot/imap-login [445 pre-login + 36 TLS proxies] dovenull 5276 5260 25 11:48 ? 00:43:14 dovecot/imap-login [1064 pre-login + 1434 TLS proxies] dovenull 5277 5260 1 11:48 ? 00:02:19 dovecot/imap-login [523 pre-login + 58 TLS proxies] dovenull 5278 5260 9 11:48 ? 00:16:19 dovecot/imap-login [937 pre-login + 462 TLS proxies] dovenull 5279 5260 19 11:48 ? 00:33:24 dovecot/imap-login [937 pre-login + 823 TLS proxies] dovenull 5280 5260 1 11:48 ? 00:03:04 dovecot/imap-login [655 pre-login + 92 TLS proxies] dovenull 5281 5260 26 11:48 ? 00:45:20 dovecot/imap-login [969 pre-login + 1450 TLS proxies] dovenull 5282 5260 6 11:48 ? 00:10:55 dovecot/imap-login [917 pre-login + 303 TLS proxies] dovenull 5283 5260 4 11:48 ? 00:08:36 dovecot/imap-login [731 pre-login + 128 TLS proxies] root 5284 5260 2 11:48 ? 00:03:55 dovecot/config dovecot 5285 5260 1 11:48 ? 00:02:19 dovecot/stats [7968 connections] dovecot 5286 5260 3 11:48 ? 00:05:53 dovecot/auth [138 wait, 0 passdb, 0 userdb] 528246 10322 5260 0 14:36 ? 00:00:00 [imap] root 16688 5260 0 14:40 ? 00:00:00 dovecot/imap imap02:~# doveadm process status | grep "^imap-login " | awk "{sum += 2500-\$3} END {print sum}" 20081 imap:~# ps -f --ppidpidof dovecot
| grep "dovecot/imap " | wc -l 7937 imap:~# doveadm process status | grep "^imap " | wc -l 7937 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap\"" | wc -l 559 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap-login\"" | wc -l 7546 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | wc -l 8271On 08/01/2020 11:33, Eirik Rye wrote:
On 08/01/2020 11:09, Aki Tuomi wrote:
Can you enable 'verbose_proctitle=yes' and maybe compare with ss -tp output?
Thanks for the response!
What exactly did you want to compare?
ss -tp
does not appear to show cmdline/process title. I enabled it for one server, but will have to wait for users to reconnect and for the issue to reappear there.In the meantime, on another server experiencing the same issue at this moment (same configuration), but no proctitles:
root@imap03:~# doveadm process status | grep "^imap " | wc -l 7564 root@imap03:~# ps aux | grep "dovecot/imap$" | wc -l 7570 root@imap03:~# ss -tp | grep '"imap-login"' | wc -l 8009 root@imap03:~# ss -tp | grep '"imap"' | wc -l 536 root@imap03:~# ss -nt "( sport = :143 or sport = :993 )" | wc -l 8739 root@imap03:~# doveadm who -1 | wc -l 7581
The
ss -tp
output appears normal (e.g. 8009+536 ~= 8739).
Uh. It's not actually supported to access same user concurrently from two servers even if you are using NFS. It will cause problems. You should use dovecot director or some other arrangement to make sure this does not happen.
Aki
On 09/01/2020 16:56 Eirik Rye rye@domeneshop.no wrote:
As a workaround for the titular issue, I have tried enabling the "imap-hibernate" service on a couple of servers to reduce the number of running imap processes.
Since ~50-60% of clients are in IDLE at any one time, this allows us to reduce the number of running imap processes to less than half of what it was.
After this I have yet to see the "net_connect_unix(imap) failed" errors on these servers, however instead we are now instead seeing a few of these:
imap([redacted])<30266><[redacted]>: Error: imap-master: Failed to import client state: Mailbox INBOX: Couldn't get recently expunged UIDs (uidnext=13479 highest_modseq=1716)
and
imap([redacted])<20055><[redacted]>: Error: imap-master: Failed to import client state: Mailbox INBOX: Couldn't send flag changes imap([redacted])<20055><[redacted]>: NOTIFY failed in the middle of FETCH reply in=0 out=0 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
The former I suspect may be related to our NFS-backed store and simultaneous access from different, since many of the errors are accompanied by other corruption related errors:
imap([redacted])<30266><[redacted]>: Error: Mailbox INBOX: vsize-hdr has invalid size: 36
This whole upgrade is part of a migration to a director-backed system, because we are seeing a too large amount of corruption errors (locking/indexes, etc) assumed to be caused by NFS.
Once we introduce the director cluster in front, these errors should hopefully be strongly reduced. :-)
-- Eirik
On 08/01/2020 15:14, Eirik Rye wrote:
With proctitles below.
Strangely, Dovecot is reporting 12679 imap-login sockets in "pre-login" state, which is about 50% more than there are active TCP connections to the server (8271, of which 7546 are in "imap-login" process according to
ss -tp
).On another server, not currently experiencing this issue, the number of TCP sockets appears to more closely match the number of imap-login sockets used (either in pre-login or TLS proxing).
imap:~# ps -f --ppid
pidof dovecot
| egrep -v "dovecot/(imap|pop3) " UID PID PPID C STIME TTY TIME CMD dovenull 5262 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5263 5260 2 11:48 ? 00:03:53 dovecot/imap-login [724 pre-login + 113 TLS proxies] dovecot 5264 5260 0 11:48 ? 00:01:15 dovecot/anvil [21 connections] root 5265 5260 1 11:48 ? 00:02:18 dovecot/log dovenull 5266 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5267 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5268 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5269 5260 20 11:48 ? 00:35:34 dovecot/imap-login [1073 pre-login + 1199 TLS proxies] dovenull 5270 5260 1 11:48 ? 00:02:25 dovecot/imap-login [615 pre-login + 67 TLS proxies] dovenull 5271 5260 0 11:48 ? 00:01:39 dovecot/imap-login [489 pre-login + 44 TLS proxies] dovenull 5272 5260 14 11:48 ? 00:24:44 dovecot/imap-login [938 pre-login + 720 TLS proxies] dovenull 5273 5260 6 11:48 ? 00:10:30 dovecot/imap-login [845 pre-login + 242 TLS proxies] dovenull 5274 5260 4 11:48 ? 00:06:54 dovecot/imap-login [817 pre-login + 209 TLS proxies] dovenull 5275 5260 0 11:48 ? 00:01:23 dovecot/imap-login [445 pre-login + 36 TLS proxies] dovenull 5276 5260 25 11:48 ? 00:43:14 dovecot/imap-login [1064 pre-login + 1434 TLS proxies] dovenull 5277 5260 1 11:48 ? 00:02:19 dovecot/imap-login [523 pre-login + 58 TLS proxies] dovenull 5278 5260 9 11:48 ? 00:16:19 dovecot/imap-login [937 pre-login + 462 TLS proxies] dovenull 5279 5260 19 11:48 ? 00:33:24 dovecot/imap-login [937 pre-login + 823 TLS proxies] dovenull 5280 5260 1 11:48 ? 00:03:04 dovecot/imap-login [655 pre-login + 92 TLS proxies] dovenull 5281 5260 26 11:48 ? 00:45:20 dovecot/imap-login [969 pre-login + 1450 TLS proxies] dovenull 5282 5260 6 11:48 ? 00:10:55 dovecot/imap-login [917 pre-login + 303 TLS proxies] dovenull 5283 5260 4 11:48 ? 00:08:36 dovecot/imap-login [731 pre-login + 128 TLS proxies] root 5284 5260 2 11:48 ? 00:03:55 dovecot/config dovecot 5285 5260 1 11:48 ? 00:02:19 dovecot/stats [7968 connections] dovecot 5286 5260 3 11:48 ? 00:05:53 dovecot/auth [138 wait, 0 passdb, 0 userdb] 528246 10322 5260 0 14:36 ? 00:00:00 [imap] root 16688 5260 0 14:40 ? 00:00:00 dovecot/imap imap02:~# doveadm process status | grep "^imap-login " | awk "{sum += 2500-\$3} END {print sum}" 20081 imap:~# ps -f --ppidpidof dovecot
| grep "dovecot/imap " | wc -l 7937 imap:~# doveadm process status | grep "^imap " | wc -l 7937 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap\"" | wc -l 559 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap-login\"" | wc -l 7546 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | wc -l 8271On 08/01/2020 11:33, Eirik Rye wrote:
On 08/01/2020 11:09, Aki Tuomi wrote:
Can you enable 'verbose_proctitle=yes' and maybe compare with ss -tp output?
Thanks for the response!
What exactly did you want to compare?
ss -tp
does not appear to show cmdline/process title. I enabled it for one server, but will have to wait for users to reconnect and for the issue to reappear there.In the meantime, on another server experiencing the same issue at this moment (same configuration), but no proctitles:
root@imap03:~# doveadm process status | grep "^imap " | wc -l 7564 root@imap03:~# ps aux | grep "dovecot/imap$" | wc -l 7570 root@imap03:~# ss -tp | grep '"imap-login"' | wc -l 8009 root@imap03:~# ss -tp | grep '"imap"' | wc -l 536 root@imap03:~# ss -nt "( sport = :143 or sport = :993 )" | wc -l 8739 root@imap03:~# doveadm who -1 | wc -l 7581
The
ss -tp
output appears normal (e.g. 8009+536 ~= 8739).
Yes, I realize this. That is why we are introducing the directors to the cluster. We are also working on doing Dovecot LDA/LMTP delivery, to improve Dovecot's ability to react to new messages. Currently we are dropping email directly into the Maildir through other facilities, which works -fine-.
I realize however that IDLE is most likely not working properly with "imap-hibernate" in our current setup. I assume the "imap" service is normally responsible for monitoring the user's mailbox for changes done outside Dovecot?
Am I correct in my assumption that email must be delivered using LMTP/LDA from the same server in order for IDLE in imap-hibernate to work properly? Setting up a separate Dovecot instance purely for LDA/LMTP will not work, as the imap-hibernate process will not be notified of user mailbox activity in that case?
--
- Eirik
Den 09.01.2020 15:59, skrev Aki Tuomi:
Uh. It's not actually supported to access same user concurrently from two servers even if you are using NFS. It will cause problems. You should use dovecot director or some other arrangement to make sure this does not happen.
Aki
On 09/01/2020 16:56 Eirik Rye rye@domeneshop.no wrote:
As a workaround for the titular issue, I have tried enabling the "imap-hibernate" service on a couple of servers to reduce the number of running imap processes.
Since ~50-60% of clients are in IDLE at any one time, this allows us to reduce the number of running imap processes to less than half of what it was.
After this I have yet to see the "net_connect_unix(imap) failed" errors on these servers, however instead we are now instead seeing a few of these:
imap([redacted])<30266><[redacted]>: Error: imap-master: Failed to
import client state: Mailbox INBOX: Couldn't get recently expunged UIDs (uidnext=13479 highest_modseq=1716)
and
imap([redacted])<20055><[redacted]>: Error: imap-master: Failed to
import client state: Mailbox INBOX: Couldn't send flag changes imap([redacted])<20055><[redacted]>: NOTIFY failed in the middle of FETCH reply in=0 out=0 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
The former I suspect may be related to our NFS-backed store and simultaneous access from different, since many of the errors are accompanied by other corruption related errors:
imap([redacted])<30266><[redacted]>: Error: Mailbox INBOX: vsize-hdr
has invalid size: 36
This whole upgrade is part of a migration to a director-backed system, because we are seeing a too large amount of corruption errors (locking/indexes, etc) assumed to be caused by NFS.
Once we introduce the director cluster in front, these errors should hopefully be strongly reduced. :-)
-- Eirik
On 08/01/2020 15:14, Eirik Rye wrote:
With proctitles below.
Strangely, Dovecot is reporting 12679 imap-login sockets in "pre-login" state, which is about 50% more than there are active TCP connections to the server (8271, of which 7546 are in "imap-login" process according to
ss -tp
).On another server, not currently experiencing this issue, the number of TCP sockets appears to more closely match the number of imap-login sockets used (either in pre-login or TLS proxing).
imap:~# ps -f --ppid
pidof dovecot
| egrep -v "dovecot/(imap|pop3) " UID PID PPID C STIME TTY TIME CMD dovenull 5262 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5263 5260 2 11:48 ? 00:03:53 dovecot/imap-login [724 pre-login + 113 TLS proxies] dovecot 5264 5260 0 11:48 ? 00:01:15 dovecot/anvil [21 connections] root 5265 5260 1 11:48 ? 00:02:18 dovecot/log dovenull 5266 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5267 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5268 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5269 5260 20 11:48 ? 00:35:34 dovecot/imap-login [1073 pre-login + 1199 TLS proxies] dovenull 5270 5260 1 11:48 ? 00:02:25 dovecot/imap-login [615 pre-login + 67 TLS proxies] dovenull 5271 5260 0 11:48 ? 00:01:39 dovecot/imap-login [489 pre-login + 44 TLS proxies] dovenull 5272 5260 14 11:48 ? 00:24:44 dovecot/imap-login [938 pre-login + 720 TLS proxies] dovenull 5273 5260 6 11:48 ? 00:10:30 dovecot/imap-login [845 pre-login + 242 TLS proxies] dovenull 5274 5260 4 11:48 ? 00:06:54 dovecot/imap-login [817 pre-login + 209 TLS proxies] dovenull 5275 5260 0 11:48 ? 00:01:23 dovecot/imap-login [445 pre-login + 36 TLS proxies] dovenull 5276 5260 25 11:48 ? 00:43:14 dovecot/imap-login [1064 pre-login + 1434 TLS proxies] dovenull 5277 5260 1 11:48 ? 00:02:19 dovecot/imap-login [523 pre-login + 58 TLS proxies] dovenull 5278 5260 9 11:48 ? 00:16:19 dovecot/imap-login [937 pre-login + 462 TLS proxies] dovenull 5279 5260 19 11:48 ? 00:33:24 dovecot/imap-login [937 pre-login + 823 TLS proxies] dovenull 5280 5260 1 11:48 ? 00:03:04 dovecot/imap-login [655 pre-login + 92 TLS proxies] dovenull 5281 5260 26 11:48 ? 00:45:20 dovecot/imap-login [969 pre-login + 1450 TLS proxies] dovenull 5282 5260 6 11:48 ? 00:10:55 dovecot/imap-login [917 pre-login + 303 TLS proxies] dovenull 5283 5260 4 11:48 ? 00:08:36 dovecot/imap-login [731 pre-login + 128 TLS proxies] root 5284 5260 2 11:48 ? 00:03:55 dovecot/config dovecot 5285 5260 1 11:48 ? 00:02:19 dovecot/stats [7968 connections] dovecot 5286 5260 3 11:48 ? 00:05:53 dovecot/auth [138 wait, 0 passdb, 0 userdb] 528246 10322 5260 0 14:36 ? 00:00:00 [imap] root 16688 5260 0 14:40 ? 00:00:00 dovecot/imap imap02:~# doveadm process status | grep "^imap-login " | awk "{sum += 2500-\$3} END {print sum}" 20081 imap:~# ps -f --ppidpidof dovecot
| grep "dovecot/imap " | wc -l 7937 imap:~# doveadm process status | grep "^imap " | wc -l 7937 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap\"" | wc -l 559 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap-login\"" | wc -l 7546 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | wc -l 8271On 08/01/2020 11:33, Eirik Rye wrote:
On 08/01/2020 11:09, Aki Tuomi wrote:
Can you enable 'verbose_proctitle=yes' and maybe compare with ss -tp output?
Thanks for the response!
What exactly did you want to compare?
ss -tp
does not appear to show cmdline/process title. I enabled it for one server, but will have to wait for users to reconnect and for the issue to reappear there.In the meantime, on another server experiencing the same issue at this moment (same configuration), but no proctitles:
root@imap03:~# doveadm process status | grep "^imap " | wc -l 7564 root@imap03:~# ps aux | grep "dovecot/imap$" | wc -l 7570 root@imap03:~# ss -tp | grep '"imap-login"' | wc -l 8009 root@imap03:~# ss -tp | grep '"imap"' | wc -l 536 root@imap03:~# ss -nt "( sport = :143 or sport = :993 )" | wc -l 8739 root@imap03:~# doveadm who -1 | wc -l 7581
The
ss -tp
output appears normal (e.g. 8009+536 ~= 8739).
Yes, the same server should be used for delivery too. For maildir, some leniency is allowed but it will still work suboptimally if you do not use dovecot for delivery. And hibernate will not work.
Aki
On 09/01/2020 17:57 Eirik Rye rye@domeneshop.no wrote:
Yes, I realize this. That is why we are introducing the directors to the cluster. We are also working on doing Dovecot LDA/LMTP delivery, to improve Dovecot's ability to react to new messages. Currently we are dropping email directly into the Maildir through other facilities, which works -fine-.
I realize however that IDLE is most likely not working properly with "imap-hibernate" in our current setup. I assume the "imap" service is normally responsible for monitoring the user's mailbox for changes done outside Dovecot?
Am I correct in my assumption that email must be delivered using LMTP/LDA from the same server in order for IDLE in imap-hibernate to work properly? Setting up a separate Dovecot instance purely for LDA/LMTP will not work, as the imap-hibernate process will not be notified of user mailbox activity in that case?
--
- Eirik
Den 09.01.2020 15:59, skrev Aki Tuomi:
Uh. It's not actually supported to access same user concurrently from two servers even if you are using NFS. It will cause problems. You should use dovecot director or some other arrangement to make sure this does not happen.
Aki
On 09/01/2020 16:56 Eirik Rye rye@domeneshop.no wrote:
As a workaround for the titular issue, I have tried enabling the "imap-hibernate" service on a couple of servers to reduce the number of running imap processes.
Since ~50-60% of clients are in IDLE at any one time, this allows us to reduce the number of running imap processes to less than half of what it was.
After this I have yet to see the "net_connect_unix(imap) failed" errors on these servers, however instead we are now instead seeing a few of these:
imap([redacted])<30266><[redacted]>: Error: imap-master: Failed to
import client state: Mailbox INBOX: Couldn't get recently expunged UIDs (uidnext=13479 highest_modseq=1716)
and
imap([redacted])<20055><[redacted]>: Error: imap-master: Failed to
import client state: Mailbox INBOX: Couldn't send flag changes imap([redacted])<20055><[redacted]>: NOTIFY failed in the middle of FETCH reply in=0 out=0 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=0 body_bytes=0
The former I suspect may be related to our NFS-backed store and simultaneous access from different, since many of the errors are accompanied by other corruption related errors:
imap([redacted])<30266><[redacted]>: Error: Mailbox INBOX: vsize-hdr
has invalid size: 36
This whole upgrade is part of a migration to a director-backed system, because we are seeing a too large amount of corruption errors (locking/indexes, etc) assumed to be caused by NFS.
Once we introduce the director cluster in front, these errors should hopefully be strongly reduced. :-)
-- Eirik
On 08/01/2020 15:14, Eirik Rye wrote:
With proctitles below.
Strangely, Dovecot is reporting 12679 imap-login sockets in "pre-login" state, which is about 50% more than there are active TCP connections to the server (8271, of which 7546 are in "imap-login" process according to
ss -tp
).On another server, not currently experiencing this issue, the number of TCP sockets appears to more closely match the number of imap-login sockets used (either in pre-login or TLS proxing).
imap:~# ps -f --ppid
pidof dovecot
| egrep -v "dovecot/(imap|pop3) " UID PID PPID C STIME TTY TIME CMD dovenull 5262 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5263 5260 2 11:48 ? 00:03:53 dovecot/imap-login [724 pre-login + 113 TLS proxies] dovecot 5264 5260 0 11:48 ? 00:01:15 dovecot/anvil [21 connections] root 5265 5260 1 11:48 ? 00:02:18 dovecot/log dovenull 5266 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5267 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5268 5260 0 11:48 ? 00:00:00 dovecot/pop3-login dovenull 5269 5260 20 11:48 ? 00:35:34 dovecot/imap-login [1073 pre-login + 1199 TLS proxies] dovenull 5270 5260 1 11:48 ? 00:02:25 dovecot/imap-login [615 pre-login + 67 TLS proxies] dovenull 5271 5260 0 11:48 ? 00:01:39 dovecot/imap-login [489 pre-login + 44 TLS proxies] dovenull 5272 5260 14 11:48 ? 00:24:44 dovecot/imap-login [938 pre-login + 720 TLS proxies] dovenull 5273 5260 6 11:48 ? 00:10:30 dovecot/imap-login [845 pre-login + 242 TLS proxies] dovenull 5274 5260 4 11:48 ? 00:06:54 dovecot/imap-login [817 pre-login + 209 TLS proxies] dovenull 5275 5260 0 11:48 ? 00:01:23 dovecot/imap-login [445 pre-login + 36 TLS proxies] dovenull 5276 5260 25 11:48 ? 00:43:14 dovecot/imap-login [1064 pre-login + 1434 TLS proxies] dovenull 5277 5260 1 11:48 ? 00:02:19 dovecot/imap-login [523 pre-login + 58 TLS proxies] dovenull 5278 5260 9 11:48 ? 00:16:19 dovecot/imap-login [937 pre-login + 462 TLS proxies] dovenull 5279 5260 19 11:48 ? 00:33:24 dovecot/imap-login [937 pre-login + 823 TLS proxies] dovenull 5280 5260 1 11:48 ? 00:03:04 dovecot/imap-login [655 pre-login + 92 TLS proxies] dovenull 5281 5260 26 11:48 ? 00:45:20 dovecot/imap-login [969 pre-login + 1450 TLS proxies] dovenull 5282 5260 6 11:48 ? 00:10:55 dovecot/imap-login [917 pre-login + 303 TLS proxies] dovenull 5283 5260 4 11:48 ? 00:08:36 dovecot/imap-login [731 pre-login + 128 TLS proxies] root 5284 5260 2 11:48 ? 00:03:55 dovecot/config dovecot 5285 5260 1 11:48 ? 00:02:19 dovecot/stats [7968 connections] dovecot 5286 5260 3 11:48 ? 00:05:53 dovecot/auth [138 wait, 0 passdb, 0 userdb] 528246 10322 5260 0 14:36 ? 00:00:00 [imap] root 16688 5260 0 14:40 ? 00:00:00 dovecot/imap imap02:~# doveadm process status | grep "^imap-login " | awk "{sum += 2500-\$3} END {print sum}" 20081 imap:~# ps -f --ppidpidof dovecot
| grep "dovecot/imap " | wc -l 7937 imap:~# doveadm process status | grep "^imap " | wc -l 7937 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap\"" | wc -l 559 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap-login\"" | wc -l 7546 imap:~# ss -ntp "( sport = :143 or sport = :993 )" | wc -l 8271On 08/01/2020 11:33, Eirik Rye wrote:
On 08/01/2020 11:09, Aki Tuomi wrote:
Can you enable 'verbose_proctitle=yes' and maybe compare with ss -tp output?
Thanks for the response!
What exactly did you want to compare?
ss -tp
does not appear to show cmdline/process title. I enabled it for one server, but will have to wait for users to reconnect and for the issue to reappear there.In the meantime, on another server experiencing the same issue at this moment (same configuration), but no proctitles:
root@imap03:~# doveadm process status | grep "^imap " | wc -l 7564 root@imap03:~# ps aux | grep "dovecot/imap$" | wc -l 7570 root@imap03:~# ss -tp | grep '"imap-login"' | wc -l 8009 root@imap03:~# ss -tp | grep '"imap"' | wc -l 536 root@imap03:~# ss -nt "( sport = :143 or sport = :993 )" | wc -l 8739 root@imap03:~# doveadm who -1 | wc -l 7581
The
ss -tp
output appears normal (e.g. 8009+536 ~= 8739).
We are unfortunately still seeing a lot of these errors once the machine reaches a high number of concurrent users/logins (just below 20k simultaneous IMAP connections on a powerful 24 core machine with 128GB RAM):
2020-01-14T09:18:58.661349+01:00 dovecot: imap-login: Warning: net_connect_unix(imap) succeeded only after retrying - took 140330 us 2020-01-14T09:18:58.854692+01:00 dovecot: imap-login: Error: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable - http://wiki2.dovecot.org/SocketUnavailable (client-pid=107932, client-id=780262, rip=x, created 500 msecs ago, received 0/4 bytes) 2020-01-14T09:18:58.888228+01:00 dovecot: imap-login: Warning: net_connect_unix(imap) succeeded only after retrying - took 440762 us
The machine is at insignificant load numbers, and the dovecot process is somewhere near 25% CPU usage when the problem occurs. It is not close to saturating its core from what I can tell.
To make sure the issues are not task/fd-limit related, I have set this in /etc/systemd/system/dovecot.service.d/service.conf:
[Service] LimitNOFILE=infinity TasksMax=infinity
~# egrep "processes|files" /proc/pidof dovecot
/limits
Max processes 514051 514051
Max open files 1048576 1048576
~# cat /proc/sys/net/core/somaxconn
65536
~# cat /proc/sys/kernel/pid_max
278528
Dovecot is configured with NFS backed storage, and locally stored passwdfile userdb/passdb are used for authentication.
Backends are (now) behind directors. The directors/proxies are having no issues dealing with the traffic whatsoever.
At the time of failure, the process list looks like this:
~# ps -f --ppid pidof dovecot
| egrep -v "dovecot/(imap|pop3) "
UID PID PPID C STIME TTY TIME CMD
274264 44753 138506 0 12:43 ? 00:00:00 [imap]
308665 104852 138506 0 13:01 ? 00:00:00 [imap]
308665 104853 138506 0 13:01 ? 00:00:00 [imap]
dovenull 138508 138506 1 10:31 ? 00:03:00 dovecot/pop3-login [6
pre-login + 36 TLS proxies]
dovenull 138509 138506 0 10:31 ? 00:00:07 dovecot/imap-login
dovecot 138510 138506 0 10:31 ? 00:01:10 dovecot/anvil [20
connections]
root 138511 138506 1 10:31 ? 00:02:14 dovecot/log
dovenull 138512 138506 1 10:31 ? 00:01:48 dovecot/pop3-login [1
pre-login + 15 TLS proxies]
dovenull 138513 138506 0 10:31 ? 00:00:08 dovecot/imap-login
[redacted TLS proxy]
dovenull 138514 138506 0 10:31 ? 00:00:07 dovecot/imap-login [0
pre-login + 3 TLS proxies]
dovenull 138515 138506 0 10:31 ? 00:00:10 dovecot/imap-login
[redacted TLS proxy]
dovenull 138516 138506 0 10:31 ? 00:01:14 dovecot/imap-login [27
pre-login + 12 TLS proxies]
dovenull 138517 138506 0 10:31 ? 00:00:31 dovecot/imap-login [2
pre-login + 2 TLS proxies]
dovenull 138518 138506 0 10:31 ? 00:01:28 dovecot/imap-login [56
pre-login + 20 TLS proxies]
dovenull 138519 138506 0 10:31 ? 00:00:09 dovecot/imap-login [0
pre-login + 4 TLS proxies]
dovenull 138520 138506 0 10:31 ? 00:00:06 dovecot/imap-login
[redacted TLS proxy]
dovenull 138521 138506 0 10:31 ? 00:00:11 dovecot/imap-login [0
pre-login + 3 TLS proxies]
dovenull 138522 138506 0 10:31 ? 00:00:16 dovecot/imap-login [2
pre-login + 2 TLS proxies]
dovenull 138523 138506 0 10:31 ? 00:00:13 dovecot/imap-login [1
pre-login + 2 TLS proxies]
dovenull 138524 138506 0 10:31 ? 00:00:24 dovecot/imap-login [1
pre-login + 3 TLS proxies]
dovenull 138525 138506 0 10:31 ? 00:01:13 dovecot/imap-login [36
pre-login + 23 TLS proxies]
dovenull 138526 138506 0 10:31 ? 00:00:41 dovecot/imap-login [10
pre-login + 12 TLS proxies]
dovenull 138527 138506 0 10:31 ? 00:00:20 dovecot/imap-login [1
pre-login + 7 TLS proxies]
root 138528 138506 2 10:31 ? 00:04:45 dovecot/config
dovecot 138529 138506 1 10:31 ? 00:02:22 dovecot/stats [19389
connections]
dovecot 138530 138506 3 10:31 ? 00:05:53 dovecot/auth [137
wait, 0 passdb, 0 userdb]
root 148675 138506 0 13:13 ? 00:00:00 dovecot/doveadm-server
[redacted]
Other stats:
~# ps -f --ppid pidof dovecot
| grep "dovecot/imap " | wc -l
19328
~# doveadm process status | grep "^imap " | wc -l
19307
~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap\"" | wc -l
19141
~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap-login\"" |
wc -l
333
~# ss -ntp "( sport = :143 or sport = :993 )" | wc -l
19530
~# doveconf -n
# 2.3.9.2 (cf2918cac): /etc/dovecot/dovecot.conf # OS: Linux 4.9.0-9-amd64 x86_64 Debian 9.11 # Hostname: [redacted] default_vsz_limit = 768 M disable_plaintext_auth = no imap_id_log = * log_timestamp = "%F %T %z " login_trusted_networks = [redacted] mail_fsync = always mail_location = maildir:~/Maildir mail_nfs_index = yes mail_nfs_storage = yes mmap_disable = yes namespace inbox { inbox = yes location = mailbox Drafts { auto = subscribe special_use = \Drafts } mailbox Sent { auto = subscribe special_use = \Sent } mailbox "Sent Messages" { auto = no special_use = \Sent } mailbox Spam { auto = create special_use = \Junk } mailbox Trash { auto = subscribe special_use = \Trash } prefix = separator = / } passdb { args = username_format=%Lu /etc/dovecot/aliases default_fields = noauthenticate driver = passwd-file } passdb { args = /etc/dovecot/passwd driver = passwd-file } protocols = imap pop3 service doveadm { inet_listener { port = 24245 } inet_listener http { port = 8080 } } service imap-login { client_limit = 4096 inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 16 process_min_avail = 16 service_count = 0 vsz_limit = 768 M } service imap { client_limit = 1 process_limit = 65536 } service pop3-login { client_limit = 4096 inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } process_limit = 8 process_min_avail = 2 service_count = 0 } service pop3 { process_limit = 16384 } service stats { client_limit = 65536 } ssl_cert = <[redacted] ssl_cipher_list = ALL:!LOW:!SSLv2:!EXP:!aNULL ssl_dh = # hidden, use -P to show it ssl_key = # hidden, use -P to show it userdb { args = /etc/dovecot/passwd driver = passwd-file } verbose_proctitle = yes protocol imap { mail_max_userip_connections = 20 rawlog_dir = [redacted] }
Are there any other tunables either in Dovecot or in the kernel that may relate to this issue that we may have missed?
-- Eirik
On 06/01/2020 11:28, Eirik Rye wrote:
Hi,
After upgrading Dovecot from version 2.2.27 to 2.3.9.2, we are sporadically seeing lots of these errors in the error log on many of our servers:
imap-login: Error: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable - http://wiki2.dovecot.org/SocketUnavailable
The issue is causing significant delays and/or timeouts on login.
From what I can tell, this happens because the imap-login service is unable to connect to the imap service socket, however I am unable to identify the root cause of the issue.
We have increased the client_limit for the stats service, as described in the upgrade document at (https://wiki2.dovecot.org/Upgrading/2.3). It is set to the same value as the process_limit for imap service.
System CPU usage does not appear to be saturated. The CPU usage sits at around 50% when these errors appear.
We have doubled the number of file descriptors after upgrading from 2.2.27. It was previously set to 16392 (which worked fine):
~# cat /proc/
pidof dovecot
/limits | grep "open files" Max open files 30000 30000 filesWe have tried increasing default_process_limit and default_client_limit from the default 1000 to 3000, but this has no effect.
Current configuration (with irrelevant parts removed):
~# doveconf -n
default_client_limit = 3000 # these were raised after upgrading in attempt to remedy default_process_limit = 3000 default_vsz_limit = 512 M [...] service imap-login { client_limit = 1250 inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 16 process_min_avail = 16 service_count = 0 vsz_limit = 512 M } service imap { client_limit = 1 process_limit = 20000 } service stats { client_limit = 20000 } [...]
We appear to be nowhere near the 20000 process_limit set for the imap service. We are also not seeing any warnings being logged with regards to the process_limit.
~# doveadm who -1 | wc -l 8765
~# doveadm process status | grep "^imap " | wc -l 7583
~# ps aux | grep "dovecot/imap" | wc -l 7449
~# doveadm process status | grep -v "^imap " name pid available_count total_count idle_start last_status_update last_kill_sent stats 19875 12384 582387 0 1578305489 0 pop3-login 19868 1000 0 0 1578061504 0 pop3-login 19867 1000 0 0 1578061503 0 pop3-login 19866 1000 0 0 1578061502 0 pop3-login 19865 1000 0 0 1578061503 0 pop3-login 19864 1000 0 0 1578061503 0 pop3-login 19853 1000 0 0 1578061504 0 pop3-login 19852 1000 0 0 1578061504 0 pop3-login 19851 1000 0 0 1578061504 0 pop3-login 19849 1000 0 0 1578061503 0 pop3-login 19847 1000 0 0 1578061503 0 pop3-login 19846 1000 0 0 1578061503 0 pop3-login 19845 1000 0 0 1578061503 0 pop3-login 19844 1000 0 0 1578061503 0 pop3-login 19843 1000 0 0 1578061503 0 pop3-login 19842 1000 0 0 1578061503 0 pop3-login 19839 1000 0 0 1578061502 0 log 19841 2973 27 0 1578061502 0 imap-login 19873 666 14942 0 1578305488 0 imap-login 19872 523 32792 0 1578305489 0 imap-login 19871 316 62876 0 1578305489 0 imap-login 19870 167 78332 0 1578305489 0 imap-login 19869 118 97989 0 1578305489 0 imap-login 19863 0 184726 0 1578305489 0 imap-login 19862 0 193094 0 1578305489 0 imap-login 19861 0 186800 0 1578305487 0 imap-login 19860 675 17806 0 1578305487 0 imap-login 19859 0 169446 0 1578305489 0 imap-login 19858 0 143517 0 1578305489 0 imap-login 19857 0 119215 0 1578305488 0 imap-login 19856 0 151958 0 1578305489 0 imap-login 19855 483 47036 0 1578305488 0 imap-login 19854 1 185240 0 1578305489 0 imap-login 19840 567 23859 0 1578305485 0 config 19874 2967 124482 0 1578305489 0 auth 19885 2967 124185 0 1578305489 0 anvil 10017 967 7567 0 1578305463 0
Embarrassingly, the issue does appear to have been /proc/sys/net/core/somaxconn setting all along. It was increased from 128, to 512, to 1024 with full restarts of Dovecot inbetween. However, this was apparently not enough to handle our login bursts.
The latest increase, from 1024 to 65536, only had a doveadm reload
issued, which seemingly does not cause the socket to rebind and
therefore still used the old settings.
After fully restarting dovecot, the issue -appears- to be gone. At least we survived through the morning spikes today without errors.
Apologies for the list spam.
On 14/01/2020 13:38, Eirik Rye wrote:
We are unfortunately still seeing a lot of these errors once the machine reaches a high number of concurrent users/logins (just below 20k simultaneous IMAP connections on a powerful 24 core machine with 128GB RAM):
2020-01-14T09:18:58.661349+01:00 dovecot: imap-login: Warning: net_connect_unix(imap) succeeded only after retrying - took 140330 us 2020-01-14T09:18:58.854692+01:00 dovecot: imap-login: Error: master(imap): net_connect_unix(imap) failed: Resource temporarily unavailable - http://wiki2.dovecot.org/SocketUnavailable (client-pid=107932, client-id=780262, rip=x, created 500 msecs ago, received 0/4 bytes) 2020-01-14T09:18:58.888228+01:00 dovecot: imap-login: Warning: net_connect_unix(imap) succeeded only after retrying - took 440762 us
The machine is at insignificant load numbers, and the dovecot process is somewhere near 25% CPU usage when the problem occurs. It is not close to saturating its core from what I can tell.
To make sure the issues are not task/fd-limit related, I have set this in /etc/systemd/system/dovecot.service.d/service.conf:
[Service] LimitNOFILE=infinity TasksMax=infinity
~# egrep "processes|files" /proc/
pidof dovecot
/limits Max processes 514051 514051 Max open files 1048576 1048576 ~# cat /proc/sys/net/core/somaxconn 65536 ~# cat /proc/sys/kernel/pid_max 278528Dovecot is configured with NFS backed storage, and locally stored passwdfile userdb/passdb are used for authentication.
Backends are (now) behind directors. The directors/proxies are having no issues dealing with the traffic whatsoever.
At the time of failure, the process list looks like this:
~# ps -f --ppid
pidof dovecot
| egrep -v "dovecot/(imap|pop3) " UID PID PPID C STIME TTY TIME CMD 274264 44753 138506 0 12:43 ? 00:00:00 [imap] 308665 104852 138506 0 13:01 ? 00:00:00 [imap] 308665 104853 138506 0 13:01 ? 00:00:00 [imap] dovenull 138508 138506 1 10:31 ? 00:03:00 dovecot/pop3-login [6 pre-login + 36 TLS proxies] dovenull 138509 138506 0 10:31 ? 00:00:07 dovecot/imap-login dovecot 138510 138506 0 10:31 ? 00:01:10 dovecot/anvil [20 connections] root 138511 138506 1 10:31 ? 00:02:14 dovecot/log dovenull 138512 138506 1 10:31 ? 00:01:48 dovecot/pop3-login [1 pre-login + 15 TLS proxies] dovenull 138513 138506 0 10:31 ? 00:00:08 dovecot/imap-login [redacted TLS proxy] dovenull 138514 138506 0 10:31 ? 00:00:07 dovecot/imap-login [0 pre-login + 3 TLS proxies] dovenull 138515 138506 0 10:31 ? 00:00:10 dovecot/imap-login [redacted TLS proxy] dovenull 138516 138506 0 10:31 ? 00:01:14 dovecot/imap-login [27 pre-login + 12 TLS proxies] dovenull 138517 138506 0 10:31 ? 00:00:31 dovecot/imap-login [2 pre-login + 2 TLS proxies] dovenull 138518 138506 0 10:31 ? 00:01:28 dovecot/imap-login [56 pre-login + 20 TLS proxies] dovenull 138519 138506 0 10:31 ? 00:00:09 dovecot/imap-login [0 pre-login + 4 TLS proxies] dovenull 138520 138506 0 10:31 ? 00:00:06 dovecot/imap-login [redacted TLS proxy] dovenull 138521 138506 0 10:31 ? 00:00:11 dovecot/imap-login [0 pre-login + 3 TLS proxies] dovenull 138522 138506 0 10:31 ? 00:00:16 dovecot/imap-login [2 pre-login + 2 TLS proxies] dovenull 138523 138506 0 10:31 ? 00:00:13 dovecot/imap-login [1 pre-login + 2 TLS proxies] dovenull 138524 138506 0 10:31 ? 00:00:24 dovecot/imap-login [1 pre-login + 3 TLS proxies] dovenull 138525 138506 0 10:31 ? 00:01:13 dovecot/imap-login [36 pre-login + 23 TLS proxies] dovenull 138526 138506 0 10:31 ? 00:00:41 dovecot/imap-login [10 pre-login + 12 TLS proxies] dovenull 138527 138506 0 10:31 ? 00:00:20 dovecot/imap-login [1 pre-login + 7 TLS proxies] root 138528 138506 2 10:31 ? 00:04:45 dovecot/config dovecot 138529 138506 1 10:31 ? 00:02:22 dovecot/stats [19389 connections] dovecot 138530 138506 3 10:31 ? 00:05:53 dovecot/auth [137 wait, 0 passdb, 0 userdb] root 148675 138506 0 13:13 ? 00:00:00 dovecot/doveadm-server [redacted]Other stats:
~# ps -f --ppid
pidof dovecot
| grep "dovecot/imap " | wc -l 19328 ~# doveadm process status | grep "^imap " | wc -l 19307 ~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap\"" | wc -l 19141 ~# ss -ntp "( sport = :143 or sport = :993 )" | grep "\"imap-login\"" | wc -l 333 ~# ss -ntp "( sport = :143 or sport = :993 )" | wc -l 19530~# doveconf -n
# 2.3.9.2 (cf2918cac): /etc/dovecot/dovecot.conf # OS: Linux 4.9.0-9-amd64 x86_64 Debian 9.11 # Hostname: [redacted] default_vsz_limit = 768 M disable_plaintext_auth = no imap_id_log = * log_timestamp = "%F %T %z " login_trusted_networks = [redacted] mail_fsync = always mail_location = maildir:~/Maildir mail_nfs_index = yes mail_nfs_storage = yes mmap_disable = yes namespace inbox { inbox = yes location = mailbox Drafts { auto = subscribe special_use = \Drafts } mailbox Sent { auto = subscribe special_use = \Sent } mailbox "Sent Messages" { auto = no special_use = \Sent } mailbox Spam { auto = create special_use = \Junk } mailbox Trash { auto = subscribe special_use = \Trash } prefix = separator = / } passdb { args = username_format=%Lu /etc/dovecot/aliases default_fields = noauthenticate driver = passwd-file } passdb { args = /etc/dovecot/passwd driver = passwd-file } protocols = imap pop3 service doveadm { inet_listener { port = 24245 } inet_listener http { port = 8080 } } service imap-login { client_limit = 4096 inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } process_limit = 16 process_min_avail = 16 service_count = 0 vsz_limit = 768 M } service imap { client_limit = 1 process_limit = 65536 } service pop3-login { client_limit = 4096 inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } process_limit = 8 process_min_avail = 2 service_count = 0 } service pop3 { process_limit = 16384 } service stats { client_limit = 65536 } ssl_cert = <[redacted] ssl_cipher_list = ALL:!LOW:!SSLv2:!EXP:!aNULL ssl_dh = # hidden, use -P to show it ssl_key = # hidden, use -P to show it userdb { args = /etc/dovecot/passwd driver = passwd-file } verbose_proctitle = yes protocol imap { mail_max_userip_connections = 20 rawlog_dir = [redacted] }
Are there any other tunables either in Dovecot or in the kernel that may relate to this issue that we may have missed?
-- Eirik
participants (3)
-
@lbutlr
-
Aki Tuomi
-
Eirik Rye