[Dovecot] 2.0.beta3 imapd running hot
Hi,
since a couple of weeks I occasionally have an imapd going crazy on me, using up 100% CPU. Current revision is 10962
29865 vmail 20 0 47820 3296 1708 R 99.7 0.9 131:07.85 imap
vmail 29865 86.1 0.8 47820 3296 ? R Mar20 131:20 dovecot/imap [berni 2001:a60:f001:1:219:66ff:fe8b:a6e IDLE
strace -p <pid> shows a repeating pattern of
epoll_wait(8, {{EPOLLOUT, {u32=36560672, u64=36560672}}}, 5, 4872) = 1 gettimeofday({1269133967, 495298}, NULL) = 0 gettimeofday({1269133967, 495333}, NULL) = 0 gettimeofday({1269133967, 495370}, NULL) = 0
with the last argument of epoll_wait (4872) slowly decrementing and then wrapping around to ~30k.
Attaching gdb shows the following backtrace:
(gdb) bt full #0 array_idx_i (ioloop=0x22bd5f0) at array.h:189 No locals. #1 io_loop_handler_run (ioloop=0x22bd5f0) at ioloop-epoll.c:188 ctx = 0x22bd800 event = <value optimized out> list = <value optimized out> io = <value optimized out> tv = {tv_sec = 26, tv_usec = 307176} t_id = 2 msecs = 26308 ret = 1 i = <value optimized out> call = <value optimized out> #2 0x00007f8f682bc8e8 in io_loop_run (ioloop=0x22bd5f0) at ioloop.c:350 No locals. #3 0x00007f8f682acd63 in master_service_run (service=0x22bd4c0, callback=0x6590) at master-service.c:486 No locals. #4 0x0000000000418551 in main (argc=1, argv=0x22bd3a0) at main.c:313 set_roots = {0x41b840, 0x0} service_flags = <value optimized out> storage_service_flags = MAIL_STORAGE_SERVICE_FLAG_DISALLOW_ROOT postlogin_socket_path = 0x0 (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x00007fffed1ff9ad in gettimeofday () (gdb) bt full #0 0x00007fffed1ff9ad in gettimeofday () No symbol table info available. #1 0x00007f8f6795f02a in gettimeofday () from /lib/libc.so.6 No symbol table info available. #2 0x00007f8f682bcb39 in io_loop_handle_timeouts_real (ioloop=0x22bd5f0) at ioloop.c:291 item = <value optimized out> tv = {tv_sec = 0, tv_usec = 36427248} tv_call = {tv_sec = 0, tv_usec = 140737170938560} t_id = <value optimized out> #3 io_loop_handle_timeouts (ioloop=0x22bd5f0) at ioloop.c:339 _data_stack_cur_id = 0 #4 0x00007f8f682bd70e in io_loop_handler_run (ioloop=0x22bd5f0) at ioloop-epoll.c:180 ctx = 0x22bd800 event = <value optimized out> list = <value optimized out> io = <value optimized out> tv = {tv_sec = 28, tv_usec = 267997} t_id = 2 msecs = 28268 ret = 1 i = <value optimized out> call = <value optimized out> #5 0x00007f8f682bc8e8 in io_loop_run (ioloop=0x22bd5f0) at ioloop.c:350 No locals. #6 0x00007f8f682acd63 in master_service_run (service=0x22bd4c0, callback=0) at master-service.c:486 No locals. #7 0x0000000000418551 in main (argc=1, argv=0x22bd3a0) at main.c:313 set_roots = {0x41b840, 0x0} service_flags = <value optimized out> storage_service_flags = MAIL_STORAGE_SERVICE_FLAG_DISALLOW_ROOT postlogin_socket_path = 0x0
doveconf -n:
mail.svr02.mucip.net:~# doveconf -n # 2.0.beta3: /etc/dovecot/dovecot.conf # OS: Linux 2.6.33 x86_64 Debian squeeze/sid auth_mechanisms = plain login disable_plaintext_auth = no listen = 83.170.6.69, 2001:1b10:1000::110:1:1 mail_gid = 5000 mail_location = maildir:%h/:INDEX=/var/cache/dovecot/%1n/%n mail_uid = 5000 maildir_stat_dirs = yes maildir_very_dirty_syncs = yes passdb { args = /etc/dovecot/dovecot-ldap.conf driver = ldap } plugin { quota = dict:user::file:%h/dovecot-quota quota_rule = *:storage=500M sieve = ~/.dovecot.sieve sieve_dir = ~/sieve } postmaster_address = postmaster@birkenwald.de protocols = imap pop3 lmtp managesieve quota_full_tempfail = yes service auth { unix_listener /var/spool/postfix-mailout/private/auth { group = postfix mode = 0660 user = postfix } unix_listener /var/spool/postfix/private/auth { group = postfix mode = 0660 user = postfix } unix_listener auth-userdb { group = vmail mode = 0660 } } service imap-login { inet_listener imap { port = 143 } inet_listener imaps { port = 993 ssl = yes } } service lmtp { inet_listener { port = 24 } } service managesieve-login { inet_listener sieve_deprecated { port = 2000 } process_min_avail = 1 } service pop3-login { inet_listener pop3 { port = 110 } inet_listener pop3s { port = 995 ssl = yes } } ssl_cert =
At the moment I have two suspects, the imap_zlib plugin and CONDSTORE (which I recently reenabled in Thunderbird after the fixed version got deployed). The client is always Thunderbird 3.0.3+, but from different systems. It does terminate without any error message in the logs when I exit Thunderbird.
Now testing without zlib...
Bernhard
On Sun, 2010-03-21 at 01:20 +0000, Bernhard Schmidt wrote:
strace -p <pid> shows a repeating pattern of
epoll_wait(8, {{EPOLLOUT, {u32=36560672, u64=36560672}}}, 5, 4872) = 1 gettimeofday({1269133967, 495298}, NULL) = 0 gettimeofday({1269133967, 495333}, NULL) = 0 gettimeofday({1269133967, 495370}, NULL) = 0
So nothing else? It looks like some function is being called over and over again that "you can write more data to client", but the function then does nothing.
Attaching gdb shows the following backtrace:
It would be helpful to find out what the I/O function is that is being called. For example:
b ioloop-epoll.c:208 p *io cont p *io cont ..etc..
What is the *io output? If it's client_output() or something else as generic.. Use "step" to get into the function then then:
p *client p *client.output_lock p client.command_queue[0] p client.command_queue[1] p client.command_queue[2]
On 21.03.2010 15:17, Timo Sirainen wrote:
Hi,
strace -p<pid> shows a repeating pattern of
epoll_wait(8, {{EPOLLOUT, {u32=36560672, u64=36560672}}}, 5, 4872) = 1 gettimeofday({1269133967, 495298}, NULL) = 0 gettimeofday({1269133967, 495333}, NULL) = 0 gettimeofday({1269133967, 495370}, NULL) = 0
So nothing else? It looks like some function is being called over and over again that "you can write more data to client", but the function then does nothing.
No, nothing else (I could see).
Attaching gdb shows the following backtrace: It would be helpful to find out what the I/O function is that is being called. For example:
I'll try to get your debug output.
I'm pretty sure it's related to the imap_zlib plugin, I haven't had one single problem since I disabled it. I'll reenable and go tracing.
Bernhard
On 21.03.2010 15:17, Timo Sirainen wrote:
Hi Timo,
It would be helpful to find out what the I/O function is that is being called. For example:
b ioloop-epoll.c:208 p *io cont p *io cont ..etc..
What is the *io output? If it's client_output() or something else as generic.. Use "step" to get into the function then then:
p *client p *client.output_lock p client.command_queue[0] p client.command_queue[1] p client.command_queue[2]
(gdb) bt #0 stream_send_io (fstream=0x179a570) at ostream-file.c:464 #1 0x00007fbaa9a69775 in io_loop_handler_run (ioloop=<value optimized out>) at ioloop-epoll.c:208 #2 0x00007fbaa9a688f8 in io_loop_run (ioloop=0x177e5f0) at ioloop.c:350 #3 0x00007fbaa9a58d63 in master_service_run (service=0x177e4c0, callback=0) at master-service.c:486 #4 0x00000000004186c1 in main (argc=1, argv=0x177e3a0) at main.c:323 (gdb) b ioloop-epoll.c:208 Breakpoint 1 at 0x7fbaa9a69767: file ioloop-epoll.c, line 208. (gdb) p *io No symbol "io" in current context. (gdb) cont Continuing.
Breakpoint 1, io_loop_handler_run (ioloop=<value optimized out>) at
ioloop-epoll.c:208
208 ioloop-epoll.c: Datei oder Verzeichnis nicht gefunden.
in ioloop-epoll.c
(gdb) p *io
$1 = {io = {condition = IO_WRITE, callback = 0x7fbaa9a715c0
(gdb) cont Continuing.
Breakpoint 1, io_loop_handler_run (ioloop=<value optimized out>) at
ioloop-epoll.c:208
208 in ioloop-epoll.c
(gdb) p *io
$2 = {io = {condition = IO_WRITE, callback = 0x7fbaa9a715c0
I am pretty sure it is related to imap_zlib ... at least this did not happen when I disabled the plugin.
I'm on revision 10993 at the moment.
Bernhard
[snip]
I am pretty sure it is related to imap_zlib ... at least this did not happen when I disabled the plugin.
[snip]
Your backtrace looks exactly like what I see on 2.0Beta4 on NetBSD, and I observed the same thing with respect to imap_zlib. I just can't find the smoking gun that links the process behavior with the use of imap_zlib. :-/
At least I'm not the only one seeing the issue though!
On 29/03/2010 23:44, Scott Ellis wrote:
Your backtrace looks exactly like what I see on 2.0Beta4 on NetBSD, and I observed the same thing with respect to imap_zlib. I just can't find the smoking gun that links the process behavior with the use of imap_zlib. :-/
At least I'm not the only one seeing the issue though!
I see exactly the same issue with imap_zlib on OpenBSD 4.6 with dovecot 2.0 aefa279e2c70. It stops as soon as imap_zlib is deactivated.
participants (4)
-
Bernhard Schmidt
-
Renaud Allard
-
Scott Ellis
-
Timo Sirainen