[Dovecot] stale imap processes with 1.0.rc6 on FreeBSD
Was running dovecot-1.0.rc2 but many stale imap processes (only one user of IMAP).
Switched to dovecot-1.0.rc6 but problem continued.
This is on FreeBSD 4 something. Maybe it has bug with kqueue??
I read messages about the stale imap processes on the list. It seems to hit *BSD systems. But email about new release candidate indicated it should be fixed.
I read on the pkgsrc commit list about --with-ioloop=best
So I configured with that and saw that the I/O loop method is now kqueue. (I saved configure output and it was poll before).
But the stale imap processes problem continues.
For one user over past couple hours I have eight imap processes in "I" state.
I have:
File offsets ........................ : 64bit I/O loop method ..................... : kqueue File change notification method ..... : kqueue Building with SSL support ........... : yes (OpenSSL) Building with IPv6 support .......... : yes Building with pop3 server ........... : yes Building with mail delivery agent .. : yes Building with GSSAPI support ........ : no Building with user database modules . : static prefetch passwd passwd-file checkpassword (modules) Building with password lookup modules : passwd passwd-file pam checkpassword (modules) Building with SQL drivers ............:
And I have:
protocols = imap pop3 protocol imap { imap_client_workarounds = outlook-idle delay-newmail }
(Just started using delay-newmail today.)
My maillogs don't show any problems.
The IMAP user tells me that Outlook says it has reset connections often.
I see multiple imap-login Login within a few minutes for this same user who said received a connection reset in outlook at the same time.
Just received a screenshot that said:
Microsoft Office Outlook
Failed to update headers.
The TCP/IP connection unexpectedly terminated by the server.
Then also received an error (in Outlook I think), like:
imap session idle too long, connection reset
(I don't have or use Outlook myself so can't test myself.)
It is used for a lot of pop3 and seems fine with that. (It uses stunnel for pop3s but not imaps for over a week to eliminate stunnel from our troubleshooting).
Any ideas on troubleshooting this would be appreciated.
Jeremy C. Reed
On Thu, 2006-08-10 at 17:40 -0500, Jeremy C. Reed wrote:
Was running dovecot-1.0.rc2 but many stale imap processes (only one user of IMAP).
Switched to dovecot-1.0.rc6 but problem continued.
I don't really understand this problem.. Is this user using SSL? What if you try the attached patch, does it change anything?
For one user over past couple hours I have eight imap processes in "I" state.
Could you get process traces and gdb backtraces from them? See http://dovecot.org/bugreport.html#hangs
On Thu, 2006-08-10 at 17:40 -0500, Jeremy C. Reed wrote:
Was running dovecot-1.0.rc2 but many stale imap processes (only one user of IMAP).
Switched to dovecot-1.0.rc6 but problem continued.
I don't really understand this problem.. Is this user using SSL? What if you try the attached patch, does it change anything?
No ssl as far as I know. (I tried to explain in my previous email: no imaps in protocols and not using stunnel for imaps for over week. Also the end user said stopped using imaps over a week ago to make sure that was not the problem.)
I didn't try the patch yet as it looks to be for SSL only.
For one user over past couple hours I have eight imap processes in "I" state.
Could you get process traces and gdb backtraces from them? See http://dovecot.org/bugreport.html#hangs
Attaching to program: /local/service/dovecot/libexec/dovecot/imap, process 49042 Reading symbols from /usr/lib/libc.so.4...done. Reading symbols from /usr/libexec/ld-elf.so.1...done. 0x2811d590 in kevent () from /usr/lib/libc.so.4 (gdb) bt full #0 0x2811d590 in kevent () from /usr/lib/libc.so.4 No symbol table info available. #1 0x80a8b20 in event_callback (context=0x80d8030) at ioloop-notify-kqueue.c:41 ctx = (struct ioloop_notify_handler_context *) 0x80d8030 io = (struct io *) 0x16b ev = {ident = 135045792, filter = 1, flags = 0, fflags = 134910204, data = 135098368, udata = 0x80dd050} #2 0x80a9230 in io_loop_handler_run (ioloop=0x80d6000) at ioloop-poll.c:203 ctx = (struct ioloop_handler_context *) 0x80ca140 pollfd = (struct pollfd *) 0x2 tv = {tv_sec = 0, tv_usec = 430177} io = (struct io *) 0x80ca540 t_id = 2 msecs = 363 ret = 0 call = 363 #3 0x80a8915 in io_loop_run (ioloop=0x80d6000) at ioloop.c:274 ioloop = (struct ioloop *) 0x80d6000 #4 0x806104d in main (argc=1, argv=0xbfbffa74, envp=0xbfbffa7c) at main.c:273 No locals. #5 0x8055ac9 in _start () No symbol table info available.
I don't really understand this problem.. Is this user using SSL? What if you try the attached patch, does it change anything?
No ssl as far as I know. (I tried to explain in my previous email: no imaps in protocols and not using stunnel for imaps for over week. Also the end user said stopped using imaps over a week ago to make sure that was not the problem.)
Also I have: ssl_disable = yes
I didn't try the patch yet as it looks to be for SSL only.
I did a backtrace on another stale PID and the gdb backtrace full was identical.
On Thu, 2006-08-10 at 18:44 -0500, Jeremy C. Reed wrote:
On Thu, 2006-08-10 at 17:40 -0500, Jeremy C. Reed wrote:
Was running dovecot-1.0.rc2 but many stale imap processes (only one user of IMAP).
Switched to dovecot-1.0.rc6 but problem continued.
I don't really understand this problem.. Is this user using SSL? What if you try the attached patch, does it change anything?
No ssl as far as I know. (I tried to explain in my previous email: no imaps in protocols and not using stunnel for imaps for over week. Also the end user said stopped using imaps over a week ago to make sure that was not the problem.)
Oh. I thought you meant you used Dovecot SSL instead of stunnel to make sure the problem wasn't with it, or something. :)
Could you get process traces and gdb backtraces from them? See http://dovecot.org/bugreport.html#hangs
Attaching to program: /local/service/dovecot/libexec/dovecot/imap, process 49042 Reading symbols from /usr/lib/libc.so.4...done. Reading symbols from /usr/libexec/ld-elf.so.1...done. 0x2811d590 in kevent () from /usr/lib/libc.so.4 (gdb) bt full #0 0x2811d590 in kevent () from /usr/lib/libc.so.4 No symbol table info available. #1 0x80a8b20 in event_callback (context=0x80d8030) at ioloop-notify-kqueue.c:41 ctx = (struct ioloop_notify_handler_context *) 0x80d8030 io = (struct io *) 0x16b ev = {ident = 135045792, filter = 1, flags = 0, fflags = 134910204, data = 135098368, udata = 0x80dd050}
So it's at least not in some infinite loop. What about ktrace, does it show anything else than kevents?
Also, what do these commands say after attaching to gdb:
p *my_client p *(struct file_istream *)my_client.input.real_stream p *(struct file_ostream *)my_client.output.real_stream
And if I understood correctly, the same hangs happened even if you didn't use --with-ioloop parameter at all, ie. it used poll?
So it's at least not in some infinite loop. What about ktrace, does it show anything else than kevents?
When I did ktrace -p on pids before the kdump was empty.
I am checking again ... still nothing.
Also, what do these commands say after attaching to gdb:
p *my_client p *(struct file_istream *)my_client.input.real_stream p *(struct file_ostream *)my_client.output.real_stream
(gdb) p *my_client $1 = {fd_in = 0, fd_out = 1, io = 0x80ca0a0, input = 0x80d9424, output = 0x80da0b0, namespaces = 0x80d7440, mailbox = 0x80d7840, keywords = { pool = 0x80d4810, keywords = {buffer = 0x0, element_size = 0}}, select_counter = 1, messages_count = 1238, recent_count = 1, last_input = 1155240486, last_output = 1155240432, bad_counter = 0, parser = 0x80d6040, cmd = {client = 0x80da000, pool = 0x80dd010, tag = 0x0, name = 0x0, func = 0, context = 0x0, uid = 0, param_error = 0}, disconnected = 0, destroyed = 0, command_pending = 0, input_pending = 0, output_pending = 0, rawlog = 0, input_skip_line = 0}
(gdb) p *(struct file_istream *)my_client.input.real_stream $2 = {istream = {iostream = {pool = 0x80c83a0, refcount = 1, close = 0x80a7430 <_close>, destroy = 0x80a7474 <_destroy>, set_max_buffer_size = 0x80a7498 <_set_max_buffer_size>}, read = 0x80a755c <_read>, seek = 0x80a77a0 <_seek>, sync = 0x80a7808 <_sync>, stat = 0x80a78a8 <_stat>, istream = { v_offset = 617, stream_errno = 0, mmaped = 0, closed = 0, seekable = 0, eof = 0, real_stream = 0x80d9400}, fd = 0, buffer = 0x80df000 "wb7m IDLE", w_buffer = 0x80df000 "wb7m IDLE", line_str = 0x0, buffer_size = 4096, abs_start_offset = 0, statbuf = { st_dev = 0, st_ino = 0, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, st_atimespec = {tv_sec = 1155240085, tv_nsec = 0}, st_mtimespec = {tv_sec = 1155240085, tv_nsec = 0}, st_ctimespec = {tv_sec = 1155240085, tv_nsec = 0}, st_size = -1, st_blocks = 0, st_blksize = 0, st_flags = 0, st_gen = 0, st_lspare = 0, st_qspare = {0, 0}}, skip = 617, pos = 617}, fstat_cache_stamp = { tv_sec = 0, tv_usec = 0}, max_buffer_size = 65536, skip_left = 0, file = 0, autoclose_fd = 0} (gdb) *I pressed enter here I think* $3 = {istream = {iostream = {pool = 0x80c83a0, refcount = 1, close = 0x80a7430 <_close>, destroy = 0x80a7474 <_destroy>, set_max_buffer_size = 0x80a7498 <_set_max_buffer_size>}, read = 0x80a755c <_read>, seek = 0x80a77a0 <_seek>, sync = 0x80a7808 <_sync>, stat = 0x80a78a8 <_stat>, istream = { v_offset = 617, stream_errno = 0, mmaped = 0, closed = 0, seekable = 0, eof = 0, real_stream = 0x80d9400}, fd = 0, buffer = 0x80df000 "wb7m IDLE", w_buffer = 0x80df000 "wb7m IDLE", line_str = 0x0, buffer_size = 4096, abs_start_offset = 0, statbuf = { st_dev = 0, st_ino = 0, st_mode = 0, st_nlink = 0, st_uid = 0, st_gid = 0, st_rdev = 0, st_atimespec = {tv_sec = 1155240085, tv_nsec = 0}, st_mtimespec = {tv_sec = 1155240085, tv_nsec = 0}, st_ctimespec = {tv_sec = 1155240085, tv_nsec = 0}, st_size = -1, st_blocks = 0, st_blksize = 0, st_flags = 0, st_gen = 0, st_lspare = 0, st_qspare = {0, 0}}, skip = 617, pos = 617}, fstat_cache_stamp = { tv_sec = 0, tv_usec = 0}, max_buffer_size = 65536, skip_left = 0, file = 0, autoclose_fd = 0}
(gdb) p *(struct file_ostream *)my_client.output.real_stream $4 = {ostream = {iostream = {pool = 0x80c83a0, refcount = 1, close = 0x80ac744 <_close>, destroy = 0x80ac768 <_destroy>, set_max_buffer_size = 0x80ac78c <_set_max_buffer_size>}, cork = 0x80aca00 <_cork>, flush = 0x80acaa4 <_flush>, flush_pending = 0x80acab8 <_flush_pending>, get_used_size = 0x80acb38 <_get_used_size>, seek = 0x80acb58 <_seek>, sendv = 0x80ace00 <_sendv>, send_istream = 0x80ad4c0 <_send_istream>, ostream = {offset = 55033, stream_errno = 0, overflow = 0, closed = 0, real_stream = 0x80da080}, callback = 0x805a1fc <_client_output>, context = 0x80da000}, fd = 1, io = 0x0, buffer = 0x80e0000 "2yi2 OK Idle completed.\r\n\r\nS (\\Seen) INTERNALDATE \"10-Aug-2006 12:40:01 -0700\" BODY[] {2437}\r\nReturn-Path: <reed@reedmedia.net>\r\nX-Original-To: AAAAAAA@AAAAAAA.AAA.AAAAAAAAAAA.com\r\nDelivered-To: AAAA"..., buffer_size = 4096, max_buffer_size = 4294967295, optimal_block_size = 4096, head = 0, tail = 0, full = 0, file = 0, corked = 0, flush_pending = 0, no_socket_cork = 1, no_sendfile = 0, autoclose_fd = 0}
And if I understood correctly, the same hangs happened even if you didn't use --with-ioloop parameter at all, ie. it used poll?
Yes, it hung -- had many stale imap processes. My configure output said it was "poll".
Thanks for your help.
I may have time later on IRC, but I don't use it often.
On 11.8.2006, at 3.53, Jeremy C. Reed wrote:
So it's at least not in some infinite loop. What about ktrace,
does it show anything else than kevents?When I did ktrace -p on pids before the kdump was empty.
Hmm. I thought it would have shown at least kevent calls in there,
but I guess not then..
Also, what do these commands say after attaching to gdb:
p *my_client p *(struct file_istream *)my_client.input.real_stream p *(struct file_ostream *)my_client.output.real_stream
(gdb) p *my_client ..
Are you sure these backtraces were from a process that was really
hanging? Everything in there looks as if it was just an idling
connection which was still connected to the IMAP client. Although the
last output was "OK Idle completed." which means that the client most
likely would have gone back to IDLE command instead of just leaving
it there..
Can you check process's connections in FreeBSD with netstat -p or
something else?
Are you sure these backtraces were from a process that was really hanging? Everything in there looks as if it was just an idling connection which was still connected to the IMAP client. Although the last output was "OK Idle completed." which means that the client most likely would have gone back to IDLE command instead of just leaving it there..
Can you check process's connections in FreeBSD with netstat -p or something else?
Sorry, I didn't mean to imply they were hanging.
But it appears that they were unused.
As my first email showed, the Outlook client had error messages and the connections were closed. (Happening frequently.)
The logs show more and more imap-logins for the same user (and same IP). But no old imap-login processes.
My logs show multiple "dovecot: imap-login: Login" for same user (from same IP) -- over 130 times in less than one day. Is that normal?
As I don't know correct behaviour: should I normally have imap-login with start time of when I started dovecot? (In my case, they are recent times and not when dovecot started.)
And more and more "imap" processes are left running.
Since I started this thread I have many more "imap" processes running (again only one user is using this).
How many "imap" processes is normal for one user from one client (and one IP)? (I have probably had over a hundred today, but I killed many off a few hours ago.)
sockstat does show several imap connections. But netstat shows same are in CLOSE_WAIT and have been for a while.
There are many more "imap" processes without any network connection though.
- Jeremy C. Reed
p.s. Instead of "netstat -p" (from Linux net-tools), on BSD systems use fstat and/or sockstat. (Or lsof from third party package.)
On Thu, 2006-08-10 at 21:28 -0500, Jeremy C. Reed wrote:
My logs show multiple "dovecot: imap-login: Login" for same user (from same IP) -- over 130 times in less than one day. Is that normal?
Depending on the client and network connections, could be.
As I don't know correct behaviour: should I normally have imap-login with start time of when I started dovecot? (In my case, they are recent times and not when dovecot started.)
No, by default one imap-login serves one IMAP login connection and then dies. A new imap-login is created whenever a new user logs in.
How many "imap" processes is normal for one user from one client (and one IP)? (I have probably had over a hundred today, but I killed many off a few hours ago.)
I think Thunderbird's default is 5, which is the normal maximum.
sockstat does show several imap connections. But netstat shows same are in CLOSE_WAIT and have been for a while.
There are many more "imap" processes without any network connection though.
These are the important processes then. What do ktrace and gdb backtraces show for them?
Can you see if file descriptors still exist for the sockets? In Linux I'd do that with ls -l /proc/pid/fd/
There are many more "imap" processes without any network connection though.
These are the important processes then. What do ktrace and gdb backtraces show for them?
Okay, here is one that no longer has an network connection:
(gdb) bt full #0 0x2811d590 in ?? () No symbol table info available. #1 0x80a9230 in ?? () No symbol table info available. #2 0x80a8915 in ?? () No symbol table info available. #3 0x806104d in _fini () No symbol table info available. #4 0x8055ac9 in i_stream_create_file (fd=-1077937284, pool=0x0, max_buffer_size=3217030017, autoclose_fd=-1077937263) at istream-file.c:236 pool = 0x0 max_buffer_size = 135102512 fstream = (struct file_istream *) 0x80ca560 st = {st_dev = 3217029756, st_ino = 3217029664, st_mode = 4159, st_nlink = 2054, st_uid = 3217029740, st_gid = 671912909, st_rdev = 3217029740, st_atimespec = {tv_sec = 134568649, tv_nsec = 1}, st_mtimespec = {tv_sec = -1077937548, tv_nsec = -1077937540}, st_ctimespec = {tv_sec = 647, tv_nsec = -1077937556}, st_size = 577967886526927274, st_blocks = 134943496, st_blksize = 0, st_flags = 0, st_gen = 3217029736, st_lspare = 0, st_qspare = { -4629706567330037760, 2885872453060393580}}
Nothing for ktrace though. With ktracing enabled, next time doing gdb did add details (but maybe not related):
99399 imap PSIG SIGINT caught handler=0x80a92ec mask=0x0 code=0x0 99399 imap RET kevent -1 errno 4 Interrupted system call 99399 imap CALL write(0x6,0xbfbff7a3,0x1) 99399 imap GIO fd 6 wrote 1 byte "\^B" 99399 imap RET write 1 99399 imap CALL sigreturn(0xbfbff7c0) 99399 imap RET sigreturn JUSTRETURN 99399 imap CALL write(0x2,0x80cb1e0,0x2b) 99399 imap GIO fd 2 wrote 43 bytes "\^AFkevent() failed: Interrupted system call " 99399 imap RET write 43/0x2b 99399 imap CALL sigprocmask(0x1,0x280d9be0,0xbfbff8a0) 99399 imap RET sigprocmask 0 99399 imap CALL sigprocmask(0x3,0x280d9bf0,0) 99399 imap RET sigprocmask 0 99399 imap CALL exit(0x59)
Can you see if file descriptors still exist for the sockets? In Linux I'd do that with ls -l /proc/pid/fd/
Yes, they do. fstat shows the "internet stream tcp" for them with file descriptor 0 and 1. The ones that still have a network connection also display the address of the socket, like:
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W user imap 4711 0* internet stream tcp d5fcd640 user imap 4711 1* internet stream tcp d5fcd640 user imap 99399 0* internet stream tcp user imap 99399 1* internet stream tcp
It is now 8:55 AM, here is a hung/stale process from 5:01 AM:
[root@jidmail log]# ps aux | grep lcrdrla lcrdrla 23436 0.0 0.0 3092 888 ? S 05:01 0:00 imap lcrdrla 6467 0.0 0.0 4068 868 ? S 08:00 0:00 imap root 20659 0.0 0.0 4232 660 pts/1 S+ 08:51 0:00 grep lcrdrla
I am running CentOS 4.3
[root@jidmail log]# gdb attach 23436 GNU gdb Red Hat Linux (6.3.0.0-1.96rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...attach: No such file or directory.
Attaching to process 23436 Reading symbols from /usr/libexec/dovecot/imap...(no debugging symbols found)...done. Using host libthread_db library "/lib/tls/libthread_db.so.1". Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/tls/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/tls/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2
0x003627a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 (gdb) bt full #0 0x003627a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 No symbol table info available. #1 0x00438c6d in poll () from /lib/tls/libc.so.6 No symbol table info available. #2 0x08086fd1 in io_loop_handler_run () No symbol table info available. #3 0x08086abb in io_loop_run () No symbol table info available. #4 0x0805a04c in main () No symbol table info available. (gdb)
On Thu, 2006-08-10 at 23:44 -0500, Jeremy C. Reed wrote:
There are many more "imap" processes without any network connection though.
These are the important processes then. What do ktrace and gdb backtraces show for them?
Okay, here is one that no longer has an network connection:
(gdb) bt full #0 0x2811d590 in ?? () No symbol table info available. #1 0x80a9230 in ?? () No symbol table info available. #2 0x80a8915 in ?? () No symbol table info available. #3 0x806104d in _fini () No symbol table info available. #4 0x8055ac9 in i_stream_create_file (fd=-1077937284, pool=0x0, max_buffer_size=3217030017, autoclose_fd=-1077937263) at istream-file.c:236 ..
Unfortunately this backtrace is completely corrupted..
Nothing for ktrace though. With ktracing enabled, next time doing gdb did add details (but maybe not related):
99399 imap PSIG SIGINT caught handler=0x80a92ec mask=0x0 code=0x0 99399 imap RET kevent -1 errno 4 Interrupted system call 99399 imap CALL write(0x6,0xbfbff7a3,0x1) 99399 imap GIO fd 6 wrote 1 byte "\^B" 99399 imap RET write 1 99399 imap CALL sigreturn(0xbfbff7c0) 99399 imap RET sigreturn JUSTRETURN 99399 imap CALL write(0x2,0x80cb1e0,0x2b) 99399 imap GIO fd 2 wrote 43 bytes "\^AFkevent() failed: Interrupted system call "
Hmm. This actually helps. It shows it's blocking in ioloop-notify-kqueue and it really shouldn't be blocking there. See if the attached patch helps.
I think I'll also do some other cleanups to kqueue code now that I finally really looked at it..
Jeremy C. Reed wrote:
So it's at least not in some infinite loop. What about ktrace, does it show anything else than kevents?
I saw the same problem, but recompiled with WITH_KQUEUE=true. Before, I had ioloop=poll and eventloop=kqueue. This caused strange behavior on my FreeBSD 6.1 system. Now I've got rc6 running with kqueue for both, and things seem pretty good. It looks like I have a couple of spare imap processes, but they go away when I exit the clients (all Thunderbird 1.5.0.5).
Thanks, Marty
On Thu, 2006-08-10 at 17:40 -0500, Jeremy C. Reed wrote:
I read on the pkgsrc commit list about --with-ioloop=best
So I configured with that and saw that the I/O loop method is now kqueue. (I saved configure output and it was poll before).
What about --with-ioloop=poll --with-notify=none
Are everyone who are having these stale imap process problems running BSDs?
participants (4)
-
james edwards
-
Jeremy C. Reed
-
Martin Jackson
-
Timo Sirainen