[Dovecot] unkillable imap process(es) with high CPU-usage
Hello,
I am having a problem with my dovecot-daemon. It is forking one or more (I saw up to perhaps 8 of them) imap processes under my user name. These processes are consuming a lot of CPU time and are not killable:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8616 arno 20 0 2900 1600 1204 R 98 0.2 1196:38 imap
Stopping dovecot does not quit these processes. Killing them (even "kill -9" as root) is not possible. The only solution to get rid of them is to reboot.
I have found a mailing list post of one who seemed to have the same problem. He solved it by upgrading to version 1.1. But in my case this did not help.
Is this a known problem? What could I check or do?
My system: very up to date debian/sid (sidux) kernel 2.6.27-8.slh.1-sidux-686 CPU: Intel(R) Core(TM)2 CPU 4300 @ 1.80GHz Filesystem: ext3fs
tried dovecot from sid: 1:1.0.15-2.3 tried dovecot from experimental: 1:1.1.2-3
My dovecot.conf is the original debian configuration with only one line changed into: protocols = imaps
# dovecot -n # 1.1.2: /etc/dovecot/dovecot.conf log_timestamp: %Y-%m-%d %H:%M:%S protocols: imaps login_dir: /var/run/dovecot/login login_executable: /usr/lib/dovecot/imap-login mail_privileged_group: mail auth default: passdb: driver: pam userdb: driver: passwd
I am using dovecot locally on my system with me as the only user. As client I am using thunderbird alias icedove 2.0.0.17-1. Icedove retrieves the Mails from another imap server and sorts them into Maildir-folders in dovecot. I do not know when the imap-process are going mad. It happens after the system (and the mail-client) is up for a while.
Do you need more information?
Thanks, Arno
On Thu, 2008-12-11 at 11:36 +0100, Arno Wald wrote:
I am having a problem with my dovecot-daemon. It is forking one or more (I saw up to perhaps 8 of them) imap processes under my user name. These processes are consuming a lot of CPU time and are not killable:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8616 arno 20 0 2900 1600 1204 R 98 0.2 1196:38 imap
Stopping dovecot does not quit these processes. Killing them (even "kill -9" as root) is not possible. The only solution to get rid of them is to reboot.
If you can't kill a process with -9, the bug is in the kernel and there's nothing Dovecot can do about it. User spaces processes can't create unkillable processes unless something's broken.
Although you could see if "strace -p <pid>" prints something. It's doubtful though if you can't kill the process.
Timo Sirainen wrote:
If you can't kill a process with -9, the bug is in the kernel and
Do you have an idea, where and how I could report this to? (against the kernel package?). Perhaps I try an original debian kernel instead the sidux kernel first.
there's nothing Dovecot can do about it. User spaces processes can't create unkillable processes unless something's broken.
Although you could see if "strace -p <pid>" prints something. It's doubtful though if you can't kill the process.
I have tried this already without success. The strace did not print anything and hang, without being able to be stopped by CTRL-C.
Thank you for your answer, Arno
Hi,
On Thu, 11 Dec 2008, Arno Wald wrote:
Timo Sirainen wrote:
If you can't kill a process with -9, the bug is in the kernel and
If there's is a blocked IO operation, like lost nfs mount point, processes appear unkillable. As soon as the share is back, the processes die. I've seen this several times.
matthias
If you can't kill a process with -9, the bug is in the kernel and there's nothing Dovecot can do about it. User spaces processes can't create unkillable processes unless something's broken. It just means the process is doing an uninterruptable sleep (in BSD notation, a tsleep() without PCATCH set). This may be an I/O operation, resource shortage etc. and needn't be a kernel bug.
On Thu, 2008-12-11 at 23:08 +0100, Edgar Fuß wrote:
If you can't kill a process with -9, the bug is in the kernel and there's nothing Dovecot can do about it. User spaces processes can't create unkillable processes unless something's broken. It just means the process is doing an uninterruptable sleep (in BSD notation, a tsleep() without PCATCH set). This may be an I/O operation, resource shortage etc. and needn't be a kernel bug.
Yes, but I'd also argue that any long enough uninterruptable sleep is a bug. :) I hate it when NFS operations hang..
Anyway, Arno's ps output showed the process to be in R state, not in D state. Unless that was some kind of a copy&paste mistake that makes it sound more like a bug.
Timo Sirainen wrote:
Anyway, Arno's ps output showed the process to be in R state, not in D
It is definitely the R state.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6717 arno 20 0 2964 1608 1192 R 100 0.2 1158:05 imap
btw: I have switched from imaps to imap protocol, because I thought this might change something. But it does not.
Bye, Arno
On Fri, 2008-12-12 at 10:10 +0100, Arno Wald wrote:
Timo Sirainen wrote:
Anyway, Arno's ps output showed the process to be in R state, not in D
It is definitely the R state.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6717 arno 20 0 2964 1608 1192 R 100 0.2 1158:05 imap
btw: I have switched from imaps to imap protocol, because I thought this might change something. But it does not.
You could see if compiling Dovecot without inotify/dnotify support would help. I can't really think of anything else.
To me it really seems like a kernel bug if a process in R state can't be killed.
Timo Sirainen wrote:
You could see if compiling Dovecot without inotify/dnotify support would help. I can't really think of anything else.
I would like to try this and report the result. But there are so many configure-options that I do not know which options (and how) I should dis/enable. Could anybody give me the command line for the ./configure? That would be very kind.
Or does it make more sense to try another kernel first?
Thanks, Arno
On Dec 12, 2008, at 6:02 PM, Arno Wald wrote:
Timo Sirainen wrote:
You could see if compiling Dovecot without inotify/dnotify support
would help. I can't really think of anything else.I would like to try this and report the result. But there are so many configure-options that I do not know which options (and how) I should dis/enable. Could anybody give me the command line for the ./ configure? That would be very kind.
configure --with-notify=none
Or does it make more sense to try another kernel first?
I guess that could also help.
Timo Sirainen wrote:
Or does it make more sense to try another kernel first?
I guess that could also help.
I started testing this with another kernel (2.6.26-6.slh.1-sidux-686). Until now (the last 15 hours) no such failing imap process did show up.
So I guess it happens with 2.6.27 kernels. I will watch this during monday to be sure.
btw: At another PC at home, also with a sidux-kernel 2.6.27 I had such an imap process, too, yesterday. (AMD Athlon(tm) XP 2200+)
Bye, Arno
On Sat, 2008-12-13 at 11:33 +0100, Arno Wald wrote:
Timo Sirainen wrote:
Or does it make more sense to try another kernel first?
I guess that could also help.
I started testing this with another kernel (2.6.26-6.slh.1-sidux-686). Until now (the last 15 hours) no such failing imap process did show up.
So I guess it happens with 2.6.27 kernels. I will watch this during monday to be sure.
btw: At another PC at home, also with a sidux-kernel 2.6.27 I had such an imap process, too, yesterday. (AMD Athlon(tm) XP 2200+)
The other guy who also had a problem was using 2.6.27. If the 3rd guy also replies that he's using 2.6.27 then that's pretty clearly the problem. Might be worth asking about in Linux kernel mailing list.
The other guy who also had a problem was using 2.6.27. If the 3rd guy also replies that he's using 2.6.27 then that's pretty clearly the problem. Might be worth asking about in Linux kernel mailing list.
We've had something similar happen on 2.6.24. Over time processes would take more and more CPU until they're at 100%. I moved to 2.6.26 and for now I havent seen this happen on any of our 35 servers.
It seemed to be NFS related in our case.
Cor
On Sat, December 13, 2008 5:43 am, Timo Sirainen wrote:
The other guy who also had a problem was using 2.6.27. If the 3rd guy also replies that he's using 2.6.27 then that's pretty clearly the problem. Might be worth asking about in Linux kernel mailing list.
Has anyone reported this over on LKML yet? Or filed a bug?
DR
On Sat, 2008-12-13 at 09:23 -0500, David Rosenstrauch wrote:
On Sat, December 13, 2008 5:43 am, Timo Sirainen wrote:
The other guy who also had a problem was using 2.6.27. If the 3rd guy also replies that he's using 2.6.27 then that's pretty clearly the problem. Might be worth asking about in Linux kernel mailing list.
Has anyone reported this over on LKML yet? Or filed a bug?
No idea, but it would be useful to test if this can be reproduced after disabling inotify code. That's the only thing that I can think of that's somewhat special in imap process code vs. everything else.
David Rosenstrauch wrote:
Has anyone reported this over on LKML yet? Or filed a bug?
I did not yet. First I would like to test my self compiled dovecot without inotify against 2.6.27. Second I do not know where and how to report kernel issues. (Also I am a little bit afraid of the whole kernel stuff, because I do not know much about it.)
Arno
A new status report regarding this issue:
Dovecot on my PC in the office is still running fine with kernel 2.6.26.
Dovecot with the latest kernel 2.6.27-9.slh.1-sidux-686 on my PC at home did show the unkillable imap processes after a few minutes.
Now I am running dovecot compiled without inotify support on this kernel without any problems for about 70 minutes.
So I really think that the inotify stuff in kernel 2.6.27 does make the problem. (I will tell you if the imap process unexpectedly are making problems again in the current configuration.).
So where are kernel issues reported? I will try to find out.
Greetings, Arno
So where are kernel issues reported? I will try to find out.
Linux kernel mailing list is probably the best place. I could also write a summary mail about this and Cc it to you all who have had the problem.
Please cc me on it, I'd rather not have to subscribe to the lkml again.
Timo Sirainen wrote:
On Mon, 2008-12-15 at 12:00 +0100, Arno Wald wrote:
So where are kernel issues reported? I will try to find out.
Linux kernel mailing list is probably the best place. I could also write a summary mail about this and Cc it to you all who have had the problem.
I would prefer this. I just have started to write a bugzilla entry on kernel.org, but as I do not know what the correct workflow for reporting issues in the kernel area is, I do not dare to press the send button yet.
Greetings, Arno
On Dec 15, 2008, at 1:00 PM, Arno Wald wrote:
A new status report regarding this issue:
Dovecot on my PC in the office is still running fine with kernel
2.6.26.Dovecot with the latest kernel 2.6.27-9.slh.1-sidux-686 on my PC at
home did show the unkillable imap processes after a few minutes.Now I am running dovecot compiled without inotify support on this
kernel without any problems for about 70 minutes.
One more thing you could try: Does the hang happen if you use
configure --with-notify=dnotify ?
Timo Sirainen wrote:
One more thing you could try: Does the hang happen if you use configure --with-notify=dnotify ?
I do not know if this is still interesting. But after notify=none did run for more than 3 hours without any problems, I am now testing dnotify since approximately 30 minutes without any problems, too.
Ciao, Arno
I have running the older debian/sid dovecot 1:1.0.15-2.3 again, now with kernel 2.6.27-10.slh.1-sidux-686 and the issue seems to be fixed. So I recommend to use at least 2.6.27.10.
Bye, Arno.
I have this EXACT same problem after upgrading to SuSE 11.1, which uses this exact kernel version!!
After reading this, I was excited to think that if I killed the nfsserver daemon (which I had running for no good reason), that it would sort my problem....
Sure enough, my computer - which up to now had been going unresponsive every 24 hours - was running fine for 72 hours and then BOOM... it happened again.
Just wanted to let people know that it seems that at the minute, the dovecot that ships with SuSE and the kernel they are using in 11.1 exhibit this problem.
Gino
Arno Wald wrote:
I have running the older debian/sid dovecot 1:1.0.15-2.3 again, now with kernel 2.6.27-10.slh.1-sidux-686 and the issue seems to be fixed. So I recommend to use at least 2.6.27.10.
Bye, Arno.
-- View this message in context: http://www.nabble.com/unkillable-imap-process%28es%29-with-high-CPU-usage-tp... Sent from the Dovecot mailing list archive at Nabble.com.
On Feb 14, 2009, at 9:57 AM, agent59624285 wrote:
I have this EXACT same problem after upgrading to SuSE 11.1, which
uses this exact kernel version!!After reading this, I was excited to think that if I killed the
nfsserver daemon (which I had running for no good reason), that it would sort my problem....Sure enough, my computer - which up to now had been going
unresponsive every 24 hours - was running fine for 72 hours and then BOOM... it
happened again.Just wanted to let people know that it seems that at the minute, the
dovecot that ships with SuSE and the kernel they are using in 11.1 exhibit
this problem.Gino
This is sounding similar to the problem I have with my setup:
High CPU usage.
Can't kill IMAP.
Server becomes unresponsive.
I'm using CentOS 5.2 64-bit version with the latest cPanel.
So what am I missing, other than the problem nobody else is having is
clearly something they ARE having?
Peace, Gene
On Sun, 2009-02-15 at 12:10 -0700, Gene Steinberg wrote:
This is sounding similar to the problem I have with my setup:
High CPU usage.
Can't kill IMAP.
kill -9 doesn't work for imap processes? You didn't mention this before.
I'm using CentOS 5.2 64-bit version with the latest cPanel.
So what am I missing, other than the problem nobody else is having is
clearly something they ARE having?
If you can't kill -9 a process, it means the kernel is buggy. At least 2.6.27 was buggy and it was fixed in 2.6.27.10.
On Feb 15, 2009, at 12:43 PM, Timo Sirainen wrote:
On Sun, 2009-02-15 at 12:10 -0700, Gene Steinberg wrote:
This is sounding similar to the problem I have with my setup:
High CPU usage.
Can't kill IMAP.
kill -9 doesn't work for imap processes? You didn't mention this
before.I'm using CentOS 5.2 64-bit version with the latest cPanel.
So what am I missing, other than the problem nobody else is having is clearly something they ARE having?
If you can't kill -9 a process, it means the kernel is buggy. At least 2.6.27 was buggy and it was fixed in 2.6.27.10.
Some of this is above my pay grade (so forgive the imprecision), but I
did try to restart IMAP in cPanel with no success, assuming I catch it
before the load makes it impossible to do anything.
Yes, I have been able to kill processes by the standard ID number. I
did that with rsync the other day when changing the backup parameters.
Here's the kernel info on my box:
Linux server.paracastworld.net 2.6.27.9rootserver-20081216a #1 SMP Tue
Dec 16 02:29:13 EST 2008 x86_64
So that's a buggy kernel?
Or is the 2.6.27.9 version better than 2.6.27 in this regard?
Peace, Gene
On Sun, 2009-02-15 at 12:48 -0700, Gene Steinberg wrote:
Here's the kernel info on my box:
Linux server.paracastworld.net 2.6.27.9rootserver-20081216a #1 SMP Tue
Dec 16 02:29:13 EST 2008 x86_64So that's a buggy kernel?
Or is the 2.6.27.9 version better than 2.6.27 in this regard?
2.6.27.9 is buggy. I'm pretty sure upgrading the kernel will fix your problem.
On Sun, 2009-02-15 at 13:42 -0700, Gene Steinberg wrote:
On Feb 15, 2009, at 1:16 PM, Timo Sirainen wrote:
2.6.27.9 is buggy. I'm pretty sure upgrading the kernel will fix your problem.
The yum upgrade function doesn't produce any updates.
Your kernel version looks like it's self-compiled instead of from CentOS, so yum won't help you.
It was likely compiled by the host/DC then, so it would not be a good
idea to change it.
Peace, Gene
On Feb 15, 2009, at 2:19 PM, Timo Sirainen tss@iki.fi wrote:
On Sun, 2009-02-15 at 13:42 -0700, Gene Steinberg wrote:
On Feb 15, 2009, at 1:16 PM, Timo Sirainen wrote:
2.6.27.9 is buggy. I'm pretty sure upgrading the kernel will fix
your problem.The yum upgrade function doesn't produce any updates.
Your kernel version looks like it's self-compiled instead of from CentOS, so yum won't help you.
Timo Sirainen wrote:
On Sun, 2009-02-15 at 12:10 -0700, Gene Steinberg wrote:
This is sounding similar to the problem I have with my setup:
High CPU usage.
Can't kill IMAP.
kill -9 doesn't work for imap processes? You didn't mention this before.
I'm using CentOS 5.2 64-bit version with the latest cPanel.
So what am I missing, other than the problem nobody else is having is
clearly something they ARE having?If you can't kill -9 a process, it means the kernel is buggy. At least 2.6.27 was buggy and it was fixed in 2.6.27.10.
Or it's I/O locked on the filesystem i.e. NFS server went away or something else. What state ate the un-killable processes in? (Z, D, S, etc.)
~Seth
On Feb 15, 2009, at 12:53 PM, Seth Mattinen wrote:
Or it's I/O locked on the filesystem i.e. NFS server went away or
something else. What state ate the un-killable processes in? (Z, D,
S, etc.)~Seth
I'd have to switch back to Dovecot in order to test for this, and
arrange to have someone with far more expertise than I possess to
continue monitoring the server to catch this when it happens. cPanel
support can't be expected to devote that much attention to preventive
medicine. My admin could do it, I suppose, though I'm only one of his
smaller clients, so I wouldn't expect it either.
As I said, I'm inclined to want to try this again for testing, if
someone would work with me on the initial setup, switching to a later
version of Dovecot than the one that cPanel operates with (if it'll
still integrate with cPanel -- is that possible?).
Peace, GEne
participants (10)
-
agent59624285
-
Arno Wald
-
Cor Bosman
-
David Rosenstrauch
-
Edgar Fuß
-
Gene Steinberg
-
Matthias Rieber
-
nuitari-dovecot@nuitari.net
-
Seth Mattinen
-
Timo Sirainen