Nate Sanders wrote:
Greetings all,
I'm trying to get an understanding of a problem we are facing here. We're currently running dovecot 1.0-beta3 and have a long standing issue of system crashes on our mail server (Debian Linux 2.4.27-2-k7-smp).
Here's what is happening:
The machine hangs and the system load climbs as high as 80.0+. Yet, the system response is not effected. Command line is still responds instantly. There are multiple running dovecot PIDs, even if I stop the service. If I try to kill or -9 the PIDs, they will not die. The machine is DOA and must be forcefully restarted. Issuing a reboot will cause the machine to hang when it attempts to unmount network shares.
It sounds like NFS is dying one way or another -- likely due to a bug on either the client side (you could try compiling a newer 2.4 or 2.6 kernel) or the server side (I know jack about NFS on IRIX.) If you look at the tasks in ps or top, the 'state' column is probably 'D' indicating an uninterruptible sleep (which usually means the process is hung waiting for an IO request to complete.)
Are there any messages in the kernel log indicating NFS timeouts? Specifying 'intr' in the nfs mount options might enable you to actually kill the running dovecot processes, unmount, and remount, but that won't solve your real problem.
-- Ben Winslow rain@bluecherry.net