Ben Winslow wrote:
It sounds like NFS is dying one way or another -- likely due to a bug on either the client side (you could try compiling a newer 2.4 or 2.6 kernel) or the server side (I know jack about NFS on IRIX.) If you look at the tasks in ps or top, the 'state' column is probably 'D' indicating an uninterruptible sleep (which usually means the process is hung waiting for an IO request to complete.)
Are there any messages in the kernel log indicating NFS timeouts? Specifying 'intr' in the nfs mount options might enable you to actually kill the running dovecot processes, unmount, and remount, but that won't solve your real problem.
Yeah the state ends up hung on the PIDs. Right now we're working to migrate all users from these two IRIX machines to a 2.4 Linux NAS. From there we will do additional testing before we try anything else. I'm sure a lot of the issue is between Linux NFS and IRIX NFS.
We end up with quite a few NFS and lock messages in the logs. I'm sure the setup is not ideal for the current maturity of NFS usage in dovecot, but that's why I was trying to get a little additional info.
Timo Sirainen wrote:
So if you're using mboxes, what about mbox_read_locks and mbox_write_locks? Maybe it helps if you change them to be dotlocks also.
I will look into these as well, thanks.