[Dovecot] Dovecot and NFS with file locking
Greetings all,
I'm trying to get an understanding of a problem we are facing here. We're currently running dovecot 1.0-beta3 and have a long standing issue of system crashes on our mail server (Debian Linux 2.4.27-2-k7-smp).
Here's what is happening:
The machine hangs and the system load climbs as high as 80.0+. Yet, the system response is not effected. Command line is still responds instantly. There are multiple running dovecot PIDs, even if I stop the service. If I try to kill or -9 the PIDs, they will not die. The machine is DOA and must be forcefully restarted. Issuing a reboot will cause the machine to hang when it attempts to unmount network shares.
Here's the setup:
- Dovecot 1.0-beta3
- lock_method = dotlock
- mmap_disable = yes
/var/mail is store locally on the mail server and access via NFS to ALL remote machines. All remote machines have /var/mail sym linked to the NFS share on Mail.
/home on Mail is NFS'd to another set of servers where IMAP mail folders reside in mbox format. All client machines have /home sym linked to the second NFS server.
In other words, there's a lot of NFS shares and one mail transaction can involve 3 machines.
What I'm trying to find out is the current state of NFS locking with Dovecot. This system hang happens 1-3 times a week. The current /home NFS mounts are running from SGI machines on IRIX 6.5. Clients are all Linux (debian) 2.4 or Linux (ubuntu) 2.6.
Is our setup too much for Dovecot to handle? Are there other variables we're not looking at here?
Thanks everyone.
--
Nate Sanders nate@ima.umn.edu Associate Systems Manager (612) 624 - 4353 http://www.ima.umn.edu/
Institute for Mathematics and its Applications University of Minnesota 400 Lind Hall, 207 Church St. SE Minneapolis, MN 55455-0463
Nate Sanders wrote:
Greetings all,
I'm trying to get an understanding of a problem we are facing here. We're currently running dovecot 1.0-beta3 and have a long standing issue of system crashes on our mail server (Debian Linux 2.4.27-2-k7-smp).
Here's what is happening:
The machine hangs and the system load climbs as high as 80.0+. Yet, the system response is not effected. Command line is still responds instantly. There are multiple running dovecot PIDs, even if I stop the service. If I try to kill or -9 the PIDs, they will not die. The machine is DOA and must be forcefully restarted. Issuing a reboot will cause the machine to hang when it attempts to unmount network shares.
It sounds like NFS is dying one way or another -- likely due to a bug on either the client side (you could try compiling a newer 2.4 or 2.6 kernel) or the server side (I know jack about NFS on IRIX.) If you look at the tasks in ps or top, the 'state' column is probably 'D' indicating an uninterruptible sleep (which usually means the process is hung waiting for an IO request to complete.)
Are there any messages in the kernel log indicating NFS timeouts? Specifying 'intr' in the nfs mount options might enable you to actually kill the running dovecot processes, unmount, and remount, but that won't solve your real problem.
-- Ben Winslow rain@bluecherry.net
Ben Winslow wrote:
It sounds like NFS is dying one way or another -- likely due to a bug on either the client side (you could try compiling a newer 2.4 or 2.6 kernel) or the server side (I know jack about NFS on IRIX.) If you look at the tasks in ps or top, the 'state' column is probably 'D' indicating an uninterruptible sleep (which usually means the process is hung waiting for an IO request to complete.)
Are there any messages in the kernel log indicating NFS timeouts? Specifying 'intr' in the nfs mount options might enable you to actually kill the running dovecot processes, unmount, and remount, but that won't solve your real problem.
Yeah the state ends up hung on the PIDs. Right now we're working to migrate all users from these two IRIX machines to a 2.4 Linux NAS. From there we will do additional testing before we try anything else. I'm sure a lot of the issue is between Linux NFS and IRIX NFS.
We end up with quite a few NFS and lock messages in the logs. I'm sure the setup is not ideal for the current maturity of NFS usage in dovecot, but that's why I was trying to get a little additional info.
Timo Sirainen wrote:
So if you're using mboxes, what about mbox_read_locks and mbox_write_locks? Maybe it helps if you change them to be dotlocks also.
I will look into these as well, thanks.
--
Nate Sanders nate@ima.umn.edu Associate Systems Manager (612) 624 - 4353 http://www.ima.umn.edu/
Institute for Mathematics and its Applications University of Minnesota 400 Lind Hall, 207 Church St. SE Minneapolis, MN 55455-0463
Timo Sirainen wrote:
On Mon, 2006-05-01 at 15:56 -0500, Nate Sanders wrote:
Here's the setup:
- Dovecot 1.0-beta3
- lock_method = dotlock
- mmap_disable = yes
So if you're using mboxes, what about mbox_read_locks and mbox_write_locks? Maybe it helps if you change them to be dotlocks also.
Here is some more info on lock methods used on the system.
mail:# postconf -d|grep mailbox_delivery_lock mailbox_delivery_lock = fcntl, dotlock
mail:# grep mbox_read_locks /etc/dovecot/dovecot.conf mbox_read_locks = fcntl
mail:# grep mbox_write_locks /etc/dovecot/dovecot.conf mbox_write_locks = fcntl dotlock
--
Nate Sanders nate@ima.umn.edu Associate Systems Manager (612) 624 - 4353 http://www.ima.umn.edu/
Institute for Mathematics and its Applications University of Minnesota 400 Lind Hall, 207 Church St. SE Minneapolis, MN 55455-0463
participants (3)
-
Ben Winslow
-
Nate Sanders
-
Timo Sirainen