[Dovecot] Dovecot and stale nfs-locks hanging processes
Søren Schrøder
sch at cybercity.dk
Fri Apr 4 13:37:04 EEST 2008
Greetings dovecot mailing list.
I have implemented a relatively big dovecot setup (250k users) and
overall I am very pleased with dovecot functionality and performance.
Setup description:
* dovecot 1.0.x
* FreeBSD 6.3
* Postfix (using dovecot deliver as LDA).
* OpenLdap backend
* Storage is NFS (Clariion EMC NFSd for Maildir, and FreeBSD NFSd for
Indexes).
* Locking is fcntl using RPC.lockd.
* Users are accessing mail using POP3 and IMAP (IMAP mainly via
Squirrelmail, but also direct)
* 3 frontends for POP/SMTP and 2 frontends for IMAP (webmail). Round
Robin DNS
My problem:
I am having issues where POP3, IMAP and DELIVER processes gets stuck,
apparently waiting for device.
fstat shows:
bash# fstat -p 93522
USER CMD PID FD MOUNT INUM MODE SZ|DV R/W
302870 pop3 93522 root / 2 drwxr-xr-x 512 r
302870 pop3 93522 wd /home/mnt5 51592 drwxr-xr-x 80 r
302870 pop3 93522 text /usr 121619 -r-xr-xr-x 436616 r
302870 pop3 93522 0* internet stream tcp
302870 pop3 93522 1* internet stream tcp
302870 pop3 93522 2* pipe c778aa48 <-> c778a990 0 rw
302870 pop3 93522 3 /dev 24 crw-rw-rw- random r
302870 pop3 93522 5* pipe ce440b28 <-> ce440be0 0 rw
302870 pop3 93522 6* pipe ce440be0 <-> ce440b28 0 rw
302870 pop3 93522 7 /home/mnt5 9010290 -rw------- 1493 rw
302870 pop3 93522 8 - - bad -
302870 pop3 93522 9 - - bad -
302870 pop3 93522 10 - - bad
And the inode in question on /home/mnt5 is a dot-nfs file, indicating
stale lock:
bash# ls -li | grep 9010290
9010290 -rw------- 1 302870 42 1493 Apr 3 18:05
.nfs.0668c236.6d524.4
ktrace on the pid shows absolutely no activity.
The pop3 process is un-killable, and I end up stacking up pop3 processes
from the user, as well as deliver to the user. Not healthy.. I was under
the impression that POP3 would exit when a lock is set, preventing more
than one pop3 processes pr. user, but it doesn't seem to be the case.
Stopping dovecot entirely, leaves these stale pop3/imap/deliver
processes hanging, even with shutdown_clients = yes
The windows-problem-solution (reboot) seems to be the only way to get
rid of the locked processes.
So: Has anyone else observed this behavior, and eventually found the
magic cure ?
I wonder if there was a way to implement a "max wall-clock time" per
dovecot process type (i.e.. terminate process after for example 120 sec.
delivery, 600 sec pop3 etc...), as a crude "garbage-collector".
Any hints/suggestions is welcome.
--
Søren Schrøder
More information about the dovecot
mailing list