[Dovecot] Dovecot and stale nfs-locks hanging processes

Søren Schrøder sch at cybercity.dk
Fri Apr 4 13:37:04 EEST 2008


Greetings dovecot mailing list.

I have implemented a relatively big dovecot setup (250k users) and
overall I am very pleased with dovecot functionality and performance. 

Setup description:

* dovecot 1.0.x 
* FreeBSD 6.3 
* Postfix (using dovecot deliver as LDA). 
* OpenLdap backend
* Storage is NFS (Clariion EMC NFSd for Maildir, and FreeBSD NFSd for
Indexes). 
* Locking is fcntl using RPC.lockd.
* Users are accessing mail using POP3 and IMAP (IMAP mainly via
Squirrelmail, but also direct)
* 3 frontends for POP/SMTP and 2 frontends for IMAP (webmail). Round
Robin DNS

My problem:

I am having issues where POP3, IMAP and DELIVER processes gets stuck,
apparently waiting for device.

fstat shows:

bash# fstat -p  93522
USER     CMD          PID   FD MOUNT      INUM MODE         SZ|DV R/W
302870   pop3       93522 root /             2 drwxr-xr-x     512  r
302870   pop3       93522   wd /home/mnt5  51592 drwxr-xr-x      80  r
302870   pop3       93522 text /usr     121619 -r-xr-xr-x  436616  r
302870   pop3       93522    0* internet stream tcp
302870   pop3       93522    1* internet stream tcp
302870   pop3       93522    2* pipe c778aa48 <-> c778a990      0 rw
302870   pop3       93522    3 /dev         24 crw-rw-rw-  random  r
302870   pop3       93522    5* pipe ce440b28 <-> ce440be0      0 rw
302870   pop3       93522    6* pipe ce440be0 <-> ce440b28      0 rw
302870   pop3       93522    7 /home/mnt5 9010290 -rw-------    1493 rw
302870   pop3       93522    8 -         -         bad    -
302870   pop3       93522    9 -         -         bad    -
302870   pop3       93522   10 -         -         bad  

And the inode in question on /home/mnt5 is a dot-nfs file, indicating
stale lock:

bash# ls -li | grep 9010290
 9010290 -rw-------  1 302870  42   1493 Apr  3 18:05
.nfs.0668c236.6d524.4

ktrace on the pid shows absolutely no activity.

The pop3 process is un-killable, and I end up stacking up pop3 processes
from the user, as well as deliver to the user. Not healthy.. I was under
the impression that POP3 would exit when a lock is set, preventing more
than one pop3 processes pr. user, but it doesn't seem to be the case. 

Stopping dovecot entirely, leaves these stale pop3/imap/deliver
processes hanging, even with shutdown_clients = yes

The windows-problem-solution (reboot) seems to be the only way to get
rid of the locked processes.

So: Has anyone else observed this behavior, and eventually found the
magic cure ?

I wonder if there was a way to implement a "max wall-clock time" per
dovecot process type (i.e.. terminate process after for example 120 sec.
delivery, 600 sec pop3 etc...), as a crude "garbage-collector". 

Any hints/suggestions is welcome.

-- 
Søren Schrøder


More information about the dovecot mailing list