[Dovecot] Dovecot and stale nfs-locks hanging processes
Greetings dovecot mailing list.
I have implemented a relatively big dovecot setup (250k users) and overall I am very pleased with dovecot functionality and performance.
Setup description:
- dovecot 1.0.x
- FreeBSD 6.3
- Postfix (using dovecot deliver as LDA).
- OpenLdap backend
- Storage is NFS (Clariion EMC NFSd for Maildir, and FreeBSD NFSd for Indexes).
- Locking is fcntl using RPC.lockd.
- Users are accessing mail using POP3 and IMAP (IMAP mainly via Squirrelmail, but also direct)
- 3 frontends for POP/SMTP and 2 frontends for IMAP (webmail). Round Robin DNS
My problem:
I am having issues where POP3, IMAP and DELIVER processes gets stuck, apparently waiting for device.
fstat shows:
bash# fstat -p 93522 USER CMD PID FD MOUNT INUM MODE SZ|DV R/W 302870 pop3 93522 root / 2 drwxr-xr-x 512 r 302870 pop3 93522 wd /home/mnt5 51592 drwxr-xr-x 80 r 302870 pop3 93522 text /usr 121619 -r-xr-xr-x 436616 r 302870 pop3 93522 0* internet stream tcp 302870 pop3 93522 1* internet stream tcp 302870 pop3 93522 2* pipe c778aa48 <-> c778a990 0 rw 302870 pop3 93522 3 /dev 24 crw-rw-rw- random r 302870 pop3 93522 5* pipe ce440b28 <-> ce440be0 0 rw 302870 pop3 93522 6* pipe ce440be0 <-> ce440b28 0 rw 302870 pop3 93522 7 /home/mnt5 9010290 -rw------- 1493 rw 302870 pop3 93522 8 - - bad - 302870 pop3 93522 9 - - bad - 302870 pop3 93522 10 - - bad
And the inode in question on /home/mnt5 is a dot-nfs file, indicating stale lock:
bash# ls -li | grep 9010290 9010290 -rw------- 1 302870 42 1493 Apr 3 18:05 .nfs.0668c236.6d524.4
ktrace on the pid shows absolutely no activity.
The pop3 process is un-killable, and I end up stacking up pop3 processes from the user, as well as deliver to the user. Not healthy.. I was under the impression that POP3 would exit when a lock is set, preventing more than one pop3 processes pr. user, but it doesn't seem to be the case.
Stopping dovecot entirely, leaves these stale pop3/imap/deliver processes hanging, even with shutdown_clients = yes
The windows-problem-solution (reboot) seems to be the only way to get rid of the locked processes.
So: Has anyone else observed this behavior, and eventually found the magic cure ?
I wonder if there was a way to implement a "max wall-clock time" per dovecot process type (i.e.. terminate process after for example 120 sec. delivery, 600 sec pop3 etc...), as a crude "garbage-collector".
Any hints/suggestions is welcome.
-- Søren Schrøder
dovecot-bounces+sch=cybercity.dk@dovecot.org wrote:
On 4/4/2008, Søren Schrøder (sch@cybercity.dk) wrote:
- dovecot 1.0.x
Is exact version a secret? ;)
Absolutely not: 1.0.10
I was just stating that i didn't go 1.1 track
Also, output of dovecot -n is usually helpful...
Here goes:
# 1.0.10: /usr/local/etc/dovecot.conf base_dir: /var/run/dovecot/ log_path: /var/log/dovecot/dovecot.log info_log_path: /var/log/dovecot/dovecot.info.log protocols: imap pop3 ssl_disable: yes disable_plaintext_auth: no login_dir: /var/run/dovecot/login login_executable(default): /usr/local/libexec/dovecot/imap-login login_executable(imap): /usr/local/libexec/dovecot/imap-login login_executable(pop3): /usr/local/libexec/dovecot/pop3-login max_mail_processes: 200 verbose_proctitle: yes first_valid_gid: 0 mail_extra_groups: mail mail_location: maildir:~/Maildir:INDEX=/home/indexmnt/idx/%h mmap_disable: yes fsync_disable: yes mbox_write_locks: fcntl mbox_lock_timeout: 100 mbox_dotlock_change_timeout: 60 mail_executable(default): /usr/local/libexec/dovecot/imap mail_executable(imap): /usr/local/libexec/dovecot/imap mail_executable(pop3): /usr/local/libexec/dovecot/pop3 mail_plugin_dir(default): /usr/local/lib/dovecot/imap mail_plugin_dir(imap): /usr/local/lib/dovecot/imap mail_plugin_dir(pop3): /usr/local/lib/dovecot/pop3 imap_client_workarounds(default): delay-newmail outlook-idle netscape-eoh tb-extra-mailbox-sep imap_client_workarounds(imap): delay-newmail outlook-idle netscape-eoh tb-extra-mailbox-sep imap_client_workarounds(pop3): outlook-idle pop3_lock_session(default): no pop3_lock_session(imap): no pop3_lock_session(pop3): yes pop3_uidl_format(default): pop3_uidl_format(imap): pop3_uidl_format(pop3): %08Xu%08Xv pop3_client_workarounds(default): pop3_client_workarounds(imap): pop3_client_workarounds(pop3): outlook-no-nuls oe-ns-eoh auth default: mechanisms: plain digest-md5 verbose: yes debug: yes debug_passwords: yes passdb: driver: ldap args: /usr/local/etc/dovecot-ldap.conf userdb: driver: passwd userdb: driver: ldap args: /usr/local/etc/dovecot-ldap.conf socket:
-- Søren Schrøder, Technical Innovation, Cybercity. "Obey gravity - It's the LAW!"
On Fri, 2008-04-04 at 12:37 +0200, Søren Schrøder wrote:
The pop3 process is un-killable, and I end up stacking up pop3 processes from the user, as well as deliver to the user. Not healthy.. I was under the impression that POP3 would exit when a lock is set, preventing more than one pop3 processes pr. user, but it doesn't seem to be the case.
If you can't kill a process even with kill -9, the problem is with the kernel and Dovecot can't do much about it.
How about trying without fcntl locks:
lock_method = dotlock
Also have you read http://wiki.dovecot.org/NFS?
Timo Sirainen wrote:
If you can't kill a process even with kill -9, the problem is with the
kernel and Dovecot can't do much about it.
exactly - the process seems to be waiting for device. I suspect rpc.lockd to be the sinner. With the NFS beeing an EMC system, my means of debugging on the serverside is limited. Thats why I called for input from the list
How about trying without fcntl locks:
lock_method = dotlock
I tried dotlocking prior to fcntl, but I really could use the performance-gain fcntl gives in comparisson to dotlocking.
Also have you read http://wiki.dovecot.org/NFS?
I have indeed, and I know that my setup is "Dovecot is run in multiple computers, users are redirected more or less randomly to different computers.", the one to be avoided, so I asked for trouble :) But I like the hoizontal scaling of such a setup.
I revert to dotlocking then.
-- Søren Schrøder
participants (3)
-
Charles Marcus
-
Søren Schrøder
-
Timo Sirainen