I switched all of our servers to dotlock_use_excl=no last night, but we're still seeing the errors:
Dec 23 08:22:04 gehenna17.rutgers.edu dovecot: imap-login: Login: user=<user1>, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 08:36:21 gehenna11.rutgers.edu dovecot: imap-login: Login: user=<user1>, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 09:17:09 gehenna18.rutgers.edu dovecot: imap-login: Login: user=<user1>, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 10:59:37 gehenna11.rutgers.edu dovecot: IMAP(user1): fdatasync(/rci/nqu/rci/u1/user1/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle Dec 23 10:59:37 gehenna11.rutgers.edu dovecot: IMAP(user1): /rci/nqu/rci/u1/user1/dovecot/.INBOX/dovecot-uidlist: next_uid was lowered (512 -> 511, hdr=511) Dec 23 11:05:43 gehenna11.rutgers.edu dovecot: IMAP(user1): Maildir /rci/u1/user1/Maildir: Expunged message reappeared, giving a new UID (old uid=1, file=1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S) Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): /rci/nqu/rci/u1/user1/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 4: 1231949492.M325236P5922V03E80009I01B6DCA5_0.gehenna9.rutgers.edu,S=4398:2,S (uid 1 -> 514) Dec 23 11:05:44 gehenna18.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
It must be related to locking though because we can see that the user is logged in on more than one server and the issues occur when the two dovecot instances fight over the uidlist.
The second error I'm seeing much more of though. A few users are spawning these errors on almost every connection. This doesn't appear to be locking related, since the user is logged in on only one server as far as I can tell and there are also no NFS or duplicate file errors:
Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: imap-login: Login: user=<user2>, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 23 11:18:32 gehenna17.rutgers.edu dovecot: IMAP(user2): Raw backtrace: /usr/libexec/dovecot/imap [0x49ed50] -> /usr/libexec/dovecot/imap [0x49edb3] -> /usr/libexec/dovecot/imap [0x49e416] -> /usr/libexec/dovecot/imap [0x41c7c0] -> /us r/libexec/dovecot/imap [0x41c9c2] -> /usr/libexec/dovecot/imap(cmd_list_full+0x49a) [0x41d60a] -> /usr/libexec/dovecot/imap(cmd_list+0xb) [0x41d8ab] -> /usr/libexec/dovecot/imap [0x4208cc] -> /usr/libexec/dovecot/imap [0x420982] -> /usr/ libexec/dovecot/imap(client_handle_input+0x3f) [0x420abf] -> /usr/libexec/dovecot/imap(client_input+0x5f) [0x42160f] -> /usr/libexec/dovecot/imap(io_loop_handler_run+0xf8) [0x4a6bd8] -> /usr/libexec/dovecot/imap(io_loop_run+0x1d) [0x4a5c ed] -> /usr/libexec/dovecot/imap(main+0x620) [0x428ef0] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x38af41d994] -> /usr/libexec/dovecot/imap [0x419ac9] Dec 23 11:18:33 gehenna17.rutgers.edu dovecot: imap-login: Login: user=<user2>, method=PLAIN, rip=xxx.xxx.xxx.xxx, lip=xxx.xxx.xxx.xxx, TLS Dec 23 11:18:33 gehenna17.rutgers.edu dovecot: IMAP(user2): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 23 11:18:33 gehenna17.rutgers.edu dovecot: IMAP(user2): Raw backtrace: /usr/libexec/dovecot/imap [0x49ed50] -> /usr/libexec/dovecot/imap [0x49edb3] -> /usr/libexec/dovecot/imap [0x49e416] -> /usr/libexec/dovecot/imap [0x41c7c0] -> /us r/libexec/dovecot/imap [0x41c9c2] -> /usr/libexec/dovecot/imap(cmd_list_full+0x49a) [0x41d60a] -> /usr/libexec/dovecot/imap(cmd_list+0xb) [0x41d8ab] -> /usr/libexec/dovecot/imap [0x4208cc] -> /usr/libexec/dovecot/imap [0x420982] -> /usr/ libexec/dovecot/imap(client_handle_input+0x3f) [0x420abf] -> /usr/libexec/dovecot/imap(client_input+0x5f) [0x42160f] -> /usr/libexec/dovecot/imap(io_loop_handler_run+0xf8) [0x4a6bd8] -> /usr/libexec/dovecot/imap(io_loop_run+0x1d) [0x4a5c ed] -> /usr/libexec/dovecot/imap(main+0x620) [0x428ef0] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x38af41d994] -> /usr/libexec/dovecot/imap [0x419ac9]
On 12/22/2009 08:17 PM, Timo Sirainen wrote:
On 22.12.2009, at 16.42, David Halik wrote:
I just double checked and we're seeing the same thing here, although we were never running with "noac" to begin with. Our poor NFS server would melt probably. ;)
This is immediately before the crash:
Dec 22 13:09:20 gehenna14.rutgers.edu dovecot: IMAP(user): fdatasync(/rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle
The interesting this is that this function is called only when dovecot-uidlist.lock has been created, i.e. when uidlist is locked, meaning nothing should have deleted the dovecot-uidlist while it was being written to, so this error just shouldn't happen.. So apparently the dotlocking just isn't working properly for your NFS servers. See if dotlock_use_excl=no helps?