We've started seeing the maildir_uidlist_records_array_delete assert crash as well. It always seems to be preceded by a 'stale NFS file handle' error from a the same user on a different connection.
Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=<apbao>, rip=a.a.a.a, pid=2439: fdatasync(/home11/apbao/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=<apbao>, rip=a.a.a.a, pid=2439: /home11/apbao/Maildir/dovecot-uidlist: next_uid was lowered (2642 -> 2641, hdr=2641) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=<apbao>, rip=b.b.b.b, pid=28088: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=<apbao>, rip=b.b.b.b, pid=28088: Raw backtrace: imap [0x4d8986] -> imap [0x4d97b0] -> imap(i_fatal+0) [0x4d8c7a] -> imap [0x44f2cc] -> imap [0x44f814] -> imap [0x4500a2] -> imap(maildir_uidlist_refresh+0x9d) [0x450686] -> imap [0x44bff1] -> imap [0x44c0a8] -> imap [0x44c178] -> imap(maildir_storage_sync_init+0x7c) [0x44c6e6] -> imap(mailbox_sync_init+0x44) [0x489922] -> imap(imap_sync_init+0xab) [0x42e02b] -> imap [0x42f107] -> imap(cmd_sync_delayed+0x1c6) [0x42f663] -> imap(client_handle_input+0x119) [0x4244d4] -> imap(client_input+0xb4) [0x424594] -> imap(io_loop_handler_run+0x17d) [0x4e5020] -> imap(io_loop_run+0x3b) [0x4e4214] -> imap(main+0xa6) [0x4300af] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c4ea1d994] -> imap [0x419aa9] Dec 22 11:17:26 cc-popmap2p dovecot: dovecot: child 28088 (imap) killed with signal 6 (core dumped)
Dec 22 13:16:49 cc-popmap3p dovecot: imap: user=<ndunn>, rip=x.x.x.x, pid=3908: fdatasync(/home2/ndunn/Maildir/dovecot-uidlist) failed: Stale NFS file handle Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=<ndunn>, rip=y.y.y.y, pid=3228: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=<ndunn>, rip=y.y.y.y, pid=3228: Raw backtrace: imap [0x4d8986] -> imap [0x4d97b0] -> imap(i_fatal+0) [0x4d8c7a] -> imap [0x44f2cc] -> imap [0x44f814] -> imap [0x4500a2] -> imap(maildir_uidlist_refresh+0x9d) [0x450686] -> imap [0x44bff1] -> imap [0x44c0a8] -> imap [0x44c178] -> imap(maildir_storage_sync_init+0x7c) [0x44c6e6] -> imap(mailbox_sync_init+0x44) [0x489922] -> imap(imap_sync_init+0xab) [0x42e02b] -> imap [0x42f107] -> imap(cmd_sync_delayed+0x1c6) [0x42f663] -> imap(client_handle_input+0x119) [0x4244d4] -> imap(client_input+0xb4) [0x424594] -> imap(io_loop_handler_run+0x17d) [0x4e5020] -> imap(io_loop_run+0x3b) [0x4e4214] -> imap(main+0xa6) [0x4300af] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e5021d994] -> imap [0x419aa9] Dec 22 13:25:16 cc-popmap3p dovecot: dovecot: child 3228 (imap) killed with signal 6 (core dumped)
I will note that we did not start seeing this crash until we took 'noac' out of our NFS mount options, as discussed on this list late last week. On the other hand, load on our NFS server (as measured in IOPS/sec) has dropped by a factor of 10.
-Brad
-----Original Message----- From: dovecot-bounces+brandond=uoregon.edu@dovecot.org [mailto:dovecot- bounces+brandond=uoregon.edu@dovecot.org] On Behalf Of David Halik Sent: Tuesday, December 22, 2009 7:48 AM To: dovecot@dovecot.org Subject: Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
I'm seeing both of these dumps on multiple users now with 1.2.9, so I went ahead and did backtraces for them both.
maildir_uidlist_records_array_delete panic: http://pastebin.com/f20614d8
ns_get_listed_prefix panic: http://pastebin.com/f1420194c
On 12/21/2009 12:43 PM, David Halik wrote:
Just wanted to update you that I just upgraded all of our servers to 1.2.9 and I'm still seeing the array_delete panic:
Dec 21 12:10:16 gehenna11.rutgers.edu dovecot: IMAP(user1): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL) Dec 21 12:15:12 gehenna19.rutgers.edu dovecot: IMAP(user2): Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
I also started receiving a good deal of these, but only from one user so far:
Dec 21 12:16:42 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES) Dec 21 12:19:57 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic: file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed: (match == IMAP_MATCH_YES)
Let me know if you need full backtraces from the core dump.
On 12/17/2009 02:06 PM, David Halik wrote:
On 12/17/2009 01:07 PM, Timo Sirainen wrote:
On Thu, 2009-12-17 at 12:49 -0500, David Halik wrote:
I applied those patches to my 1.2.8 installation before 1.2.9 was released and that immediately fixed the expunge crash, but the array_delete bug is still there. Do you also see the "duplicate file entry" before the crash?
Yes, the duplicate file entry is always reported immediately before the crash, just as Ralf reported too. You can see it in this example pastebin I took from one of our users:
maildir:~/Maildir:INDEX=/rci/nqu%h/dovecot:CONTROL=/rci/nqu%h/dovecot Are the index/control files on NFS? Are there multiple different servers accessing mail data?
Correct. All index, control files, amd mail storage are located on NFS and there are multiple load balanced servers accessing the NFS data. We're currently running with:
mmap_disable = yes dotlock_use_excl = yes fsync_disable = no mail_nfs_storage = yes mail_nfs_index = yes
--
David Halik System Administrator OIT-CSS Rutgers University dhalik@jla.rutgers.edu