[Dovecot] dovecot-1.2.8 imap crash (with backtrace)

David Halik dhalik at jla.rutgers.edu
Tue Dec 22 23:42:50 EET 2009


I just double checked and we're seeing the same thing here, although we 
were never running with "noac" to begin with. Our poor NFS server would 
melt probably. ;)

This is immediately before the crash:

Dec 22 13:09:20 gehenna14.rutgers.edu dovecot: IMAP(user): 
fdatasync(/rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist) failed: 
Stale NFS file handle
Dec 22 13:09:20 gehenna14.rutgers.edu dovecot: IMAP(user): 
/rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist: next_uid was 
lowered (1507 -> 1506, hdr=1506)
Dec 22 13:12:06 gehenna14.rutgers.edu dovecot: IMAP(user): 
/rci/nqu/rci/u5/bruben/dovecot/.INBOX/dovecot-uidlist: Duplicate file 
entry at line 3: 
1249348137.M369064P26267V04240006I01ED7E09_0.gehenna5.rutgers.edu,S=15857:2,S 
(uid 1208 -> 1508)


On 12/22/2009 04:33 PM, Brandon Davidson wrote:
> We've started seeing the maildir_uidlist_records_array_delete assert crash as well. It always seems to be preceded by a 'stale NFS file handle' error from a the same user on a different connection.
>
> Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=<apbao>, rip=a.a.a.a, pid=2439: fdatasync(/home11/apbao/Maildir/dovecot-uidlist) failed: Stale NFS file handle
> Dec 22 10:12:20 oh-popmap5p dovecot: imap: user=<apbao>, rip=a.a.a.a, pid=2439: /home11/apbao/Maildir/dovecot-uidlist: next_uid was lowered (2642 ->  2641, hdr=2641)
> Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=<apbao>, rip=b.b.b.b, pid=28088: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
> Dec 22 11:17:26 cc-popmap2p dovecot: imap: user=<apbao>, rip=b.b.b.b, pid=28088: Raw backtrace: imap [0x4d8986] ->  imap [0x4d97b0] ->  imap(i_fatal+0) [0x4d8c7a] ->  imap [0x44f2cc] ->  imap [0x44f814] ->  imap [0x4500a2] ->  imap(maildir_uidlist_refresh+0x9d) [0x450686] ->  imap [0x44bff1] ->  imap [0x44c0a8] ->  imap [0x44c178] ->  imap(maildir_storage_sync_init+0x7c) [0x44c6e6] ->  imap(mailbox_sync_init+0x44) [0x489922] ->  imap(imap_sync_init+0xab) [0x42e02b] ->  imap [0x42f107] ->  imap(cmd_sync_delayed+0x1c6) [0x42f663] ->  imap(client_handle_input+0x119) [0x4244d4] ->  imap(client_input+0xb4) [0x424594] ->  imap(io_loop_handler_run+0x17d) [0x4e5020] ->  imap(io_loop_run+0x3b) [0x4e4214] ->  imap(main+0xa6) [0x4300af] ->  /lib64/libc.so.6(__libc_start_main+0xf4) [0x3c4ea1d994] ->  imap [0x419aa9]
> Dec 22 11:17:26 cc-popmap2p dovecot: dovecot: child 28088 (imap) killed with signal 6 (core dumped)
>
> Dec 22 13:16:49 cc-popmap3p dovecot: imap: user=<ndunn>, rip=x.x.x.x, pid=3908: fdatasync(/home2/ndunn/Maildir/dovecot-uidlist) failed: Stale NFS file handle
> Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=<ndunn>, rip=y.y.y.y, pid=3228: Panic: file maildir-uidlist.c: line 403 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
> Dec 22 13:25:16 cc-popmap3p dovecot: imap: user=<ndunn>, rip=y.y.y.y, pid=3228: Raw backtrace: imap [0x4d8986] ->  imap [0x4d97b0] ->  imap(i_fatal+0) [0x4d8c7a] ->  imap [0x44f2cc] ->  imap [0x44f814] ->  imap [0x4500a2] ->  imap(maildir_uidlist_refresh+0x9d) [0x450686] ->  imap [0x44bff1] ->  imap [0x44c0a8] ->  imap [0x44c178] ->  imap(maildir_storage_sync_init+0x7c) [0x44c6e6] ->  imap(mailbox_sync_init+0x44) [0x489922] ->  imap(imap_sync_init+0xab) [0x42e02b] ->  imap [0x42f107] ->  imap(cmd_sync_delayed+0x1c6) [0x42f663] ->  imap(client_handle_input+0x119) [0x4244d4] ->  imap(client_input+0xb4) [0x424594] ->  imap(io_loop_handler_run+0x17d) [0x4e5020] ->  imap(io_loop_run+0x3b) [0x4e4214] ->  imap(main+0xa6) [0x4300af] ->  /lib64/libc.so.6(__libc_start_main+0xf4) [0x3e5021d994] ->  imap [0x419aa9]
> Dec 22 13:25:16 cc-popmap3p dovecot: dovecot: child 3228 (imap) killed with signal 6 (core dumped)
>
> I will note that we did not start seeing this crash until we took 'noac' out of our NFS mount options, as discussed on this list late last week. On the other hand, load on our NFS server (as measured in IOPS/sec) has dropped by a factor of 10.
>
> -Brad
>
>    
>> -----Original Message-----
>> From: dovecot-bounces+brandond=uoregon.edu at dovecot.org [mailto:dovecot-
>> bounces+brandond=uoregon.edu at dovecot.org] On Behalf Of David Halik
>> Sent: Tuesday, December 22, 2009 7:48 AM
>> To: dovecot at dovecot.org
>> Subject: Re: [Dovecot] dovecot-1.2.8 imap crash (with backtrace)
>>
>>
>> I'm seeing both of these dumps on multiple users now with 1.2.9, so I
>> went ahead and did backtraces for them both.
>>
>> maildir_uidlist_records_array_delete panic: http://pastebin.com/f20614d8
>>
>> ns_get_listed_prefix panic: http://pastebin.com/f1420194c
>>
>>
>> On 12/21/2009 12:43 PM, David Halik wrote:
>>      
>>> Just wanted to update you that I just upgraded all of our servers to
>>> 1.2.9 and I'm still seeing the array_delete panic:
>>>
>>> Dec 21 12:10:16 gehenna11.rutgers.edu dovecot: IMAP(user1): Panic:
>>> file maildir-uidlist.c: line 403
>>> (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
>>> Dec 21 12:15:12 gehenna19.rutgers.edu dovecot: IMAP(user2): Panic:
>>> file maildir-uidlist.c: line 403
>>> (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
>>>
>>> I also started receiving a good deal of these, but only from one user
>>> so far:
>>>
>>> Dec 21 12:16:42 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic:
>>> file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed:
>>> (match == IMAP_MATCH_YES)
>>> Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic:
>>> file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed:
>>> (match == IMAP_MATCH_YES)
>>> Dec 21 12:18:20 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic:
>>> file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed:
>>> (match == IMAP_MATCH_YES)
>>> Dec 21 12:19:57 gehenna14.rutgers.edu dovecot: IMAP(user3): Panic:
>>> file cmd-list.c: line 242 (ns_get_listed_prefix): assertion failed:
>>> (match == IMAP_MATCH_YES)
>>>
>>> Let me know if you need full backtraces from the core dump.
>>>
>>> On 12/17/2009 02:06 PM, David Halik wrote:
>>>        
>>>> On 12/17/2009 01:07 PM, Timo Sirainen wrote:
>>>>          
>>>>> On Thu, 2009-12-17 at 12:49 -0500, David Halik wrote:
>>>>>
>>>>>            
>>>>>> I applied those patches to my 1.2.8 installation before 1.2.9 was
>>>>>> released and that immediately fixed the expunge crash, but the
>>>>>> array_delete bug is still there.
>>>>>>              
>>>>> Do you also see the "duplicate file entry" before the crash?
>>>>>
>>>>>            
>>>> Yes, the duplicate file entry is always reported immediately before
>>>> the crash, just as Ralf reported too. You can see it in this example
>>>> pastebin I took from one of our users:
>>>>
>>>> http://pastebin.com/f29c55de5
>>>>
>>>>
>>>>          
>>>>>> maildir:~/Maildir:INDEX=/rci/nqu%h/dovecot:CONTROL=/rci/nqu%h/dovecot
>>>>>>              
>>>>> Are the index/control files on NFS? Are there multiple different
>>>>> servers
>>>>> accessing mail data?
>>>>>
>>>>>            
>>>> Correct. All index, control files, amd mail storage are located on
>>>> NFS and there are multiple load balanced servers accessing the NFS
>>>> data. We're currently running with:
>>>>
>>>> mmap_disable = yes
>>>> dotlock_use_excl = yes
>>>> fsync_disable = no
>>>> mail_nfs_storage = yes
>>>> mail_nfs_index = yes
>>>>
>>>>          
>>>
>>>        
>>
>>
>> --
>> ================================
>> David Halik
>> System Administrator
>> OIT-CSS Rutgers University
>> dhalik at jla.rutgers.edu
>> ================================
>>      
>    


-- 
================================
David Halik
System Administrator
OIT-CSS Rutgers University
dhalik at jla.rutgers.edu
================================



More information about the dovecot mailing list