[Dovecot] dovecot-1.2.8 imap crash (with backtrace)

David Halik dhalik at jla.rutgers.edu
Wed Dec 30 19:10:47 EET 2009


Ok, I think I've got some more info and a more accurate time line for 
you. I tried this on two different dumps from two different users. The 
count was 4 in the first example and 0 in the second. I'm guessing 
that's considered "small"? The links to my gdb sessions for both are 
below and have some of the info you were looking for. The corresponding 
logs are also there so you can see how each failed. I put everything on 
pastebin so it's a little easier to see.

By the way, I also found that the stale NFS file handle message does 
appear first in each instance, it was just farther back in the logs. 
"Lowering uid" message also appears immediately after every stale NFS 
message, which in turn causes all of this n amount of time later 
(sometimes 5 minutes, sometimes 20) when a user does a new action. The 
"file reappeared message only occurs some of the time. Here's the chain 
of events in every case so far that I can see:

1) fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) 
failed: Stale NFS file handle
2) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: next_uid was 
lowered (n -> n-1, hdr=n-1)
...a few minutes later...
(may or may not be a "message reappeared" warning at this point)
3) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file 
entry at line 3: 
1261057547.M378185P17303V03E80002I0197FB4A_0.gehenna9.rutgers.edu,S=7174:2,RS 
(uid i -> n+1,2,3 )
4) Panic: file maildir-uidlist.c: line 405 
(maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)


One thing to note, after the "Expunged message reappeared, giving a new 
UID" he died quickly and one more than one server simultaneously. The 
gdb output is from server gehenna11 of that log file. The uid in 
*recs[0] is also the number that you can see in the logs being lowered 
from 719 -> 718.

First user log: http://pastebin.com/m1718f07b
First user gdb: http://pastebin.com/m40088dc8

The second user also died on more than one server. The output is also 
from gehenna11

Second user log: http://pastebin.com/f3a1756f2
Second user gdb: http://pastebin.com/m59aacde4



On 12/29/2009 7:50 PM, Timo Sirainen wrote:
> On 29.12.2009, at 19.09, David Halik wrote:
>
>    
>> I'll definitely get back to you on this. Right now we're closed until after New Years and I don't want to go updating the dovecot package on all of our servers until we're all back at work. I did do some quick poking around and the count is optimized out, so I'll have the package rebuilt without optimization and let you what the values are at the beginning of next week. Thanks again.
>>      
> well, you can probably also get the values in a bit more difficult way:
>
> p count = p uidlist.records.arr.buffer.used / uidlist.records.arr.element_size
>
> p recs[n] = p *(*uidlist.records.v)[n]
>
>    



More information about the dovecot mailing list