[Dovecot] dovecot-1.2.8 imap crash (with backtrace)

David Halik dhalik at jla.rutgers.edu
Wed Jan 6 17:07:34 EET 2010


Timo, I reinstalled all our servers with -O0 and full debugging on 
"-gdwarf-2 -g3". I should be able to look up anything you need done now.

Here's some more examples with the data you asked for last time, now 
without optimization. I also noticed that there are a few users that 
seem to be regulars with this problem.

#4  0x00000000004d8fd7 in i_panic (format=0x501688 "file %s: line %d 
(%s): assertion failed: (%s)") at failures.c:207
     args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area = 
0x7fffcf52b5a0, reg_save_area = 0x7fffcf52b4e0}}
#5  0x000000000044f34e in maildir_uidlist_records_array_delete 
(uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405
     recs = (struct maildir_uidlist_rec * const *) 0x1b607f60
     pos = (struct maildir_uidlist_rec * const *) 0x0
     idx = 0
     count = 2
     __PRETTY_FUNCTION__ = "maildir_uidlist_records_array_delete"
#6  0x000000000044f8fc in maildir_uidlist_next (uidlist=0x1b612330, 
line=0x1b60426a 
"1249562340.M302085P7316V04240006I00DB2CF1_0.gehenna7.rutgers.edu,S=19277:2,S") 
at maildir-uidlist.c:552
     rec = (struct maildir_uidlist_rec *) 0x1b6209f8
     old_rec = (struct maildir_uidlist_rec *) 0x1b61f1f8
     recs = (struct maildir_uidlist_rec * const *) 0x1b604264
     count = 0
     uid = 2807
--------------------
(gdb) fr 5
#5  0x000000000044f34e in maildir_uidlist_records_array_delete 
(uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405
405            i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x1b61fa60
(gdb) p count
$2 = 2
(gdb) p *recs[0]
$3 = {uid = 2735, flags = 6, filename = 0x1b61fa78 
"1262739815.M913182P4280V04240006I014D0F92_0.gehenna7.rutgers.edu,S=11791:2,", 
extensions = 0x1b620980 "W12036"}
(gdb) p *recs[1]
$4 = {uid = 2736, flags = 0, filename = 0x1b6209a0 
"1262739645.M307617P13458V04240006I014D0F91_0.gehenna10.rutgers.edu,S=5005:2,S", 
extensions = 0x1b6209f0 "W5124"}
(gdb)

Another user for comparison:

(gdb) fr 5
#5  0x000000000044f34e in maildir_uidlist_records_array_delete 
(uidlist=0x6bd7680, rec=0x6bc93d0) at maildir-uidlist.c:405
405            i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x6d5c418
(gdb) p count
$2 = 7
(gdb) p *recs[0]
$3 = {uid = 52566, flags = 4, filename = 0x6d5c438 
"1262781360.M836147P9323V04240007I0148DC53_0.gehenna5.rutgers.edu,S=2157:2,a", 
extensions = 0x6d5c430 "W2201"}
(gdb) p *recs[1]
$4 = {uid = 52568, flags = 0, filename = 0x6cd4590 
"1262628888.M94168P6001V04240006I003C41C2_0.gehenna10.rutgers.edu,S=22946:2,Sa", 
extensions = 0x6cd4588 "W23481"}
(gdb) p *recs[2]
$5 = {uid = 52569, flags = 0, filename = 0x6bc9328 
"1041233872.000006.mbox:2,S", extensions = 0x6bc9318 "W6334"}
(gdb) p *recs[3]
$6 = {uid = 52570, flags = 0, filename = 0x6bc9370 
"1041233872.000013.mbox:2,S", extensions = 0x6bc9360 "W1193"}
(gdb) p *recs[4]
$7 = {uid = 52571, flags = 0, filename = 0x6bc92e0 
"1041233872.000014.mbox:2,RS", extensions = 0x6bc92d0 "W3095"}
(gdb) p *recs[5]
$8 = {uid = 52572, flags = 4, filename = 0x6d58ec8 
"1262782141.M730129P15749V03E80006I0148DC54_0.gehenna9.rutgers.edu,S=7454:2,", 
extensions = 0x6d62160 "W7626"}
(gdb) p *recs[6]
$9 = {uid = 52573, flags = 0, filename = 0x6d62140 
"1041233872.000016.mbox:2,S", extensions = 0x0}
(gdb)

And one more user:

(gdb) fr 5
#5  0x000000000044f34e in maildir_uidlist_records_array_delete 
(uidlist=0x10f44680, rec=0x10f67120) at maildir-uidlist.c:405
405            i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x11093760
(gdb) p count
$2 = 2
(gdb) p *recs[0]
$3 = {uid = 55108, flags = 0, filename = 0x11093778 
"1262788462.M201250P7938V04240006I01EF76E0_0.gehenna10.rutgers.edu,S=1731278:2,", 
extensions = 0x1109d240 "W1753846"}
(gdb) p *recs[1]
$4 = {uid = 55109, flags = 4, filename = 0x1109d268 
"1262788793.M851477P3866V045C0007I01EF76E3_0.gehenna8.rutgers.edu,S=19990", 
extensions = 0x0}




On 12/30/2009 12:10 PM, David Halik wrote:
>
> Ok, I think I've got some more info and a more accurate time line for 
> you. I tried this on two different dumps from two different users. The 
> count was 4 in the first example and 0 in the second. I'm guessing 
> that's considered "small"? The links to my gdb sessions for both are 
> below and have some of the info you were looking for. The 
> corresponding logs are also there so you can see how each failed. I 
> put everything on pastebin so it's a little easier to see.
>
> By the way, I also found that the stale NFS file handle message does 
> appear first in each instance, it was just farther back in the logs. 
> "Lowering uid" message also appears immediately after every stale NFS 
> message, which in turn causes all of this n amount of time later 
> (sometimes 5 minutes, sometimes 20) when a user does a new action. The 
> "file reappeared message only occurs some of the time. Here's the 
> chain of events in every case so far that I can see:
>
> 1) fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) 
> failed: Stale NFS file handle
> 2) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: next_uid was 
> lowered (n -> n-1, hdr=n-1)
> ...a few minutes later...
> (may or may not be a "message reappeared" warning at this point)
> 3) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file 
> entry at line 3: 
> 1261057547.M378185P17303V03E80002I0197FB4A_0.gehenna9.rutgers.edu,S=7174:2,RS 
> (uid i -> n+1,2,3 )
> 4) Panic: file maildir-uidlist.c: line 405 
> (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
>
>
> One thing to note, after the "Expunged message reappeared, giving a 
> new UID" he died quickly and one more than one server simultaneously. 
> The gdb output is from server gehenna11 of that log file. The uid in 
> *recs[0] is also the number that you can see in the logs being lowered 
> from 719 -> 718.
>
> First user log: http://pastebin.com/m1718f07b
> First user gdb: http://pastebin.com/m40088dc8
>
> The second user also died on more than one server. The output is also 
> from gehenna11
>
> Second user log: http://pastebin.com/f3a1756f2
> Second user gdb: http://pastebin.com/m59aacde4
>
>
>
> On 12/29/2009 7:50 PM, Timo Sirainen wrote:
>> On 29.12.2009, at 19.09, David Halik wrote:
>>
>>> I'll definitely get back to you on this. Right now we're closed 
>>> until after New Years and I don't want to go updating the dovecot 
>>> package on all of our servers until we're all back at work. I did do 
>>> some quick poking around and the count is optimized out, so I'll 
>>> have the package rebuilt without optimization and let you what the 
>>> values are at the beginning of next week. Thanks again.
>> well, you can probably also get the values in a bit more difficult way:
>>
>> p count = p uidlist.records.arr.buffer.used / 
>> uidlist.records.arr.element_size
>>
>> p recs[n] = p *(*uidlist.records.v)[n]
>>
>


-- 
================================
David Halik
System Administrator
OIT-CSS Rutgers University
dhalik at jla.rutgers.edu
================================



More information about the dovecot mailing list