[Dovecot] dovecot-1.2.8 imap crash (with backtrace)
David Halik
dhalik at jla.rutgers.edu
Wed Jan 6 17:07:34 EET 2010
Timo, I reinstalled all our servers with -O0 and full debugging on
"-gdwarf-2 -g3". I should be able to look up anything you need done now.
Here's some more examples with the data you asked for last time, now
without optimization. I also noticed that there are a few users that
seem to be regulars with this problem.
#4 0x00000000004d8fd7 in i_panic (format=0x501688 "file %s: line %d
(%s): assertion failed: (%s)") at failures.c:207
args = {{gp_offset = 40, fp_offset = 48, overflow_arg_area =
0x7fffcf52b5a0, reg_save_area = 0x7fffcf52b4e0}}
#5 0x000000000044f34e in maildir_uidlist_records_array_delete
(uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405
recs = (struct maildir_uidlist_rec * const *) 0x1b607f60
pos = (struct maildir_uidlist_rec * const *) 0x0
idx = 0
count = 2
__PRETTY_FUNCTION__ = "maildir_uidlist_records_array_delete"
#6 0x000000000044f8fc in maildir_uidlist_next (uidlist=0x1b612330,
line=0x1b60426a
"1249562340.M302085P7316V04240006I00DB2CF1_0.gehenna7.rutgers.edu,S=19277:2,S")
at maildir-uidlist.c:552
rec = (struct maildir_uidlist_rec *) 0x1b6209f8
old_rec = (struct maildir_uidlist_rec *) 0x1b61f1f8
recs = (struct maildir_uidlist_rec * const *) 0x1b604264
count = 0
uid = 2807
--------------------
(gdb) fr 5
#5 0x000000000044f34e in maildir_uidlist_records_array_delete
(uidlist=0x1b612330, rec=0x1b61f1f8) at maildir-uidlist.c:405
405 i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x1b61fa60
(gdb) p count
$2 = 2
(gdb) p *recs[0]
$3 = {uid = 2735, flags = 6, filename = 0x1b61fa78
"1262739815.M913182P4280V04240006I014D0F92_0.gehenna7.rutgers.edu,S=11791:2,",
extensions = 0x1b620980 "W12036"}
(gdb) p *recs[1]
$4 = {uid = 2736, flags = 0, filename = 0x1b6209a0
"1262739645.M307617P13458V04240006I014D0F91_0.gehenna10.rutgers.edu,S=5005:2,S",
extensions = 0x1b6209f0 "W5124"}
(gdb)
Another user for comparison:
(gdb) fr 5
#5 0x000000000044f34e in maildir_uidlist_records_array_delete
(uidlist=0x6bd7680, rec=0x6bc93d0) at maildir-uidlist.c:405
405 i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x6d5c418
(gdb) p count
$2 = 7
(gdb) p *recs[0]
$3 = {uid = 52566, flags = 4, filename = 0x6d5c438
"1262781360.M836147P9323V04240007I0148DC53_0.gehenna5.rutgers.edu,S=2157:2,a",
extensions = 0x6d5c430 "W2201"}
(gdb) p *recs[1]
$4 = {uid = 52568, flags = 0, filename = 0x6cd4590
"1262628888.M94168P6001V04240006I003C41C2_0.gehenna10.rutgers.edu,S=22946:2,Sa",
extensions = 0x6cd4588 "W23481"}
(gdb) p *recs[2]
$5 = {uid = 52569, flags = 0, filename = 0x6bc9328
"1041233872.000006.mbox:2,S", extensions = 0x6bc9318 "W6334"}
(gdb) p *recs[3]
$6 = {uid = 52570, flags = 0, filename = 0x6bc9370
"1041233872.000013.mbox:2,S", extensions = 0x6bc9360 "W1193"}
(gdb) p *recs[4]
$7 = {uid = 52571, flags = 0, filename = 0x6bc92e0
"1041233872.000014.mbox:2,RS", extensions = 0x6bc92d0 "W3095"}
(gdb) p *recs[5]
$8 = {uid = 52572, flags = 4, filename = 0x6d58ec8
"1262782141.M730129P15749V03E80006I0148DC54_0.gehenna9.rutgers.edu,S=7454:2,",
extensions = 0x6d62160 "W7626"}
(gdb) p *recs[6]
$9 = {uid = 52573, flags = 0, filename = 0x6d62140
"1041233872.000016.mbox:2,S", extensions = 0x0}
(gdb)
And one more user:
(gdb) fr 5
#5 0x000000000044f34e in maildir_uidlist_records_array_delete
(uidlist=0x10f44680, rec=0x10f67120) at maildir-uidlist.c:405
405 i_assert(pos != NULL);
(gdb) p *recs
$1 = (struct maildir_uidlist_rec * const) 0x11093760
(gdb) p count
$2 = 2
(gdb) p *recs[0]
$3 = {uid = 55108, flags = 0, filename = 0x11093778
"1262788462.M201250P7938V04240006I01EF76E0_0.gehenna10.rutgers.edu,S=1731278:2,",
extensions = 0x1109d240 "W1753846"}
(gdb) p *recs[1]
$4 = {uid = 55109, flags = 4, filename = 0x1109d268
"1262788793.M851477P3866V045C0007I01EF76E3_0.gehenna8.rutgers.edu,S=19990",
extensions = 0x0}
On 12/30/2009 12:10 PM, David Halik wrote:
>
> Ok, I think I've got some more info and a more accurate time line for
> you. I tried this on two different dumps from two different users. The
> count was 4 in the first example and 0 in the second. I'm guessing
> that's considered "small"? The links to my gdb sessions for both are
> below and have some of the info you were looking for. The
> corresponding logs are also there so you can see how each failed. I
> put everything on pastebin so it's a little easier to see.
>
> By the way, I also found that the stale NFS file handle message does
> appear first in each instance, it was just farther back in the logs.
> "Lowering uid" message also appears immediately after every stale NFS
> message, which in turn causes all of this n amount of time later
> (sometimes 5 minutes, sometimes 20) when a user does a new action. The
> "file reappeared message only occurs some of the time. Here's the
> chain of events in every case so far that I can see:
>
> 1) fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist)
> failed: Stale NFS file handle
> 2) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: next_uid was
> lowered (n -> n-1, hdr=n-1)
> ...a few minutes later...
> (may or may not be a "message reappeared" warning at this point)
> 3) /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file
> entry at line 3:
> 1261057547.M378185P17303V03E80002I0197FB4A_0.gehenna9.rutgers.edu,S=7174:2,RS
> (uid i -> n+1,2,3 )
> 4) Panic: file maildir-uidlist.c: line 405
> (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
>
>
> One thing to note, after the "Expunged message reappeared, giving a
> new UID" he died quickly and one more than one server simultaneously.
> The gdb output is from server gehenna11 of that log file. The uid in
> *recs[0] is also the number that you can see in the logs being lowered
> from 719 -> 718.
>
> First user log: http://pastebin.com/m1718f07b
> First user gdb: http://pastebin.com/m40088dc8
>
> The second user also died on more than one server. The output is also
> from gehenna11
>
> Second user log: http://pastebin.com/f3a1756f2
> Second user gdb: http://pastebin.com/m59aacde4
>
>
>
> On 12/29/2009 7:50 PM, Timo Sirainen wrote:
>> On 29.12.2009, at 19.09, David Halik wrote:
>>
>>> I'll definitely get back to you on this. Right now we're closed
>>> until after New Years and I don't want to go updating the dovecot
>>> package on all of our servers until we're all back at work. I did do
>>> some quick poking around and the count is optimized out, so I'll
>>> have the package rebuilt without optimization and let you what the
>>> values are at the beginning of next week. Thanks again.
>> well, you can probably also get the values in a bit more difficult way:
>>
>> p count = p uidlist.records.arr.buffer.used /
>> uidlist.records.arr.element_size
>>
>> p recs[n] = p *(*uidlist.records.v)[n]
>>
>
--
================================
David Halik
System Administrator
OIT-CSS Rutgers University
dhalik at jla.rutgers.edu
================================
More information about the dovecot
mailing list