Ok, I think I've got some more info and a more accurate time line for you. I tried this on two different dumps from two different users. The count was 4 in the first example and 0 in the second. I'm guessing that's considered "small"? The links to my gdb sessions for both are below and have some of the info you were looking for. The corresponding logs are also there so you can see how each failed. I put everything on pastebin so it's a little easier to see.
By the way, I also found that the stale NFS file handle message does appear first in each instance, it was just farther back in the logs. "Lowering uid" message also appears immediately after every stale NFS message, which in turn causes all of this n amount of time later (sometimes 5 minutes, sometimes 20) when a user does a new action. The "file reappeared message only occurs some of the time. Here's the chain of events in every case so far that I can see:
- fdatasync(/rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist) failed: Stale NFS file handle
- /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: next_uid was lowered (n -> n-1, hdr=n-1) ...a few minutes later... (may or may not be a "message reappeared" warning at this point)
- /rci/nqu/rci/u8/user/dovecot/.INBOX/dovecot-uidlist: Duplicate file entry at line 3: 1261057547.M378185P17303V03E80002I0197FB4A_0.gehenna9.rutgers.edu,S=7174:2,RS (uid i -> n+1,2,3 )
- Panic: file maildir-uidlist.c: line 405 (maildir_uidlist_records_array_delete): assertion failed: (pos != NULL)
One thing to note, after the "Expunged message reappeared, giving a new UID" he died quickly and one more than one server simultaneously. The gdb output is from server gehenna11 of that log file. The uid in *recs[0] is also the number that you can see in the logs being lowered from 719 -> 718.
First user log: http://pastebin.com/m1718f07b First user gdb: http://pastebin.com/m40088dc8
The second user also died on more than one server. The output is also from gehenna11
Second user log: http://pastebin.com/f3a1756f2 Second user gdb: http://pastebin.com/m59aacde4
On 12/29/2009 7:50 PM, Timo Sirainen wrote:
On 29.12.2009, at 19.09, David Halik wrote:
I'll definitely get back to you on this. Right now we're closed until after New Years and I don't want to go updating the dovecot package on all of our servers until we're all back at work. I did do some quick poking around and the count is optimized out, so I'll have the package rebuilt without optimization and let you what the values are at the beginning of next week. Thanks again.
well, you can probably also get the values in a bit more difficult way:
p count = p uidlist.records.arr.buffer.used / uidlist.records.arr.element_size
p recs[n] = p *(*uidlist.records.v)[n]