I've been working around the problem by deleting the contents of ~user/.imap for affected users, but I'd really prefer to fix it properly. :-) Any suggestions would be gratefully received.
I think this should get fixed by itself? Although the first login fails, the next one should succeed.
Yes, that's right. I didn't realize it at first, because we were also seeing other problems (eg assertion failure at line 39 of index-sync.c) which did require the index to be deleted.
I've since made the following changes:
commented out the assertion at line 39 of index-sync.c (as per http://www.nabble.com/dovecot-panic-after-upgrade-1.0.15-to-1.1.2-td19098506...)
recompiled dovecot without optimization (ie, gcc -O0 instead of -O2) (as per http://www.mail-archive.com/dovecot@dovecot.org/msg12936.html)
reverted to using flock for file locking instead of fcntl (we'd been using flock with dovecot 1.0.10; 1.1.3 used fcntl by default in a couple of places, so I changed it back to see what would happen)
After this, we've seen fewer problems, but there are still the occasional ones:
the original corrupted transaction log error which prompted this thread
we still see the occasional assertion failure at line 1091 of index-mail.c; this is the one that I'd hoped to solve by recompiling without optimization, but at least it seems to be happening less frequently than before
this doesn't happen often, but once in a while we see assertion failures at line 163 of src/pop3-login/client-authenticate.c or line 200 of src/imap-login/client-authenticate.c
Can you reproduce this by running imaptest (on a test account)? http://imapwiki.org/ImapTest
Actually this and your other suggestions may not be necessary; I think I just realized what's been going on.
I missed this originally, but on reflection I notice in the logs that every time the "record size too small" error appears, inevitably the user involved is over disk quota and out of grace period. For example,
Oct 3 17:52:21 alcor dovecot: imap[14973] XXXX, NNN.NNN.NN.NNN: Corrupted transaction log file /home/XXXX/mail/.imap/sent-mail/dovecot.index.log: record size too small (type=0xcd91, offset=8180, size=0) Oct 3 17:52:21 alcor dovecot: imap[14973] XXXX, NNN.NNN.NN.NNN: fscking index file (in-memory index) Oct 3 17:52:21 alcor dovecot: imap[14973] XXXX, NNN.NNN.NN.NNN: Transaction log got desynced for index (in-memory index) Oct 3 17:52:21 alcor dovecot: imap[14973] XXXX, NNN.NNN.NN.NNN: Disconnected: IMAP session state is inconsistent, please relogin. bytes=169/1848 Oct 3 17:52:22 alcor dovecot: imap-login: Login: [14998], XXXX, NNN.NNN.NN.NNN Oct 3 17:52:22 alcor dovecot: wrapper[15023]: imap, XXXX, NNN.NNN.NN.NNN Oct 3 17:52:22 alcor dovecot: imap[15023] XXXX, NNN.NNN.NN.NNN: open(/home/XXXX/mail/.imap/CFI-Howe/.temp.alcor.15023.e786c194b1dcc08c) failed: Disk quota exceeded
Having seen this, I went back to the saved log for September 30th (the one containing the excerpt I pasted into my original message in this thread) -- and interestingly, I found that the quota failure wasn't recorded there. I don't know what was different about that occurrence, but everything I've seen recently suggests that "record size too small" is due to interaction with the quota system.
...so this is where I admit that I have indeed read http://wiki.dovecot.org/Quota/FS; unfortunately, the only user-writable file system on this machine that doesn't have (or need to have!) disk quotas enabled is /var/tmp, which doesn't seem to be a suitable place for dovecot indexes.
Under the circumstances, are you still interested in seeing ImapTest results and a backtrace? If so, should I do that before or after upgrading to 1.1.4?
Thanks,
- Steven
Steven Winikoff | Concordia University | "The end of the world will occur at Montreal, QC, Canada | 3:00 p.m., this Friday, with symposium smw@alcor.concordia.ca | to follow." http://alcor.concordia.ca/~smw | - fortune(6)