Re: [Dovecot] Corrupted transaction log file / record size too small
- recompiled dovecot without optimization (ie, gcc -O0 instead of -O2) (as per http://www.mail-archive.com/dovecot@dovecot.org/msg12936.html)
The problem here turned out to be a bug in Sun CC (and gcc -O2
apparently worked after all, so maybe a problem with testing).
Thanks. I'll revert to the default -O2 the next time we recompile (which will probably be version 1.1.4 sometime early next week).
- reverted to using flock for file locking instead of fcntl (we'd been using flock with dovecot 1.0.10; 1.1.3 used fcntl by default in a couple of places, so I changed it back to see what would happen)
This shouldn't really make a difference.
That's good, because in hindsight I haven't noticed any difference. :-)
- we still see the occasional assertion failure at line 1091 of index-mail.c; this is the one that I'd hoped to solve by recompiling without optimization, but at least it seems to be happening less frequently than before
I had hoped v1.1.3 would have fixed this. But perhaps this is also
related to the disk quota issues. Are there any other errors that are
logged before the assert?
Nothing obvious, I'm afraid. For example, this occurence is typical of the 24 which we've logged since Sept. 26 (which is as far back as we keep dovecot log data online; for the record, we upgraded from 1.0.10 to 1.1.3 on Sept. 23):
Oct 8 13:56:39 alcor dovecot: pop3[15127] USER1, NNN.NNN.NNN.NNN: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0, xfer=12/27 Oct 8 13:56:39 alcor dovecot: pop3-login: Login: [15118], USER2, NNN.NNN.NNN.NNN Oct 8 13:56:39 alcor dovecot: wrapper[15139]: pop3, USER2, NNN.NNN.NNN.NNN Oct 8 13:56:39 alcor dovecot: pop3-login: Login: [15058], USER3, NNN.NNN.NNN.NNN Oct 8 13:56:39 alcor dovecot: wrapper[15141]: pop3, USER3, NNN.NNN.NNN.NNN Oct 8 13:56:39 alcor dovecot: pop3[15141] USER3, NNN.NNN.NNN.NNN: mkdir(/home/USER3/.imap/INBOX) failed: Disk quota exceeded Oct 8 13:56:39 alcor dovecot: imap-login: Login: [15121], USER4, NNN.NNN.NNN.NNN Oct 8 13:56:39 alcor dovecot: wrapper[15143]: imap, USER4, NNN.NNN.NNN.NNN Oct 8 13:56:39 alcor dovecot: pop3[15139] USER2, NNN.NNN.NNN.NNN: Disconnected: Logged out top=0/0, retr=0/0, del=0/0, size=0, xfer=12/27 Oct 8 13:56:39 alcor dovecot: imap-login: Login: [15001], USER5, NNN.NNN.NNN.NNN Oct 8 13:56:40 alcor dovecot: wrapper[15159]: imap, USER5, NNN.NNN.NNN.NNN Oct 8 13:56:40 alcor dovecot: imap-login: Login: [15158], USER6, NNN.NNN.NNN.NNN Oct 8 13:56:40 alcor dovecot: wrapper[15162]: imap, USER6, NNN.NNN.NNN.NNN Oct 8 13:56:40 alcor dovecot: imap[15159] USER5, NNN.NNN.NNN.NNN: Disconnected: Logged out bytes=305/13377 Oct 8 13:56:40 alcor dovecot: imap[15162] USER6, NNN.NNN.NNN.NNN: Disconnected: Logged out bytes=110/356 Oct 8 13:56:40 alcor dovecot: Panic: imap[15143] USER4, NNN.NNN.NNN.NNN: file index-mail.c: line 1091 (index_mail_close): assertion failed: (!mail->data.destroying_stream) Oct 8 13:56:40 alcor dovecot: imap[15143] USER4, NNN.NNN.NNN.NNN: Raw backtrace: dovecot-imap [0x4a3ddd] -> dovecot-imap [0x4a4b9d] -> dovecot-imap(i_fatal+0) [0x4a40d1] -> dovecot-imap(index_mail_close+0xfa) [0x4538d4] -> dovecot-imap [0x4538fe] -> dovecot-imap(index_mail_set_seq+0x6a) [0x453ac6] -> dovecot-imap [0x445e6e] -> dovecot-imap(mail_set_seq+0x28) [0x45f6a5] -> dovecot-imap(index_storage_search_next_nonblock+0xe9) [0x458ff0] -> dovecot-imap(mailbox_search_next_nonblock+0x33) [0x462de3] -> dovecot-imap(mailbox_search_next+0x2e) [0x462da2] -> dovecot-imap [0x415e99] -> dovecot-imap(cmd_copy+0x19c) [0x4160ac] -> dovecot-imap [0x41d0ac] -> dovecot-imap [0x41d2e9] -> dovecot-imap [0x41d41c] -> dovecot-imap [0x41d460] -> dovecot-imap(client_input+0xb4) [0x41d5ed] -> dovecot-imap(io_loop_handler_run+0x17d) [0x4aea11] -> dovecot-imap(io_loop_run+0x3b) [0x4adc6a] -> dovecot-imap(main+0x99) [0x42908b] -> /lib64/libc.so.6(__libc_start_main+0xf4) [0x2b306fee78c4] -> dovec! ot-imap [0x414939] Oct 8 13:56:40 alcor dovecot: child 15143 (imap) killed with signal 6 Oct 8 13:56:40 alcor dovecot: pop3[15141] USER3, NNN.NNN.NNN.NNN: Disconnected: Logged out top=0/0, retr=1/12816, del=1/1, size=12798, xfer=40/12944 Oct 8 13:56:40 alcor dovecot: imap-login: Login: [15003], USER7, NNN.NNN.NNN.NNN
Although the usernames and IP addresses have been elided, each USERn token always represents the same specific account wherever it appears.
- this doesn't happen often, but once in a while we see assertion failures at line 163 of src/pop3-login/client-authenticate.c or line 200 of src/imap-login/client-authenticate.c
I think I've managed to get this also sometimes when killing dovecot- auth or doing something else "wrong" (I don't remember exactly), but
not during normal operation. I should have noted down how to reproduce
this when I still knew how to reproduce it..
For us this appears to happen during normal operations. That's not to say I've never killed a dovecot-auth process :-), but it's not something I do often (in fact the last time I did was when I recompiled 1.1.3 as described previously, which by now was ten days ago).
For what it may be worth, this particular symptom appears to happen in clusters. The most recent spate that we saw were as follows (in context, in case any of the other log messages may be relevant):
Oct 10 04:48:45 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9703], NNN.NNN.NNN.NNN Oct 10 04:48:45 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9705], NNN.NNN.NNN.NNN Oct 10 04:48:45 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9702], NNN.NNN.NNN.NNN Oct 10 04:48:46 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9704], NNN.NNN.NNN.NNN Oct 10 04:48:46 alcor dovecot: pop3-login: Disconnected: Inactivity (auth failed, 1 attempts): [9500], USER10, NNN.NNN.NNN.NNN Oct 10 04:48:46 alcor dovecot: Panic: pop3-login: file client-authenticate.c: line 163 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED) Oct 10 04:48:46 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [9502], USER11, NNN.NNN.NNN.NNN, TLS Oct 10 04:48:46 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED) Oct 10 04:48:46 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [9503], USER12, NNN.NNN.NNN.NNN, TLS Oct 10 04:48:46 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED) Oct 10 04:48:46 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [8777], USER9, NNN.NNN.NNN.NNN Oct 10 04:48:46 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED) Oct 10 04:48:47 alcor dovecot: imap[4258] USER8, 67.223.74.40: Disconnected: Logged out bytes=7407/20930 Oct 10 04:48:47 alcor dovecot: pop3-login: Disconnected: Inactivity (auth failed, 1 attempts): [9575], USER13, NNN.NNN.NNN.NNN Oct 10 04:48:47 alcor dovecot: Panic: pop3-login: file client-authenticate.c: line 163 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED) Oct 10 04:48:47 alcor dovecot: pop3-login: Disconnected: Inactivity (auth failed, 1 attempts): [9576], USER14, NNN.NNN.NNN.NNN Oct 10 04:48:47 alcor dovecot: Panic: pop3-login: file client-authenticate.c: line 163 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED) Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9498 Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9500 Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9499 Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9502 Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9503 Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9575 Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9576 Oct 10 04:48:47 alcor dovecot: auth(default): Master requested auth for nonexisting client 9174 Oct 10 04:48:47 alcor dovecot: imap[9809] USER9, 132.205.7.20: Connection closed bytes=0/0 Oct 10 04:48:47 alcor dovecot: imap-login: Disconnected: Inactivity (auth failed, 1 attempts): [9174], USER8, NNN.NNN.NNN.NNN, TLS Oct 10 04:48:47 alcor dovecot: Panic: imap-login: file client-authenticate.c: line 200 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED) Oct 10 04:48:48 alcor dovecot: imap-login: Disconnected (no auth attempts): [9896], NNN.NNN.NNN.NNN Oct 10 04:48:48 alcor dovecot: imap-login: Disconnected (no auth attempts): [9895], NNN.NNN.NNN.NNN Oct 10 04:48:48 alcor dovecot: pop3-login: Disconnected (no auth attempts): [9893], NNN.NNN.NNN.NNN
I missed this originally, but on reflection I notice in the logs that every time the "record size too small" error appears, inevitably the user involved is over disk quota and out of grace period. For
example,Yes, this is most likely what's the problem. I remember testing v1.0
quite a lot in "out of disk space" conditions, but I haven't really
done any such testing with v1.1. I'll try that and see if I can get
these problems reproduced and fixed.
Thank you! :-) Please let me know if there's anything I can do to help.
- Steven
Steven Winikoff | Concordia University | "Hanging is too good for a man who Montreal, QC, Canada | makes puns; he should be drawn and smw@alcor.concordia.ca | quoted." http://alcor.concordia.ca/~smw | - Fred Allen
On Oct 10, 2008, at 9:22 PM, Steven Winikoff wrote:
I had hoped v1.1.3 would have fixed this. But perhaps this is also related to the disk quota issues. Are there any other errors that are logged before the assert?
Nothing obvious, I'm afraid. For example, this occurence is typical
of the 24 which we've logged since Sept. 26 (which is as far back as we keep dovecot log data online; for the record, we upgraded from 1.0.10 to 1.1.3 on Sept. 23): .. Oct 8 13:56:40 alcor dovecot: Panic: imap[15143] USER4,
NNN.NNN.NNN.NNN: file index-mail.c: line 1091 (index_mail_close):
assertion failed: (!mail->data.destroying_stream)
Hmm. I think I thought of a different assert before when commenting.
This assert should be fixed by 1.1.4 / this patch: http://hg.dovecot.org/dovecot-1.1/rev/3b0d23902a32
- this doesn't happen often, but once in a while we see assertion failures at line 163 of src/pop3-login/client-authenticate.c or line 200 of src/imap-login/client-authenticate.c
I think I've managed to get this also sometimes when killing dovecot- auth or doing something else "wrong" (I don't remember exactly), but not during normal operation. I should have noted down how to
reproduce this when I still knew how to reproduce it..For us this appears to happen during normal operations. That's not to say I've never killed a dovecot-auth process :-), but it's not
something I do often (in fact the last time I did was when I recompiled 1.1.3 as described previously, which by now was ten days ago).
I'll try to look into this soon.
On Fri, 2008-10-10 at 14:22 -0400, Steven Winikoff wrote:
Oct 10 04:48:46 alcor dovecot: Panic: pop3-login: file client-authenticate.c: line 163 (sasl_callback): assertion failed: (!client->destroyed || reply == SASL_SERVER_REPLY_CLIENT_ERROR || reply == SASL_SERVER_REPLY_MASTER_FAILED)
Fixed: http://hg.dovecot.org/dovecot-1.1/rev/a0693aec0245
(This bug shows why I wish gcc gave more warnings about type mismatches.)
participants (2)
-
Steven Winikoff
-
Timo Sirainen