[Dovecot] Assertion failed
We're in a bit of a panic here as we're getting a lot of errors along the lines of:
May 2 12:39:00 lenny dovecot: IMAP(XXXXX): file mbox-sync-rewrite.c: line 106 (mbox_sync_headers_add_space): assertion failed: (start_pos < data_size)
We didn't see this until our mail servers went under a full load and now we've trashed the ability of nearly all our users to read email.
We saw this first under 1.0Beta3 and we've tried building 1.0Beta7 and its made no difference. I've tried fiddling with the "dirty" settings for the indices and banning mmap to see if there's any way way of avoiding the broken code, but with no luck.
We're running White Box EL4 (2.6 Linux kernel)
Does anyone know a workaround or whether there is an older version that doesn't have the bug? We desperately need to get an email service running and I don't want to go back to (shudder) UW-imap if I can help it but my options are limited at the moment.
-- ______ J.Knight@kis.keele.ac.uk Jonathan Knight, / Keele Information Services / _ __ Telephone: +44 1782 583478 University of Keele, Keele, (_/ (_) / / Fax : +44 1782 713082 Staffordshire. ST5 5BG. U.K.
Is this happening with all users, or just a significant minority that didn't include your testers?
If it happens to a minority, could you send us/me an anonymised copy of a (hopefully small) failing mbox (there's a tool provided in http://www.dovecot.org/tools/mboxcrypt.pl for doing this) and I'll try opening it with our Dovecot.
Did you compile Dovecot yourself or use a package? If the former, what configure options did you use?
Best Wishes, Chris
Jonathan Knight wrote:
We're in a bit of a panic here as we're getting a lot of errors along the lines of:
May 2 12:39:00 lenny dovecot: IMAP(XXXXX): file mbox-sync-rewrite.c: line 106 (mbox_sync_headers_add_space): assertion failed: (start_pos < data_size)
We didn't see this until our mail servers went under a full load and now we've trashed the ability of nearly all our users to read email.
We saw this first under 1.0Beta3 and we've tried building 1.0Beta7 and its made no difference. I've tried fiddling with the "dirty" settings for the indices and banning mmap to see if there's any way way of avoiding the broken code, but with no luck.
We're running White Box EL4 (2.6 Linux kernel)
Does anyone know a workaround or whether there is an older version that doesn't have the bug? We desperately need to get an email service running and I don't want to go back to (shudder) UW-imap if I can help it but my options are limited at the moment.
-- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wakelin@reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439 Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094
FWIW, we see this assert on rare occasions. It has never caused any issues. Our imap server runs under very heavy loads (> 30) at times. Our setup: Solaris 9, mbox, imap and imaps only, beta7.
Jeff Earickson Colby Colllege
On Tue, 2 May 2006, Chris Wakelin wrote:
Date: Tue, 02 May 2006 13:19:31 +0100 From: Chris Wakelin <c.d.wakelin@reading.ac.uk> To: Jonathan Knight <jonathan@cs.keele.ac.uk> Cc: dovecot@dovecot.org Subject: Re: [Dovecot] Assertion failed
Is this happening with all users, or just a significant minority that didn't include your testers?
If it happens to a minority, could you send us/me an anonymised copy of a (hopefully small) failing mbox (there's a tool provided in http://www.dovecot.org/tools/mboxcrypt.pl for doing this) and I'll try opening it with our Dovecot.
Did you compile Dovecot yourself or use a package? If the former, what configure options did you use?
Best Wishes, Chris
Jonathan Knight wrote:
We're in a bit of a panic here as we're getting a lot of errors along the lines of:
May 2 12:39:00 lenny dovecot: IMAP(XXXXX): file mbox-sync-rewrite.c: line 106 (mbox_sync_headers_add_space): assertion failed: (start_pos < data_size)
We didn't see this until our mail servers went under a full load and now we've trashed the ability of nearly all our users to read email.
We saw this first under 1.0Beta3 and we've tried building 1.0Beta7 and its made no difference. I've tried fiddling with the "dirty" settings for the indices and banning mmap to see if there's any way way of avoiding the broken code, but with no luck.
We're running White Box EL4 (2.6 Linux kernel)
Does anyone know a workaround or whether there is an older version that doesn't have the bug? We desperately need to get an email service running and I don't want to go back to (shudder) UW-imap if I can help it but my options are limited at the moment.
-- --+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+- Christopher Wakelin, c.d.wakelin@reading.ac.uk IT Services Centre, The University of Reading, Tel: +44 (0)118 378 8439 Whiteknights, Reading, RG6 2AF, UK Fax: +44 (0)118 975 3094
Is there any special reason for retr bytes be null in the logs (POP3)?
dovecot: May 02 06:22:52 Info: pop3(xxx): Logout. top=1/1431, retr=0/ del=0/1, size=52448 dovecot: May 02 05:42:41 Info: pop3(xxx): Logout. top=42/61155, retr=0/ del=0/43, size=651803
Config: pop3_logout_format = top=%t/%T, retr=%r/%R, del=%d/%m, size=%s
On Tue, 2006-05-02 at 10:42 -0300, Daniel Lafraia wrote:
Is there any special reason for retr bytes be null in the logs (POP3)?
dovecot: May 02 06:22:52 Info: pop3(xxx): Logout. top=1/1431, retr=0/ del=0/1, size=52448 dovecot: May 02 05:42:41 Info: pop3(xxx): Logout. top=42/61155, retr=0/ del=0/43, size=651803
Config: pop3_logout_format = top=%t/%T, retr=%r/%R, del=%d/%m, size=%s
I've changed the variables since 1.0beta4 because %R was already used for another thing and it was broken. The new default is:
pop3_logout_format = top=%t/%p, retr=%r/%b, del=%d/%m, size=%s
Timo Sirainen wrote:
On Tue, 2006-05-02 at 12:48 +0100, Jonathan Knight wrote:
May 2 12:39:00 lenny dovecot: IMAP(XXXXX): file mbox-sync-rewrite.c: line 106 (mbox_sync_headers_add_space): assertion failed: (start_pos < data_size)
Hmm. Does attached patch help?
Sorry folks - I was testing out the bug with my own mailbox so I was offline for a moment or two.
I had a flash of inspiration which has identified the cause of the problem. Due to an earlier mistake (not mine!) we ended up with the students having two copies of most of their mail messages. The bug is triggered by trying to delete one of the duplicates. Once the index files are mangled there is no way back other than deleting the index files and starting again.
I can only assume that the index code assumes that there is a unique identifier in a mail message and doesn't allow for a complete duplicate of a message being in a mailbox.
We didn't spot this during out test phase because we actually messed up the students mailboxes as we deployed new servers with dovecot installed.
I've written some perl to fix the mailboxes, but this should probably be addressed as it is possible to copy mail messages between folders and end up with a duplicate so I imagine that others will find the problem, although not on the scale that we did.
-- ______ J.Knight@kis.keele.ac.uk Jonathan Knight, / Keele Information Services / _ __ Telephone: +44 1782 583478 University of Keele, Keele, (_/ (_) / / Fax : +44 1782 713082 Staffordshire. ST5 5BG. U.K.
On Tue, 2006-05-02 at 14:43 +0100, Jonathan Knight wrote:
Timo Sirainen wrote:
On Tue, 2006-05-02 at 12:48 +0100, Jonathan Knight wrote:
May 2 12:39:00 lenny dovecot: IMAP(XXXXX): file mbox-sync-rewrite.c: line 106 (mbox_sync_headers_add_space): assertion failed: (start_pos < data_size)
Hmm. Does attached patch help?
Sorry folks - I was testing out the bug with my own mailbox so I was offline for a moment or two.
I had a flash of inspiration which has identified the cause of the problem. Due to an earlier mistake (not mine!) we ended up with the students having two copies of most of their mail messages. The bug is triggered by trying to delete one of the duplicates. Once the index files are mangled there is no way back other than deleting the index files and starting again.
Well, if you can still reproduce this bug easily, I'd like to know if the patch fixes it :)
Timo Sirainen wrote:
Well, if you can still reproduce this bug easily, I'd like to know if the patch fixes it :)
No it doesn't.
The steps I took to create the bug are:
cd /var/mail cp csa01 csa01.save cat csa01.save >> csa01
Then I use a "webmail" product which seems to trip the bug instantly (possibly because it connects and then disconnects). I delete one of a duplicated messages and immediately the assertion failed message appear in the log files and the webmail client starts showing empty messages.
-- ______ J.Knight@kis.keele.ac.uk Jonathan Knight, / Keele Information Services / _ __ Telephone: +44 1782 583478 University of Keele, Keele, (_/ (_) / / Fax : +44 1782 713082 Staffordshire. ST5 5BG. U.K.
On Tue, 2006-05-02 at 15:32 +0100, Jonathan Knight wrote:
Timo Sirainen wrote:
Well, if you can still reproduce this bug easily, I'd like to know if the patch fixes it :)
No it doesn't.
The steps I took to create the bug are:
cd /var/mail cp csa01 csa01.save cat csa01.save >> csa01
These mailboxes were being used by UW-IMAP before? So they contain X-UID etc. headers?
I can get "UIDs broken with partial sync" errors with that method (which is also a bug, will fix it), but not the assert error.
participants (5)
-
Chris Wakelin
-
Daniel Lafraia
-
Jeff A. Earickson
-
Jonathan Knight
-
Timo Sirainen