Re: [Dovecot] Unsolved : mbox-sync-rewrite.c : assertion failed
Looks like I broke keyword handling for mboxes in rc10, fix here:
http://dovecot.org/list/dovecot-cvs/2006-October/006632.html
Hi Timo,
On Thu, Oct 26, 2006 at 03:18:24 +0300, Timo Sirainen wrote:
Looks like I broke keyword handling for mboxes in rc10, fix here:
http://dovecot.org/list/dovecot-cvs/2006-October/006632.html
Great, thanks for the patch!
I am now getting segmentation faults though:
Oct 26 08:58:27 tardis dovecot: child 21984 (imap) killed with signal 11
Nothing else in the logs...
Cheers David
David Schweikert | phone: +41 44 632 7019 System manager ISG.EE | walk: ETH Zentrum, ETL F24.1 ETH Zurich, Switzerland | web: http://people.ee.ethz.ch/dws
Same here -- segfaults with the patch. I tried an strace on the child
imap processes, but keep getting permission denied (even as root). I
turned on debugging but no luck. I went ahead and reverted to the
code from before the patch.
Oddly enough after I reverted, one of the mailboxes that caused the
assert failures before the patch no longer appears to be causing the
assert failure; this is an mbox file that got used (and possibly
rewritten or fixed?) with the after-patch code. Another mbox file
that didn't get used with the patched code is still throwing the
assert failures.
Is there anything else I can do or send to help debug?
Thanks,
-dalvenjah
On Oct 26, 2006, at 12:06 AM, David Schweikert wrote:
Hi Timo,
On Thu, Oct 26, 2006 at 03:18:24 +0300, Timo Sirainen wrote:
Looks like I broke keyword handling for mboxes in rc10, fix here:
http://dovecot.org/list/dovecot-cvs/2006-October/006632.html
Great, thanks for the patch!
I am now getting segmentation faults though:
Oct 26 08:58:27 tardis dovecot: child 21984 (imap) killed with
signal 11Nothing else in the logs...
Cheers David
David Schweikert | phone: +41 44 632 7019 System manager ISG.EE | walk: ETH Zentrum, ETL F24.1 ETH Zurich, Switzerland | web: http://people.ee.ethz.ch/dws
On Thu, 2006-10-26 at 00:38 -0700, Dalvenjah FoxFire wrote:
Same here -- segfaults with the patch. I tried an strace on the child
imap processes, but keep getting permission denied (even as root). I
turned on debugging but no luck. I went ahead and reverted to the
code from before the patch.Oddly enough after I reverted, one of the mailboxes that caused the
assert failures before the patch no longer appears to be causing the
assert failure; this is an mbox file that got used (and possibly
rewritten or fixed?) with the after-patch code. Another mbox file
that didn't get used with the patched code is still throwing the
assert failures.Is there anything else I can do or send to help debug?
Hmh. I tried for a while but I couldn't make it crash. Could you get gdb backtrace from the crash? See http://dovecot.org/bugreport.html
On Thu, 2006-10-26 at 00:38 -0700, Dalvenjah FoxFire wrote:
Same here -- segfaults with the patch. I tried an strace on the child
imap processes, but keep getting permission denied (even as root). I
turned on debugging but no luck. I went ahead and reverted to the
code from before the patch.
Wonder why I didn't get it to crash. Anyway, this should fix it:
http://dovecot.org/list/dovecot-cvs/2006-October/006634.html
It got a bit weirder with this patch. It no longer crashed, but I got
a different assert error, and a slight (fixable) corruption bug:
Oct 26 12:22:16 dragonlair dovecot: imap-login: Login:
user=<dalvenja>, method=PLAIN, rip=X.X.X.X, lip=X.X.X.X, TLS
Oct 26 12:22:16 dragonlair dovecot: IMAP(dalvenja): L466 count: 0
last_seq: 22025 first_seq: 22025
Oct 26 12:22:16 dragonlair dovecot: IMAP(dalvenja): file mbox-sync-
rewrite.c: line 468 (mbox_sync_rewrite): assertion failed: (count ==
last_seq - first_seq + 1)
Oct 26 12:22:16 dragonlair dovecot: child 20995 (imap) killed with
signal 6
[...]
Oct 26 12:24:32 dragonlair dovecot: imap-login: Login:
user=<dalvenja>, method=PLAIN, rip=X.X.X.X, lip=X.X.X.X, TLS
Oct 26 12:24:37 dragonlair dovecot: IMAP(dalvenja): L466 count: 0
last_seq: 6370 first_seq: 6365
Oct 26 12:24:37 dragonlair dovecot: IMAP(dalvenja): file mbox-sync-
rewrite.c: line 468 (mbox_sync_rewrite): assertion failed: (count ==
last_seq - first_seq + 1)
Oct 26 12:24:37 dragonlair dovecot: child 21132 (imap) killed with
signal 6
(the L466 line was added by me to print out the values for the assert
variables)
This happened with two different mbox files.
The weirder part (the corruption bug) was that every time dovecot
wrote the mbox file, it added between 280 and 320 NUL (ascii 0)
characters at the end of the last message; so the next message to get
appended by procmail started its 'From ' header on a "line" with NUL
characters, and that message would not get recognized on the next
check. I found this had happened a total of 3 times to the mbox file.
I was able to clean the NULs out of the mbox file, but I've again
reverted back to the raw rc10 code for now.
Do you want me to try debugging with just the first patch and try to
find where the core dump occurs?
Let me know -- thanks!
-dalvenjah
On Oct 26, 2006, at 4:28 AM, Timo Sirainen wrote:
On Thu, 2006-10-26 at 00:38 -0700, Dalvenjah FoxFire wrote:
Same here -- segfaults with the patch. I tried an strace on the child imap processes, but keep getting permission denied (even as root). I turned on debugging but no luck. I went ahead and reverted to the code from before the patch.
Wonder why I didn't get it to crash. Anyway, this should fix it:
http://dovecot.org/list/dovecot-cvs/2006-October/006634.html
Another data point; after reverting, I kept getting the same (new)
assert failures from below, and again the dovecot server wrote a
bunch of NULs (this time a couple thousand) to the end of the mbox file.
I'm back on UW-IMAP for the moment, but would like to go ahead and
continue to help test and troubleshoot this bug.
-dalvenjah
On Oct 26, 2006, at 1:31 PM, Dalvenjah FoxFire wrote:
It got a bit weirder with this patch. It no longer crashed, but I
got a different assert error, and a slight (fixable) corruption bug:Oct 26 12:22:16 dragonlair dovecot: imap-login: Login:
user=<dalvenja>, method=PLAIN, rip=X.X.X.X, lip=X.X.X.X, TLS Oct 26 12:22:16 dragonlair dovecot: IMAP(dalvenja): L466 count: 0
last_seq: 22025 first_seq: 22025 Oct 26 12:22:16 dragonlair dovecot: IMAP(dalvenja): file mbox-sync- rewrite.c: line 468 (mbox_sync_rewrite): assertion failed: (count
== last_seq - first_seq + 1) Oct 26 12:22:16 dragonlair dovecot: child 20995 (imap) killed with
signal 6 [...] Oct 26 12:24:32 dragonlair dovecot: imap-login: Login:
user=<dalvenja>, method=PLAIN, rip=X.X.X.X, lip=X.X.X.X, TLS Oct 26 12:24:37 dragonlair dovecot: IMAP(dalvenja): L466 count: 0
last_seq: 6370 first_seq: 6365 Oct 26 12:24:37 dragonlair dovecot: IMAP(dalvenja): file mbox-sync- rewrite.c: line 468 (mbox_sync_rewrite): assertion failed: (count
== last_seq - first_seq + 1) Oct 26 12:24:37 dragonlair dovecot: child 21132 (imap) killed with
signal 6(the L466 line was added by me to print out the values for the
assert variables)This happened with two different mbox files.
The weirder part (the corruption bug) was that every time dovecot
wrote the mbox file, it added between 280 and 320 NUL (ascii 0)
characters at the end of the last message; so the next message to
get appended by procmail started its 'From ' header on a "line"
with NUL characters, and that message would not get recognized on
the next check. I found this had happened a total of 3 times to the
mbox file.I was able to clean the NULs out of the mbox file, but I've again
reverted back to the raw rc10 code for now.Do you want me to try debugging with just the first patch and try
to find where the core dump occurs?Let me know -- thanks!
-dalvenjah
On Oct 26, 2006, at 4:28 AM, Timo Sirainen wrote:
On Thu, 2006-10-26 at 00:38 -0700, Dalvenjah FoxFire wrote:
Same here -- segfaults with the patch. I tried an strace on the
child imap processes, but keep getting permission denied (even as root). I turned on debugging but no luck. I went ahead and reverted to the code from before the patch.Wonder why I didn't get it to crash. Anyway, this should fix it:
http://dovecot.org/list/dovecot-cvs/2006-October/006634.html
Hi all,
Just to follow up on this issue --
About 2 weeks ago I tried swapping to rc14, and the mbox-sync-rewrite
and NULs in the mbox file all went away. I guess I didn't do the
patching quite right; in any case, the issue appears to be fixed now.
I've been running that version ever since with no problems.
Thanks very much Timo -- the server's fast, responsive, and reliable.
-dalvenjah
On Oct 26, 2006, at 2:30 PM, Dalvenjah FoxFire wrote:
Another data point; after reverting, I kept getting the same (new)
assert failures from below, and again the dovecot server wrote a
bunch of NULs (this time a couple thousand) to the end of the mbox
file.I'm back on UW-IMAP for the moment, but would like to go ahead and
continue to help test and troubleshoot this bug.-dalvenjah
On Oct 26, 2006, at 1:31 PM, Dalvenjah FoxFire wrote:
It got a bit weirder with this patch. It no longer crashed, but I
got a different assert error, and a slight (fixable) corruption bug:Oct 26 12:22:16 dragonlair dovecot: imap-login: Login:
user=<dalvenja>, method=PLAIN, rip=X.X.X.X, lip=X.X.X.X, TLS Oct 26 12:22:16 dragonlair dovecot: IMAP(dalvenja): L466 count: 0
last_seq: 22025 first_seq: 22025 Oct 26 12:22:16 dragonlair dovecot: IMAP(dalvenja): file mbox-sync- rewrite.c: line 468 (mbox_sync_rewrite): assertion failed: (count
== last_seq - first_seq + 1) Oct 26 12:22:16 dragonlair dovecot: child 20995 (imap) killed with
signal 6 [...] Oct 26 12:24:32 dragonlair dovecot: imap-login: Login:
user=<dalvenja>, method=PLAIN, rip=X.X.X.X, lip=X.X.X.X, TLS Oct 26 12:24:37 dragonlair dovecot: IMAP(dalvenja): L466 count: 0
last_seq: 6370 first_seq: 6365 Oct 26 12:24:37 dragonlair dovecot: IMAP(dalvenja): file mbox-sync- rewrite.c: line 468 (mbox_sync_rewrite): assertion failed: (count
== last_seq - first_seq + 1) Oct 26 12:24:37 dragonlair dovecot: child 21132 (imap) killed with
signal 6(the L466 line was added by me to print out the values for the
assert variables)This happened with two different mbox files.
The weirder part (the corruption bug) was that every time dovecot
wrote the mbox file, it added between 280 and 320 NUL (ascii 0)
characters at the end of the last message; so the next message to
get appended by procmail started its 'From ' header on a "line"
with NUL characters, and that message would not get recognized on
the next check. I found this had happened a total of 3 times to
the mbox file.I was able to clean the NULs out of the mbox file, but I've again
reverted back to the raw rc10 code for now.Do you want me to try debugging with just the first patch and try
to find where the core dump occurs?Let me know -- thanks!
-dalvenjah
On Oct 26, 2006, at 4:28 AM, Timo Sirainen wrote:
On Thu, 2006-10-26 at 00:38 -0700, Dalvenjah FoxFire wrote:
Same here -- segfaults with the patch. I tried an strace on the
child imap processes, but keep getting permission denied (even as
root). I turned on debugging but no luck. I went ahead and reverted to the code from before the patch.Wonder why I didn't get it to crash. Anyway, this should fix it:
http://dovecot.org/list/dovecot-cvs/2006-October/006634.html
On Thu, Oct 26, 2006 at 01:31:04PM -0700, Dalvenjah FoxFire wrote:
It got a bit weirder with this patch. It no longer crashed, but I got
a different assert error, and a slight (fixable) corruption bug:
The patch seems to work here when array_append(&sync_ctx->mails, &mail_ctx->mail, 1); is always called after the new test if (array_is_created(&mail_ctx->mail.keywords)) { } instead of calling array_append() within the new if
hmk
if (array_is_created(&mail_ctx->mail.keywords)) {
/* mail's keywords are allocated from a pool that's cleared
for each mail. we'll need to copy it to something more
permanent. */
ARRAY_CREATE(&keywords_copy, sync_ctx->saved_keywords_pool,
unsigned int,
array_count(&mail_ctx->mail.keywords));
array_append_array(&keywords_copy, &mail_ctx->mail.keywords);
mail_ctx->mail.keywords = keywords_copy;
}array_append(&sync_ctx->mails, &mail_ctx->mail, 1);
- array_append(&sync_ctx->mails, &mail_ctx->mail, 1);
On Fri, 2006-10-27 at 21:13 +0200, Hans Morten Kind wrote:
On Thu, Oct 26, 2006 at 01:31:04PM -0700, Dalvenjah FoxFire wrote:
It got a bit weirder with this patch. It no longer crashed, but I got
a different assert error, and a slight (fixable) corruption bug:The patch seems to work here when array_append(&sync_ctx->mails, &mail_ctx->mail, 1); is always called after the new test if (array_is_created(&mail_ctx->mail.keywords)) { } instead of calling array_append() within the new if
hmk
if (array_is_created(&mail_ctx->mail.keywords)) { /* mail's keywords are allocated from a pool that's cleared for each mail. we'll need to copy it to something more permanent. */ ARRAY_CREATE(&keywords_copy, sync_ctx->saved_keywords_pool, unsigned int, array_count(&mail_ctx->mail.keywords)); array_append_array(&keywords_copy, &mail_ctx->mail.keywords); mail_ctx->mail.keywords = keywords_copy;
}array_append(&sync_ctx->mails, &mail_ctx->mail, 1);
- array_append(&sync_ctx->mails, &mail_ctx->mail, 1);
Thanks, I probably would have wasted a long time trying to figure out what the problem was :) Committed.
Timo Sirainen schrieb:
Looks like I broke keyword handling for mboxes in rc10, fix here:
http://dovecot.org/list/dovecot-cvs/2006-October/006632.html
Thank you - sadly I cannot test it today on the machine where the error occured. I will try as soon as I can and get back woth the results.
Jakob Curdes
participants (6)
-
Dalvenjah FoxFire
-
Dalvenjah FoxFire
-
David Schweikert
-
Hans Morten Kind
-
Jakob Curdes
-
Timo Sirainen