On Wed, 2007-12-05 at 20:43 +0100, Kristian Koehntopp wrote:
On Wednesday, 5. December 2007 20:21:32 Timo Sirainen wrote:
I am in the process of moving my mailboxes from an overloaded cyrus-box to a new dovecot installation. The suse-supplied 1.0.rc14 did break down with signal 6 asserts "seq 1 < seq2 ...", so I compiled a download of 1.0.8 instead. This did not change a thing.
How easily can you crash it? What do you use as filesystem?
h743107:/var/log # grep signal dovecot.log | perl -n -e '($m = $_ ) =~ /(dovecot: .*signal.*)/ and ( $m = $1 ) =~ s/child \d+/child PID/ and print "$m\n";'| sort | uniq -c 17 dovecot: child PID (imap) killed with signal 11 607 dovecot: child PID (imap) killed with signal 6
h743107:/var/log # head -1 dovecot.log Dec 1 23:31:10 h743107 dovecot: Dovecot v1.0.rc14 starting up h743107:/var/log # tail -1 dovecot.log Dec 5 20:41:18 h743107 dovecot: IMAP(azundris): Disconnected: Logged out
Either your users are doing something really strange, or there's something wrong with the server. v1.0.x has worked pretty well for a lot of people, so I have trouble believing that the real problem here is with Dovecot.
The assert you first mentioned happens if there's broken data in dovecot.index.log file. v1.1 handles this by logging an error instead of crashing. But broken data should never be written to dovecot.index.log in the first place.
The glibc free() error then is a pretty serious problem. It just should never happen no matter what you do. The backtrace shows that it's happening on commit path when changing message flags. There's no way there's a bug in there, so either the heap was corrupted earlier by another code path, or there's something wrong with the server's memory.
If it's heap corruption it's probably in some rarely run error handling path, in which case it would help to see what errors were logged by the same process before that.
Could you send more (or all) of those asserts, backtraces and other errors you see in logs to me privately?