I found two more view syncing bugs: http://dovecot.org/list/dovecot-cvs/2007-January/007366.html http://dovecot.org/list/dovecot-cvs/2007-January/007368.html
Will try asap.
I dont get coredumps anymore which makes this very hard for me to debug.
You mean Dovecot just doesn't do that, or you did it on purpose for some reason? You should get core dumps as long as:
I believe you added some code that catches these gracefully? I really dont see any coredumps. User can write, exec=yes, and ulimit -c set. I used to get coredumps. All I see now are these log entries, not even a signal notice.
possible to make a patch that doesnt copy the files, but dumps the in-memory index someplace?
Yea, I could do that if the above patches don't help. Although the easiest way to debug this would be looking into the core file with gdb..
It must be something very unlikely, as I see this on a tiny fraction of our connections. Something like 1 in every 100000. Could it be some race condition on the lock file, and a second process reads a partial index?
Hmm. I'm beginning to think that 1. might be the most likely problem. Have you tried changing lock_method? Try switching to flock (or to fcntl).
Ok, will do.
Cor