This was starting from a clean index, first opening pine on the NFS Solaris 9 sparc machine, and then at the same time opening pine on my Fedora 9 i386 workstation.
Why does it matter where you run Pine? Does it directly execute Dovecot on the local machine instead of connecting via TCP?
Correct. We have dovecot executing locally in each instance, with the index being shared. I'll try the TCP method and get back to you. By the way, the only reason I'm specifically doing it this way to test out what might possibly happen to our user group.
We have approximately 50,000 student accounts, and 20,000 staff accounts that all access mail in multiple fashions. We want to be able to roll out dovecot everywhere, but to do this it has to be resiliant enough to handle multiple instances of dovecot on multiple architectures. For example, a student logs into a webmail machine (sparc) and then ssh's into a linux frontend server and opens pine at the same time. This scenerio isn't likely to happen, but it could. We're just trying to cover all possibilities. Hence why I we're running the local dovecot/pine and the server side dovecot/pine... trying to see how it holds up.
So far it's been great minus the endianess issue. By the way, we're trying out seperating the index by arch and it's working pretty good right now. The only concern is how it's going to scale with regards to disk usage if we have double the number of indexs per account. We figure max of 10MB per index multiplied by 2, multiplied by 70,000... not a small number at all, but that's for us to worry about. ;) Of course, that is a worse case scenerio.
I'd suggest not running Dovecot on different architectures. Like if you're on a non-x86 make it connect via TCP to a x86 Dovecot server.
I'm going to try that out and get back to you.
By the way, I don't think this is related to the corruption, but we also have tons of these in the logs:
Jun 25 11:52:32 host IMAP(user): : Created dotlock file's timestamp is different than current time (1214409234 vs 1214409152): /dovecot-index/control/user/.INBOX/dovecot-uidlist Jun 25 11:52:32 host IMAP(user): : Created dotlock file's timestamp is different than current time (1214409235 vs 1214409152): /dovecot-index/control/user/.INBOX/dovecot-uidlist
Dovecot really wants that clocks are synchronized between the NFS clients and the server. If the clock difference is more than 1 second, you'll get problems.
I figured. Looks like we need to be a little more strict with ntp. ;)