On Fri, Feb 20, 2009 at 05:50:28PM -0500, Timo Sirainen wrote:
On Tue, 2009-01-27 at 08:30 +0100, Ulrich Zehl wrote:
On Mon, Jan 26, 2009 at 10:02:19AM -0500, Timo Sirainen wrote:
Perhaps the NFS cache flushing doesn't work then for some reason. What OS (kernel) are you using on the Dovecot servers? How big values have you set to attribute cache?
On the client side, it's Linux 2.6.23.16. All attribute cache related values are at their default, as far as I can tell. The entry in fstab reads:
nfs-server:/srv/storage /srv/storage nfs rw,nfsvers=3,hard,intr,nosuid,noexec,nodev,noatime 0 0
Setting actimeo=0 probably fixes this, but also probably increases the load a lot. actimeo=1 might work ok and reduce how often these problems happen, but not eliminate them completely.
Dovecot's nfs settings should avoid this problem though. You could see if upgrading your kernel helps. Some kernels have somewhat broken NFS code.
I did
# mount -o remount,actimeo=0 /srv/storage
around 9 this morning, but I'm still seeing these errors pop up. (I verified that there are mailboxes where this happened more than once since 10 today.)
Is it time to upgrade my kernel, or are there other options left?
$ grep 123456@example.net mail.log Jan 27 08:00:17 allina dovecot: pop3-login: Login: user=<123456@example.net>, method=PLAIN, rip=80.x.x.x, lip=10.x.x.x Jan 27 08:00:19 allina dovecot: POP3(123456@example.net): Disconnected: Logged out top=0/0, retr=1/1283428, del=0/1, size=1283408 Jan 27 08:00:57 laura deliver(123456@example.net): msgid=<497EB077.000009.01164@ERNESTO>: saved mail to INBOX
So allina modified dovecot-uidlist and soon afterwards laura probably was using a cached dovecot-uidlist and corrupted it.
Since the corrupted files are available for a little while (in the example, it was ~ 15 minutes), will it help if I repeatedly check all dovecot-uidlists and save those found to be corrupted to a special directory, so that we can see what the corruption actually is?