[Dovecot] 1.1.4 and trouble over NFS
Hello, I have some trouble with the current setup (it's a testing environment):
2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26) (name: "exim" and "exim2") 1 NFS server (OS Debian testing 2.6.26)
I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot configuration:
dotlock_use_excl = yes mail_nfs_storage = yes mail_nfs_index = yes mmap_disable = yes lock_method = dotlock
Maildir format.
/etc/exports on NFS server:
/mnt 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check,fsid=0) /mnt/vexim 192.168.0.0/24(rw,sync,no_root_squash,no_subtree_check)
/etc/fstab on the clients:
192.168.0.71:/vexim /mnt/nfs1 nfs4 rw 0 0
When I access a mailbox with 2 different clients (for example Outlook Express that uses "exim" and Thunderbird that uses "exim2") and in the same time some message is delivered to the mailbox (the smtp is served by Exim 4.69) Dovecot has some trouble as you can see:
*** Log from machine "exim"
Oct 7 20:00:20 exim dovecot: IMAP(postmaster@minnie.tld): /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist: file size changed unexpectedly after write Oct 7 20:00:20 exim dovecot: IMAP(postmaster@minnie.tld): Broken file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist line 3: Invalid data: Oct 7 20:00:21 exim dovecot: IMAP(postmaster@minnie.tld): /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist: Duplicate file entry at line 6: 1223402420.H172077P24365.exim.bnscr.it:2, (uid 16919 -> 16922) Oct 7 20:00:23 exim dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=10612 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Append with UID 16925, but next_uid = 16926 Oct 7 20:00:23 exim dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=10724 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record update for invalid uid=16926 Oct 7 20:00:23 exim dovecot: IMAP(postmaster@minnie.tld): Synchronization corrupted index header: /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:00:23 exim dovecot: IMAP(postmaster@minnie.tld): fscking index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:00:23 exim dovecot: IMAP(postmaster@minnie.tld): Fixed index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: first_recent_uid 16927 -> 16926 Oct 7 20:00:24 exim dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=11140 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record update for invalid uid=16928 Oct 7 20:00:24 exim dovecot: IMAP(postmaster@minnie.tld): fscking index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:01:00 exim dovecot: imap-login: Login: user=postmaster@minnie.tld, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Oct 7 20:02:00 exim dovecot: imap-login: Login: user=postmaster@minnie.tld, method=PLAIN, rip=127.0.0.1, lip=127.0.0.1, secured Oct 7 20:02:26 exim dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=11628 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Append with UID 16929, but next_uid = 16930 Oct 7 20:02:26 exim dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=11756 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record update for invalid uid=16930 Oct 7 20:02:26 exim dovecot: IMAP(postmaster@minnie.tld): fscking index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:15:23 exim dovecot: Panic: IMAP(postmaster@minnie.tld): Message count decreased Oct 7 20:15:23 exim dovecot: IMAP(postmaster@minnie.tld): Raw backtrace: imap [0x80d6781] -> imap [0x80d6802] -> imap [0x80d6199] -> imap [0x8069711] -> imap [0x805e98b] -> imap [0x805ea40] -> imap [0x809c9af] -> imap(io_loop_handle_timeouts+0xe9) [0x80ddb19] -> imap(io_loop_handler_run+0x82) [0x80de3d2] -> imap(io_loop_run+0x20) [0x80dd8c0] -> imap(main+0x46a) [0x806b5aa] -> /lib/i686/cmov/libc.so.6(__libc_start_main+0xe5) [0xb7df2455] -> imap [0x805c9f1]
*** Log from machine "exim"
Oct 7 20:00:20 exim2 dovecot: IMAP(postmaster@minnie.tld): /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot-uidlist: Duplicate file entry at line 5: 1223402419.H453623P30627.exim2.bnscr.it:2, (uid 16919 -> 16920) Oct 7 20:00:22 exim2 dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=10496 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record update for invalid uid=16925 Oct 7 20:00:22 exim2 dovecot: IMAP(postmaster@minnie.tld): fscking index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:00:25 exim2 dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=11324 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Append with UID 16928, but next_uid = 16929 Oct 7 20:00:25 exim2 dovecot: IMAP(postmaster@minnie.tld): Log synchronization error at seq=3,offset=11528 for /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: Extension record update for invalid uid=16929 Oct 7 20:00:26 exim2 dovecot: IMAP(postmaster@minnie.tld): Synchronization corrupted index header: /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:00:26 exim2 dovecot: IMAP(postmaster@minnie.tld): fscking index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:00:26 exim2 dovecot: IMAP(postmaster@minnie.tld): Fixed index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index: first_recent_uid 16930 -> 16929 Oct 7 20:00:26 exim2 dovecot: IMAP(postmaster@minnie.tld): fscking index file /mnt/nfs1/minnie.tld/postmaster/Maildir/dovecot.index Oct 7 20:01:49 exim2 dovecot: IMAP(postmaster@minnie.tld): Disconnected: Logged out bytes=80/583 Oct 7 20:01:49 exim2 dovecot: IMAP(postmaster@minnie.tld): Disconnected: Logged out bytes=20304/276528 Oct 7 20:03:06 exim2 dovecot: imap-login: Login: user=postmaster@minnie.tld, method=PLAIN, rip=192.168.0.199, lip=192.168.0.72 Oct 7 20:03:09 exim2 dovecot: IMAP(postmaster@minnie.tld): Maildir /mnt/nfs1/minnie.tld/postmaster/Maildir: Expunged message reappeared, giving a new UID (old uid=16930, file=1223402424.H796471P30658.exim2.bnscr.it:2,) Oct 7 20:03:09 exim2 dovecot: IMAP(postmaster@minnie.tld): Maildir /mnt/nfs1/minnie.tld/postmaster/Maildir: Expunged message reappeared, giving a new UID (old uid=16931, file=1223402425.H483603P24395.exim.bnscr.it:2,) Oct 7 20:03:09 exim2 dovecot: IMAP(postmaster@minnie.tld): Maildir /mnt/nfs1/minnie.tld/postmaster/Maildir: Expunged message reappeared, giving a new UID (old uid=16932, file=1223402425.H824041P30664.exim2.bnscr.it:2,)
I know that with Dovecot < 1.1 the use on multiple computer when users are connected randomly was discouraged, but with version 1.1.x this setup is safe.
I can try to move indexes on the local disk, the trouble with dovecot.index disappear but I have again problem with dovecot-uidlist.
I can reproduce this kind of problem easly: 2 clients on the same mailbox connected to two different imap server and a simple bash script that send 2 email every 1 second to the mailbox.
There is some option/configuration that I can try to change?
On Oct 7, 2008, at 9:57 PM, Igor Colombi wrote:
Hello, I have some trouble with the current setup (it's a testing
environment):2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26)
(name: "exim" and "exim2") 1 NFS server (OS Debian testing 2.6.26)I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot
configuration: .. Oct 7 20:00:20 exim dovecot: IMAP(postmaster@minnie.tld): /mnt/nfs1/ minnie.tld/postmaster/Maildir/dovecot-uidlist: file size changed
unexpectedly after write .. I know that with Dovecot < 1.1 the use on multiple computer when
users are connected randomly was discouraged, but with version 1.1.x
this setup is safe.
It's safe as long as Dovecot can reliably flush all NFS caches. But
this clearly isn't happening.
I can reproduce this kind of problem easly: 2 clients on the same
mailbox connected to two different imap server and a simple bash
script that send 2 email every 1 second to the mailbox.
Or probably also running imaptest (http://imapwiki.org/ImapTest) on
both machines? Could you run http://www.dovecot.org/tools/nfstest.c
and tell me what it outputs?
Timo Sirainen wrote:
On Oct 7, 2008, at 9:57 PM, Igor Colombi wrote:
Hello, I have some trouble with the current setup (it's a testing environment):
2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26) (name: "exim" and "exim2") 1 NFS server (OS Debian testing 2.6.26)
I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot configuration:
Or probably also running imaptest (http://imapwiki.org/ImapTest) on both machines? Could you run http://www.dovecot.org/tools/nfstest.c and tell me what it outputs?
Hello, this is the output of nftest on exim2 (client) vs exim (the other client)
./nfstest 192.168.0.70 50 /mnt/nfs1/pippo/pluto Connected: Acting as test client ESTALE errors don't happen O_EXCL appears to be working, but this could be just faked by NFS client timestamps resolution: seconds
Testing file attribute cache.. Attr cache flush open+close: failed Attr cache flush close+open: OK Attr cache flush fchown(-1, -1): failed Attr cache flush fchown(uid, -1): OK Attr cache flush fchmod(mode): OK Attr cache flush chown(-1, -1): failed Attr cache flush chown(uid, -1): OK Attr cache flush chmod(mode): OK Attr cache flush rmdir(): failed Attr cache flush rmdir(parent dir): failed Attr cache flush dup+close: failed Attr cache flush fcntl(shared): OK Attr cache flush fcntl(exclusive): OK Attr cache flush flock(shared): OK Attr cache flush flock(exclusive): OK Attr cache flush fsync(): failed Attr cache flush fcntl(O_SYNC): failed Attr cache flush O_DIRECT: failed
Testing data cache.. Data cache flush no caching: failed Data cache flush open+close: failed Data cache flush close+open: failed Data cache flush fchown(-1, -1): failed Data cache flush fchown(uid, -1): failed Data cache flush fchmod(mode): failed Data cache flush chown(-1, -1): failed Data cache flush chown(uid, -1): failed Data cache flush chmod(mode): failed Data cache flush rmdir(): failed Data cache flush rmdir(parent dir): failed Data cache flush dup+close: failed Data cache flush fcntl(shared): OK Data cache flush fcntl(exclusive): OK Data cache flush flock(shared): OK Data cache flush flock(exclusive): OK Data cache flush fsync(): failed Data cache flush fcntl(O_SYNC): failed Data cache flush O_DIRECT: OK
Testing write flushing.. Write flush no caching: failed Write flush open+close: OK Write flush close+open: OK Write flush fchown(-1, -1): failed Write flush fchown(uid, -1): OK Write flush fchmod(mode): OK Write flush chown(-1, -1): failed Write flush chown(uid, -1): OK Write flush chmod(mode): OK Write flush rmdir(): failed Write flush rmdir(parent dir): failed Write flush dup+close: OK Write flush fcntl(shared): OK Write flush fcntl(exclusive): OK Write flush flock(shared): OK Write flush flock(exclusive): OK Write flush fsync(): OK Write flush fcntl(O_SYNC): failed Write flush O_DIRECT: OK
Testing partial writing.. OK
Testing file handle cache.. File handle cache flush no caching: failed File handle cache flush open+close: failed File handle cache flush close+open: failed File handle cache flush fchown(-1, -1): failed File handle cache flush fchown(uid, -1): failed File handle cache flush fchmod(mode): failed File handle cache flush chown(-1, -1): failed File handle cache flush chown(uid, -1): failed File handle cache flush chmod(mode): failed File handle cache flush rmdir(): failed File handle cache flush rmdir(parent dir): OK File handle cache flush dup+close: failed File handle cache flush fcntl(shared): failed fcntl(setlk, write) failed: Bad file descriptor File handle cache flush fcntl(exclusive): failed File handle cache flush flock(shared): failed File handle cache flush flock(exclusive): failed File handle cache flush fsync(): failed File handle cache flush fcntl(O_SYNC): failed fcntl(/mnt/nfs1/pippo, O_DIRECT) failed: Invalid argument File handle cache flush O_DIRECT: failed
Testing negative file handle cache.. Negative file handle cache flush no caching: failed Negative file handle cache flush open+close: failed Negative file handle cache flush close+open: failed Negative file handle cache flush fchown(-1, -1): failed Negative file handle cache flush fchown(uid, -1): failed Negative file handle cache flush fchmod(mode): failed Negative file handle cache flush chown(-1, -1): failed Negative file handle cache flush chown(uid, -1): failed Negative file handle cache flush chmod(mode): failed rmdir(/mnt/nfs1/pippo/pluto) failed: No such file or directory rmdir(/mnt/nfs1/pippo/pluto) failed: No such file or directory Negative file handle cache flush rmdir(): failed Negative file handle cache flush rmdir(parent dir): OK Negative file handle cache flush dup+close: failed Negative file handle cache flush fcntl(shared): failed fcntl(setlk, write) failed: Bad file descriptor Negative file handle cache flush fcntl(exclusive): failed Negative file handle cache flush flock(shared): failed Negative file handle cache flush flock(exclusive): failed Negative file handle cache flush fsync(): failed Negative file handle cache flush fcntl(O_SYNC): failed fcntl(/mnt/nfs1/pippo, O_DIRECT) failed: Invalid argument Negative file handle cache flush O_DIRECT: failed
On Tue, 2008-10-07 at 22:19 +0200, Igor Colombi wrote:
Timo Sirainen wrote:
On Oct 7, 2008, at 9:57 PM, Igor Colombi wrote:
Hello, I have some trouble with the current setup (it's a testing environment):
2 server with Dovecot 1.1.4 from source (OS Debian testing 2.6.26) (name: "exim" and "exim2") 1 NFS server (OS Debian testing 2.6.26)
I use NFS v.4, indexes shared over NFS. The relavant part of Dovecot configuration:
Or probably also running imaptest (http://imapwiki.org/ImapTest) on both machines? Could you run http://www.dovecot.org/tools/nfstest.c and tell me what it outputs?
Hello, this is the output of nftest on exim2 (client) vs exim (the other client)
./nfstest 192.168.0.70 50 /mnt/nfs1/pippo/pluto
That looks correct. So either something's changed in Dovecot since I last tested it with NFS, or something's changed in Linux kernel. Or there's something else weird going on.
I don't really have any good suggestions for solutions and I don't really have time to debug NFS problems right now.. Or you could of course always try if disabling NFS attribute cache fixes things, but that'll most likely also make your performance a lot worse.
That looks correct. So either something's changed in Dovecot since I last tested it with NFS, or something's changed in Linux kernel. Or there's something else weird going on.
I don't really have any good suggestions for solutions and I don't really have time to debug NFS problems right now.. Or you could of course always try if disabling NFS attribute cache fixes things, but that'll most likely also make your performance a lot worse.
Fwiw I just upgraded one of our servers from 1.1.2 to 1.1.4 (32-bit, centos5.2) and cant see any changes in our nfs access patterns. We use nfs3 though.
Mark
participants (3)
-
Igor Colombi
-
Mark Zealey
-
Timo Sirainen