On Tue, Mar 23, 2010 at 03:19:49PM +0200, Timo Sirainen wrote:
I have done some small-scale testing and it looks fine.
Stress testing by running imaptest for same user's same mailbox in 2+ different servers (i.e. two NFS clients reading/writing same mailbox files) should show up quickly what kind of errors you could get. http://imapwiki.org/ImapTest
OK, I've now set this up:
ImapTest ---> dovecot (same host) -----> NFS server
`---> dovecot (diff host) ----'
- 172.16.23.104: dovecot 1.2.11 and ImapTest-latest. FreeBSD 7.2.
- 172.16.23.101: dovecot 1.2.11 only. FreeBSD 7.2.
- 172.16.23.103: NFS server. Ubuntu Karmic.
All three hosts are ntpd synced.
The following was needed on the FreeBSD boxes to get fcntl locking working:
nfs_client_enable="YES" rpc_lockd_enable="YES" rpc_statd_enable="YES"
(imapd worked without these, but maillog showed errors about failing to obtain locks, "operation not supported")
Test results
- Pointing a single instance of imaptest at a single host, or two instances of imaptest at the same host (with clients=5 to avoid hitting the 15 client limit) was fine. ImapTest reported no errors, and nothing out of the ordinary in maillog.
$ egrep -v "Login:|Disconnected:|Aborted login" /var/log/maillog
- Things went badly wrong with two instances of imaptest pointing at different dovecot hosts. I had seen this sort of thing when I'd previously been using dot locking, and was hoping they'd be fixed by switching to fcntl, but unfortunately not.
ImapTest reported errors including:
Error: brian@dev.example.com[8]: SELECT failed: 8.3 NO [SERVERBUG] Internal error occurred. Refer to server log for more information. [2010-03-25 10:22:23]
- 6 stalled for 16 secs in command: 11 EXPUNGE
All sorts of errors reported in maillog, including:
Mar 25 10:22:23 freebsd-dev dovecot: IMAP(brian@dev.example.com): fscking index file /mail/0/6/37/30/brian%dev.example.com/dovecot.index Mar 25 10:22:23 freebsd-dev dovecot: IMAP(brian@dev.example.com): Transaction log /mail/0/6/37/30/brian%dev.example.com/dovecot.index.log: duplicate transaction log sequence (10) Mar 25 10:22:23 freebsd-dev dovecot: IMAP(brian@dev.example.com): Our dotlock file /mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.lock was overridden (locked 0 secs ago, touched 0 secs ago) Mar 25 10:22:23 freebsd-dev dovecot: IMAP(brian@dev.example.com): fscking index file /mail/0/6/37/30/brian%dev.example.com/dovecot.index Mar 25 10:22:23 freebsd-dev dovecot: IMAP(brian@dev.example.com): Transaction log /mail/0/6/37/30/brian%dev.example.com/dovecot.index.log: duplicate transaction log sequence (11) Mar 25 10:22:27 freebsd-dev dovecot: IMAP(brian@dev.example.com): /mail/0/6/37/30/brian%dev.example.com/dovecot.index reset, view is now inconsistent Mar 25 10:22:46 freebsd-dev dovecot: IMAP(brian@dev.example.com): Panic: file mail-transaction-log-view.c: line 108 (mail_transaction_log_view_set): assertion failed: (min_file_seq <= max_file_seq) Mar 25 10:22:48 freebsd-dev dovecot: IMAP(brian@dev.example.com): rename(/mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.tmp, /mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist) failed: No such file or directory Mar 25 10:22:48 freebsd-dev dovecot: IMAP(brian@dev.example.com): unlink(/mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.tmp) failed: No such file or directory
Mar 25 10:22:36 wipe-dev dovecot: IMAP(brian@dev.example.com): ftruncate(/mail/0/6/37/30/brian%dev.example.com/dovecot-uidlist.lock) failed: Stale NFS file handle
(Logs from a single test run are attached)
Interestingly, these messages imply that dovecot is still using dotlocking in some circumstances, even though I've definitely set fcntl locking.
$ grep ^lock /usr/local/etc/dovecot.conf lock_method = fcntl
$ egrep '^mail_nfs|^mmap' /usr/local/etc/dovecot.conf mmap_disable = yes mail_nfs_storage = yes mail_nfs_index = yes
All this suggests I should use some sort of 'sticky' load balancing in front so that all client conns from one IP hit the same frontend box. However, that contradicts the experience Adam McDougall has had with a similar setup:
http://dovecot.org/list/dovecot/2010-March/047815.html
It's possible that switching the Linux NFS server to a Netapp will help (which is what it will be deployed onto eventually anyway)
Adam: did you do any tuning of FreeBSD client NFS settings? And have you tried using ImapTest, or just real IMAP users?
I see there are a few tunables:
$ grep nfs /etc/defaults/rc.conf netfs_types="nfs:NFS nfs4:NFS4 smbfs:SMB portalfs:PORTAL nwfs:NWFS" # Net filesystems. nfs_client_enable="NO" # This host is an NFS client (or NO). nfs_access_cache="60" # Client cache timeout in seconds nfs_server_enable="NO" # This host is an NFS server (or NO). nfs_server_flags="-u -t -n 4" # Flags to nfsd (if enabled). nfs_reserved_port_only="NO" # Provide NFS only on secure port (or NO). nfs_bufpackets="" # bufspace (in packets) for client
I have tried rerunning with sysctl vfs.nfs.access_cache_timeout=0 but saw the same problems.
Maybe the load pattern from 'real' IMAP clients is such that these problems generally don't show in practice? (i.e. it would be unusual for a single IMAP client to make simultaneous changes to the same folder via different TCP connections)
Regards,
Brian.