On 10/21/09 8:59 AM, "Guy" wyldfury@gmail.com wrote:
Our current setup uses two NFS mounts accessed simultaneously by two servers. Our load balancing tries to keep a user on the same server whenever possible. Initially we just had roundrobin load balancing which led to index corruption. The problems we've had with that corruption have simply been that some messages are displayed twice or not displayed at all in mail clients. Deletion of the corrupted index allowed Dovecot to recreate it correctly, so the client can't do anything about it. You'd probably have to do it manually or have some sort of web interface for users to do it themselves.
I certainly wouldn't use NFS with multiple servers accessing it again for Dovecot. Looking at a clustered FS on SAN solution at the moment.
As a contrasting data point, we run NFS + random redirects with almost no problems. We host ~7TB of mail for ~45k users with a peak connection count of 10k IMAP connections, and maybe a handful of POP3. We make absolutely no effort to make sure that connections from the same user or IP are routed to the same server.
We do occasionally see index corruption, but it is almost always related to the user going over quota, and Dovecot being unable to write to the logs. If we wanted to solve this problem, we could move the indexes off to a second tier of storage. It is a very minor issue though. Locking has not been a problem at all.
I will say that this may be a situation where you get what you pay for. We've invested a fair amount of money in our storage system (Netapp), server pool (RHEL5), and networking technology (F5 BigIP LTM). Our mail is spread across 16 volumes on two filers, and we are careful to stress-test the servers and storage backend before rolling out major upgrades.
That is not of course to neglect the value of things that are free - like Dovecot! Many thanks to Timo for maintaining such a wonderful piece of software!
-Brad