On 01/22/2010 12:16 PM, Timo Sirainen wrote:
Looking at the problems with people using NFS it's pretty clear that this solution just isn't going to work properly.
Actually, considering the amount of people and servers we're throwing at it, I think that it's dealing with it pretty well. I'm sure there are always more tweaks and enhancements that can be done, but look at how much better 1.2 is over 1.0 releases. it's definitely not "broken," just maybe not quite production ready as it could be. Honestly, at this point my users are very happy with the speed increase and as long as their imap process isn't dying they don't seem to notice the behind the scenes corruption because of the self healing code.
But then again, Dovecot is the only (free) IMAP server that even attempts to support this kind of behavior. Or sure, Courier does too, but disabling index files on Dovecot should get the same stability.
By the way, I didn't want to give the impression that we were unhappy with the product, rather I think what you've accomplished with dovecot is great even by non-free enterprise standards, not to mention the level of support you've given us has been excellent and I appreciate it greatly. It was a clear choice for us over courier once NFS support became a reality. Loads on the exact same hardware dropped from an average of 5 to 0.5, quite amazing, not to mention the speed benefit of the indexes. Our users with extremely large Maildir's were very satisfied.
I see only two proper solutions:
- Change your architecture so that all mail accesses to a specific user go through a single server. Install Dovecot proxy so all IMAP/POP3 connections go through it to the correct server.
We've discussed this internally and are still considering layer7 username balancing as a possibility, but I haven't worked too much on the specifics yet. We've only been running for two months on dovecot, so we wanted to give it some burn in time and see how things progressed. Now that the core dumps are fixed, I think we might be able to live with the corruption for awhile. The only user visible issue that I was aware of was the the users' mailbox disappearing when the processes died, but since that's not happening any more I'll have to see if anyone notices the corruption.
Thanks for all the feedback. I'm going over some of the ideas you suggested and we'll be thinking about long term solutions.