On 01/22/2010 01:15 PM, Brandon Davidson wrote:
We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4, Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster (active/standby) in a L4 profile distributing connections round-robin, maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks), 10k peak concurrent connections for 45k total accounts. We used to run with the noac mount option, but performance was abysmal, and we were approaching 80% CPU utilization on the filers at peak load. After removing noac, our CPU is down around 30%, and our NFS ops/sec rate is maybe 1/10th of what it used to be.
Wow, that's almost the exact same setup we use, except we have 10 IMAP/POP and a clustered pair of FAS920's with 10K drives which are getting replaced in a few weeks. We also have a pair of clustered 3050's, but they're not running dovecot (yet).
You're right about noac though, it absolutely destroyed our netapps. Of course the corruption was all but eliminated, but the filer performance was so bad our users immediately noticed. Definitely not an option.
The downside to this is that we've started seeing significantly more crashing and mailbox corruption. Timo's latest patch seems to have fixed the crashing, but the corruption just seems to be the cost of distributing users at random across our backend servers.
Yep, I agree. Like I said in the last email, we'll going to deal with it for now and see if anyone really notices. I can live with it if the users don't care.
Timo, speaking of which, I'm guessing everyone is happy with the latest patches, any ETA on 1.2.10? ;)
We've thought about enabling IP-based session affinity on the load balancer, but this would concentrate the load of our webmail clients, as well as not really solving the problem for users that leave clients open on multiple systems.
We currently have IP session 'sticky' on our L4's and it didn't help all that much. yes, it reduces thrashing on the backend, but ultimately it won't help the corruption. Like you said, multiple logins will still go to different servers when the IP's are different.
How if your webmail architecture setup? We're using imapproxy to spread them them out across the same load balancer, so essentially all traffic from outside and inside get's balanced. The trick is we have an internal load balanced virtual IP that spreads the load out for webmail on private IP space. If they were to go outside they would get NAT'd as one outbound IP, so we just go inside and get the benefit of balancing.
Anyway, that's where we're at with the issue. As a data point for your discussion with your boss:
- With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most of these were related to users going over quota.
- After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes a day. The crashes were highly visible to the users, as their mailbox would appear to be empty until the rebuild completed.
- Since applying the latest patch, we've seen no crashes, and 60-70 'Corrupt' errors a day. We have not had any new user complaints.
That's where we are, and as long as the corruptions stay user invisible, I'm fine with it. Crashes seem to be the only user visible issue so far, with "noac" being out of the question unless they buy a ridiculously expensive filer.