Re: [Dovecot] quick question

22 Jan 2010

      David,
...
-----Original Message-----
From: dovecot-bounces+brandond=uoregon.edu@dovecot.org
[mailto:dovecot-
Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with
the same NFS backend where the index, control, and Maildir's for the
users reside. Accessing this are direct connections from clients, plus
multiple squirrelmail webservers, and pine users, all at the same time
with layer4 switch connection load balancing.
Each server has an average of about 400 connections, for a total of
around concurrent 4000 during a normal business day. This is out of a
possible user population of about 15,000.
All our dovecot servers syslog to one machine, and on average I see
about 50-75 instances of file corruption per day. I'm not counting
each
line, since some instances of corruption generate a log message for
each
uid that's wrong. This is just me counting "user A was corrupted once
at
10:00, user B was corrupted at 10:25" for example.
We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4,
Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster
(active/standby) in a L4 profile distributing connections round-robin,
maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks),
10k peak concurrent connections for 45k total accounts. We used to run
with the noac mount option, but performance was abysmal, and we were
approaching 80% CPU utilization on the filers at peak load. After
removing noac, our CPU is down around 30%, and our NFS ops/sec rate is
maybe 1/10th of what it used to be.
The downside to this is that we've started seeing significantly more
crashing and mailbox corruption. Timo's latest patch seems to have fixed
the crashing, but the corruption just seems to be the cost of
distributing users at random across our backend servers.
We've thought about enabling IP-based session affinity on the load
balancer, but this would concentrate the load of our webmail clients, as
well as not really solving the problem for users that leave clients open
on multiple systems. I've done a small bit of looking at nginx's imap
proxy support, but it's not really set up to do what we want, and would
require moving the IMAP virtual server off our load balancers and on to
something significantly less supportable. Having the dovecot processes
'talk amongst themselves' to synchronize things, or go into proxy mode
automatically, would be fantastic.
Anyway, that's where we're at with the issue. As a data point for your
discussion with your boss:

With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most
of these were related to users going over quota.
After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes
a day. The crashes were highly visible to the users, as their mailbox
would appear to be empty until the rebuild completed.
Since applying the latest patch, we've seen no crashes, and 60-70
'Corrupt' errors a day. We have not had any new user complaints.

Hope that helps,
-Brad

Re: [Dovecot] quick question

Brandon Davidson