[Dovecot] quick question

Brandon Davidson brandond at uoregon.edu
Fri Jan 22 20:15:45 EET 2010


David,

> -----Original Message-----
> From: dovecot-bounces+brandond=uoregon.edu at dovecot.org
[mailto:dovecot-
> Our physical setup is 10 Centos 5.4 x86_64 IMAP/POP servers, all with
> the same NFS backend where the index, control, and Maildir's for the
> users reside. Accessing this are direct connections from clients, plus
> multiple squirrelmail webservers, and pine users, all at the same time
> with layer4 switch connection load balancing.
> 
> Each server has an average of about 400 connections, for a total of
> around concurrent 4000 during a normal business day. This is out of a
> possible user population of about 15,000.
> 
> All our dovecot servers syslog to one machine, and on average I see
> about 50-75 instances of file corruption per day. I'm not counting
each
> line, since some instances of corruption generate a log message for
each
> uid that's wrong. This is just me counting "user A was corrupted once
at
> 10:00, user B was corrupted at 10:25" for example.

We have a much similar setup - 8 POP/IMAP servers running RHEL 5.4,
Dovecot 1.2.9 (+ patches), F5 BigIP load balancer cluster
(active/standby) in a L4 profile distributing connections round-robin,
maildirs on two Netapp Filers (clustered 3070s with 54k RPM SATA disks),
10k peak concurrent connections for 45k total accounts. We used to run
with the noac mount option, but performance was abysmal, and we were
approaching 80% CPU utilization on the filers at peak load. After
removing noac, our CPU is down around 30%, and our NFS ops/sec rate is
maybe 1/10th of what it used to be.

The downside to this is that we've started seeing significantly more
crashing and mailbox corruption. Timo's latest patch seems to have fixed
the crashing, but the corruption just seems to be the cost of
distributing users at random across our backend servers.

We've thought about enabling IP-based session affinity on the load
balancer, but this would concentrate the load of our webmail clients, as
well as not really solving the problem for users that leave clients open
on multiple systems. I've done a small bit of looking at nginx's imap
proxy support, but it's not really set up to do what we want, and would
require moving the IMAP virtual server off our load balancers and on to
something significantly less supportable. Having the dovecot processes
'talk amongst themselves' to synchronize things, or go into proxy mode
automatically, would be fantastic.

Anyway, that's where we're at with the issue. As a data point for your
discussion with your boss:
* With 'noac', we would see maybe 1 or two 'corrupt' errors a day. Most
of these were related to users going over quota.
* After removing 'noac', we saw 5-10 'Corrupt' errors and 20-30 crashes
a day. The crashes were highly visible to the users, as their mailbox
would appear to be empty until the rebuild completed.
* Since applying the latest patch, we've seen no crashes, and 60-70
'Corrupt' errors a day. We have not had any new user complaints.

Hope that helps,

-Brad


More information about the dovecot mailing list