[Dovecot] High Load Average on POP/IMAP.

Stan Hoeppner stan at hardwarefreak.com
Wed Aug 21 15:30:06 EEST 2013


On 8/21/2013 5:37 AM, Kavish Karkera wrote:

> We have a serious issue running on our POP/IMAP servers these days. The load average of a servers 
> spikes up to 400-500  as a uptime command result, for a particular time period , to be specific 
> mostly in noon time and evening, but it last for few minutes only.
> 
> We have 2 servers running dovecot 1.1.20 , in loadbanlancer, We have used KEEPLIVE (1.1.13) for 
> loadbalacing.
> 
> Server specification.
> Operating System : CentOS 5.5 64bit
> CPU cores : 16
> RAM : 8GB
> 
> Mail and Indexes are mounted on NFS (NetApp).
...

> Cpu(s):  8.3%us,  7.6%sy,  0.0%ni,  8.3%id, 75.0%wa,  0.0%hi,  0.8%si,  0.0%st
                                              ^^^^^^^
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                 
>   408 mysql     18   0  384m  38m 4412 S 52.8  0.2  42221:44 mysqld                                                                   

This doesn't seem to be a dovecot issue.  mysql has apparently 8 (or
more) threads on 8 cores all blocking on IO.  I see a few possible causes.

1.  The NetApp is unable to keep up with the request rate because:

   a.  There are too few spindles in the RAID set backing this NFS
volume and/or the file(s) aren't properly striped across all spindles

   b.  An inappropriate RAID level.  The mysql job is apparently doing
large table updates and you're experiencing massive RMW latency from
RAID5/6.  This is why one should never put a transactional database, or
one that sees large frequent table updates, on a parity RAID
volume--unless the disks are SSD.  SSDs have no mechanical parts, thus
RMW latency is almost nonexistent.

2. Apparently 8 (or more) threads are concurrently accessing the same
file or files.  Thus the massive iowait could simply be the result of
filesystem and/or NFS locking, NFS client caching issues, etc.

The cause of the massive iowait could be one or all of the above, or
could be something else entirely.  These are the typical causes.

You seem to have a database job scheduled to run twice daily that
triggers the problem.  Identify this job, figure out what it does, why
it does it, how necessary it is, and if it can be scheduled to run at
off peak hours.  If it can you may want to simply do so, as it may be
expensive, in hardware and/or labor dollars, to fix the IO latency problem.

-- 
Stan



More information about the dovecot mailing list