[Dovecot] Ongoing performance issues with 2.0.x

Stan Hoeppner stan at hardwarefreak.com
Tue Nov 9 10:31:59 EET 2010


Ralf Hildebrandt put forth on 11/8/2010 12:44 PM:
> * Stan Hoeppner <stan at hardwarefreak.com>:
> 
>> Does this machine have more than 4GB of RAM?  You do realize that merely
>> utilizing PAE will cause an increase in context switching, whether on
>> bare medal or in a VM guest.  It will probably actually be much higher
>> with a VM guest running a PAE kernel.  Also, please tell me the ESX
>> kernel you're running is native 64 bit, not 32 bit.  If the VMWare
>> kernel itself is doing PAE, as well as the guest Linux kernel, this may
>> fully explain the performance disaster you have on your hands, if it is
>> indeed due to context switching.
> 
> It sure work with 1.2.x now, so that's not really the problem

I'm not so sure we can make that assumption.  I'm leaning toward
something other than context switches, as they are obviously very high
with VMWare, always.

>> The bigger question is, why does this problem surface so readily while
>> running Dovecot 2.0.x and not while running Dovecot 1.2.x?
> 
> EXACTLY
> 
>> Is 1.2.x merely tickling the dragon's chin, whereas 2.0.x is sticking
>> it's head into the dragon's mouth?
> 
> I'd say the difference between 1.2 and 2.0 is so dramatic that it's
> probably something else.

Given what we know, that the increase in CPU time is in guest kernel
space, or at least appears so, I'm guessing that Dovecot 2.x is making a
system or library call(s) which your kernel is racing with for extended
time yet still releasing.  Your best bet I'm thinking is to put a trace
on each Dovecot process and find which one(s) are waiting the longest
for system call returns.  Once you know which process is triggering the
problem you can start to narrow down the code segment, obviously with
Timo's help.  I'm starting to get out of my element at this point.

>> This very well may be the case.  You need to also look at the CONFIG_HZ=
>> value of the Linux kernel of the guest.  If it's a tickless kernel you
>> should be fine.  If tickless, IIRC, you should see CONFIG_NO_HZ=y.
> 
> # fgrep HZ config-2.6.32-23-generic-pae
> CONFIG_NO_HZ=y
> # CONFIG_HZ_100 is not set
> CONFIG_HZ_250=y
> # CONFIG_HZ_300 is not set
> # CONFIG_HZ_1000 is not set
> CONFIG_HZ=250
> CONFIG_MACHZ_WDT=m

I can't tell from that which is being used as both tickless and 250 are
configured.  If it's 250 that should still be fine.  That will generate
in the neighborhood of 2000 interrupts/sec with 8 vCPUs, which is the
same as a "workstation" kernel on two vCPUs, which would be configured
with CONFIG_HZ=1000.

-- 
Stan


More information about the dovecot mailing list