Ralf Hildebrandt put forth on 11/8/2010 12:44 PM:
- Stan Hoeppner <stan@hardwarefreak.com>:
Does this machine have more than 4GB of RAM? You do realize that merely utilizing PAE will cause an increase in context switching, whether on bare medal or in a VM guest. It will probably actually be much higher with a VM guest running a PAE kernel. Also, please tell me the ESX kernel you're running is native 64 bit, not 32 bit. If the VMWare kernel itself is doing PAE, as well as the guest Linux kernel, this may fully explain the performance disaster you have on your hands, if it is indeed due to context switching.
It sure work with 1.2.x now, so that's not really the problem
I'm not so sure we can make that assumption. I'm leaning toward something other than context switches, as they are obviously very high with VMWare, always.
The bigger question is, why does this problem surface so readily while running Dovecot 2.0.x and not while running Dovecot 1.2.x?
EXACTLY
Is 1.2.x merely tickling the dragon's chin, whereas 2.0.x is sticking it's head into the dragon's mouth?
I'd say the difference between 1.2 and 2.0 is so dramatic that it's probably something else.
Given what we know, that the increase in CPU time is in guest kernel space, or at least appears so, I'm guessing that Dovecot 2.x is making a system or library call(s) which your kernel is racing with for extended time yet still releasing. Your best bet I'm thinking is to put a trace on each Dovecot process and find which one(s) are waiting the longest for system call returns. Once you know which process is triggering the problem you can start to narrow down the code segment, obviously with Timo's help. I'm starting to get out of my element at this point.
This very well may be the case. You need to also look at the CONFIG_HZ= value of the Linux kernel of the guest. If it's a tickless kernel you should be fine. If tickless, IIRC, you should see CONFIG_NO_HZ=y.
# fgrep HZ config-2.6.32-23-generic-pae CONFIG_NO_HZ=y # CONFIG_HZ_100 is not set CONFIG_HZ_250=y # CONFIG_HZ_300 is not set # CONFIG_HZ_1000 is not set CONFIG_HZ=250 CONFIG_MACHZ_WDT=m
I can't tell from that which is being used as both tickless and 250 are configured. If it's 250 that should still be fine. That will generate in the neighborhood of 2000 interrupts/sec with 8 vCPUs, which is the same as a "workstation" kernel on two vCPUs, which would be configured with CONFIG_HZ=1000.
-- Stan