- Ralf Hildebrandt Ralf.Hildebrandt@charite.de:
I'm getting constantly high numbers of page reclaims & involuntary context switches for dovecot/auth.
page reclaims = minor faults = cpu switching back to system-mode, But why is the auth process doing that so excessively? Same for the large number of involuntary context switches...
Some additions:
The last time we had 2.0 at the start we came into big trouble which could also be seen on the VMware ESX side. The CPU load was about 95% constantly and on the VM side the processes showed up in top at mainly using kernel space (system load).
Now we didn't have that high load in the morning, of course processes had been in the kernel space too often. But: until the load isn't getting too high the ESX doesn't show any problems, even the stats went up and down (what they didn't do the last time we had the real problems, they just stayed in an even upper line...).
Of course we could test it during the main noon time but in that case the mailsystem begins to stumble on high load and users might complain. We also have no real test scenario because it's not easy to get a "real" pressure on the machine, so we have to test it in the production line. But I cannot switch on 2.0 permanently this would cause too many problems.
Anyway, even if it runs without making problems on the ESX side we can see the processes in the kernel space. They're way too long there and Ralf seems to find the reason: too many page faults. That's all we can say now.
Regards,
Udo
Udo Wolter Geschäftsbereich IT | Abt. System Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570847 | Fax +49 30 450 7570600 Udo.Wolter@Charite.de | http://www.charite.de