On 7/11/2011 2:57 PM, Matthew Macdonald-Wallace wrote:
On Mon, 2011-07-11 at 13:47 -0500, Stan Hoeppner wrote:
On 7/11/2011 11:22 AM, lists@truthisfreedom.org.uk wrote:
They're showing as between 20 and 24 for the POP3 servers and 1.4 for the IMAP servers.
FULL STOP. Oh my lordy. Something is ridiculously wrong here. You have 12 physical cores with only ~600 simultaneous pop connections. That's only 50 per core. Even if those are the 'lowly' 2.4GHz 5645 chips each core should be able to handle a couple hundred pop connections. If you were truly hitting an actual load of 20-24, a single one of those boxes would be bringing your NetApp to its knees (assuming GbE) due to the amount of IO that would be taking place with the CPUs this busy.
Good, so my assumption that something was wrong was correct and as the NetApp isn't on its knees...
So a kernel update is more than sensible...
Disable HT regardless of kernel upgrading. See if it helps the load issue with the current kernel. Then go ahead and upgrade the kernel. If the CentOS repos don't have anything in the 2.6.3x series grab: http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.39.3.tar.bz2
Are these virtual machines? You didn't state so previously. Running 2.6.18 as a VM guest on these machines may also be part of the incorrect load reporting problem. If so, run the data collector daemon inside the hypervisor itself so you get actual load figures. You'll never get accurate performance metrics for a whole box from a kernel/daemon inside a VM guest.
-- Stan