On 03/08/2011 09:26 AM, Chris Wilson wrote:
Hi Thierry,
On Tue, 8 Mar 2011, Thierry de Montaudry wrote:
On 08 Mar 2011, at 13:24, Chris Wilson wrote:
top - 11:10:14 up 14 days, 12:04, 2 users, load average: 55.04, 29.13, 14.55 Tasks: 474 total, 60 running, 414 sleeping, 0 stopped, 0 zombie Cpu(s): 99.6%us, 0.3%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 16439812k total, 16353268k used, 86544k free, 33268k buffers Swap: 4192956k total, 140k used, 4192816k free, 8228744k cached
As you can see the numbers (55.04, 29.13, 14.55) the load was busy getting higher when I took this snapshot and this was not a normal situation. Usually this machine's load is only between 1 and 4, which is quite ok for a quad core. It only happens when dovecot start generating errors, and pop3, imap and http get stuck. It went up to 200, and I was still able to stop web and mail daemons, then restart them, and everything was back to normal.
I don't have a definite answer, but I remember that there has been a long-running bug in the Linux kernel with schedulers behaving badly under heavy writes:
"One of the problems commonly talked about in our forums and elsewhere is the poor responsiveness of the Linux desktop when dealing with significant disk activity on systems where there is insufficient RAM or the disks are slow. The GUI basically drops to its knees when there is too much disk activity..." [http://www.phoronix.com/scan.php?page=news_item&px=ODQ3Mw] (note, it's not just the GUI, all other tasks can starve when a disk I/O queue builds up).
"There are a few options to tune the linux IO scheduler that can help a bunch... Typically CFQ stalls too long under heavy writes, especially if your disk subsystem sucks, so particularly if you have several spindles deadline is worth a try." [http://communities.vmware.com/thread/82544]
"I run Ubuntu on a moderately powerful quad-core x86-64 system and the desktop response is basically crippled whenever something is reading or writing large files as fast as it can (at normal priority)... For example, cat /path/to/LARGE_FILE> /dev/null ... Everything else gets completely unusable because of the I/O latency." [https://bugs.launchpad.net/ubuntu/+source/linux/+bug/343371]
"I was just running mkfs.ext4 -b 4096 -E stride=128 -E stripe-width=128 -O ^has_journal /dev/sdb2 on my SSD18M connected via USB1.1, and the result was, well, absolutely, positively _DEVASTATING_. The entire system became _FULLY_ unresponsive, not even switching back down to tty1 via Ctrl-Alt-F1 worked (took 20 seconds for even this key to be respected)." [http://lkml.org/lkml/2010/4/4/86]
"This regression has been around since about the 2.6.18 timeframe and has eluded a lot of testing to isolate the root cause. The most promising fix is in the VM subsystem (mm) where the LRU scan has been changed to favor keeping executable pages active longer. Most of these symptoms come down to VM thrashing to make room for I/O pages. The key change/commit is ab4754d24a0f2e05920170c845bd84472814c6, "vmscan: make mapped executable pages the first class citizen"... This change was merged into the 2.6.31r1 kernel." [https://bugs.launchpad.net/ubuntu/+source/linux/+bug/131094/comments/235]
One possible cause is that writing to a slow device can block the write queue for other devices, causing the machine to come to a standstill when there's plenty of useful work that it could be doing.
This could cause a cascading failure in your server as soon as disk I/O write load goes over a certain point, a bit like a swap death. I'm not sure if the fact that you're using NFS makes a difference; perhaps only if you memory-map files?
You could test this by booting with the NOOP or anticipatory scheduler instead of the default CFQ to see if it makes any difference.
Cheers, Chris.
You can change it on the fly with:
echo noop > /sys/block/${DEVICE}/queue/scheduler
-- -Eric 'shubes'