On May 20 16:29, Urban Loesch wrote:
I checked my kernel and the patch mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=681578
(comment 31) is not applied. It comes in version 3.0.30 and 3.2.17.
I will see what tomorrow happens under more load. If I have the problem again, I give 3.2.17 a chance.
We've seen similar behavior on a similar system with a similar workload.
We've tried a 3.0.31 - after the epoll patch was applied upstream - without seeing a difference. Right now we're running a 3.3.7 with vs2.3.3.4, and this has reduced the problem quite a bit, but not eliminated it completely.
Stracing the processes in D state from before they hang has just revealed something interesting, however, pointing to an issue with inotify rather than epoll.
[snip] [...] 15414 23:27:36 inotify_init() = 12 <0.000024> [...] 15414 23:27:36 close(12 <unfinished ...> 15414 23:28:51 <... close resumed> ) = 0 <74.593917> 15414 23:28:51 close(9 <unfinished ...> 15414 23:28:51 <... close resumed> ) = 0 <0.000080> 15414 23:28:51 exit_group(0) = ? [/snip]
In short, as far as we can tell, all the processes in D state appear to be waiting to close the file handle they got from their inotify_init(), and eventually all these close()s go through almost simultaneously.
Right now we're trawling for locking issues related to inotify, with our focus mainly at the VServer patch set. I would very much appreciate updates on your - or anyone else's - findings and progress.
Yours,
Jesper Nyerup.