[Dovecot] load increase after upgrade to 2.0.8

Cor Bosman

8 Dec 2010 8 Dec '10

4:39 p.m.

I upgraded most of our servers from 1.2.x to 2.0.8 and am noticing a really big increase in server load. This is across the board, not any specific server. Check this screenshot of a load graph: http://grab.by/7N8V

Is there anything i should be looking at that could cause such a drastic load increase? Conf file is available here: http://wa.ter.net/download/doveconf.txt

Regards,

Cor

Show replies by date

Ralf Hildebrandt

8 Dec 8 Dec

4:50 p.m.

Cor Bosman <cor@xs4all.nl>:

...

I upgraded most of our servers from 1.2.x to 2.0.8 and am noticing a really big increase in server load. This is across the board, not any specific server. Check this screenshot of a load graph: http://grab.by/7N8V

Yes, this looks like my graphs. Same increase. Factor 5.

...

Is there anything i should be looking at that could cause such a drastic load increase? Conf file is available here: http://wa.ter.net/download/doveconf.txt

We need to find out WHAT is common in our two config files!

Could you post "doveconf -n" output (I think it shows only non-default values), the current output is a bit longish.

-- Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebrandt@charite.de | http://www.charite.de

Ralf Hildebrandt

4:53 p.m.

Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:

...

...
http://wa.ter.net/download/doveconf.txt

We need to find out WHAT is common in our two config files!

My config is attached

Timo Sirainen

9 Dec 9 Dec

12:29 a.m.

It would be useful to know if the problem is with the imap process startup or after that. If you apply the attached patch, you can set:

service imap { service_count = 0 }

This causes the imap processes to be reused for future connections. With the patch enabled the processes keep the ability to switch back to running as root and then switch to another user, so this works with multiple UIDs.

Cor Bosman

12:52 a.m.

Hope im doing what you want :) Getting kinda confusing.

Right now I have running:

3 servers with just a new kernel, no other changes 2 servers with service_count = 0, no other changes 1 server with service_count = 0, and src/imap/main.c patch

Is that ok?

I wont be seeing much impact until tomorrow morning, not many users on at night.

Cor

Timo Sirainen

12:57 a.m.

On 8.12.2010, at 22.52, Cor Bosman wrote:

...

Right now I have running:

3 servers with just a new kernel, no other changes 2 servers with service_count = 0, no other changes 1 server with service_count = 0, and src/imap/main.c patch

Is that ok?

Looks good!

...

I wont be seeing much impact until tomorrow morning, not many users on at night.

I'm hoping that at least the last server will see the load shrink back.

Cor Bosman

7:57 a.m.

...

...
1 server with service_count = 0, and src/imap/main.c patch

Is that ok?

Looks good!

I had to revert this patch because it's causes permission errors on our filesystem. Directories are being created for user X with the euid of user Y (which fails, so at least i didnt get corruption, just that service failed on that server).

Cor

Timo Sirainen

1:03 a.m.

On 8.12.2010, at 22.52, Cor Bosman wrote:

...

1 server with service_count = 0, and src/imap/main.c patch

By this you mean service_count=0 for both service imap-login and service imap blocks, right?

Cor Bosman

1:09 a.m.

...

...
1 server with service_count = 0, and src/imap/main.c patch

By this you mean service_count=0 for both service imap-login and service imap blocks, right?

Yes, on both imap-login and imap,

The 2 servers without the patch only have it on imap-login.

Cor

Mark Moseley

1:23 a.m.

On Wed, Dec 8, 2010 at 3:03 PM, Timo Sirainen <tss@iki.fi> wrote:

...

On 8.12.2010, at 22.52, Cor Bosman wrote:

...
1 server with service_count = 0, and src/imap/main.c patch

By this you mean service_count=0 for both service imap-login and service imap blocks, right?

Speaking from my own experience, the system loads on our dovecot boxes went up *substantially* when we upgraded kernels from 2.6.32.x and 2.6.33.x to newer ones (late 2.6.35.x and 2.6.36 -- haven't tried 2.6.36.1 yet). But I also saw loads on all sort of other types of boxes grow when moved to 2.6.35.x and 2.6.36, so it's not necessarily dovecot-related. Though you've got plenty to choose from between 2.6.27.x and up.

Getting 'imap-login' and 'pop3-login' set to service_count=0 and 'pop3' and 'imap' set to service_count=1000 (as per Timo's suggestion) helped keep the boxes from spinning into oblivion. To reduce the enormous amount of context switches, I've got 'pop3's client_limit set to 4. I played around with 'imap's client_limit between 1 and 5 but haven't quite found the sweet spot yet. pop3 with client_limit 4 seems to work pretty good. That brought context switches down from 10,000-30,000 to sub-10,000. These are somewhat crummy Dell 850s (2 cores, 2 gigs) but lots of them, dedicated to just POP/IMAP, storage on NFS (indexes on local disk though) running 2.0.7 across the board (60 boxes or so).

Ralf Hildebrandt

9:54 a.m.

Mark Moseley <moseleymark@gmail.com>:

...

On Wed, Dec 8, 2010 at 3:03 PM, Timo Sirainen <tss@iki.fi> wrote:

...
On 8.12.2010, at 22.52, Cor Bosman wrote:

...
1 server with service_count = 0, and src/imap/main.c patch

By this you mean service_count=0 for both service imap-login and service imap blocks, right?

Speaking from my own experience, the system loads on our dovecot boxes went up *substantially* when we upgraded kernels from 2.6.32.x and 2.6.33.x to newer ones (late 2.6.35.x and 2.6.36 -- haven't tried 2.6.36.1 yet). But I also saw loads on all sort of other types of boxes grow when moved to 2.6.35.x and 2.6.36, so it's not necessarily dovecot-related. Though you've got plenty to choose from between 2.6.27.x and up.

We're on 2.6.32 and the load only goes up when I change dovecot (not when I change the kernel, which I didn't do so far)

...

Getting 'imap-login' and 'pop3-login' set to service_count=0 and 'pop3' and 'imap' set to service_count=1000 (as per Timo's suggestion) helped keep the boxes from spinning into oblivion. To reduce the enormous amount of context switches, I've got 'pop3's client_limit set to 4. I played around with 'imap's client_limit between 1 and 5 but haven't quite found the sweet spot yet. pop3 with client_limit 4 seems to work pretty good. That brought context switches down from 10,000-30,000 to sub-10,000.

Interesting. Would that spawn a lot of pop3 processes? On the other hand, almost nobody is using pop3 here

Mark Moseley

8:18 p.m.

On Wed, Dec 8, 2010 at 11:54 PM, Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> wrote:

...

Mark Moseley <moseleymark@gmail.com>:

...
On Wed, Dec 8, 2010 at 3:03 PM, Timo Sirainen <tss@iki.fi> wrote:

...
On 8.12.2010, at 22.52, Cor Bosman wrote:

...
1 server with service_count = 0, and src/imap/main.c patch

By this you mean service_count=0 for both service imap-login and service imap blocks, right?

Speaking from my own experience, the system loads on our dovecot boxes went up *substantially* when we upgraded kernels from 2.6.32.x and 2.6.33.x to newer ones (late 2.6.35.x and 2.6.36 -- haven't tried 2.6.36.1 yet). But I also saw loads on all sort of other types of boxes grow when moved to 2.6.35.x and 2.6.36, so it's not necessarily dovecot-related. Though you've got plenty to choose from between 2.6.27.x and up.

We're on 2.6.32 and the load only goes up when I change dovecot (not when I change the kernel, which I didn't do so far)

If you at some point upgrade to >2.6.35, I'd be interested to hear if the load skyrockets on you. I also get the impression that the load average calculation in these recent kernels is 'touchier' than in pre-2.6.35. Even with similar CPU and I/O utilization, the load average on a >2.6.35 both is much higher than pre- and it also seems to react more quickly; more jitter I guess. That's based on nothing scientific though.

...

...
Getting 'imap-login' and 'pop3-login' set to service_count=0 and 'pop3' and 'imap' set to service_count=1000 (as per Timo's suggestion) helped keep the boxes from spinning into oblivion. To reduce the enormous amount of context switches, I've got 'pop3's client_limit set to 4. I played around with 'imap's client_limit between 1 and 5 but haven't quite found the sweet spot yet. pop3 with client_limit 4 seems to work pretty good. That brought context switches down from 10,000-30,000 to sub-10,000.

Interesting. Would that spawn a lot of pop3 processes? On the other hand, almost nobody is using pop3 here

Upping the client_limit actually results in less processes, since a single process can service up to #client_limit connections. When I bumped up the client_limit for imap, my context switches plummeted. Though as Timo pointed out on another thread the other day when I was asking about this, when that proc blocks on I/O, it's blocking all the connections that the process is servicing.Timo, correct me if I'm wildly off here -- I didn't even know this existed before a week or two ago. So you can then end up creating a bottleneck, thus why I've been playing with finding a sweet spot for imap. I figure that enough of a process's imap connections must be sitting in IDLE at any given moment, so setting client_limit to like 4 or 5 isn't too bad. Though it's not impossible that by putting multiple connections on a single process, I'm actually throttiling the system, resulting in fewer context switches (though I'd imagine bottlenecked procs would be blocked on I/O and do a lot of volcs's).

Ralf Hildebrandt

9:13 p.m.

Mark Moseley <moseleymark@gmail.com>:

...

...
We're on 2.6.32 and the load only goes up when I change dovecot (not when I change the kernel, which I didn't do so far)

If you at some point upgrade to >2.6.35, I'd be interested to hear if the load skyrockets on you.

You mean even more? I'm still hoping it would decrease at some point :) I updated to 2.6.32-27-generic-pae today. I wonder what happens.

...

I also get the impression that the load average calculation in these recent kernels is 'touchier' than in pre-2.6.35. Even with similar CPU and I/O utilization, the load average on a >2.6.35 both is much higher than pre- and it also seems to react more quickly; more jitter I guess. That's based on nothing scientific though.

Interesting.

...

Upping the client_limit actually results in less processes, since a single process can service up to #client_limit connections. When I bumped up the client_limit for imap, my context switches plummeted.

Which setting are you using now?

...

Though as Timo pointed out on another thread the other day when I was asking about this, when that proc blocks on I/O, it's blocking all the connections that the process is servicing.Timo, correct me if I'm wildly off here -- I didn't even know this existed before a week or two ago. So you can then end up creating a bottleneck, thus why I've been playing with finding a sweet spot for imap.

Blocking on /proc? Never heard that before.

...

I figure that enough of a process's imap connections must be sitting in IDLE at any given moment, so setting client_limit to like 4 or 5 isn't too bad. Though it's not impossible that by putting multiple connections on a single process, I'm actually throttiling the system, resulting in fewer context switches (though I'd imagine bottlenecked procs would be blocked on I/O and do a lot of volcs's).

Mark Moseley

9:58 p.m.

On Thu, Dec 9, 2010 at 11:13 AM, Ralf Hildebrandt <Ralf.Hildebrandt@charite.de> wrote:

...

Mark Moseley <moseleymark@gmail.com>:

...
...
We're on 2.6.32 and the load only goes up when I change dovecot (not when I change the kernel, which I didn't do so far)

If you at some point upgrade to >2.6.35, I'd be interested to hear if the load skyrockets on you.

You mean even more? I'm still hoping it would decrease at some point :) I updated to 2.6.32-27-generic-pae today. I wonder what happens.

...
I also get the impression that the load average calculation in these recent kernels is 'touchier' than in pre-2.6.35. Even with similar CPU and I/O utilization, the load average on a >2.6.35 both is much higher than pre- and it also seems to react more quickly; more jitter I guess. That's based on nothing scientific though.

Interesting.

Interesting, and a major PITA. If I had more time, I'd go back and iterate through each kernel leading up to 2.6.35 to see where things start to go downhill. You'd like to think each kernel version would make things just a little bit faster (or at least, nothing worse than the same). Though if it really is just more jittery and is showing higher loads but not actually working any harder than an older kernel, then it's just my perception only.

...

...
Upping the client_limit actually results in less processes, since a single process can service up to #client_limit connections. When I bumped up the client_limit for imap, my context switches plummeted.

Which setting are you using now?

At the moment, I'm using client_limit=5 for imap, but I keep playing with it. I have a feeling that's too high though. If I had faster cpus and more memory on these boxes, it wouldn't be so painful to put it back to 1.

...

...
Though as Timo pointed out on another thread the other day when I was asking about this, when that proc blocks on I/O, it's blocking all the connections that the process is servicing.Timo, correct me if I'm wildly off here -- I didn't even know this existed before a week or two ago. So you can then end up creating a bottleneck, thus why I've been playing with finding a sweet spot for imap.

Blocking on /proc? Never heard that before.

I was just being lazy. I meant 'process' :) So, if I'm understanding it correctly, assume you've got client_limit=2 and you've got connection A and connection B serviced by a single process. If A does a file operation that blocks, then B is effectively blocked too. So I imagine if you get enough I/O backlog, you can create a downward spiral where you can't service the connections faster than they're coming in and you top out at the #process_limit. Which, btw, I set to 300 for imap with client_limit=5.

...

...
I figure that enough of a process's imap connections must be sitting in IDLE at any given moment, so setting client_limit to like 4 or 5 isn't too bad. Though it's not impossible that by putting multiple connections on a single process, I'm actually throttiling the system, resulting in fewer context switches (though I'd imagine bottlenecked procs would be blocked on I/O and do a lot of volcs's).

Timo Sirainen

10:58 p.m.

On Thu, 2010-12-09 at 10:18 -0800, Mark Moseley wrote:

...

Upping the client_limit actually results in less processes, since a single process can service up to #client_limit connections. When I bumped up the client_limit for imap, my context switches plummeted. Though as Timo pointed out on another thread the other day when I was asking about this, when that proc blocks on I/O, it's blocking all the connections that the process is servicing.

Yeah. And it's even worse if it's blocking on waiting for a lock.

BTW. Do you have these kind of error messages in your log:

net_connect_unix(pop3) failed: Resource temporarily unavailable net_connect_unix(imap) failed: Resource temporarily unavailable

I think those are sometimes happening when not using client_limit=1, because all the processes are busy at that time and can't accept a new connection (while with client_limit=1 a new process would be created to handle it).

Mark Moseley

10 Dec 10 Dec

12:15 a.m.

On Thu, Dec 9, 2010 at 12:58 PM, Timo Sirainen <tss@iki.fi> wrote:

...

On Thu, 2010-12-09 at 10:18 -0800, Mark Moseley wrote:

...
Upping the client_limit actually results in less processes, since a single process can service up to #client_limit connections. When I bumped up the client_limit for imap, my context switches plummeted. Though as Timo pointed out on another thread the other day when I was asking about this, when that proc blocks on I/O, it's blocking all the connections that the process is servicing.

Yeah. And it's even worse if it's blocking on waiting for a lock.

BTW. Do you have these kind of error messages in your log:

net_connect_unix(pop3) failed: Resource temporarily unavailable net_connect_unix(imap) failed: Resource temporarily unavailable

I think those are sometimes happening when not using client_limit=1, because all the processes are busy at that time and can't accept a new connection (while with client_limit=1 a new process would be created to handle it).

Yeah, I do get small bursts of those, but not enough to get too worried about. I was assuming it was hitting process_limit for imap. On one box, for all of today I see two clusters of those errors, both lasting about 15-20 seconds apiece.

The problem is that the smaller I set client_limit (including =1) on imap, the more likely the mass of imap processes will push these box into swap (and these are old 2gig boxes; hoping to get replace them soon with newer Dells, like PE 450's).

Stan Hoeppner

2:54 a.m.

Mark Moseley put forth on 12/9/2010 12:18 PM:

...

If you at some point upgrade to >2.6.35, I'd be interested to hear if the load skyrockets on you. I also get the impression that the load average calculation in these recent kernels is 'touchier' than in pre-2.6.35.

This thread may be of value in relation to this issue:

http://groups.google.com/group/linux.kernel/browse_thread/thread/eb5cb488b74...

It seems there are some load issues regarding recent Linux kernels, from 2.6.34 (maybe earlier?) on up. The commit of the patch to fix it was Dec 8th--yesterday. So it'll be a while before distros get this patch out.

However, this still doesn't seem to explain Ralf's issue, where the kernel stays the same, but the Dovecot version changes, with 2.0.x causing the high load and 1.2.x being normal. Maybe 2.0.x simply causes this bug to manifest itself more loudly?

This Linux kernel bug doesn't explain the high load reported with 2.0.x on FreeBSD either. But it is obviously somewhat at play for people running these kernels versions and 2.0.x. To what degree it adds to the load I cannot say.

-- Stan

Timo Sirainen

3:10 a.m.

On 10.12.2010, at 0.54, Stan Hoeppner wrote:

...

However, this still doesn't seem to explain Ralf's issue, where the kernel stays the same, but the Dovecot version changes, with 2.0.x causing the high load and 1.2.x being normal. Maybe 2.0.x simply causes this bug to manifest itself more loudly?

Cor's debugging has so far shown that a single epoll_wait() call can sometimes generate a few thousand voluntary context switches. I can't really understand how that's possible. Those epoll_wait() calls about half of the total voluntary context switches generated by imap processes. We'll see tomorrow if poll() works better or if a small patch I made makes it better.

...

This Linux kernel bug doesn't explain the high load reported with 2.0.x on FreeBSD either.

Who has reported high load on FreeBSD? So far I know of only Ralf and Cor and both are using Linux.

Ralf Hildebrandt

9:39 a.m.

Timo Sirainen <tss@iki.fi>:

...

Cor's debugging has so far shown that a single epoll_wait() call can sometimes generate a few thousand voluntary context switches. I can't really understand how that's possible. Those epoll_wait() calls about half of the total voluntary context switches generated by imap processes. We'll see tomorrow if poll() works better or if a small patch I made makes it better.

That sounds promising :)

Stan Hoeppner

11:46 a.m.

Ralf Hildebrandt put forth on 12/10/2010 1:39 AM:

...

Timo Sirainen <tss@iki.fi>:

...
Cor's debugging has so far shown that a single epoll_wait() call can sometimes generate a few thousand voluntary context switches. I can't really understand how that's possible. Those epoll_wait() calls about half of the total voluntary context switches generated by imap processes. We'll see tomorrow if poll() works better or if a small patch I made makes it better.

That sounds promising :)

Has anyone considered a compiler issue? A gcc optimization flags issue? A gcc version issue? Something along these lines?

To generate the number of context switches/sec that has been mentioned, that would seem to indicate a fast loop running more times than it should but without actually doing much, thus not utilizing its full time slice, getting sidelined by the scheduler, but ready to run on each kernel tick. Thus it runs on each kernel tick. Could something like "-funroll-all-loops" be having problems with the 2.0.x source?

I'm talking a bit out of my element here. I know just enough about compiler technology and machine code to make me really dangerous, but not very useful. :) I'm just trying to look outside the box I guess, or leave no stone unturned. Pick you favorite analogy that applies.

Cor and Ralf, are you two using the same gcc version? Same flags? Are they different than the switches/flags most other 2.0.x OPs are using when they make the source?

-- Stan

Ralf Hildebrandt

12:05 p.m.

Stan Hoeppner <stan@hardwarefreak.com>:

...

Has anyone considered a compiler issue? A gcc optimization flags issue? A gcc version issue? Something along these lines?

Using gcc: gcc version 4.4.5 (Debian 4.4.5-8)

...

Cor and Ralf, are you two using the same gcc version? Same flags? Are they different than the switches/flags most other 2.0.x OPs are using when they make the source?

I'm using: ./configure --prefix=/usr/dovecot-2 --enable-maintainer-mode && make

Ralf Hildebrandt

12:07 p.m.

Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>:

...

I'm using: ./configure --prefix=/usr/dovecot-2 --enable-maintainer-mode && make

and export LDFLAGS="-Wl,--as-needed" export CPPFLAGS='-Wl,--as-needed' but omitting that makes no difference

Cor Bosman

3 p.m.

...

Using gcc: gcc version 4.4.5 (Debian 4.4.5-8)

We run gcc version 4.3.2

Im not using any configure options, except for a few that Timo had me try during debugging.

Cor

Timo Sirainen

15 Dec 15 Dec

5:46 p.m.

Attached patch should workaround this. The main problem with it is that Dovecot doesn't die very easily after this. You have to kill all the processes manually. I'll probably have to add yet another pipe just for this.

But I wonder if this could be considered a kernel bug too? I think I'll write a small test program and see what lkml thinks about it. Oh, and seeing if other operating systems handle it better would be interesting too.

BTW. I actually considered this already last week but then I thought it can't be it since v1.2 did the same thing. But no, v1.2 doesn't really do the same thing.

Ralf Hildebrandt

6:43 p.m.

...

Attached patch should workaround this. The main problem with it is that Dovecot doesn't die very easily after this. You have to kill all the processes manually. I'll probably have to add yet another pipe just for this.

Yes. Plenty of killall's are now needed!

So what does this patch actually do?

All client processes used to check on the master's status FD to check if it is still alive. And you disabled that?

...

But I wonder if this could be considered a kernel bug too? I think I'll write a small test program and see what lkml thinks about it. Oh, and seeing if other operating systems handle it better would be interesting too.

BTW. I actually considered this already last week but then I thought it can't be it since v1.2 did the same thing. But no, v1.2 doesn't really do the same thing.

Timo Sirainen

7:08 p.m.

On Wed, 2010-12-15 at 17:43 +0100, Ralf Hildebrandt wrote:

...

...
Attached patch should workaround this. The main problem with it is that Dovecot doesn't die very easily after this. You have to kill all the processes manually. I'll probably have to add yet another pipe just for this.

Yes. Plenty of killall's are now needed!

Since it's now pretty obvious that this is the problem and this has to be fixed for Linux users, I added a proper patch to hg: http://hg.dovecot.org/dovecot-2.0/rev/b0ec48006d57

...

So what does this patch actually do?

All client processes used to check on the master's status FD to check if it is still alive. And you disabled that?

Yes. And the new patch uses a new pipe fd.

Timo Sirainen

9:16 p.m.

BTW. Postfix's behavior is similar (unsurprisingly, since I basically copied its design for v2.0). I wonder how much of a problem this is with Postfix. Is a similar patch or a kernel fix going to drop context switches 10x there too?

http://marc.info/?l=linux-kernel&m=129243588809986&w=2

Robert Schetterer

9:34 p.m.

Am 15.12.2010 20:16, schrieb Timo Sirainen:

...

BTW. Postfix's behavior is similar (unsurprisingly, since I basically copied its design for v2.0). I wonder how much of a problem this is with Postfix. Is a similar patch or a kernel fix going to drop context switches 10x there too?

http://marc.info/?l=linux-kernel&m=129243588809986&w=2

however, youre patch should be tested strongly by users with problems to make sure it doesnt break something on other points you may contact Wietse Venema for design questions

-- Best Regards

MfG Robert Schetterer

Germany/Munich/Bavaria

Jerry

9:42 p.m.

On Wed, 15 Dec 2010 19:16:45 +0000 Timo Sirainen <tss@iki.fi> articulated:

...

BTW. Postfix's behavior is similar (unsurprisingly, since I basically copied its design for v2.0). I wonder how much of a problem this is with Postfix. Is a similar patch or a kernel fix going to drop context switches 10x there too?

http://marc.info/?l=linux-kernel&m=129243588809986&w=2

Interesting. On a FreeBSD-8.2/amd64 system with Postfix (2.8-20101108) installed I have not noticed any difference. I am assuming that this problem does not affect FreeBSD systems.

-- Jerry ✌ Dovecot.user@seibercom.net

Disclaimer: off-list followups get on-list replies or get ignored. Please do not ignore the Reply-To header.

Timo Sirainen

10:01 p.m.

On 15.12.2010, at 19.42, Jerry wrote:

...

...
BTW. Postfix's behavior is similar (unsurprisingly, since I basically copied its design for v2.0). I wonder how much of a problem this is with Postfix. Is a similar patch or a kernel fix going to drop context switches 10x there too?

http://marc.info/?l=linux-kernel&m=129243588809986&w=2

Interesting. On a FreeBSD-8.2/amd64 system with Postfix (2.8-20101108) installed I have not noticed any difference. I am assuming that this problem does not affect FreeBSD systems.

Correct. Only Linux is affected.

Anyway, Postfix's design has been like this forever, so no one would have ever noticed context switch counts increasing. But changing this might make them notice that context switches are dropping.

Cor Bosman

10:06 p.m.

...

Correct. Only Linux is affected.

Anyway, Postfix's design has been like this forever, so no one would have ever noticed context switch counts increasing. But changing this might make them notice that context switches are dropping.

Im a little surprised we havent seen more reports on dovecot as well. The impact was noticeable for us.

Thanks for looking into the issue!

Cor

Jose Celestino

16 Dec 16 Dec

1:38 a.m.

On Qua, 2010-12-15 at 21:06 +0100, Cor Bosman wrote:

...

...
Correct. Only Linux is affected.

Anyway, Postfix's design has been like this forever, so no one would have ever noticed context switch counts increasing. But changing this might make them notice that context switches are dropping.

Im a little surprised we havent seen more reports on dovecot as well. The impact was noticeable for us.

We had very little change on the load, no impact on client performance or anything to alarm us, after all it was a major upgrade.

Definitely there's wasn't a 20x increase on the load as on the cs's.

I guess that would depend on the CPUs, caches, etc.

-- Jose Celestino | http://japc.uncovering.org/files/japc-pgpkey.asc

"Assumption is the Mother of Screw-Up" -- Mr. John Elwood Hale

Ralf Hildebrandt

9:57 a.m.

Jose Celestino <japc@co.sapo.pt>:

...

I guess that would depend on the CPUs, caches, etc.

No it doesn't :)

Cor Bosman

1:53 p.m.

It looks like Timo's patch fixes the problem. Context switches are now back to normal, and the load graph is smooth and lower. I saw someone also posted a patch to the LKML.

Cor

Ralf Hildebrandt

1:56 p.m.

Cor Bosman <cor@xs4all.nl>:

...

It looks like Timo's patch fixes the problem. Context switches are now back to normal, and the load graph is smooth and lower.

Same here.

...

I saw someone also posted a patch to the LKML.

I guess I missed that one

Jose Celestino

3:26 p.m.

On Qui, 2010-12-16 at 12:56 +0100, Ralf Hildebrandt wrote:

...

Cor Bosman <cor@xs4all.nl>:

...

...
I saw someone also posted a patch to the LKML.

I guess I missed that one

http://lkml.org/lkml/2010/12/15/470

-- Jose Celestino | http://japc.uncovering.org/files/japc-pgpkey.asc

"Assumption is the Mother of Screw-Up" -- Mr. John Elwood Hale

Mark Moseley

8:33 p.m.

2010/12/16 Jose Celestino <japc@co.sapo.pt>:

...

On Qui, 2010-12-16 at 12:56 +0100, Ralf Hildebrandt wrote:

...

Cor Bosman <cor@xs4all.nl>:

...
...
I saw someone also posted a patch to the LKML.

I guess I missed that one

http://lkml.org/lkml/2010/12/15/470

Timo, if we apply the above kernel patch, do we still need to patch dovecot or is it just an either-or thing?

I'm guessing the reason I saw so many less context switches when using a client_limit of more than one due to having more connections per process == less children to wake up.

Cor Bosman

10:41 p.m.

For those interested, graph showing the difference before and after patch:

context switches: http://grab.by/7W9u load: http://grab.by/7W9x

Cor

Daniel L. Miller

17 Dec 17 Dec

12:09 a.m.

You mean these past two months of grief - concern over Dovecot 2.0's underlying design, exploring various configurations and settings, and general hysteria came down to...a two line patch?!?!?!

I know it was more than that - and that in fact this exposed a Linux kernel flaw - but I'm still laughing over this (since my tiny installations never had a significant load that would have been affected, I can afford to laugh).

As always - sincere thanks for everything Timo!

Daniel

Timo Sirainen

5:44 a.m.

On 16.12.2010, at 22.09, Daniel L. Miller wrote:

...

You mean these past two months of grief - concern over Dovecot 2.0's underlying design, exploring various configurations and settings, and general hysteria came down to...a two line patch?!?!?!

At least now I know much more about context switches than I ever did before :)

Ralf Hildebrandt

1:20 p.m.

Timo Sirainen <tss@iki.fi>:

...

On 16.12.2010, at 22.09, Daniel L. Miller wrote:

...
You mean these past two months of grief - concern over Dovecot 2.0's underlying design, exploring various configurations and settings, and general hysteria came down to...a two line patch?!?!?!

At least now I know much more about context switches than I ever did before :)

That's the spirit!

Ralf Hildebrandt Geschäftsbereich IT | Abteilung Netzwerk Charité - Universitätsmedizin Berlin Campus Benjamin Franklin Hindenburgdamm 30 | D-12203 Berlin Tel. +49 30 450 570 155 | Fax: +49 30 450 570 962 ralf.hildebrandt@charite.de | http://www.charite.de

Ralf Hildebrandt

1:20 p.m.

Daniel L. Miller <dmiller@amfes.com>:

...

You mean these past two months of grief - concern over Dovecot 2.0's underlying design, exploring various configurations and settings, and general hysteria came down to...a two line patch?!?!?!

Yes. It's like that. With Windows, nobody would have ever fixed that.

ckubu

12:54 a.m.

hi cor,

...

For those interested, graph showing the difference before and after patch:

context switches: http://grab.by/7W9u load: http://grab.by/7W9x

which patch, timo's patch on dovecot or the also posted lkml patch

christoph

Timo Sirainen

5:44 a.m.

On 16.12.2010, at 18.33, Mark Moseley wrote:

...

...
http://lkml.org/lkml/2010/12/15/470

Timo, if we apply the above kernel patch, do we still need to patch dovecot or is it just an either-or thing?

Either-or.

...

I'm guessing the reason I saw so many less context switches when using a client_limit of more than one due to having more connections per process == less children to wake up.

Yes.

Timo Sirainen

5:50 a.m.

Oops, now I finally understand why Mail.app kept asking for my password for each mail I sent: it helpfully decided to start signing mails with the only client cert I had, without asking me.. Forget about those signatures in the last two mails :)

Cor Bosman

10:44 a.m.

...

Oops, now I finally understand why Mail.app kept asking for my password for each mail I sent: it helpfully decided to start signing mails with the only client cert I had, without asking me.. Forget about those signatures in the last two mails :)

Heh, is that the key you used to get into our test server for this problem? :)

Cor

Charles Marcus

16 Dec 16 Dec

2:44 p.m.

On 2010-12-16 6:53 AM, Cor Bosman wrote:

...

It looks like Timo's patch fixes the problem. Context switches are now back to normal, and the load graph is smooth and lower. I saw someone also posted a patch to the LKML.

Out of curiosity, when you say lower - how does the load actually compare now to the load you had with 1.2 (same? little higher but now steady? lower?)?

Best regards,

Charles

Ralf Hildebrandt

2:48 p.m.

Charles Marcus <CMarcus@Media-Brokers.com>:

...

On 2010-12-16 6:53 AM, Cor Bosman wrote:

...
It looks like Timo's patch fixes the problem. Context switches are now back to normal, and the load graph is smooth and lower. I saw someone also posted a patch to the LKML.

Out of curiosity, when you say lower - how does the load actually compare now to the load you had with 1.2 (same? little higher but now steady? lower?)?

Identical

Stan Hoeppner

10 Dec 10 Dec

11:12 a.m.

Timo Sirainen put forth on 12/9/2010 7:10 PM:

...

On 10.12.2010, at 0.54, Stan Hoeppner wrote:

...
However, this still doesn't seem to explain Ralf's issue, where the kernel stays the same, but the Dovecot version changes, with 2.0.x causing the high load and 1.2.x being normal. Maybe 2.0.x simply causes this bug to manifest itself more loudly?

Cor's debugging has so far shown that a single epoll_wait() call can sometimes generate a few thousand voluntary context switches. I can't really understand how that's possible. Those epoll_wait() calls about half of the total voluntary context switches generated by imap processes. We'll see tomorrow if poll() works better or if a small patch I made makes it better.

...
This Linux kernel bug doesn't explain the high load reported with 2.0.x on FreeBSD either.

Who has reported high load on FreeBSD? So far I know of only Ralf and Cor and both are using Linux.

Oh, my bad Timo. I thought someone said they were running Dovecot 2.0.x atop FreeBSD on bare metal and having the high load issue. I must have got two different posts crossed up in the gray matter.

-- Stan

Ralf Hildebrandt

9:38 a.m.

Stan Hoeppner <stan@hardwarefreak.com>:

...

Mark Moseley put forth on 12/9/2010 12:18 PM:

...
If you at some point upgrade to >2.6.35, I'd be interested to hear if the load skyrockets on you. I also get the impression that the load average calculation in these recent kernels is 'touchier' than in pre-2.6.35.

This thread may be of value in relation to this issue:

http://groups.google.com/group/linux.kernel/browse_thread/thread/eb5cb488b74...

It seems there are some load issues regarding recent Linux kernels, from 2.6.34 (maybe earlier?) on up. The commit of the patch to fix it was Dec 8th--yesterday. So it'll be a while before distros get this patch out.

I'm using 2.6.32

...

However, this still doesn't seem to explain Ralf's issue, where the kernel stays the same, but the Dovecot version changes, with 2.0.x causing the high load and 1.2.x being normal. Maybe 2.0.x simply causes this bug to manifest itself more loudly?

That could be!

Mark Moseley

10:25 p.m.

On Thu, Dec 9, 2010 at 4:54 PM, Stan Hoeppner <stan@hardwarefreak.com> wrote:

...

Mark Moseley put forth on 12/9/2010 12:18 PM:

...
If you at some point upgrade to >2.6.35, I'd be interested to hear if the load skyrockets on you. I also get the impression that the load average calculation in these recent kernels is 'touchier' than in pre-2.6.35.

This thread may be of value in relation to this issue:

http://groups.google.com/group/linux.kernel/browse_thread/thread/eb5cb488b74...

It seems there are some load issues regarding recent Linux kernels, from 2.6.34 (maybe earlier?) on up. The commit of the patch to fix it was Dec 8th--yesterday. So it'll be a while before distros get this patch out.

Glad you brought up that thread. I've seen it before but not any time lately (and certainly not since 2 days ago!). Hopefully that'll be in 2.6.37, instead of not till 2.6.38 (or that I can patch it into 2.6.36.2). I roll my own kernels, so no need to wait on distros.

...

However, this still doesn't seem to explain Ralf's issue, where the kernel stays the same, but the Dovecot version changes, with 2.0.x causing the high load and 1.2.x being normal. Maybe 2.0.x simply causes this bug to manifest itself more loudly?

This Linux kernel bug doesn't explain the high load reported with 2.0.x on FreeBSD either. But it is obviously somewhat at play for people running these kernels versions and 2.0.x. To what degree it adds to the load I cannot say.

Yeah, my comment on the kernel thing was just in reply to one of Cor's 3 debugging tracks, 1 of which was to try upgrading the kernel. I figured I should mention the load issue if he might be upgrading to the latest/greatest, since it could make things worse (or make things *look* worse).

Daniel L. Miller

11 Dec 11 Dec

2:31 a.m.

This is probably a really dumb question, but here goes...

Is it possible to run the imap-login process from 1.2 against a 2.0 system?

Daniel

Timo Sirainen

13 Dec 13 Dec

3:58 p.m.

On Fri, 2010-12-10 at 16:31 -0800, Daniel L. Miller wrote:

...

Is it possible to run the imap-login process from 1.2 against a 2.0 system?

No, they're too different.

Stan Hoeppner

11 Dec 11 Dec

11:06 a.m.

Mark Moseley put forth on 12/10/2010 2:25 PM:

...

Yeah, my comment on the kernel thing was just in reply to one of Cor's 3 debugging tracks, 1 of which was to try upgrading the kernel. I figured I should mention the load issue if he might be upgrading to the latest/greatest, since it could make things worse (or make things *look* worse).

Maybe I need to re-read that thread again (twice). At this point I'm still not sure if the load is actually higher, or if the bug is that's it's simply being *reported* as higher when it really isn't.

-- Stan

Ralf Hildebrandt

12:44 p.m.

Stan Hoeppner <stan@hardwarefreak.com>:

...

Mark Moseley put forth on 12/10/2010 2:25 PM:

...
Yeah, my comment on the kernel thing was just in reply to one of Cor's 3 debugging tracks, 1 of which was to try upgrading the kernel. I figured I should mention the load issue if he might be upgrading to the latest/greatest, since it could make things worse (or make things *look* worse).

Maybe I need to re-read that thread again (twice). At this point I'm still not sure if the load is actually higher, or if the bug is that's it's simply being *reported* as higher when it really isn't.

The load is definitely higher. Yesterday I kept 2.0 running until noon. Half an hour before noon users were reporting their logins were timing out. At the same time the load skyrocketed (system was at almost 100 percent)

Cor Bosman

8 Dec 8 Dec

4:55 p.m.

Here's the doveconf -n output: http://wa.ter.net/download/doveconf-n.txt

Cor

Mario Antonio

5:04 p.m.

On 12/8/2010 9:55 AM, Cor Bosman wrote:

...

Here's the doveconf -n output: http://wa.ter.net/download/doveconf-n.txt

Cor

It could be that you both are running a different Kernel from the Standard Lenny Kernel 2.6.26..... (this could be a clue ..)

M.A.

Cor Bosman

5:45 p.m.

...

...
It could be that you both are running a different Kernel from the Standard Lenny Kernel 2.6.26..... (this could be a clue ..)

It would be interesting to hear from people that aren't seeing a big load increase. My initial guess was some kind of NFS problem, but since Ralf isn't doing NFS, that's probably not it.

Our CPUs are actually pretty much idle (these are 8 and 16 core machines). And memory is plentiful.

Cor

Ralf Hildebrandt

5:53 p.m.

Cor Bosman <cor@xs4all.nl>:

...

It would be interesting to hear from people that aren't seeing a big load increase.

Indeed. Tomi was mentioning some big company; what kind of setup were the using, Timo?

...

My initial guess was some kind of NFS problem, but since Ralf isn't doing NFS, that's probably not it.

Yep.

...

Our CPUs are actually pretty much idle (these are 8 and 16 core machines). And memory is plentiful.

Stan Hoeppner

9 Dec 9 Dec

12:35 a.m.

Cor Bosman put forth on 12/8/2010 9:45 AM:

...

...
...
It could be that you both are running a different Kernel from the Standard Lenny Kernel 2.6.26..... (this could be a clue ..)

It would be interesting to hear from people that aren't seeing a big load increase. My initial guess was some kind of NFS problem, but since Ralf isn't doing NFS, that's probably not it.

Our CPUs are actually pretty much idle (these are 8 and 16 core machines). And memory is plentiful.

Cor, Ralf is running his Dovecot servers as VMware ESX guests. Are you running your servers in a VM environment, or on bare metal?

I'm guessing the common thread WRT this load problem is going to be relatively busy Dovecot servers running in virtual machine environments.

-- Stan

Cor Bosman

12:37 a.m.

We're running on bare metal, no VM involved.

Cor

Stan Hoeppner

12:47 a.m.

Cor Bosman put forth on 12/8/2010 4:37 PM:

...

We're running on bare metal, no VM involved.

Ok, that's good to know--should eliminate some potential complexity in troubleshooting this.

-- Stan

Noel Butler

3:02 a.m.

On Wed, 2010-12-08 at 23:37 +0100, Cor Bosman wrote:

...

We're running on bare metal, no VM involved.

Cor

Missed most of this list due to some noise levels and my lack of time to sit through hundreds messages on every list I'm on, so apologies if this was already suggested, due to errors in recent kernel time computations, append nohz=off to your lilo config (or whatever boot loader you use)

Ralf Hildebrandt

8 Dec 8 Dec

5:52 p.m.

Cor Bosman <cor@xs4all.nl>:

...

Here's the doveconf -n output: http://wa.ter.net/download/doveconf-n.txt

Except for your storage being on NFS, it looks fairly identical! masteruser, passdb pam, userdb passwd.

David Ford

5:09 p.m.

Ralf, did you do the profiling yet?

On 12/08/10 09:50, Ralf Hildebrandt wrote:

...

[...] Yes, this looks like my graphs. Same increase. Factor 5.

Ralf Hildebrandt

5:54 p.m.

David Ford <david@blue-labs.org>:

...

Ralf, did you do the profiling yet?

With gprof or what exactly is on your mind?

...

On 12/08/10 09:50, Ralf Hildebrandt wrote:

...
[...] Yes, this looks like my graphs. Same increase. Factor 5.

David Ford

6:03 p.m.

gprof for detail, and even with simple strace timing. i.e. strace -c.
if load is going up significantly, there should be one or more functions significantly fatter than the rest. you can pick either to run it on the whole group, or just attach to certain processes. (master, imap, lda, etc)

when you run 'ps aux|grep dovecot', what seems to collect the most cpu time?

-d

On 12/08/10 10:54, Ralf Hildebrandt wrote:

...

David Ford<david@blue-labs.org>:

...
Ralf, did you do the profiling yet? With gprof or what exactly is on your mind?

Ralf Hildebrandt

6:05 p.m.

David Ford <david@blue-labs.org>:

...

gprof for detail, and even with simple strace timing. i.e. strace -c. if load is going up significantly, there should be one or more functions significantly fatter than the rest. you can pick either to run it on the whole group, or just attach to certain processes. (master, imap, lda, etc)

Timo and I found excessive numbers of context switches, factor 20-30. But it's unclear why the IMAP process would do/cause this.

...

when you run 'ps aux|grep dovecot', what seems to collect the most cpu time?

-d

On 12/08/10 10:54, Ralf Hildebrandt wrote:

...

David Ford<david@blue-labs.org>:

...
Ralf, did you do the profiling yet? With gprof or what exactly is on your mind?

David Ford

6:11 p.m.

what's your task switch HZ compiled at? CONFIG_HZ_1000? you would probably be better at 300 or 250. have you tried tickless? is your kernel compiled for precisely your cpu type and smp/hyper options set correctly? what about CONFIG_PREEMPT? definitely don't use realtime, server is appropriate.

-d

On 12/08/10 11:05, Ralf Hildebrandt wrote:

...

Timo and I found excessive numbers of context switches, factor 20-30. But it's unclear why the IMAP process would do/cause this.

Diederick van Dijk

6:26 p.m.

On 08-12-10 17:11, David Ford wrote:

...

what's your task switch HZ compiled at? CONFIG_HZ_1000? you would probably be better at 300 or 250. have you tried tickless? is your kernel compiled for precisely your cpu type and smp/hyper options set correctly? what about CONFIG_PREEMPT? definitely don't use realtime, server is appropriate.

-d

On 12/08/10 11:05, Ralf Hildebrandt wrote:

...
Timo and I found excessive numbers of context switches, factor 20-30. But it's unclear why the IMAP process would do/cause this. Hi,

If I'm not mistaken uses XS4ALL FreeBSD on there imap servers. Cor ?

Thnx,

Diederick

Cor Bosman

6:28 p.m.

On Dec 8, 2010, at 5:11 PM, David Ford wrote:

...

what's your task switch HZ compiled at? CONFIG_HZ_1000? you would probably be better at 300 or 250. have you tried tickless? is your kernel compiled for precisely your cpu type and smp/hyper options set correctly? what about CONFIG_PREEMPT? definitely don't use realtime, server is appropriate.

You're probably asking Ralf, but I might as well answer as well..

CONFIG_HZ_250=y CONFIG_PREEMPT_NONE=y

Not using realtime, all cpu/smp options look fine.

CPU is usually pretty idle. Hovering around 90-99% idle. But the load graphs clearly show a radical increase, where before things were pretty smooth, now they're erratic and much higher.

Cor

ps: we use linux, not freebsd.

Ralf Hildebrandt

8:37 p.m.

David Ford <david@blue-labs.org>:

...

what's your task switch HZ compiled at? CONFIG_HZ_1000? you would probably be better at 300 or 250. have you tried tickless?

fgrep HZ /boot/config-2.6.32-25-generic-pae

CONFIG_NO_HZ=y CONFIG_HZ_250=y CONFIG_HZ=250

...

is your kernel compiled for precisely your cpu type and smp/hyper options set correctly?

It's an Ubuntu kernel (config-2.6.32-25-generic-pae)

...

what about CONFIG_PREEMPT? definitely don't use realtime, server is appropriate.

fgrep PREEMPT /boot/config-2.6.32-25-generic-pae

CONFIG_TREE_PREEMPT_RCU is not set

CONFIG_PREEMPT_NOTIFIERS=y

CONFIG_PREEMPT_NONE is not set

CONFIG_PREEMPT_VOLUNTARY=y

CONFIG_PREEMPT is not set

It's the same kernel 1.2.x is running on, without problems.

Cor Bosman

7:56 p.m.

...

Timo and I found excessive numbers of context switches, factor 20-30. But it's unclear why the IMAP process would do/cause this.

Im seeing this as well... http://grab.by/7Nni

Cor

Timo Sirainen

10:30 p.m.

On 8.12.2010, at 17.56, Cor Bosman wrote:

...

...
Timo and I found excessive numbers of context switches, factor 20-30. But it's unclear why the IMAP process would do/cause this.

Im seeing this as well... http://grab.by/7Nni

So same problem as with Ralf. What kernel version are you using? Maybe try a newer one to see if that happens to fix it?

Timo Sirainen

10:37 p.m.

On 8.12.2010, at 20.30, Timo Sirainen wrote:

...

So same problem as with Ralf. What kernel version are you using? Maybe try a newer one to see if that happens to fix it?

I see you're using userdb passwd. Do your users have unique UIDs? If they have, maybe it has to do with that..

Cor Bosman

11:41 p.m.

...

I see you're using userdb passwd. Do your users have unique UIDs? If they have, maybe it has to do with that..

Yes, we have about 1 million unique UIDs in the passwd file (actually NIS). I did upgrade 4 machines to the latest kernel, but it's hard to tell if that changed much as our usage peak is during the day, So I'll know more tomorrow. But so far it looks as though the context switches are still high.

Cor

Timo Sirainen

9 Dec 9 Dec

12:11 a.m.

...

service imap-login { service_count = 1 }

So you have tons of imap-login processes. Did you have that in v1.2 too? Try setting service_count=0 (and same for pop3-login)

http://wiki2.dovecot.org/LoginProcess

Cor Bosman

12:14 a.m.

...

So you have tons of imap-login processes. Did you have that in v1.2 too? Try setting service_count=0 (and same for pop3-login)

http://wiki2.dovecot.org/LoginProcess

Yes, we have tons of imap-login processes. I'll set service_count=0 for imap (we dont do pop with dovecot) on a few servers.

Cor

Cor Bosman

12:19 a.m.

Oh, and I dont know if we did in 1.2. I think so, but cant be positive. I tried making the config the same. I have the config still around if you want to see it.

Cor

Timo Sirainen

12:20 a.m.

On 8.12.2010, at 22.19, Cor Bosman wrote:

...

Oh, and I dont know if we did in 1.2. I think so, but cant be positive. I tried making the config the same. I have the config still around if you want to see it.

In v1.2 it was called login_process_per_connection

Cor Bosman

12:24 a.m.

login_process_per_connection = yes

So seems like we did have that set in 1.2.

Cor

Ralf Hildebrandt

9:56 a.m.

Cor Bosman <cor@xs4all.nl>:

...

login_process_per_connection = yes

So seems like we did have that set in 1.2.

We didn't use that (in both 1.2 and 2.0)

Ralf Hildebrandt

9:55 a.m.

Cor Bosman <cor@xs4all.nl>:

...

...
I see you're using userdb passwd. Do your users have unique UIDs? If they have, maybe it has to do with that..

Yes, we have about 1 million unique UIDs in the passwd file (actually NIS).

And we have about 15.000 unique UIDs in the passwd file (no NIS!)

...

I did upgrade 4 machines to the latest kernel, but it's hard to tell if that changed much as our usage peak is during the day, So I'll know more tomorrow. But so far it looks as though the context switches are still high.

Let me know!

Jose Celestino

8 Dec 8 Dec

6:50 p.m.

On Qua, 2010-12-08 at 15:39 +0100, Cor Bosman wrote:

...

I upgraded most of our servers from 1.2.x to 2.0.8 and am noticing a really big increase in server load. This is across the board, not any specific server. Check this screenshot of a load graph: http://grab.by/7N8V

Is there anything i should be looking at that could cause such a drastic load increase? Conf file is available here: http://wa.ter.net/download/doveconf.txt

We also had a load increase (about 10x) when going from 1.1.10 to 2.0.7 a couple of weeks ago.

It ended up being due to a slight increase in memory usage by the imap processes that made the servers start using swap and the load to spike.

A memory upgrade brought it all to normal loads.

Wonder if this is your case (vmstat, check for si and so).

-- Jose Celestino | http://japc.uncovering.org/files/japc-pgpkey.asc

"Assumption is the Mother of Screw-Up" -- Mr. John Elwood Hale

Ralf Hildebrandt

8:42 p.m.

Jose Celestino <japc@co.sapo.pt>:

...

It ended up being due to a slight increase in memory usage by the imap processes that made the servers start using swap and the load to spike.

My machine is not swapping; it doesn't even HAVE swap, it still has lots of free memory

Timo Sirainen

9 Dec 9 Dec

3:52 a.m.

trace looks like an interesting new tool: http://lwn.net/Articles/415728/

Wonder if that would tell something about this problem.

Timo Sirainen

4:24 a.m.

On 9.12.2010, at 1.52, Timo Sirainen wrote:

...

trace looks like an interesting new tool: http://lwn.net/Articles/415728/

Wonder if that would tell something about this problem.

Like the answer to question: Is the number of voluntary context switches by imap processes close to the number of called syscalls? And if so, which system calls does v2.0 call more often than v1.2?

Timo Sirainen

6:27 a.m.

On 9.12.2010, at 2.24, Timo Sirainen wrote:

...

Like the answer to question: Is the number of voluntary context switches by imap processes close to the number of called syscalls?

Oh, right, no. Syscalls don't increase context switch counts. So the voluntary context switch basically seems to mean that the process is doing a read(), poll() or something else (what?) that waits for input from elsewhere..

So with Ralf's previous statistics for one hour:

v2.0: 6887 imap logouts (processes) 26672928 voluntary context switches (3872 / process) 1313631 involuntary context switches (190 / process)

v1.2: 6832 imap logouts (processes) 1003200 voluntary context switches (146 / process) 155566 involuntary context switches (22 / process)

The v1.2 values look pretty good. v2.0's involuntary context switches isn't too bad either. So where do all the 3700 new voluntary context switches come from? The new method of initializing user logins can't add more than a few more of those. Running imaptest locally I get similar volcs values for v1.2 and v2.0. Wonder if you get different values?

If you don't mind some huge logs, you could also try the attached patch that logs the voluntary context switch count for every executed IMAP command. Maybe some command shows up that generates them much more than others.

Cor Bosman

8:50 a.m.

...

lcs values for v1.2 and v2.0. Wonder if you get different values?

If you don't mind some huge logs, you could also try the attached patch that logs the voluntary context switch count for every executed IMAP command. Maybe some command shows up that generates them much more than others. <diff>

I have applied this patch to 1 server, and it's logging away. Let me know what kind of output you'd like to see. Or i can send you the raw logfile. As I said previously, im not longer running the imap server=0 patch because it caused these errors:

Dec 9 06:54:32 userimap30 dovecot: imap(cor): Error: user cor: Initialization failed: Namespace '': mkdir(/var/spool/mail/.4a/L/0/14/corbosch) failed: Permission denied (euid=1000(cor) egid=50(staff) missing +w perm: /var/spool/mail/.4a/L/0/14, euid is not dir owner).

After I removed the patch these errors disappeared.

Cor

Cor Bosman

8:58 a.m.

Just for thoroughness ive started 2 servers with the logging patch. One with service_count=0 and one with service_count=1

Timo Sirainen

10:12 a.m.

On 9.12.2010, at 6.50, Cor Bosman wrote:

...

...
lcs values for v1.2 and v2.0. Wonder if you get different values?

If you don't mind some huge logs, you could also try the attached patch that logs the voluntary context switch count for every executed IMAP command. Maybe some command shows up that generates them much more than others. <diff>

I have applied this patch to 1 server, and it's logging away. Let me know what kind of output you'd like to see. Or i can send you the raw logfile.

Are the process pids also logged in the messages, so it's clear which messages belong to which imap process? If not, add %p to mail_log_prefix.

...

As I said previously, im not longer running the imap server=0 patch because it caused these errors:

Dec 9 06:54:32 userimap30 dovecot: imap(cor): Error: user cor: Initialization failed: Namespace '': mkdir(/var/spool/mail/.4a/L/0/14/corbosch) failed: Permission denied (euid=1000(cor) egid=50(staff) missing +w perm: /var/spool/mail/.4a/L/0/14, euid is not dir owner).

After I removed the patch these errors disappeared.

The patch was supposed to fix this.. Is uid 1000 wrong for this user? Didn't you already exist, why is it complaining about mkdir()?

...

Just for thoroughness ive started 2 servers with the logging patch. One with service_count=0 and one with service_count=1

What, are you saying that without the patch service imap { service_count=0 } works? Or are you talking about imap-login?

Cor Bosman

10:24 a.m.

...

Are the process pids also logged in the messages, so it's clear which messages belong to which imap process? If not, add %p to mail_log_prefix.

Done. It wasnt logging this, now it is.

...

...
As I said previously, im not longer running the imap server=0 patch because it caused these errors:

Dec 9 06:54:32 userimap30 dovecot: imap(cor): Error: user cor: Initialization failed: Namespace '': mkdir(/var/spool/mail/.4a/L/0/14/corbosch) failed: Permission denied (euid=1000(cor) egid=50(staff) missing +w perm: /var/spool/mail/.4a/L/0/14, euid is not dir owner).

After I removed the patch these errors disappeared.

The patch was supposed to fix this.. Is uid 1000 wrong for this user? Didn't you already exist, why is it complaining about mkdir()?

It's very strange. Im user 'cor'. Maybe it's logging my name because I started the process (su-d to root obviously)? My uid is 1000. But im not the one doing this login, these are customer logins. The mkdir fails because the target dir is only writable by root. Normally dovecot creates a directory as root, and chowns it to the owner it belongs to. At least, thats what I assume happens as normally this all works fine.

It's like dovecot was unable to change the process to euid root but instead is doing everything as euid cor.

...

...
Just for thoroughness ive started 2 servers with the logging patch. One with service_count=0 and one with service_count=1

What, are you saying that without the patch service imap { service_count=0 } works? Or are you talking about imap-login?

Im talking about imap-login. Currently im not running any servers with imap { service_count=0 } due to the permission errors. Im more than happy to give it another go if you feel i did something wrong.

Cor

Timo Sirainen

10:56 a.m.

On 9.12.2010, at 8.24, Cor Bosman wrote:

...

...
Are the process pids also logged in the messages, so it's clear which messages belong to which imap process? If not, add %p to mail_log_prefix.

Done. It wasnt logging this, now it is.

Great. If the logs aren't too huge I could look at the raw ones, or you could try to write a script yourself to parse them. I'm basically interested in things like:

How large are the first volcs entries for processes? (= Is the initial process setup cost high?)
Find the last volcs entry for each process. Sum them up for some time period. Is the total about the same as the system's entire volcs during that time?
What kind of a volcs distribution do the processes have? A graph might be nice :)
Most importantly: Are there some commands that trigger a high increase in volcs numbers?

...

...
...
As I said previously, im not longer running the imap server=0 patch because it caused these errors:

Dec 9 06:54:32 userimap30 dovecot: imap(cor): Error: user cor: Initialization failed: Namespace '': mkdir(/var/spool/mail/.4a/L/0/14/corbosch) failed: Permission denied (euid=1000(cor) egid=50(staff) missing +w perm: /var/spool/mail/.4a/L/0/14, euid is not dir owner).

After I removed the patch these errors disappeared.

The patch was supposed to fix this.. Is uid 1000 wrong for this user? Didn't you already exist, why is it complaining about mkdir()?

It's very strange. Im user 'cor'. Maybe it's logging my name because I started the process (su-d to root obviously)? My uid is 1000. But im not the one doing this login, these are customer logins.

Well, that clearly says it's trying to initialize user "cor".. Is the "corbosch" directory you or someone else?

...

The mkdir fails because the target dir is only writable by root.

I'd have guessed that the above corbosch already existed.

...

Normally dovecot creates a directory as root, and chowns it to the owner it belongs to. At least, thats what I assume happens as normally this all works fine.

Uh. No.. Privileges are always dropped before Dovecot creates any directories. I don't know what creates the directories for you. I'd guess the authentication code?

...

...
...
Just for thoroughness ive started 2 servers with the logging patch. One with service_count=0 and one with service_count=1

What, are you saying that without the patch service imap { service_count=0 } works? Or are you talking about imap-login?

Im talking about imap-login. Currently im not running any servers with imap { service_count=0 } due to the permission errors. Im more than happy to give it another go if you feel i did something wrong.

OK. I hope the volcs logging gets us somewhere.

Cor Bosman

11:13 a.m.

...

Great. If the logs aren't too huge I could look at the raw ones, or you could try to write a script yourself to parse them. I'm basically interested in things like:

How large are the first volcs entries for processes? (= Is the initial process setup cost high?)

Find the last volcs entry for each process. Sum them up for some time period. Is the total about the same as the system's entire volcs during that time?

What kind of a volcs distribution do the processes have? A graph might be nice :)

Most importantly: Are there some commands that trigger a high increase in volcs numbers?

If you want to have a quick look already, im mailing you the locations of 2 files, 1 with service_count=0 and one with service_count=1.

...

Well, that clearly says it's trying to initialize user "cor".. Is the "corbosch" directory you or someone else?

corbosch is actually a non-existing user, and im suspecting a local bug. I mailed you about that in private.

...

I'd have guessed that the above corbosch already existed.

Yeah, it probably should be targetting 'cor' but for some reason chars get appended to the dir location. Not sure if its our local modification or something in dovecot.

...

...
Normally dovecot creates a directory as root, and chowns it to the owner it belongs to. At least, thats what I assume happens as normally this all works fine.

Uh. No.. Privileges are always dropped before Dovecot creates any directories. I don't know what creates the directories for you. I'd guess the authentication code?

Something in dovecot does :) If a user without a maildir logs in for the first time, the dir is created.

Cor

Timo Sirainen

11:41 a.m.

On 9.12.2010, at 9.13, Cor Bosman wrote:

...

If you want to have a quick look already, im mailing you the locations of 2 files, 1 with service_count=0 and one with service_count=1.

I see that about half the commands that do hundreds or thousands of volcses are IDLE. Wonder if that is the problem. But there are others .. I see for example that one "LOGOUT" command takes 2000 volcses, which I doubt is true. More likely it's counting something that happened before the LOGOUT, like probably doing a mailbox synchronization. Attached a new version of the patch that also logs volcs for syncs.

Also STATUS and SELECT seem to have high volcses, which do a mailbox sync.. Hmm. So if it is about mailbox sync, where in the sync then is it slow? Added more debugging for this in the attached patch.

...

...
...
Normally dovecot creates a directory as root, and chowns it to the owner it belongs to. At least, thats what I assume happens as normally this all works fine.

Uh. No.. Privileges are always dropped before Dovecot creates any directories. I don't know what creates the directories for you. I'd guess the authentication code?

Something in dovecot does :) If a user without a maildir logs in for the first time, the dir is created.

Probably would be good to find out what exactly, in case that's caused by some security hole :)

Cor Bosman

11:52 a.m.

On Dec 9, 2010, at 10:41 AM, Timo Sirainen wrote:

...

On 9.12.2010, at 9.13, Cor Bosman wrote:

...
If you want to have a quick look already, im mailing you the locations of 2 files, 1 with service_count=0 and one with service_count=1.

I see that about half the commands that do hundreds or thousands of volcses are IDLE. Wonder if that is the problem. But there are others .. I see for example that one "LOGOUT" command takes 2000 volcses, which I doubt is true. More likely it's counting something that happened before the LOGOUT, like probably doing a mailbox synchronization. Attached a new version of the patch that also logs volcs for syncs.

Also STATUS and SELECT seem to have high volcses, which do a mailbox sync.. Hmm. So if it is about mailbox sync, where in the sync then is it slow? Added more debugging for this in the attached patch.

Ok, i'll apply this patch to both running servers doing logging.

...

...
Something in dovecot does :) If a user without a maildir logs in for the first time, the dir is created.

Probably would be good to find out what exactly, in case that's caused by some security hole :)

Sorry, my mistake. It's not dovecot, it's our LDA (not dovecot) thats making the dir (which means we now get an error if a user logs in and they have never had email.. but that never happens since we mail them on signup..interesting little problem :)

Cor

Cor Bosman

noon

Some preliminary findings..

Changing the kernel seems to have a positive effect on the load. I changed from 2.6.27.46 to 2.6.27.54 (sorry, im bound by locally available kernels due to a kernel patch we created to fix some NFS problems in the linux kernel. Patch should be available in the stock linux kernel as of 2.6.36, but we dont have that kernel locally available yet), At similar user levels the load is now back to what it was. It doesnt however influence the context switches. Still very high. Here's a graph of a server where I changed the kernel and nothing else:

load: http://grab.by/7Odt (i rebooted the server with a new kernel at the point where the high load drops to 0)
cs: http://grab.by/7Odw

Here's a server with the old kernel, but imap-login { service_count=0 }. It seems this doesnt have much impact. cs are lower (but higher than 1.2), probably because user levels havent gone up to normal levels yet due to loadbalancer distribution issues after a reboot)

load: http://grab.by/7OdL cs: http://grab.by/7OdN

Ive now also set up a server with new kernel and service_count = 0, just to cover all the bases. But not enough data on that server yet. It takes a while for user levels to creep up once a server gets rebooted as users are sticky to a server once they're linked to it.

Cor

Ralf Hildebrandt

9:57 a.m.

Timo Sirainen <tss@iki.fi>:

...

So with Ralf's previous statistics for one hour:

v2.0: 6887 imap logouts (processes) 26672928 voluntary context switches (3872 / process) 1313631 involuntary context switches (190 / process)

v1.2: 6832 imap logouts (processes) 1003200 voluntary context switches (146 / process) 155566 involuntary context switches (22 / process)

The v1.2 values look pretty good. v2.0's involuntary context switches isn't too bad either. So where do all the 3700 new voluntary context switches come from? The new method of initializing user logins can't add more than a few more of those. Running imaptest locally I get similar volcs values for v1.2 and v2.0. Wonder if you get different values?

How exactly would I run imaptest...?

...

If you don't mind some huge logs, you could also try the attached patch that logs the voluntary context switch count for every executed IMAP command. Maybe some command shows up that generates them much more than others.

I can try that, but today and friday I'm busy, and then it's weekend :(

Timo Sirainen

10:08 a.m.

On 9.12.2010, at 7.57, Ralf Hildebrandt wrote:

...

...
The v1.2 values look pretty good. v2.0's involuntary context switches isn't too bad either. So where do all the 3700 new voluntary context switches come from? The new method of initializing user logins can't add more than a few more of those. Running imaptest locally I get similar volcs values for v1.2 and v2.0. Wonder if you get different values?

How exactly would I run imaptest...?

Create a test account for it (or you're more or less accidentally destroy your INBOX) and then run e.g.:

imaptest user=testuser pass=testpass logout=0 secs=10 clients=1

Then compare the voluntary context switch counts logged by the imap process.

http://imapwiki.org/ImapTest has more info about installing and running. Especially note http://imapwiki.org/ImapTest/Running#Append_mbox

5338

Age (days ago)

5347

Last active (days ago)

List overview

98 comments

15 participants

participants (15)

Charles Marcus
ckubu
Cor Bosman
Daniel L. Miller
David Ford
Diederick van Dijk
Jerry
Jose Celestino
Mario Antonio
Mark Moseley
Noel Butler
Ralf Hildebrandt
Robert Schetterer
Stan Hoeppner
Timo Sirainen

[Dovecot] load increase after upgrade to 2.0.8

Cor Bosman

Cor Bosman

Cor Bosman

Cor Bosman

Stan Hoeppner

Stan Hoeppner

Cor Bosman

Robert Schetterer

Jerry

Cor Bosman

-- Jose Celestino | http://japc.uncovering.org/files/japc-pgpkey.asc

Cor Bosman

-- Jose Celestino | http://japc.uncovering.org/files/japc-pgpkey.asc

Cor Bosman

As always - sincere thanks for everything Timo!

That's the spirit!

Cor Bosman

Charles Marcus

Identical

Stan Hoeppner

Is it possible to run the imap-login process from 1.2 against a 2.0 system?

Stan Hoeppner

Cor Bosman

Mario Antonio

Cor Bosman

Stan Hoeppner

Cor Bosman

Stan Hoeppner

David Ford

David Ford

David Ford

Diederick van Dijk

Cor Bosman

fgrep HZ /boot/config-2.6.32-25-generic-pae

fgrep PREEMPT /boot/config-2.6.32-25-generic-pae

CONFIG_TREE_PREEMPT_RCU is not set

CONFIG_PREEMPT_NONE is not set

CONFIG_PREEMPT is not set

Cor Bosman

Cor Bosman

Cor Bosman

Cor Bosman

Cor Bosman

-- Jose Celestino | http://japc.uncovering.org/files/japc-pgpkey.asc

Cor Bosman

Cor Bosman

Cor Bosman

Cor Bosman

Cor Bosman

Cor Bosman

tags

participants (15)