IMAP hibernate and scalability in general

older
Re: Dovecot last_login plugin and...

Christian Balzer

6 Apr 2017 6 Apr '17

6:15 a.m.

Hello,

as some may remember, we're running very dense IMAP cluster here, in excess of 50k IMAP sessions per node (current record holder is 68k, design is for 200k+).

The first issue we ran into was that the dovecot master process (which is single thread and thus a bottleneck) was approaching 100% CPU usage (aka using a full core) when trying to spawn off new IMAP processes.

This was rectified by giving IMAP a service count of 200 to create a pool of "idling" processes eventually, reducing the strain for the master process dramatically. That of course required generous cranking up ulimits, FDs in particular.

The next issues of course is (as mentioned before) the memory usage of all those IMAP processes and the fact that quite a few things outside of dovecote (ps, etc) tend to get quite sedate when dealing with tens of thousands of processes.

We just started to deploy a new mailbox cluster pair with 2.2.27 and having IMAP hibernate configured. Getting this work is a PITA though with regards to ownership and access rights to the various sockets, this part could definitely do with some better (I know, difficult) defaults or at least more documentation (none besides the source and this ML).

Initial results are very promising, depending on what your clients are doing (are they well behaved, are your users constantly looking at other folders, etc) the vast majority of IDLE processes will be in hibernated at any given time and thus not only using a fraction of the RAM otherwise needed but also freeing up process space.

Real life example: 240 users, 86 imap processes (80% of those not IDLE) and: dovecot 119157 0.0 0.0 10452 3236 ? S Apr01 0:21 dovecot/imap-hibernate [237 connections] That's 237 hibernated connections and thus less processes than otherwise.

I assume that given the silence on the ML what we are going to be the first hibernate users where the term "large scale" does apply. Despite that I have some questions, clarifications/confirmations:

Our current default_client_limit is 16k, which can be seen by having 5 config processes on our 65k+ session node. ^_-

This would also apply to imap-hibernate, one wonders if that's fine (config certainly has no issues) or if something smaller would be appropriate here?

Since we have idling IMAP processes around most of the time, the strain of re-spawning proper processes from imap-hibernate should be just as reduced as for the dovecot master, correct?

I'll keep reporting our experiences here, that is if something blows up spectacularly. ^o^

Christian

Christian Balzer Network/Systems Engineer
chibi@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/

Show replies by date

Aki Tuomi

6 Apr 6 Apr

9:17 a.m.

On 06.04.2017 06:15, Christian Balzer wrote:

...

Hello,

as some may remember, we're running very dense IMAP cluster here, in excess of 50k IMAP sessions per node (current record holder is 68k, design is for 200k+).

The first issue we ran into was that the dovecot master process (which is single thread and thus a bottleneck) was approaching 100% CPU usage (aka using a full core) when trying to spawn off new IMAP processes.

This was rectified by giving IMAP a service count of 200 to create a pool of "idling" processes eventually, reducing the strain for the master process dramatically. That of course required generous cranking up ulimits, FDs in particular.

The next issues of course is (as mentioned before) the memory usage of all those IMAP processes and the fact that quite a few things outside of dovecote (ps, etc) tend to get quite sedate when dealing with tens of thousands of processes.

We just started to deploy a new mailbox cluster pair with 2.2.27 and having IMAP hibernate configured. Getting this work is a PITA though with regards to ownership and access rights to the various sockets, this part could definitely do with some better (I know, difficult) defaults or at least more documentation (none besides the source and this ML).

Initial results are very promising, depending on what your clients are doing (are they well behaved, are your users constantly looking at other folders, etc) the vast majority of IDLE processes will be in hibernated at any given time and thus not only using a fraction of the RAM otherwise needed but also freeing up process space.

Real life example: 240 users, 86 imap processes (80% of those not IDLE) and: dovecot 119157 0.0 0.0 10452 3236 ? S Apr01 0:21 dovecot/imap-hibernate [237 connections] That's 237 hibernated connections and thus less processes than otherwise.

I assume that given the silence on the ML what we are going to be the first hibernate users where the term "large scale" does apply. Despite that I have some questions, clarifications/confirmations:

Our current default_client_limit is 16k, which can be seen by having 5 config processes on our 65k+ session node. ^_-

This would also apply to imap-hibernate, one wonders if that's fine (config certainly has no issues) or if something smaller would be appropriate here?

Since we have idling IMAP processes around most of the time, the strain of re-spawning proper processes from imap-hibernate should be just as reduced as for the dovecot master, correct?

I'll keep reporting our experiences here, that is if something blows up spectacularly. ^o^

Christian

Hi!

We have customers using it in larger deployments. A good idea is to have as much of your clients hibernating as possible, as the hibernation process is much smaller than actual IMAP process.

You should probably also look at reusing the processes, as this will probably help your performance, https://wiki.dovecot.org/PerformanceTuning and https://wiki.dovecot.org/LoginProcess are probably a good starting point, although I suspect you've read these already.

If you are running a dense server, cranking up various limits is rather expected.

Aki

Mark Moseley

9:45 a.m.

We've been using hibernate for about half a year with no ill effects. There were various logged errors in earlier versions of dovecot, but even with those, we never heard a reported customer-side error (almost always when transitioning from hibernate back to regular imap; in the case of those errors, presumably the mail client just reconnected silently).

For no particular reason besides wanting to start conservatively, we've got client_limit set to 50 on the hibernate procs (with 1100 total hibernated connections on the box I'm looking at). At only a little over a meg each, I'm fine with those extra processes.

On Wed, Apr 5, 2017 at 11:17 PM, Aki Tuomi <aki.tuomi@dovecot.fi> wrote:

...

On 06.04.2017 06:15, Christian Balzer wrote:

...
Hello,

as some may remember, we're running very dense IMAP cluster here, in excess of 50k IMAP sessions per node (current record holder is 68k, design is for 200k+).

The first issue we ran into was that the dovecot master process (which is single thread and thus a bottleneck) was approaching 100% CPU usage (aka using a full core) when trying to spawn off new IMAP processes.

This was rectified by giving IMAP a service count of 200 to create a pool of "idling" processes eventually, reducing the strain for the master process dramatically. That of course required generous cranking up ulimits, FDs in particular.

The next issues of course is (as mentioned before) the memory usage of all those IMAP processes and the fact that quite a few things outside of dovecote (ps, etc) tend to get quite sedate when dealing with tens of thousands of processes.

We just started to deploy a new mailbox cluster pair with 2.2.27 and having IMAP hibernate configured. Getting this work is a PITA though with regards to ownership and access rights to the various sockets, this part could definitely do with some better (I know, difficult) defaults or at least more documentation (none besides the source and this ML).

Initial results are very promising, depending on what your clients are doing (are they well behaved, are your users constantly looking at other folders, etc) the vast majority of IDLE processes will be in hibernated at any given time and thus not only using a fraction of the RAM otherwise needed but also freeing up process space.

Real life example: 240 users, 86 imap processes (80% of those not IDLE) and: dovecot 119157 0.0 0.0 10452 3236 ? S Apr01 0:21 dovecot/imap-hibernate [237 connections] That's 237 hibernated connections and thus less processes than otherwise.

I assume that given the silence on the ML what we are going to be the first hibernate users where the term "large scale" does apply. Despite that I have some questions, clarifications/confirmations:

Our current default_client_limit is 16k, which can be seen by having 5 config processes on our 65k+ session node. ^_-

This would also apply to imap-hibernate, one wonders if that's fine (config certainly has no issues) or if something smaller would be appropriate here?

Since we have idling IMAP processes around most of the time, the strain of re-spawning proper processes from imap-hibernate should be just as reduced as for the dovecot master, correct?

I'll keep reporting our experiences here, that is if something blows up spectacularly. ^o^

Christian

Hi!

We have customers using it in larger deployments. A good idea is to have as much of your clients hibernating as possible, as the hibernation process is much smaller than actual IMAP process.

You should probably also look at reusing the processes, as this will probably help your performance, https://wiki.dovecot.org/PerformanceTuning and https://wiki.dovecot.org/LoginProcess are probably a good starting point, although I suspect you've read these already.

If you are running a dense server, cranking up various limits is rather expected.

Aki

Christian Balzer

9:56 a.m.

Hello,

On Wed, 5 Apr 2017 23:45:33 -0700 Mark Moseley wrote:

...

We've been using hibernate for about half a year with no ill effects. There were various logged errors in earlier versions of dovecot, but even with those, we never heard a reported customer-side error (almost always when transitioning from hibernate back to regular imap; in the case of those errors, presumably the mail client just reconnected silently).

This is my impression as well (silent reconnect w/o anything really bad happening) with regard to the bug I just reported/saw.

...

For no particular reason besides wanting to start conservatively, we've got client_limit set to 50 on the hibernate procs (with 1100 total hibernated connections on the box I'm looking at). At only a little over a meg each, I'm fine with those extra processes.

Yeah, but 50 would be a tad too conservative for our purposes here. I'll keep an eye on it and see how it goes, first checkpoint would be at 1k hibernated sessions. ^_^

Christian

...

On Wed, Apr 5, 2017 at 11:17 PM, Aki Tuomi <aki.tuomi@dovecot.fi> wrote:

...
On 06.04.2017 06:15, Christian Balzer wrote:

...
Hello,

as some may remember, we're running very dense IMAP cluster here, in excess of 50k IMAP sessions per node (current record holder is 68k,
design
is for 200k+).

The first issue we ran into was that the dovecot master process (which is single thread and thus a bottleneck) was approaching 100% CPU usage (aka using a full core) when trying to spawn off new IMAP processes.

This was rectified by giving IMAP a service count of 200 to create a pool of "idling" processes eventually, reducing the strain for the master process dramatically. That of course required generous cranking up ulimits, FDs in particular.

The next issues of course is (as mentioned before) the memory usage of
all
those IMAP processes and the fact that quite a few things outside of dovecote (ps, etc) tend to get quite sedate when dealing with tens of thousands of processes.

We just started to deploy a new mailbox cluster pair with 2.2.27 and having IMAP hibernate configured. Getting this work is a PITA though with regards to ownership and access rights to the various sockets, this part could definitely do with some better (I know, difficult) defaults or at least more documentation (none besides the source and this ML).

Initial results are very promising, depending on what your clients are doing (are they well behaved, are your users constantly looking at other folders, etc) the vast majority of IDLE processes will be in hibernated at any given time and thus not only using a fraction of the RAM otherwise needed but also freeing up process space.

Real life example: 240 users, 86 imap processes (80% of those not IDLE) and: dovecot 119157 0.0 0.0 10452 3236 ? S Apr01 0:21
dovecot/imap-hibernate [237 connections]
That's 237 hibernated connections and thus less processes than otherwise.

I assume that given the silence on the ML what we are going to be the first hibernate users where the term "large scale" does apply. Despite that I have some questions, clarifications/confirmations:

Our current default_client_limit is 16k, which can be seen by having 5 config processes on our 65k+ session node. ^_-

This would also apply to imap-hibernate, one wonders if that's fine (config certainly has no issues) or if something smaller would be appropriate here?

Since we have idling IMAP processes around most of the time, the strain
of
re-spawning proper processes from imap-hibernate should be just as
reduced
as for the dovecot master, correct?

I'll keep reporting our experiences here, that is if something blows up spectacularly. ^o^

Christian

Hi!

We have customers using it in larger deployments. A good idea is to have as much of your clients hibernating as possible, as the hibernation process is much smaller than actual IMAP process.

You should probably also look at reusing the processes, as this will probably help your performance, https://wiki.dovecot.org/PerformanceTuning and https://wiki.dovecot.org/LoginProcess are probably a good starting point, although I suspect you've read these already.

If you are running a dense server, cranking up various limits is rather expected.

Aki

-- Christian Balzer Network/Systems Engineer
chibi@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/

Timo Sirainen

1:10 p.m.

On 6 Apr 2017, at 9.56, Christian Balzer <chibi@gol.com> wrote:

...

...
For no particular reason besides wanting to start conservatively, we've got client_limit set to 50 on the hibernate procs (with 1100 total hibernated connections on the box I'm looking at). At only a little over a meg each, I'm fine with those extra processes.

Yeah, but 50 would be a tad too conservative for our purposes here. I'll keep an eye on it and see how it goes, first checkpoint would be at 1k hibernated sessions. ^_^

imap-hibernate processes are similar to imap-login processes in that they should be able to handle thousands or even tens of thousands of connections per process.

Christian Balzer

2:12 p.m.

On Thu, 6 Apr 2017 13:10:03 +0300 Timo Sirainen wrote:

...

On 6 Apr 2017, at 9.56, Christian Balzer <chibi@gol.com> wrote:

...
...
For no particular reason besides wanting to start conservatively, we've got client_limit set to 50 on the hibernate procs (with 1100 total hibernated connections on the box I'm looking at). At only a little over a meg each, I'm fine with those extra processes.

Yeah, but 50 would be a tad too conservative for our purposes here. I'll keep an eye on it and see how it goes, first checkpoint would be at 1k hibernated sessions. ^_^

imap-hibernate processes are similar to imap-login processes in that they should be able to handle thousands or even tens of thousands of connections per process.

I assume the config processes are in the same category, they are happy with 16k clients and using 169MB each, without any issues. ^.^

Christian

Christian Balzer Network/Systems Engineer
chibi@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/

Mark Moseley

9:14 p.m.

On Thu, Apr 6, 2017 at 3:10 AM, Timo Sirainen <tss@iki.fi> wrote:

...

On 6 Apr 2017, at 9.56, Christian Balzer <chibi@gol.com> wrote:

...
...
For no particular reason besides wanting to start conservatively, we've

got

...
...
client_limit set to 50 on the hibernate procs (with 1100 total hibernated connections on the box I'm looking at). At only a little over a meg each, I'm fine with those extra processes.

Yeah, but 50 would be a tad too conservative for our purposes here. I'll keep an eye on it and see how it goes, first checkpoint would be at 1k hibernated sessions. ^_^

imap-hibernate processes are similar to imap-login processes in that they should be able to handle thousands or even tens of thousands of connections per process.

TL;DR: In a director/proxy setup, what's a good client_limit for imap-login/pop3-login?

Would the same apply for imap-login when it's being used in proxy mode? I'm moving us to a director setup (cf. my other email about director rings getting wedged from a couple days ago) and, again, for the sake of starting conservatively, I've got imap-login set to a client limit of 20, since I figure that proxying is a lot more work than just doing IMAP logins. I'm doing auth to mysql at both stages (at the proxy level and at the backend level).

On a sample director box, I've got 10000 imap connections, varying from 50mbit/sec to the backends up to 200mbit/sec. About a third of the connections are TLS, if that makes a diff. That's pretty normal from what I've seen. The director servers are usually 90-95% idle.

Should I be able to handle a much higher client_limit for imap-login and pop3-login than 20?

Timo Sirainen

10:13 p.m.

On 6 Apr 2017, at 21.14, Mark Moseley <moseleymark@gmail.com> wrote:

...

...
imap-hibernate processes are similar to imap-login processes in that they should be able to handle thousands or even tens of thousands of connections per process.

TL;DR: In a director/proxy setup, what's a good client_limit for imap-login/pop3-login?

You should have the same number of imap-login processes as the number of CPU cores, so they can use all the available CPU without doing unnecessary context switches. The client_limit is then large enough to handle all the concurrent connections you need, but not so large that it would bring down the whole system if it actually happens.

...

Would the same apply for imap-login when it's being used in proxy mode? I'm moving us to a director setup (cf. my other email about director rings getting wedged from a couple days ago) and, again, for the sake of starting conservatively, I've got imap-login set to a client limit of 20, since I figure that proxying is a lot more work than just doing IMAP logins. I'm doing auth to mysql at both stages (at the proxy level and at the backend level).

Proxying isn't doing any disk IO or any other blocking operations. There's no benefit to having more processes. The only theoretical advantage would be if some client could trigger a lot of CPU work and cause delays to handling other clients, but I don't think that's possible (unless somehow via OpenSSL but I'd guess that would be a bug in it then).

...

Should I be able to handle a much higher client_limit for imap-login and pop3-login than 20?

Yeah.

Christian Balzer

7 Apr 7 Apr

7:22 a.m.

Hello,

On Thu, 6 Apr 2017 22:13:07 +0300 Timo Sirainen wrote:

...

On 6 Apr 2017, at 21.14, Mark Moseley <moseleymark@gmail.com> wrote:

...
...
imap-hibernate processes are similar to imap-login processes in that they should be able to handle thousands or even tens of thousands of connections per process.

TL;DR: In a director/proxy setup, what's a good client_limit for imap-login/pop3-login?

You should have the same number of imap-login processes as the number of CPU cores, so they can use all the available CPU without doing unnecessary context switches. The client_limit is then large enough to handle all the concurrent connections you need, but not so large that it would bring down the whole system if it actually happens.

Also keep in mind that pop3 login processes deal with rather ephemeral events, unlike IMAP with IDLE sessions lasting months. So they're unlike to grow beyond their initial numbers even with a small (few hundreds) client_limit.

On the actual mailbox servers, either login processes tend to use about 1% of one core, very lightweight.

...

...
Would the same apply for imap-login when it's being used in proxy mode? I'm moving us to a director setup (cf. my other email about director rings getting wedged from a couple days ago) and, again, for the sake of starting conservatively, I've got imap-login set to a client limit of 20, since I figure that proxying is a lot more work than just doing IMAP logins. I'm doing auth to mysql at both stages (at the proxy level and at the backend level).

Proxying isn't doing any disk IO or any other blocking operations. There's no benefit to having more processes. The only theoretical advantage would be if some client could trigger a lot of CPU work and cause delays to handling other clients, but I don't think that's possible (unless somehow via OpenSSL but I'd guess that would be a bug in it then).

Indeed, in proxy mode you can go nuts, here I see pop3-logins being busier, but still just 2-5% of a core as opposed to typically 1-2% for imap-logins. That's with 500 pop3 sessions at any given time and 70k IMAP sessions per node. Or in other words, less than 1 core total typically.

...

...
Should I be able to handle a much higher client_limit for imap-login and pop3-login than 20?

Yeah.

The above is with a 4k client_limit, I'm definitely going to crank that up to 16k when the opportunity arise (quite disruptive on a proxy...).

Christian

Christian Balzer Network/Systems Engineer
chibi@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/

Mark Moseley

10 Apr 10 Apr

9:49 p.m.

On Thu, Apr 6, 2017 at 9:22 PM, Christian Balzer <chibi@gol.com> wrote:

...

Hello,

On Thu, 6 Apr 2017 22:13:07 +0300 Timo Sirainen wrote:

...
On 6 Apr 2017, at 21.14, Mark Moseley <moseleymark@gmail.com> wrote:

...
...
imap-hibernate processes are similar to imap-login processes in that

they

...
...
should be able to handle thousands or even tens of thousands of connections per process.

TL;DR: In a director/proxy setup, what's a good client_limit for imap-login/pop3-login?

You should have the same number of imap-login processes as the number of CPU cores, so they can use all the available CPU without doing unnecessary context switches. The client_limit is then large enough to handle all the concurrent connections you need, but not so large that it would bring down the whole system if it actually happens.

Also keep in mind that pop3 login processes deal with rather ephemeral events, unlike IMAP with IDLE sessions lasting months. So they're unlike to grow beyond their initial numbers even with a small (few hundreds) client_limit.

On the actual mailbox servers, either login processes tend to use about 1% of one core, very lightweight.

...
...
Would the same apply for imap-login when it's being used in proxy mode? I'm moving us to a director setup (cf. my other email about director rings getting wedged from a couple days ago) and, again, for the sake of starting conservatively, I've got imap-login set to a client limit of 20, since I figure that proxying is a lot more work than just doing IMAP logins. I'm doing auth to mysql at both stages (at the proxy level and at the backend level).

Proxying isn't doing any disk IO or any other blocking operations. There's no benefit to having more processes. The only theoretical advantage would be if some client could trigger a lot of CPU work and cause delays to handling other clients, but I don't think that's possible (unless somehow via OpenSSL but I'd guess that would be a bug in it then).

Indeed, in proxy mode you can go nuts, here I see pop3-logins being busier, but still just 2-5% of a core as opposed to typically 1-2% for imap-logins. That's with 500 pop3 sessions at any given time and 70k IMAP sessions per node. Or in other words, less than 1 core total typically.

...
...
Should I be able to handle a much higher client_limit for imap-login and pop3-login than 20?

Yeah.

The above is with a 4k client_limit, I'm definitely going to crank that up to 16k when the opportunity arise (quite disruptive on a proxy...).

Timo, any sense on where (if any) the point is where there are so many connections on a given login process that it would get too busy to keep up? I.e. where the sheer amount of stuff the login process has to do outweighs the CPU savings of not having to context switch so much?

I realize that's a terribly subjective question, so perhaps you might have a guess in very very round numbers? Given a typical IMAP userbase (moderately busy, most people sitting in IDLE, etc), I woud've thought 10k connections on a single process would've been past that tipping point.

With the understood caveat of being totally subjective and dependent on local conditions, should 20k be ok? 50k? 100k?

Maybe a better question is, is there anywhere in the login process that is possible to block? If not, I'd figure that a login process that isn't using up 100% of a core can be assumed to *not* be falling behind. Does that seem accurate?

Timo Sirainen

11:11 p.m.

On 10 Apr 2017, at 21.49, Mark Moseley <moseleymark@gmail.com> wrote:

...

Timo, any sense on where (if any) the point is where there are so many connections on a given login process that it would get too busy to keep up? I.e. where the sheer amount of stuff the login process has to do outweighs the CPU savings of not having to context switch so much?

There might be some unexpected bottleneck somewhere, but I haven't heard of anyone hitting one.

...

I realize that's a terribly subjective question, so perhaps you might have a guess in very very round numbers? Given a typical IMAP userbase (moderately busy, most people sitting in IDLE, etc), I woud've thought 10k connections on a single process would've been past that tipping point.

With the understood caveat of being totally subjective and dependent on local conditions, should 20k be ok? 50k? 100k?

I only remember seeing a few thousand connections per process, but the CPU usage there was almost nothing. So I'd expect it to scale well past 10k connections. I think it's mainly limited by Linux, and a quick google shows 500k, but I guess that's per server and not per process. Still, that's likely not all that many CPUs/processes. http://stackoverflow.com/questions/9899532/maximum-socket-connection-with-ep... <http://stackoverflow.com/questions/9899532/maximum-socket-connection-with-epoll>

...

Maybe a better question is, is there anywhere in the login process that is possible to block?

Shouldn't be. Well, logging, but all the login processes are sharing the same log pipe so if one blocks the others would block too.

...

If not, I'd figure that a login process that isn't using up 100% of a core can be assumed to *not* be falling behind. Does that seem accurate?

Should be. In general I haven't heard of installations hitting CPU limits in proxies. The problem so far has always been related to getting enough outgoing sockets without errors, which is a server-wide problem. 2.2.29 has one tweak that hopefully helps with that.

Christian Balzer

12 Apr 12 Apr

3:33 a.m.

On Mon, 10 Apr 2017 23:11:24 +0300 Timo Sirainen wrote:

...

On 10 Apr 2017, at 21.49, Mark Moseley <moseleymark@gmail.com> wrote:

...
Timo, any sense on where (if any) the point is where there are so many connections on a given login process that it would get too busy to keep up? I.e. where the sheer amount of stuff the login process has to do outweighs the CPU savings of not having to context switch so much?

There might be some unexpected bottleneck somewhere, but I haven't heard of anyone hitting one.

I haven't, OTOH context switching isn't _that_ bad and unless you're having a very dedicated box just doing this one thing it might happen anyway. Never mind that NUMA and device IRQ adjacency also factor into this.

So you want to probably start with some reasonable (and that means larger than what you expect to be needed ^o^) value and if it grows beyond that no worries.

...

...
I realize that's a terribly subjective question, so perhaps you might have a guess in very very round numbers? Given a typical IMAP userbase (moderately busy, most people sitting in IDLE, etc), I woud've thought 10k connections on a single process would've been past that tipping point.

With the understood caveat of being totally subjective and dependent on local conditions, should 20k be ok? 50k? 100k?

I only remember seeing a few thousand connections per process, but the CPU usage there was almost nothing. So I'd expect it to scale well past 10k connections. I think it's mainly limited by Linux, and a quick google shows 500k, but I guess that's per server and not per process. Still, that's likely not all that many CPUs/processes. http://stackoverflow.com/questions/9899532/maximum-socket-connection-with-ep... <http://stackoverflow.com/questions/9899532/maximum-socket-connection-with-epoll>

As I wrote and from my substantial experience, 8k connections per process are no issue at all, I'd expect it to go easily up to 50k. But w/o any pressing reason I'd personally keep it below 20k, too many eggs in one basket and all that.

And in the original context of this thread, an imap-hibernate process with 2.5k connections uses about 10MB RAM and 0.5% of a CPU core, so 16k per process as configured here should be a breeze.

...

...
Maybe a better question is, is there anywhere in the login process that is possible to block?

Shouldn't be. Well, logging, but all the login processes are sharing the same log pipe so if one blocks the others would block too.

...
If not, I'd figure that a login process that isn't using up 100% of a core can be assumed to *not* be falling behind. Does that seem accurate?

Should be. In general I haven't heard of installations hitting CPU limits in proxies. The problem so far has always been related to getting enough outgoing sockets without errors, which is a server-wide problem. 2.2.29 has one tweak that hopefully helps with that.

Which would be? The delayed connection bit?

Anyway, with a properly sized login_source_ips pool this shouldn't be an issue, I have 80k sessions (that's 160k connections total) per proxy server now and they are bored.

Christian

Christian Balzer Network/Systems Engineer
chibi@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/

Timo Sirainen

10:10 a.m.

...

On 12 Apr 2017, at 3.33, Christian Balzer <chibi@gol.com> wrote:

...
Should be. In general I haven't heard of installations hitting CPU limits in proxies. The problem so far has always been related to getting enough outgoing sockets without errors, which is a server-wide problem. 2.2.29 has one tweak that hopefully helps with that.

Which would be? The delayed connection bit?

This one - we're getting these errors a few times per minute after proxy had ~30k sessions:

commit 2ba664518940f4cef7f7339719944f80d0a238ca Author: Timo Sirainen <timo.sirainen@dovecot.fi> Date: Tue Apr 4 13:28:44 2017 +0300

lib: Increase net_connect*() EADDRNOTAVAIL retries to 20

4 is too little, since on busy systems it's sometimes failing. These calls
should be pretty cheap, so lets try if 20 is enough.

It would be nice if this was configurable, but the only practical way right
now would be to use environment variable, which is a bit ugly. We could
try it next if 20 is still not enough.

Christian Balzer

24 Apr 24 Apr

4:04 a.m.

Hello,

Just to follow up on this, we've hit over 16k (default client limit here) hibernated sessions:

dovecot 119157 0.1 0.0 63404 56140 ? S Apr01 62:05 dovecot/imap-hibernate [11291 connections] dovecot 877825 0.2 0.0 28512 21224 ? S Apr23 1:34 dovecot/imap-hibernate [5420 connections]

No issues other than the minor bug I reported, CPU usage is slight (at most 2% of a CPU core), memory savings are immense, so I'm a happy camper.

Just out of curiosity, how does dovecot decide to split and spread out sessions between hibernate processes? It's clearly something more involved than "fill up one and then fill up the next" or we would see 16k on the old one and a few on the new one.

Christian

Christian Balzer Network/Systems Engineer
chibi@gol.com Global OnLine Japan/Rakuten Communications http://www.gol.com/

Timo Sirainen

3:41 p.m.

On 24 Apr 2017, at 4.04, Christian Balzer <chibi@gol.com> wrote:

...

Hello,

Just to follow up on this, we've hit over 16k (default client limit here) hibernated sessions:

dovecot 119157 0.1 0.0 63404 56140 ? S Apr01 62:05 dovecot/imap-hibernate [11291 connections] dovecot 877825 0.2 0.0 28512 21224 ? S Apr23 1:34 dovecot/imap-hibernate [5420 connections]

No issues other than the minor bug I reported, CPU usage is slight (at most 2% of a CPU core), memory savings are immense, so I'm a happy camper.

Just out of curiosity, how does dovecot decide to split and spread out sessions between hibernate processes? It's clearly something more involved than "fill up one and then fill up the next" or we would see 16k on the old one and a few on the new one.

New processes aren't created until client_limit is reached in all the existing processes. When there are multiple processes they're all listening for new connections and whichever happens to be fastest gets it. Related to this, I'm thinking about implementing SO_REUSEPORT (https://lwn.net/Articles/542629/ <https://lwn.net/Articles/542629/>) soon that would change the behavior a bit. Although its main purposes would be as a workaround to allow Dovecot restarts to work even though some of the old processes are still keeping the listener port open.

Timo Sirainen

30 Apr 30 Apr

1:51 p.m.

On 24 Apr 2017, at 15.41, Timo Sirainen <tss@iki.fi> wrote:

...

On 24 Apr 2017, at 4.04, Christian Balzer <chibi@gol.com <mailto:chibi@gol.com>> wrote:

...
Hello,

Just to follow up on this, we've hit over 16k (default client limit here) hibernated sessions:

dovecot 119157 0.1 0.0 63404 56140 ? S Apr01 62:05 dovecot/imap-hibernate [11291 connections] dovecot 877825 0.2 0.0 28512 21224 ? S Apr23 1:34 dovecot/imap-hibernate [5420 connections]

No issues other than the minor bug I reported, CPU usage is slight (at most 2% of a CPU core), memory savings are immense, so I'm a happy camper.

Just out of curiosity, how does dovecot decide to split and spread out sessions between hibernate processes? It's clearly something more involved than "fill up one and then fill up the next" or we would see 16k on the old one and a few on the new one.

New processes aren't created until client_limit is reached in all the existing processes. When there are multiple processes they're all listening for new connections and whichever happens to be fastest gets it. Related to this, I'm thinking about implementing SO_REUSEPORT (https://lwn.net/Articles/542629/ <https://lwn.net/Articles/542629/> <https://lwn.net/Articles/542629/ <https://lwn.net/Articles/542629/>>) soon that would change the behavior a bit. Although its main purposes would be as a workaround to allow Dovecot restarts to work even though some of the old processes are still keeping the listener port open.

Huh, SO_REUSEPORT was already implemented in 2013. I completely forgot about that. Would be useful to try if it works better (or at least not worse) and maybe change it to be enabled by default in some version.

service ... { inet_listener ... { reuse_port = yes

Timo Sirainen

3:06 p.m.

On 30 Apr 2017, at 13.51, Timo Sirainen <tss@iki.fi> wrote:

...

...
New processes aren't created until client_limit is reached in all the existing processes. When there are multiple processes they're all listening for new connections and whichever happens to be fastest gets it. Related to this, I'm thinking about implementing SO_REUSEPORT (https://lwn.net/Articles/542629/ <https://lwn.net/Articles/542629/> <https://lwn.net/Articles/542629/ <https://lwn.net/Articles/542629/>> <https://lwn.net/Articles/542629/ <https://lwn.net/Articles/542629/> <https://lwn.net/Articles/542629/ <https://lwn.net/Articles/542629/>>>) soon that would change the behavior a bit. Although its main purposes would be as a workaround to allow Dovecot restarts to work even though some of the old processes are still keeping the listener port open.

Huh, SO_REUSEPORT was already implemented in 2013. I completely forgot about that. Would be useful to try if it works better (or at least not worse) and maybe change it to be enabled by default in some version.

service ... { inet_listener ... { reuse_port = yes

After some testing, looks like it's not working correctly. Needs some further thinking to figure out if it can be even made to work well with Dovecot.

3101

Age (days ago)

3125

Last active (days ago)

List overview

16 comments

4 participants

participants (4)

Aki Tuomi
Christian Balzer
Mark Moseley
Timo Sirainen