I've been playing with weakforced, so it fills in the 'fail2ban across a cluster' niche (not to mention RBLs). It seems to work well, once you've actually read the docs :)
I was curious if anyone had played with it and was *very* curious if anyone was using it in high traffic production. Getting things to 'work' versus getting them to work *and* handle a couple hundred dovecot servers is a very wide margin. I realize this is not a weakforced mailing list (there doesn't appear to be one anyway), but the users here are some of the likeliest candidates for having tried it out.
Mainly I'm curious if weakforced can handle serious concurrency and whether the cluster really works under load.
On 19.07.2017 02:38, Mark Moseley wrote:
I've been playing with weakforced, so it fills in the 'fail2ban across a cluster' niche (not to mention RBLs). It seems to work well, once you've actually read the docs :)
I was curious if anyone had played with it and was *very* curious if anyone was using it in high traffic production. Getting things to 'work' versus getting them to work *and* handle a couple hundred dovecot servers is a very wide margin. I realize this is not a weakforced mailing list (there doesn't appear to be one anyway), but the users here are some of the likeliest candidates for having tried it out.
Mainly I'm curious if weakforced can handle serious concurrency and whether the cluster really works under load.
Hi!
Weakforced is used by some of our customers in quite large installations, and performs quite nicely.
Aki
On Tue, Jul 18, 2017 at 10:40 PM, Aki Tuomi <aki.tuomi@dovecot.fi> wrote:
On 19.07.2017 02:38, Mark Moseley wrote:
I've been playing with weakforced, so it fills in the 'fail2ban across a cluster' niche (not to mention RBLs). It seems to work well, once you've actually read the docs :)
I was curious if anyone had played with it and was *very* curious if anyone was using it in high traffic production. Getting things to 'work' versus getting them to work *and* handle a couple hundred dovecot servers is a very wide margin. I realize this is not a weakforced mailing list (there doesn't appear to be one anyway), but the users here are some of the likeliest candidates for having tried it out.
Mainly I'm curious if weakforced can handle serious concurrency and whether the cluster really works under load.
Hi!
Weakforced is used by some of our customers in quite large installations, and performs quite nicely.
Cool, good to know.
Do you have any hints/tips/guidelines for things like sizing, both in a per-server sense (memory, mostly) and in a cluster-sense (logins per sec :: node ratio)? I'm curious too how large is quite large. Not looking for details but just a ballpark figure. My largest install would have about 4 million mailboxes to handle, which I'm guessing falls well below 'quite large'. Looking at stats, our peak would be around 2000 logins/sec.
I'm also curious if -- assuming they're well north of 2000 logins/sec -- the replication protocol begins to overwhelm the daemon at very high concurrency.
Any rules of thumb on things like "For each additional 1000 logins/sec, add another # to setNumSiblingThreads and another # to setNumWorkerThreads" would be super appreciated too.
Thanks! And again, feel free to point me elsewhere if there's a better place to ask. For a young project, the docs are actually quite good.
On 16 Aug 2017, at 21.34, Mark Moseley <moseleymark@gmail.com> wrote:
Cool, good to know.
Do you have any hints/tips/guidelines for things like sizing, both in a per-server sense (memory, mostly) and in a cluster-sense (logins per sec :: node ratio)? I'm curious too how large is quite large. Not looking for details but just a ballpark figure. My largest install would have about 4 million mailboxes to handle, which I'm guessing falls well below 'quite large'. Looking at stats, our peak would be around 2000 logins/sec.
Single node can manage about 18K req/sec keeping sub 10ms/req latency. This was benchmarked with system that had 4core, 3.1GHz CPU. Each login is usually 3 requests so this translates to 6000 logins/sec. Results are same for two node cluster.
What comes to memory there is only one rule: The whole dataset much fit in memory or the performance dies. In the benchmarks 4 windows x 5 integer fields with 1M keys used 2GB of RAM.
Sami
Below is an answer by the current weakforced main developer. It overlaps partly with Samis answer.
---snip---
Do you have any hints/tips/guidelines for things like sizing, both in a per-server sense (memory, mostly) and in a cluster-sense (logins per sec :: node ratio)? I'm curious too how large is quite large. Not looking for details but just a ballpark figure. My largest install would have about 4 million mailboxes to handle, which I'm guessing falls well below 'quite large'. Looking at stats, our peak would be around 2000 logins/sec.
So in terms of overall requests per second, on a 4 CPU server, latencies start to rise pretty quickly once you get to around 18K requests per second. Now, bearing in mind that each login from Dovecot could generate 2 allow and 1 report requests, this leads to roughly 6K logins per second on a 4 CPU server.
In terms of memory usage, the more the better obviously, but it depends on your policy and how many time windows you have. Most of our customers have 24GB+.
I'm also curious if -- assuming they're well north of 2000 logins/sec -- the replication protocol begins to overwhelm the daemon at very high concurrency.
Eventually it will, but in tests it consumes a pretty tiny fraction of the overall CPU load compared to requests so it must be a pretty high limit. Also, if you don’t update the time windows DB in the allow function, then that doesn’t cause any replication. We’ve tested with three servers, each handling around 5-6000 logins/sec (i.e. 15-18K requests each) and the overall query rate was maintained.
Any rules of thumb on things like "For each additional 1000 logins/sec, add another # to setNumSiblingThreads and another # to setNumWorkerThreads" would be super appreciated too.
Actually the rule of thumb is more like:
- WorkerThreads - Set to number of CPUs. Set number of LuaContexts to WorkerThreads + 2
- SiblingThreads - Leave at 2 unless you see issues.
Thanks! And again, feel free to point me elsewhere if there's a better place to ask. Free free to ask questions using the weakforced issues on GitHub.
For a young project, the docs are actually quite good.
Thanks, that’s appreciated - we try to keep them up to date and comprehensive.
Neil
On Thu, Aug 17, 2017 at 1:16 AM, Teemu Huovila <teemu.huovila@dovecot.fi> wrote:
Below is an answer by the current weakforced main developer. It overlaps partly with Samis answer.
---snip---
Do you have any hints/tips/guidelines for things like sizing, both in a per-server sense (memory, mostly) and in a cluster-sense (logins per sec :: node ratio)? I'm curious too how large is quite large. Not looking for details but just a ballpark figure. My largest install would have about 4 million mailboxes to handle, which I'm guessing falls well below 'quite large'. Looking at stats, our peak would be around 2000 logins/sec.
So in terms of overall requests per second, on a 4 CPU server, latencies start to rise pretty quickly once you get to around 18K requests per second. Now, bearing in mind that each login from Dovecot could generate 2 allow and 1 report requests, this leads to roughly 6K logins per second on a 4 CPU server.
In terms of memory usage, the more the better obviously, but it depends on your policy and how many time windows you have. Most of our customers have 24GB+.
I'm also curious if -- assuming they're well north of 2000 logins/sec -- the replication protocol begins to overwhelm the daemon at very high concurrency.
Eventually it will, but in tests it consumes a pretty tiny fraction of the overall CPU load compared to requests so it must be a pretty high limit. Also, if you don’t update the time windows DB in the allow function, then that doesn’t cause any replication. We’ve tested with three servers, each handling around 5-6000 logins/sec (i.e. 15-18K requests each) and the overall query rate was maintained.
Any rules of thumb on things like "For each additional 1000 logins/sec, add another # to setNumSiblingThreads and another # to setNumWorkerThreads" would be super appreciated too.
Actually the rule of thumb is more like:
- WorkerThreads - Set to number of CPUs. Set number of LuaContexts to WorkerThreads + 2
- SiblingThreads - Leave at 2 unless you see issues.
Thanks! And again, feel free to point me elsewhere if there's a better place to ask. Free free to ask questions using the weakforced issues on GitHub.
For a young project, the docs are actually quite good.
Thanks, that’s appreciated - we try to keep them up to date and comprehensive.
Wow, wow, wow. Thanks so much to all three of you guys for such detailed answers. That's absolutely perfect info and just what I was looking for.
participants (4)
-
Aki Tuomi
-
Mark Moseley
-
Sami Ketola
-
Teemu Huovila