On 5.3.2012, at 14.15, Attila Nagy wrote:
On 03/04/12 11:44, Timo Sirainen wrote:
In dovecot-2.1 hg you can now test dsync-based replication. Everything isn't finished yet, but it appears to work and I've enabled it for my @dovecot.fi mails. Some issues:
Do you plan to make it more performant in the future? I mean calling doveadm (and ssh) for every change -even when they are aggregated- seems to be very resource intensive, it won't keep up on a machine with a lot of modifications happening every seconds. Sure the idea is to improve the performance :) There are two ways:
Use longer running SSH sessions which dsync more than one user at a time.
Use TCP connections instead of SSH. Don't forget about connection pooling to get concurrency. :) There's already concurrency. replication_max_conns (default 10) specifies how many dsyncs can be running concurrently. Good to hear.
BTW, despite being somewhat harder to implement, I personally like native connections better. Native = TCP? It's not difficult, probably a few lines of more code since doveadm server can already listening for TCP connections. It doesn't support SSL though. Yes. For large installations there may be some backend channel already (SSL tunnels, IPSec etc), so it seems to be OK.
It would be good to have constantly running daemons on both sides to eliminate the high startup/teardown costs. The process startup/teardown costs are pretty low. I'll need to improve dsync's performance at some point though. Actually I pretty much redesigned the whole dsync already, but I'll probably leave that to v2.2. The current design can still be improved.
It depends. For a moderately loaded server I get this: # time ssh root@be02 "echo 1" I meant doveadm/dsync costs, ssh startup is rather slow. I see. Running from network makes this worse slightly. Long running
On 03/05/12 13:48, Timo Sirainen wrote: processes with long running connections rule. :)
Yes, dsync seems to need some optimizations too. :) I've tried previously on one pair of our servers with a higher level of concurrency (8-16 or so, I can't remember), and it couldn't keep up with the changes. The method was similar to yours:
- an external library wrote modified user ids to a file
- in an endless loop a script picked up those (moved the file) and started parallel dsyncs (on ssh)
The runs were longer and longer... dsync doesn't currently take enough advantage of modseqs and send only the changed data.
Hm. What is your estimate about the performance capability of the current "best" replication scheme available in Dovecot? I know it's hard to tell, because there are a lot of parameters, but do you think it's good for a real world environment with (10-1000*x :) thousands of users, and a lot of changes? BTW, it would even better to have something scalable as Cassandra, so Dovecout wouldn't have to worry about replication and (read/write) scalability.
BTW, we modify the maildirs externally, so this adds a lot of inefficiency here... Definitely doesn't help.
I know, we are working on this. :)