On 17/07/2023 13:24 EEST David Zambonini dzambonini@names.co.uk wrote:
On 16/07/2023 17:54, Aki Tuomi via dovecot wrote:
Hi!
Yes, director and replicator are removed, and won't be available for pro users either.
For NFS setups (or similar shared setups), we have documented a way to use Lua to run a director-like setup, see
https://doc.dovecot.org/3.0/configuration_manual/howto/director_with_lua/
With respect, I'm not sure why these scripts are considered a suitable replacement, because they're not, and it's obvious no real attempt was made to make them so.
Putting aside things relatively trivial to add, like weighted instead of random mappings, the meat of the issue is:
When a backend is removed (fails, or gracefully taken out), the script remaps connecting users that were mapped to it to a different backend. This sounds obvious enough.
However, when that backend comes back up, users aren't mapped back onto it - the users now have mappings elsewhere. Maybe you got lucky and some users that were mapped onto it just didn't connect the whole time it was down, maybe you got a few new users, but for the majority of your active user base, you now have a large imbalance in your user mappings. The backend has stopped existing for them.
So you have to rebalance your user mappings manually.
Rebalancing requires going through all your user mappings. This quickly becomes prohibitively expensive as number of users increases. However, it also leads us on to the next issue.
Adding a new backend, or returning a failed backend, requires remapping your users, but you can't remap one that's currently connected to a backend without risking a new connection to the wrong backend.
Either you don't balance/add backends in a way that covers any connected user going forward (which isn't really useful) or you have to start kicking your users (on all of your balancers individually, you don't have the control director gave you any more!) and remapping to rebalance the overall configuration.
We know that a user will immediately attempt to reconnect when it's kicked, this is the nature of imap clients. You now have a race condition, the only solution to which is to lock that user out while you update the database. I'm not even sure how you'd cover this.
So it's all gone from automatic to painfully manual with large overheads and race conditions.
At the very least, to retain the very basic level of suitability would require:
- mappings in the database being given a TTL after which they're no longer considered valid. This is trivial, however:
- importantly, mappings exceeding TTL must *still* be considered valid if the user has existing connections on any balancer. Good luck with that!
And all this doesn't even cover the other deficiencies. Some easy enough to add, some (involving addressing all balancers at once) far less so.
Unfortunately I can't speak for our future plans; but I'd personally remain on 2.x director for the proxies for as long as possible, then potentially look at the feasibility of porting the director code to 3.x.
Again, I really don't understand the thinking behind saying the lua scripts are anything like a suitable replacement. Our team were considering using Pro when the decision to drop director was announced, we backed off quickly as a result.
-- David Zambonini
You do note that most of the issues you just described were actually issues with director itself. The director did not do any automatic healing, load balancing etc, which is why we have a Pro offering that provides actual clustering component (called Palomar architecture).
Those Lua scripts are OK replacement for a do it yourself environment.
Aki