On 30 Oct 2017, at 11.05, Ralf Becker <rb@egroupware.org> wrote:
It happened now twice that replication created folders and mails in the wrong mailbox :(
Here's the architecture we use:
- 2 Dovecot (2.2.32) backends in two different datacenters replicating via a VPN connection
- Dovecot directors in both datacenters talks to both backends with vhost_count of 100 vs 1 for local vs remote backend
- backends use proxy dict via a unix domain socket and socat to talk via tcp to a dict on a different server (kubernetes cluster)
- backends have a local sqlite userdb for iteration (also containing home directories, as just iteration is not possible)
- serving around 7000 mailboxes in a roughly 200 different domains
Everything works as expected, until dict is not reachable eg. due to a server failure or a planed reboot of a node of the kubernetes cluster. In that situation it can happen that some requests are not answered, even with Kubernetes running multiple instances of the dict. I can only speculate what happens then: it seems the connection failure to the remote dict is not correctly handled and leads to situation in which last mailbox/home directory is used for the replication :(
It sounds to me like a userdb lookup changes the username during a dict failure. Although I can't really think of how that could happen. The only thing that comes to my mind is auth_cache, but in that case I'd expect the same problem to happen even when there aren't dict errors.
For testing you could see if it's reproducible with:
- get random username
- do doveadm user <user>
- verify that the result contains the same input user
Then do that in a loop rapidly and restart your test kubernetes once in a while.