Dovecot Replication - Architecture Endianness?
Hi all,
I've had an interesting use case come up which - to cut the story short
- one way to solve the problem I am looking at may be to replicate a small number of mailboxes to a third remote server.
I've currently had replication running between my main dovecot machine and another remote VM for some time and working well (so I'm not new to replication and I've got a good working config), but I've a need to add a third to the mix for a select number of mailboxes. The arch on both of those is Gentoo x86_64 and with latest 2.1.16 -hg.
I have attempted this so far by rsync'ing the initial Maildirs and then once the bulk of the data has been transferred rely on dovecot's replication to keep things in sync. I figure that this should in theory mean that the subsequent updates in both directions are incremental and the bulk of the data gets moved while the device is here on my desk using rsync.
I've attempted to do this using a Raspberry Pi as a remote device, but when I set it up the dovecot replication process seems to need to start the replication over from scratch even after the rsync is done. I know this is happening as the disk utilisation on the Pi skyrockets once the replication starts and I end up with thousands of double ups of all the mails ... which defeats the entire point of the process.
If I do an identical configuration but on a third Gentoo x86_64 VM locally it all works as expected. No double ups of mails and the "catchup" between the two devices is practically instant. Same filesystem even. The only difference appears to be the system architecture.
So main my question is this. Is there a known architecture/endian limitation on replication? I guess cross-arch replication is not something many people try but is it supposed to work anyway?
Has anyone else got replication working across different arch's?
Also is there a way to restrict replication users aside from a crude hack around system first and last UIDs?
Thanks, Reuben
On 05/03/2015 01:48 PM, Reuben Farrelly wrote:
Hi all,
I've had an interesting use case come up which - to cut the story short - one way to solve the problem I am looking at may be to replicate a small number of mailboxes to a third remote server.
I've currently had replication running between my main dovecot machine and another remote VM for some time and working well (so I'm not new to replication and I've got a good working config), but I've a need to add a third to the mix for a select number of mailboxes. The arch on both of those is Gentoo x86_64 and with latest 2.1.16 -hg.
I have attempted this so far by rsync'ing the initial Maildirs and then once the bulk of the data has been transferred rely on dovecot's replication to keep things in sync. I figure that this should in theory mean that the subsequent updates in both directions are incremental and the bulk of the data gets moved while the device is here on my desk using rsync.
I've attempted to do this using a Raspberry Pi as a remote device, but when I set it up the dovecot replication process seems to need to start the replication over from scratch even after the rsync is done. I know this is happening as the disk utilisation on the Pi skyrockets once the replication starts and I end up with thousands of double ups of all the mails ... which defeats the entire point of the process.
If I do an identical configuration but on a third Gentoo x86_64 VM locally it all works as expected. No double ups of mails and the "catchup" between the two devices is practically instant. Same filesystem even. The only difference appears to be the system architecture.
So main my question is this. Is there a known architecture/endian limitation on replication? I guess cross-arch replication is not something many people try but is it supposed to work anyway? I think you are bumping against Dovecot index endianess restrictions. I dont think cross-arch dsync can currently work very efficiently. http://wiki2.dovecot.org/Design/Indexes/MainIndex?highlight=%28endian%29
Also is there a way to restrict replication users aside from a crude hack around system first and last UIDs? You can set the userdb to return an empty mail_replica variable for users you want to exclude from replication. http://hg.dovecot.org/dovecot-2.2/rev/c1c67bdc8752
br, Teemu Huovila
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
On 05/03/2015 01:48 PM, Reuben Farrelly wrote:
Hi all,
I've had an interesting use case come up which - to cut the story short - one way to solve the problem I am looking at may be to replicate a small number of mailboxes to a third remote server.
I've currently had replication running between my main dovecot machine and another remote VM for some time and working well (so I'm not new to replication and I've got a good working config), but I've a need to add a third to the mix for a select number of mailboxes. The arch on both of those is Gentoo x86_64 and with latest 2.1.16 -hg.
I have attempted this so far by rsync'ing the initial Maildirs and then once the bulk of the data has been transferred rely on dovecot's replication to keep things in sync. I figure that this should in theory mean that the subsequent updates in both directions are incremental and the bulk of the data gets moved while the device is here on my desk using rsync.
I've attempted to do this using a Raspberry Pi as a remote device, but when I set it up the dovecot replication process seems to need to start the replication over from scratch even after the rsync is done. I know this is happening as the disk utilisation on the Pi skyrockets once the replication starts and I end up with thousands of double ups of all the mails ... which defeats the entire point of the process.
If I do an identical configuration but on a third Gentoo x86_64 VM locally it all works as expected. No double ups of mails and the "catchup" between the two devices is practically instant. Same filesystem even. The only difference appears to be the system architecture.
So main my question is this. Is there a known architecture/endian limitation on replication? I guess cross-arch replication is not something many people try but is it supposed to work anyway? I think you are bumping against Dovecot index endianess restrictions. I dont think cross-arch dsync can currently work very efficiently. http://wiki2.dovecot.org/Design/Indexes/MainIndex?highlight=%28endian%29
Ok. That explains why the rsync won't work. But if I kick off a dovecot to dovecot replication (without doing the rsync first) will this work any better once the system catches up? This assumes (possibly incorrectly - please correct me if I am wrong) that the index files themselves aren't dsync'd byte-by-byte but instead the metadata/content from them is sent instead, and then the indexes are written to remote disk by the remote dovecot in the right arch and format that the remote machine can read and understand. Because if that's the case then I can probably make that work - just taking a hit on the initial sync which could take longer.
Even if this doesn't end up working I figure I'll get to learn a little more about the indexes themselves in the process.
Thanks for any advice, Reuben
On 04 May 2015, at 17:11, Reuben Farrelly reuben-dovecot@reub.net wrote:
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
On 05/03/2015 01:48 PM, Reuben Farrelly wrote:
Hi all,
I've had an interesting use case come up which - to cut the story short - one way to solve the problem I am looking at may be to replicate a small number of mailboxes to a third remote server.
I've currently had replication running between my main dovecot machine and another remote VM for some time and working well (so I'm not new to replication and I've got a good working config), but I've a need to add a third to the mix for a select number of mailboxes. The arch on both of those is Gentoo x86_64 and with latest 2.1.16 -hg.
I have attempted this so far by rsync'ing the initial Maildirs and then once the bulk of the data has been transferred rely on dovecot's replication to keep things in sync. I figure that this should in theory mean that the subsequent updates in both directions are incremental and the bulk of the data gets moved while the device is here on my desk using rsync.
I've attempted to do this using a Raspberry Pi as a remote device, but when I set it up the dovecot replication process seems to need to start the replication over from scratch even after the rsync is done. I know this is happening as the disk utilisation on the Pi skyrockets once the replication starts and I end up with thousands of double ups of all the mails ... which defeats the entire point of the process.
If I do an identical configuration but on a third Gentoo x86_64 VM locally it all works as expected. No double ups of mails and the "catchup" between the two devices is practically instant. Same filesystem even. The only difference appears to be the system architecture.
So main my question is this. Is there a known architecture/endian limitation on replication? I guess cross-arch replication is not something many people try but is it supposed to work anyway? I think you are bumping against Dovecot index endianess restrictions. I dont think cross-arch dsync can currently work very efficiently. http://wiki2.dovecot.org/Design/Indexes/MainIndex?highlight=%28endian%29
Ok. That explains why the rsync won't work. But if I kick off a dovecot to dovecot replication (without doing the rsync first) will this work any better once the system catches up? This assumes (possibly incorrectly - please correct me if I am wrong) that the index files themselves aren't dsync'd byte-by-byte but instead the metadata/content from them is sent instead, and then the indexes are written to remote disk by the remote dovecot in the right arch and format that the remote machine can read and understand. Because if that's the case then I can probably make that work - just taking a hit on the initial sync which could take longer.
Even if this doesn't end up working I figure I'll get to learn a little more about the indexes themselves in the process.
dsyncing between servers (or in general using dsync-server) transfers all data using a portable protocol. So dsync source and destination can then have different endianess and it doesn't matter.
On 7/05/2015 7:47 AM, Timo Sirainen wrote:
On 04 May 2015, at 17:11, Reuben Farrelly reuben-dovecot@reub.net wrote:
On 4/05/2015 11:06 PM, Teemu Huovila wrote: Even if this doesn't end up working I figure I'll get to learn a little more about the indexes themselves in the process.
dsyncing between servers (or in general using dsync-server) transfers all data using a portable protocol. So dsync source and destination can then have different endianess and it doesn't matter.
I've tested this out today and can confirm it all works well - starting from nothing and doing the entire sync using dovecot. The takeaway from this is that for cross-arch replication an initial rsync is -not- the right thing to do in this situation.
Thanks!
Reuben
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
Also is there a way to restrict replication users aside from a crude hack around system first and last UIDs? You can set the userdb to return an empty mail_replica variable for users you want to exclude from replication. http://hg.dovecot.org/dovecot-2.2/rev/c1c67bdc8752
br, Teemu Huovila
One last question. Is it possible to achieve this with system users and PAM or do I need to basically create a new static userdb for system users?
Could this be done via a per-user LDA setting or sieve?
Thanks, Reuben
On 06 May 2015, at 13:52, Reuben Farrelly reuben-dovecot@reub.net wrote:
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
Also is there a way to restrict replication users aside from a crude hack around system first and last UIDs? You can set the userdb to return an empty mail_replica variable for users you want to exclude from replication. http://hg.dovecot.org/dovecot-2.2/rev/c1c67bdc8752
br, Teemu Huovila
One last question. Is it possible to achieve this with system users and PAM or do I need to basically create a new static userdb for system users?
You can create a new userdb passwd-file that adds extra fields. So something like:
userdb { driver = passwd result_success = continue-ok }
userdb { driver = passwd-file args = /etc/dovecot/passwd.extra skip = notfound }
Could this be done via a per-user LDA setting or sieve?
Replication would happen also with IMAP access.
On 7/05/2015 7:49 AM, Timo Sirainen wrote:
On 06 May 2015, at 13:52, Reuben Farrelly reuben-dovecot@reub.net wrote:
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
Also is there a way to restrict replication users aside from a crude hack around system first and last UIDs? You can set the userdb to return an empty mail_replica variable for users you want to exclude from replication. http://hg.dovecot.org/dovecot-2.2/rev/c1c67bdc8752
br, Teemu Huovila
One last question. Is it possible to achieve this with system users and PAM or do I need to basically create a new static userdb for system users?
You can create a new userdb passwd-file that adds extra fields. So something like:
userdb { driver = passwd result_success = continue-ok }
userdb { driver = passwd-file args = /etc/dovecot/passwd.extra skip = notfound }
This doesn't seem to work for me and my config has that exact config. My password.extra file has just one line for the one account I am testing with at the moment:
user1:::::::userdb_mail_replica=tcps:lightning.reub.net:4813,userdb_mail_replica=tcp:pi.x.y:4814
This breaks access for other system users such as my own account which do not have entries:
ay 7 21:19:06 tornado.reub.net dovecot: imap-login: Internal login failure (pid=22573 id=1) (internal failure, 1 successful auths): user=<reuben>, auth-method=PLAIN, remote=2001:44b8:31d4:1311::50, local=2001:44b8:31d4:1310::20, TLS
which then starts soon spitting this out 10s of times per second in the mail log:
May 7 21:19:32 tornado.reub.net dovecot: auth-worker(23738): Error: Auth worker sees different passdbs/userdbs than auth server. Maybe config just changed and this goes away automatically?
This is with -hg latest as of now.
This system uses PAM for local users. Do I need to replicate all of the system users including those who do not need any extra settings, in the passwd.extra file too?
Is my syntax above for two mail_replica servers correct?
Thanks, Reuben
On 05/07/2015 02:32 PM, Reuben Farrelly wrote:
On 7/05/2015 7:49 AM, Timo Sirainen wrote:
On 06 May 2015, at 13:52, Reuben Farrelly reuben-dovecot@reub.net wrote:
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
Also is there a way to restrict replication users aside from a crude hack around system first and last UIDs? You can set the userdb to return an empty mail_replica variable for users you want to exclude from replication. http://hg.dovecot.org/dovecot-2.2/rev/c1c67bdc8752
br, Teemu Huovila
One last question. Is it possible to achieve this with system users and PAM or do I need to basically create a new static userdb for system users?
You can create a new userdb passwd-file that adds extra fields. So something like:
userdb { driver = passwd result_success = continue-ok }
userdb { driver = passwd-file args = /etc/dovecot/passwd.extra skip = notfound }
This doesn't seem to work for me and my config has that exact config. My password.extra file has just one line for the one account I am testing with at the moment:
user1:::::::userdb_mail_replica=tcps:lightning.reub.net:4813,userdb_mail_replica=tcp:pi.x.y:4814
This breaks access for other system users such as my own account which do not have entries:
ay 7 21:19:06 tornado.reub.net dovecot: imap-login: Internal login failure (pid=22573 id=1) (internal failure, 1 successful auths): user=<reuben>, auth-method=PLAIN, remote=2001:44b8:31d4:1311::50, local=2001:44b8:31d4:1310::20, TLS
which then starts soon spitting this out 10s of times per second in the mail log:
May 7 21:19:32 tornado.reub.net dovecot: auth-worker(23738): Error: Auth worker sees different passdbs/userdbs than auth server. Maybe config just changed and this goes away automatically?
This is with -hg latest as of now.
This system uses PAM for local users. Do I need to replicate all of the system users including those who do not need any extra settings, in the passwd.extra file too?
Is my syntax above for two mail_replica servers correct? A bit unsure about the config syntax, so I can not advice on that, but there were some bugs in auth yesterday. Maybe you could retest with f2a8e1793718 or newer. Make sure configs on both sides are in sync.
Thank you for your continued testing, Teemu Huovila
On 8/05/2015 6:10 PM, Teemu Huovila wrote:
On 05/07/2015 02:32 PM, Reuben Farrelly wrote:
On 7/05/2015 7:49 AM, Timo Sirainen wrote:
On 06 May 2015, at 13:52, Reuben Farrelly reuben-dovecot@reub.net wrote:
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
Also is there a way to restrict replication users aside from a crude hack around system first and last UIDs? You can set the userdb to return an empty mail_replica variable for users you want to exclude from replication. http://hg.dovecot.org/dovecot-2.2/rev/c1c67bdc8752
br, Teemu Huovila
One last question. Is it possible to achieve this with system users and PAM or do I need to basically create a new static userdb for system users?
You can create a new userdb passwd-file that adds extra fields. So something like:
userdb { driver = passwd result_success = continue-ok }
userdb { driver = passwd-file args = /etc/dovecot/passwd.extra skip = notfound }
This doesn't seem to work for me and my config has that exact config. My password.extra file has just one line for the one account I am testing with at the moment:
user1:::::::userdb_mail_replica=tcps:lightning.reub.net:4813,userdb_mail_replica=tcp:pi.x.y:4814
This breaks access for other system users such as my own account which do not have entries:
ay 7 21:19:06 tornado.reub.net dovecot: imap-login: Internal login failure (pid=22573 id=1) (internal failure, 1 successful auths): user=<reuben>, auth-method=PLAIN, remote=2001:44b8:31d4:1311::50, local=2001:44b8:31d4:1310::20, TLS
which then starts soon spitting this out 10s of times per second in the mail log:
May 7 21:19:32 tornado.reub.net dovecot: auth-worker(23738): Error: Auth worker sees different passdbs/userdbs than auth server. Maybe config just changed and this goes away automatically?
This is with -hg latest as of now.
This system uses PAM for local users. Do I need to replicate all of the system users including those who do not need any extra settings, in the passwd.extra file too?
Is my syntax above for two mail_replica servers correct?
A bit unsure about the config syntax, so I can not advice on that, but there were some bugs in auth yesterday. Maybe you could retest with f2a8e1793718 or newer. Make sure configs on both sides are in sync.
Thank you for your continued testing, Teemu Huovila
With -hg as of now it's still not any better:
tornado log # dovecot --version 2.2.16 (f2a8e1793718+) tornado log #
===================
# System users (NSS, /etc/passwd, or similiar). In many systems nowadays
this
# uses Name Service Switch, which is configured in /etc/nsswitch.conf.
userdb {
#
# Override fields from passwd #override_fields = home=/home/virtual/%u
result_success = continue-ok }
# Add some extra fields such as replication..
userdb { driver = passwd-file args = /etc/dovecot/passwd.extra skip = notfound }
==============
May 8 22:59:11 tornado.reub.net dovecot: imap: Error: Authenticated user not found from userdb, auth lookup id=586547201 (client-pid=29035 client-id=1) May 8 22:59:11 tornado.reub.net dovecot: imap-login: Internal login failure (pid=29035 id=1) (internal failure, 1 successful auths): user=<reuben>, auth-method=PLAIN, remote=2001:44b8:31d4:1311::50, local=2001:44b8:31d4:1310::20, TLS
It logs an awful lot of those lines in short succession also, at least 15 per second...
Reuben
On 8/05/2015 11:04 PM, Reuben Farrelly wrote:
On 8/05/2015 6:10 PM, Teemu Huovila wrote:
On 05/07/2015 02:32 PM, Reuben Farrelly wrote:
On 7/05/2015 7:49 AM, Timo Sirainen wrote:
On 06 May 2015, at 13:52, Reuben Farrelly reuben-dovecot@reub.net wrote:
On 4/05/2015 11:06 PM, Teemu Huovila wrote:
> Also is there a way to restrict replication users aside > from a crude hack around system first and last UIDs? You can set the userdb to return an empty mail_replica variable for users you want to exclude from replication. http://hg.dovecot.org/dovecot-2.2/rev/c1c67bdc8752
br, Teemu Huovila
One last question. Is it possible to achieve this with system users and PAM or do I need to basically create a new static userdb for system users?
You can create a new userdb passwd-file that adds extra fields. So something like:
userdb { driver = passwd result_success = continue-ok }
userdb { driver = passwd-file args = /etc/dovecot/passwd.extra skip = notfound }
This doesn't seem to work for me and my config has that exact config. My password.extra file has just one line for the one account I am testing with at the moment:
user1:::::::userdb_mail_replica=tcps:lightning.reub.net:4813,userdb_mail_replica=tcp:pi.x.y:4814
This breaks access for other system users such as my own account which do not have entries:
ay 7 21:19:06 tornado.reub.net dovecot: imap-login: Internal login failure (pid=22573 id=1) (internal failure, 1 successful auths): user=<reuben>, auth-method=PLAIN, remote=2001:44b8:31d4:1311::50, local=2001:44b8:31d4:1310::20, TLS
which then starts soon spitting this out 10s of times per second in the mail log:
May 7 21:19:32 tornado.reub.net dovecot: auth-worker(23738): Error: Auth worker sees different passdbs/userdbs than auth server. Maybe config just changed and this goes away automatically?
This is with -hg latest as of now.
This system uses PAM for local users. Do I need to replicate all of the system users including those who do not need any extra settings, in the passwd.extra file too?
Is my syntax above for two mail_replica servers correct?
A bit unsure about the config syntax, so I can not advice on that, but there were some bugs in auth yesterday. Maybe you could retest with f2a8e1793718 or newer. Make sure configs on both sides are in sync.
Thank you for your continued testing, Teemu Huovila
With -hg as of now it's still not any better:
tornado log # dovecot --version 2.2.16 (f2a8e1793718+) tornado log #
===================
# System users (NSS, /etc/passwd, or similiar). In many systems nowadays this # uses Name Service Switch, which is configured in /etc/nsswitch.conf. userdb { #
driver = passwd # [blocking=no] #args = # Override fields from passwd #override_fields = home=/home/virtual/%u
result_success = continue-ok }
# Add some extra fields such as replication..
userdb { driver = passwd-file args = /etc/dovecot/passwd.extra skip = notfound }
==============
May 8 22:59:11 tornado.reub.net dovecot: imap: Error: Authenticated user not found from userdb, auth lookup id=586547201 (client-pid=29035 client-id=1) May 8 22:59:11 tornado.reub.net dovecot: imap-login: Internal login failure (pid=29035 id=1) (internal failure, 1 successful auths): user=<reuben>, auth-method=PLAIN, remote=2001:44b8:31d4:1311::50, local=2001:44b8:31d4:1310::20, TLS
It logs an awful lot of those lines in short succession also, at least 15 per second...
Reuben
Following on from this I've managed to get it to work - but there is one outstanding problem which I suspect may be a bug. Running -hg build as of today.
In case anyone else tries this, I had to separate each userdb_mail_replica entry with a space. This is however, documented in the wiki.
The outstanding issue is that even though I've had 'skip = notfound' in the second userdb as above, if I don't add all of the users to that file (even with no extra variables set) those users who are not added cannot log in. They fail with the error above about an 'internal failure'.
It seems that the second passdb is not actually being skipped at all if the user is not listed in it...Timo?
Thanks, Reuben
participants (3)
-
Reuben Farrelly
-
Teemu Huovila
-
Timo Sirainen