Replication problem
Hi,
I'm using mdbox and replication. Due to configuration error synchronization was not able to be done last week. Since then the problem has been corrected but synchronisation for some mailbox always failed in I/O stalled timeout for 600 seconds.
The link between the two servers is quite slow and multiple sync are done in parallel leading to congested link.
I can't replicate with rsync as change in the mdbox has been done on the two servers to get back to a working state.
What do you think could be done to resynchronized the two dovecot server. Another question is what is this timeout ? Timeout of communication i.e no data received during 600 seconds ( to me that looks unlikely ) or 600 seconds for doing the full sync or 600 seconds for one mail sync.
Thanks for any help.
Regards,
Vincent
Le 19/08/2014 10:34, Ve (HOME) a écrit :
Hi,
I'm using mdbox and replication. Due to configuration error synchronization was not able to be done last week. Since then the problem has been corrected but synchronisation for some mailbox always failed in I/O stalled timeout for 600 seconds.
The link between the two servers is quite slow and multiple sync are done in parallel leading to congested link.
I can't replicate with rsync as change in the mdbox has been done on the two servers to get back to a working state.
What do you think could be done to resynchronized the two dovecot server. Another question is what is this timeout ? Timeout of communication i.e no data received during 600 seconds ( to me that looks unlikely ) or 600 seconds for doing the full sync or 600 seconds for one mail sync.
Thanks for any help.
Regards,
Vincent
Hi
After some digging, the problem is this 600 seconds timeout that in my case is unsuffisant to transfer one big mail. So retry and ..; same result.. and again and again
I have verify with strace that data is exchange continuously during the sync between the two host but i can't succed in uploading the file during that time.
Is there a way to configure this timeout ?
Eventually a manual sync with a larger timeout to restore replication before limiting maximum size in postfix ?
Possibly a feature would be to have a shorter timeout but applied to the transmission ( ie. nothing receive during 30 sec = timeout ) or a timeout compuited base on size ( ie. 300 sec for 10 mo for example)
Any help appreciated
Vincent
On 09/10/2014 02:04 AM, Vincent ETIENNE wrote:
After some digging, the problem is this 600 seconds timeout that in my case is unsuffisant to transfer one big mail. So retry and ..; same result.. and again and again
I have verify with strace that data is exchange continuously during the sync between the two host but i can't succed in uploading the file during that time.
Is there a way to configure this timeout ?
Eventually a manual sync with a larger timeout to restore replication before limiting maximum size in postfix ?
Possibly a feature would be to have a shorter timeout but applied to the transmission ( ie. nothing receive during 30 sec = timeout ) or a timeout compuited base on size ( ie. 300 sec for 10 mo for example)
Any help appreciated Currently there is no way to change it at run time. As a quick fix, if you compile your own Dovecot, you could try modifying DSYNC_IBC_STREAM_TIMEOUT_MSECS in src/doveadm/dsync/dsync-ibc-stream.c . I think that is the timeout you are bumping up against.
br, Teemu Huovila
Le 10/09/2014 11:56, Teemu Huovila a écrit :
On 09/10/2014 02:04 AM, Vincent ETIENNE wrote:
After some digging, the problem is this 600 seconds timeout that in my case is unsuffisant to transfer one big mail. So retry and ..; same result.. and again and again
I have verify with strace that data is exchange continuously during the sync between the two host but i can't succed in uploading the file during that time.
Is there a way to configure this timeout ?
Eventually a manual sync with a larger timeout to restore replication before limiting maximum size in postfix ?
Possibly a feature would be to have a shorter timeout but applied to the transmission ( ie. nothing receive during 30 sec = timeout ) or a timeout compuited base on size ( ie. 300 sec for 10 mo for example)
Any help appreciated Currently there is no way to change it at run time. As a quick fix, if you compile your own Dovecot, you could try modifying DSYNC_IBC_STREAM_TIMEOUT_MSECS in src/doveadm/dsync/dsync-ibc-stream.c . I think that is the timeout you are bumping up against.
br, Teemu Huovila
Thanks will try and keep you inform of the result. May take some time ( i am not compiling dovecot now ) Really thanks because for now my replication is broken and so mail are not receive for some user depending on the instance of dovecot they connect....
Vincent ETIENNE
On 09/10/2014 01:49 PM, Vincent ETIENNE wrote:
Le 10/09/2014 11:56, Teemu Huovila a écrit :
Currently there is no way to change it at run time. As a quick fix, if you compile your own Dovecot, you could try modifying DSYNC_IBC_STREAM_TIMEOUT_MSECS in src/doveadm/dsync/dsync-ibc-stream.c . I think that is the timeout you are bumping up against.
Thanks will try and keep you inform of the result. May take some time ( i am not compiling dovecot now ) Really thanks because for now my replication is broken and so mail are not receive for some user depending on the instance of dovecot they connect.... Cancel that advice. Timo did a change that should make changing the timeout by hand unnecessary. If you can, try running Dovecot with this patch http://hg.dovecot.org/dovecot-2.2/rev/647162da8423. There should be no time outs, even for large mails.
Do you get any error messages, when there is a timeout?
br, Teemu Huovila
Le 2014-09-10 13:02, Teemu Huovila a écrit :
On 09/10/2014 01:49 PM, Vincent ETIENNE wrote:
Le 10/09/2014 11:56, Teemu Huovila a écrit : Cancel that advice. Timo did a change that should make changing the timeout by hand unnecessary. If you can, try running Dovecot with this patch http://hg.dovecot.org/dovecot-2.2/rev/647162da8423. There should be no time outs, even for large mails.
Do you get any error messages, when there is a timeout?
br, Teemu Huovila
Have tested with the patch from Timo ( applied to 2.2.13 version ) and have successfully synchronized with a mail double the size of the mail that causes trouble before. So the changes looks correct. But i have not tested that the timeout occured if the link is down or broken.
Thanks a lot for the quick response. Very helpful.
Vincent ETIENNE
participants (3)
-
Teemu Huovila
-
Ve (HOME)
-
Vincent ETIENNE