<div dir="ltr">Hi Aki,<div><br></div><div>We just installed 2.3.19, and are seeing a couple of users throwing the "INBOX/dovecot.index reset, view is now inconsistent" and their replicator status erroring out. Tried force-resync on the full mailbox, but to no avail just yet. Not sure if this bug was supposedly fixed in 2.3.19?</div><div><br></div><div>Thanks,</div><div><br></div><div>Cassidy</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Apr 28, 2022 at 5:02 AM Aki Tuomi <<a href="mailto:aki.tuomi@open-xchange.com">aki.tuomi@open-xchange.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex">2.3.19 is round the corner, so not long. I cannot yet promise an exact date but hopefully within week or two.<br>
<br>
Aki<br>
<br>
> On 28/04/2022 13:57 Paul Kudla (<a href="http://SCOM.CA" rel="noreferrer" target="_blank">SCOM.CA</a> Internet Services Inc.) <<a href="mailto:paul@scom.ca" target="_blank">paul@scom.ca</a>> wrote:<br>
> <br>
> <br>
> Thanks for the update.<br>
> <br>
> is this for both replication issues (folders +300 etc)<br>
> <br>
> Just Asking - Any ETA<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> Happy Thursday !!!<br>
> Thanks - paul<br>
> <br>
> Paul Kudla<br>
> <br>
> <br>
> Scom.ca Internet Services <<a href="http://www.scom.ca" rel="noreferrer" target="_blank">http://www.scom.ca</a>><br>
> 004-1009 Byron Street South<br>
> Whitby, Ontario - Canada<br>
> L1N 4S3<br>
> <br>
> Toronto 416.642.7266<br>
> Main 1.866.411.7266<br>
> Fax 1.888.892.7266<br>
> <br>
> On 4/27/2022 9:01 AM, Aki Tuomi wrote:<br>
> > <br>
> > Hi!<br>
> > <br>
> > This is probably going to get fixed in 2.3.19, this looks like an issue we are already fixing.<br>
> > <br>
> > Aki<br>
> > <br>
> >> On 26/04/2022 16:38 Paul Kudla (<a href="http://SCOM.CA" rel="noreferrer" target="_blank">SCOM.CA</a> Internet Services Inc.) <<a href="mailto:paul@scom.ca" target="_blank">paul@scom.ca</a>> wrote:<br>
> >><br>
> >> <br>
> >> Agreed there seems to be no way of posting these kinds of issues to see<br>
> >> if they are even being addressed or even known about moving forward on<br>
> >> new updates<br>
> >><br>
> >> i read somewhere there is a new branch soming out but nothing as of yet?<br>
> >><br>
> >> 2.4 maybe ....<br>
> >> 5.0 ........<br>
> >><br>
> >> my previous replication issues (back in feb) went unanswered.<br>
> >><br>
> >> not faulting anyone, but the developers do seem to be disconnected from<br>
> >> issues as of late? or concentrating on other issues.<br>
> >><br>
> >> I have no problem with support contracts for day to day maintence<br>
> >> however as a programmer myself they usually dont work as the other end<br>
> >> relies on the latest source code anyways. Thus can not help.<br>
> >><br>
> >> I am trying to take a part the replicator c programming based on 2.3.18<br>
> >> as most of it does work to some extent.<br>
> >><br>
> >> tcps just does not work (ie 600 seconds default in the c programming)<br>
> >><br>
> >> My thoughts are tcp works ok but fails when the replicator through<br>
> >> dsync-client.c when asked to return the folder list?<br>
> >><br>
> >><br>
> >> replicator-brain.c seems to control the overall process and timing.<br>
> >><br>
> >> replicator-queue.c seems to handle the que file that does seem to carry<br>
> >> acurate info.<br>
> >><br>
> >><br>
> >> things in the source code are documented enough to figure this out but i<br>
> >> am still going through all the related .h files documentation wise which<br>
> >> are all over the place.<br>
> >><br>
> >> there is no clear documentation on the .h lib files so i have to walk<br>
> >> through the tree one at a time finding relative code.<br>
> >><br>
> >> since the dsync from doveadm does see to work ok i have to assume the<br>
> >> dsync-client used to compile the replicator is at fault somehow or a<br>
> >> call from it upstream?<br>
> >><br>
> >> Thanks for your input on the other issues noted below, i will keep that<br>
> >> in mind when disassembling the source code.<br>
> >><br>
> >> No sense in fixing one thing and leaving something else behind, probably<br>
> >> all related anyways.<br>
> >><br>
> >> i have two test servers avaliable so i can play with all this offline to<br>
> >> reproduce the issues<br>
> >><br>
> >> Unfortunately I have to make a living first, this will be addressed when<br>
> >> possible as i dont like systems that are live running this way and<br>
> >> currently only have 5 accounts with this issue (mine included)<br>
> >><br>
> >><br>
> >><br>
> >><br>
> >> Happy Tuesday !!!<br>
> >> Thanks - paul<br>
> >><br>
> >> Paul Kudla<br>
> >><br>
> >><br>
> >> Scom.ca Internet Services <<a href="http://www.scom.ca" rel="noreferrer" target="_blank">http://www.scom.ca</a>><br>
> >> 004-1009 Byron Street South<br>
> >> Whitby, Ontario - Canada<br>
> >> L1N 4S3<br>
> >><br>
> >> Toronto 416.642.7266<br>
> >> Main 1.866.411.7266<br>
> >> Fax 1.888.892.7266<br>
> >><br>
> >> On 4/26/2022 9:03 AM, Reuben Farrelly wrote:<br>
> >>><br>
> >>> I ran into this back in February and documented a reproducible test case<br>
> >>> (and sent it to this list). In short - I was able to reproduce this by<br>
> >>> having a valid and consistent mailbox on the source/local, creating a<br>
> >>> very standard empty Maildir/(new|cur|tmp) folder on the remote replica,<br>
> >>> and then initiating the replicate from the source. This consistently<br>
> >>> caused dsync to fail replication with the error "dovecot.index reset,<br>
> >>> view is now inconsistent" and sync aborted, leaving the replica mailbox<br>
> >>> in a screwed up inconsistent state. Client connections on the source<br>
> >>> replica were also dropped when this error occurred. You can see the<br>
> >>> error by enabling debug level logging if you initiate dsync manually on<br>
> >>> a test mailbox.<br>
> >>><br>
> >>> The only workaround I found was to remove the remote Maildir and let<br>
> >>> Dovecot create the whole thing from scratch. Dovecot did not like any<br>
> >>> existing folders on the destination replica even if they were the same<br>
> >>> names as the source and completely empty. I was able to reproduce this<br>
> >>> the bare minimum of folders - just an INBOX!<br>
> >>><br>
> >>> I have no idea if any of the developers saw my post or if the bug has<br>
> >>> been fixed for the next release. But it seemed to be quite a common<br>
> >>> problem over time (saw a few posts from people going back a long way<br>
> >>> with the same problem) and it is seriously disruptive to clients. The<br>
> >>> error message is not helpful in tracking down the problem either.<br>
> >>><br>
> >>> Secondly, I also have had an ongoing and longstanding problem using<br>
> >>> tcps: for replication. For some reason using tcps: (with no other<br>
> >>> changes at all to the config) results in a lot of timeout messages<br>
> >>> "Error: dsync I/O has stalled, no activity for 600 seconds". This goes<br>
> >>> away if I revert back to tcp: instead of tcps - with tcp: I very rarely<br>
> >>> get timeouts. No idea why, guess this is a bug of some sort also.<br>
> >>><br>
> >>> It's disappointing that there appears to be no way to have these sorts<br>
> >>> or problems addressed like there once was. I am not using Dovecot for<br>
> >>> commercial purposes so paying a fortune for a support contract for a<br>
> >>> high end installation just isn't going to happen, and this list seems to<br>
> >>> be quite ordinary for getting support and reporting bugs nowadays....<br>
> >>><br>
> >>> Reuben<br>
> >>><br>
> >>> On 26/04/2022 7:21 pm, Paul Kudla (<a href="http://SCOM.CA" rel="noreferrer" target="_blank">SCOM.CA</a> Internet Services Inc.) wrote:<br>
> >>><br>
> >>>><br>
> >>>> side issue<br>
> >>>><br>
> >>>> if you are getting inconsistant dsyncs there is no real way to fix<br>
> >>>> this in the long run.<br>
> >>>><br>
> >>>> i know its a pain (already had to my self)<br>
> >>>><br>
> >>>> i needed to do a full sync, take one server offline, delete the user<br>
> >>>> dir (with dovecot offline) and then rsync (or somehow duplicate the<br>
> >>>> main server's user data) over the the remote again.<br>
> >>>><br>
> >>>> then bring remote back up and it kind or worked worked<br>
> >>>><br>
> >>>> best suggestion is to bring the main server down at night so the copy<br>
> >>>> is clean?<br>
> >>>><br>
> >>>> if using postfix you can enable the soft bounce option and the mail<br>
> >>>> will back spool until everything comes back online<br>
> >>>><br>
> >>>> (needs to be enable on bother servers)<br>
> >>>><br>
> >>>> replication was still an issue on accounts with 300+ folders in them,<br>
> >>>> still working on a fix for that.<br>
> >>>><br>
> >>>><br>
> >>>> Happy Tuesday !!!<br>
> >>>> Thanks - paul<br>
> >>>><br>
> >>>> Paul Kudla<br>
> >>>><br>
> >>>><br>
> >>>> Scom.ca Internet Services <<a href="http://www.scom.ca" rel="noreferrer" target="_blank">http://www.scom.ca</a>><br>
> >>>> 004-1009 Byron Street South<br>
> >>>> Whitby, Ontario - Canada<br>
> >>>> L1N 4S3<br>
> >>>><br>
> >>>> Toronto 416.642.7266<br>
> >>>> Main 1.866.411.7266<br>
> >>>> Fax 1.888.892.7266<br>
> >>>><br>
> >>>> On 4/25/2022 10:01 AM, Arnaud Abélard wrote:<br>
> >>>>> Ah, I'm now getting errors in the logs, that would explains the<br>
> >>>>> increasing number of failed sync requests:<br>
> >>>>><br>
> >>>>> dovecot: imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>:<br>
> >>>>> Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset, view is<br>
> >>>>> now inconsistent<br>
> >>>>><br>
> >>>>><br>
> >>>>> And sure enough:<br>
> >>>>><br>
> >>>>> # dovecot replicator status xxxxx<br>
> >>>>><br>
> >>>>> xxxxx none 00:02:54 07:11:28 - y<br>
> >>>>><br>
> >>>>><br>
> >>>>> What could explain that error?<br>
> >>>>><br>
> >>>>> Arnaud<br>
> >>>>><br>
> >>>>><br>
> >>>>><br>
> >>>>> On 25/04/2022 15:13, Arnaud Abélard wrote:<br>
> >>>>>> Hello,<br>
> >>>>>><br>
> >>>>>> On my side we are running Linux (Debian Buster).<br>
> >>>>>><br>
> >>>>>> I'm not sure my problem is actually the same as Paul or you<br>
> >>>>>> Sebastian since I have a lot of boxes but those are actually small<br>
> >>>>>> (quota of 110MB) so I doubt any of them have more than a dozen imap<br>
> >>>>>> folders.<br>
> >>>>>><br>
> >>>>>> The main symptom is that I have tons of full sync requests awaiting<br>
> >>>>>> but even though no other sync is pending the replicator just waits<br>
> >>>>>> for something to trigger those syncs.<br>
> >>>>>><br>
> >>>>>> Today, with users back I can see that normal and incremental syncs<br>
> >>>>>> are being done on the 15 connections, with an occasional full sync<br>
> >>>>>> here or there and lots of "Waiting 'failed' requests":<br>
> >>>>>><br>
> >>>>>> Queued 'sync' requests 0<br>
> >>>>>><br>
> >>>>>> Queued 'high' requests 0<br>
> >>>>>><br>
> >>>>>> Queued 'low' requests 0<br>
> >>>>>><br>
> >>>>>> Queued 'failed' requests 122<br>
> >>>>>><br>
> >>>>>> Queued 'full resync' requests 28785<br>
> >>>>>><br>
> >>>>>> Waiting 'failed' requests 4294<br>
> >>>>>><br>
> >>>>>> Total number of known users 42512<br>
> >>>>>><br>
> >>>>>><br>
> >>>>>><br>
> >>>>>> So, why didn't the replicator take advantage of the weekend to<br>
> >>>>>> replicate the mailboxes while no user were using them?<br>
> >>>>>><br>
> >>>>>> Arnaud<br>
> >>>>>><br>
> >>>>>><br>
> >>>>>><br>
> >>>>>><br>
> >>>>>> On 25/04/2022 13:54, Sebastian Marske wrote:<br>
> >>>>>>> Hi there,<br>
> >>>>>>><br>
> >>>>>>> thanks for your insights and for diving deeper into this Paul!<br>
> >>>>>>><br>
> >>>>>>> For me, the users ending up in 'Waiting for dsync to finish' all have<br>
> >>>>>>> more than 256 Imap folders as well (ranging from 288 up to >5500;<br>
> >>>>>>> as per<br>
> >>>>>>> 'doveadm mailbox list -u <username> | wc -l'). For more details on my<br>
> >>>>>>> setup please see my post from February [1].<br>
> >>>>>>><br>
> >>>>>>> @Arnaud: What OS are you running on?<br>
> >>>>>>><br>
> >>>>>>><br>
> >>>>>>> Best<br>
> >>>>>>> Sebastian<br>
> >>>>>>><br>
> >>>>>>><br>
> >>>>>>> [1] <a href="https://dovecot.org/pipermail/dovecot/2022-February/124168.html" rel="noreferrer" target="_blank">https://dovecot.org/pipermail/dovecot/2022-February/124168.html</a><br>
> >>>>>>><br>
> >>>>>>><br>
> >>>>>>> On 4/24/22 19:36, Paul Kudla (<a href="http://SCOM.CA" rel="noreferrer" target="_blank">SCOM.CA</a> Internet Services Inc.) wrote:<br>
> >>>>>>>><br>
> >>>>>>>> Question having similiar replication issues<br>
> >>>>>>>><br>
> >>>>>>>> pls read everything below and advise the folder counts on the<br>
> >>>>>>>> non-replicated users?<br>
> >>>>>>>><br>
> >>>>>>>> i find the total number of folders / account seems to be a factor<br>
> >>>>>>>> and<br>
> >>>>>>>> NOT the size of the mail box<br>
> >>>>>>>><br>
> >>>>>>>> ie i have customers with 40G of emails no problem over 40 or so<br>
> >>>>>>>> folders<br>
> >>>>>>>> and it works ok<br>
> >>>>>>>><br>
> >>>>>>>> 300+ folders seems to be the issue<br>
> >>>>>>>><br>
> >>>>>>>> i have been going through the replication code<br>
> >>>>>>>><br>
> >>>>>>>> no errors being logged<br>
> >>>>>>>><br>
> >>>>>>>> i am assuming that the replication --> dhclient --> other server is<br>
> >>>>>>>> timing out or not reading the folder lists correctly (ie dies after X<br>
> >>>>>>>> folders read)<br>
> >>>>>>>><br>
> >>>>>>>> thus i am going through the code patching for log entries etc to find<br>
> >>>>>>>> the issues.<br>
> >>>>>>>><br>
> >>>>>>>> see<br>
> >>>>>>>><br>
> >>>>>>>> [13:33:57] <a href="http://mail18.scom.ca" rel="noreferrer" target="_blank">mail18.scom.ca</a> [root:0] /usr/local/var/lib/dovecot<br>
> >>>>>>>> # ll<br>
> >>>>>>>> total 86<br>
> >>>>>>>> drwxr-xr-x 2 root wheel uarch 4B Apr 24 11:11 .<br>
> >>>>>>>> drwxr-xr-x 4 root wheel uarch 4B Mar 8 2021 ..<br>
> >>>>>>>> -rw-r--r-- 1 root wheel uarch 73B Apr 24 11:11 instances<br>
> >>>>>>>> -rw-r--r-- 1 root wheel uarch 160K Apr 24 13:33 replicator.db<br>
> >>>>>>>><br>
> >>>>>>>> [13:33:58] <a href="http://mail18.scom.ca" rel="noreferrer" target="_blank">mail18.scom.ca</a> [root:0] /usr/local/var/lib/dovecot<br>
> >>>>>>>> #<br>
> >>>>>>>><br>
> >>>>>>>> replicator.db seems to get updated ok but never processed properly.<br>
> >>>>>>>><br>
> >>>>>>>> # sync.users<br>
> >>>>>>>> <a href="mailto:nick@elirpa.com" target="_blank">nick@elirpa.com</a> high 00:09:41 463:47:01 - y<br>
> >>>>>>>> <a href="mailto:keith@elirpa.com" target="_blank">keith@elirpa.com</a> high 00:09:23 463:45:43 - y<br>
> >>>>>>>> <a href="mailto:paul@scom.ca" target="_blank">paul@scom.ca</a> high 00:09:41 463:46:51 - y<br>
> >>>>>>>> <a href="mailto:ed@scom.ca" target="_blank">ed@scom.ca</a> high 00:09:43 463:47:01 - y<br>
> >>>>>>>> <a href="mailto:ed.hanna@dssmgmt.com" target="_blank">ed.hanna@dssmgmt.com</a> high 00:09:42 463:46:58 - y<br>
> >>>>>>>> <a href="mailto:paul@paulkudla.net" target="_blank">paul@paulkudla.net</a> high 00:09:44 463:47:03<br>
> >>>>>>>> 580:35:07<br>
> >>>>>>>> y<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> so ....<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> two things :<br>
> >>>>>>>><br>
> >>>>>>>> first to get the production stuff to work i had to write a script<br>
> >>>>>>>> that<br>
> >>>>>>>> whould find the bad sync's and the force a dsync between the servers<br>
> >>>>>>>><br>
> >>>>>>>> i run this every five minutes or each server.<br>
> >>>>>>>><br>
> >>>>>>>> in crontab<br>
> >>>>>>>><br>
> >>>>>>>> */10 * * * * root /usr/bin/nohup<br>
> >>>>>>>> /programs/common/sync.recover > /dev/null<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> python script to sort things out<br>
> >>>>>>>><br>
> >>>>>>>> # cat /programs/common/sync.recover<br>
> >>>>>>>> #!/usr/local/bin/python3<br>
> >>>>>>>><br>
> >>>>>>>> #Force sync between servers that are reporting bad?<br>
> >>>>>>>><br>
> >>>>>>>> import os,sys,django,socket<br>
> >>>>>>>> from optparse import OptionParser<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> from lib import *<br>
> >>>>>>>><br>
> >>>>>>>> #Sample Re-Index MB<br>
> >>>>>>>> #doveadm -D force-resync -u <a href="mailto:paul@scom.ca" target="_blank">paul@scom.ca</a> -f INBOX*<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> USAGE_TEXT = '''\<br>
> >>>>>>>> usage: %%prog %s[options]<br>
> >>>>>>>> '''<br>
> >>>>>>>><br>
> >>>>>>>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4')<br>
> >>>>>>>><br>
> >>>>>>>> parser.add_option("-m", "--send_to", dest="send_to", help="Send<br>
> >>>>>>>> Email To")<br>
> >>>>>>>> parser.add_option("-e", "--email", dest="email_box", help="Box to<br>
> >>>>>>>> Index")<br>
> >>>>>>>> parser.add_option("-d", "--detail",action='store_true',<br>
> >>>>>>>> dest="detail",default =False, help="Detailed report")<br>
> >>>>>>>> parser.add_option("-i", "--index",action='store_true',<br>
> >>>>>>>> dest="index",default =False, help="Index")<br>
> >>>>>>>><br>
> >>>>>>>> options, args = parser.parse_args()<br>
> >>>>>>>><br>
> >>>>>>>> print (options.email_box)<br>
> >>>>>>>> print (options.send_to)<br>
> >>>>>>>> print (options.detail)<br>
> >>>>>>>><br>
> >>>>>>>> #sys.exit()<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> print ('Getting Current User Sync Status')<br>
> >>>>>>>> command = commands("/usr/local/bin/doveadm replicator status '*'")<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> #print command<br>
> >>>>>>>><br>
> >>>>>>>> sync_user_status = command.output.split('\n')<br>
> >>>>>>>><br>
> >>>>>>>> #print sync_user_status<br>
> >>>>>>>><br>
> >>>>>>>> synced = []<br>
> >>>>>>>><br>
> >>>>>>>> for n in range(1,len(sync_user_status)) :<br>
> >>>>>>>> user = sync_user_status[n]<br>
> >>>>>>>> print ('Processing User : %s' %user.split(' ')[0])<br>
> >>>>>>>> if user.split(' ')[0] != options.email_box :<br>
> >>>>>>>> if options.email_box != None :<br>
> >>>>>>>> continue<br>
> >>>>>>>><br>
> >>>>>>>> if options.index == True :<br>
> >>>>>>>> command = '/usr/local/bin/doveadm -D force-resync<br>
> >>>>>>>> -u %s<br>
> >>>>>>>> -f INBOX*' %user.split(' ')[0]<br>
> >>>>>>>> command = commands(command)<br>
> >>>>>>>> command = command.output<br>
> >>>>>>>><br>
> >>>>>>>> #print user<br>
> >>>>>>>> for nn in range (len(user)-1,0,-1) :<br>
> >>>>>>>> #print nn<br>
> >>>>>>>> #print user[nn]<br>
> >>>>>>>><br>
> >>>>>>>> if user[nn] == '-' :<br>
> >>>>>>>> #print 'skipping ... %s' %user.split(' ')[0]<br>
> >>>>>>>><br>
> >>>>>>>> break<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> if user[nn] == 'y': #Found a Bad Mailbox<br>
> >>>>>>>> print ('syncing ... %s' %user.split(' ')[0])<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> if options.detail == True :<br>
> >>>>>>>> command = '/usr/local/bin/doveadm -D<br>
> >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]<br>
> >>>>>>>> print (command)<br>
> >>>>>>>> command = commands(command)<br>
> >>>>>>>> command = command.output.split('\n')<br>
> >>>>>>>> print (command)<br>
> >>>>>>>> print ('Processed Mailbox for ...<br>
> >>>>>>>> %s'<br>
> >>>>>>>> %user.split(' ')[0] )<br>
> >>>>>>>> synced.append('Processed Mailbox<br>
> >>>>>>>> for ...<br>
> >>>>>>>> %s' %user.split(' ')[0])<br>
> >>>>>>>> for nnn in range(len(command)):<br>
> >>>>>>>> synced.append(command[nnn] + '\n')<br>
> >>>>>>>> break<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> if options.detail == False :<br>
> >>>>>>>> #command =<br>
> >>>>>>>> '/usr/local/bin/doveadm -D<br>
> >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]<br>
> >>>>>>>> #print (command)<br>
> >>>>>>>> #command = os.system(command)<br>
> >>>>>>>> command = subprocess.Popen(<br>
> >>>>>>>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split('<br>
> >>>>>>>> ')[0]<br>
> >>>>>>>> ], \<br>
> >>>>>>>> shell = True, stdin=None,<br>
> >>>>>>>> stdout=None,<br>
> >>>>>>>> stderr=None, close_fds=True)<br>
> >>>>>>>><br>
> >>>>>>>> print ( 'Processed Mailbox for<br>
> >>>>>>>> ... %s'<br>
> >>>>>>>> %user.split(' ')[0] )<br>
> >>>>>>>> synced.append('Processed Mailbox<br>
> >>>>>>>> for ...<br>
> >>>>>>>> %s' %user.split(' ')[0])<br>
> >>>>>>>> #sys.exit()<br>
> >>>>>>>> break<br>
> >>>>>>>><br>
> >>>>>>>> if len(synced) != 0 :<br>
> >>>>>>>> #send email showing bad synced boxes ?<br>
> >>>>>>>><br>
> >>>>>>>> if options.send_to != None :<br>
> >>>>>>>> send_from = '<a href="mailto:monitor@scom.ca" target="_blank">monitor@scom.ca</a>'<br>
> >>>>>>>> send_to = ['%s' %options.send_to]<br>
> >>>>>>>> send_subject = 'Dovecot Bad Sync Report for : %s'<br>
> >>>>>>>> %(socket.gethostname())<br>
> >>>>>>>> send_text = '\n\n'<br>
> >>>>>>>> for n in range (len(synced)) :<br>
> >>>>>>>> send_text = send_text + synced[n] + '\n'<br>
> >>>>>>>><br>
> >>>>>>>> send_files = []<br>
> >>>>>>>> sendmail (send_from, send_to, send_subject,<br>
> >>>>>>>> send_text,<br>
> >>>>>>>> send_files)<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> sys.exit()<br>
> >>>>>>>><br>
> >>>>>>>> second :<br>
> >>>>>>>><br>
> >>>>>>>> i posted this a month ago - no response<br>
> >>>>>>>><br>
> >>>>>>>> please appreciate that i am trying to help ....<br>
> >>>>>>>><br>
> >>>>>>>> after much testing i can now reporduce the replication issues at hand<br>
> >>>>>>>><br>
> >>>>>>>> I am running on freebsd 12 & 13 stable (both test and production<br>
> >>>>>>>> servers)<br>
> >>>>>>>><br>
> >>>>>>>> sdram drives etc ...<br>
> >>>>>>>><br>
> >>>>>>>> Basically replication works fine until reaching a folder quantity<br>
> >>>>>>>> of ~<br>
> >>>>>>>> 256 or more<br>
> >>>>>>>><br>
> >>>>>>>> to reproduce using doveadm i created folders like<br>
> >>>>>>>><br>
> >>>>>>>> INBOX/folder-0<br>
> >>>>>>>> INBOX/folder-1<br>
> >>>>>>>> INBOX/folder-2<br>
> >>>>>>>> INBOX/folder-3<br>
> >>>>>>>> and so forth ......<br>
> >>>>>>>><br>
> >>>>>>>> I created 200 folders and they replicated ok on both servers<br>
> >>>>>>>><br>
> >>>>>>>> I created another 200 (400 total) and the replicator got stuck and<br>
> >>>>>>>> would<br>
> >>>>>>>> not update the mbox on the alternate server anymore and is still<br>
> >>>>>>>> updating 4 days later ?<br>
> >>>>>>>><br>
> >>>>>>>> basically replicator goes so far and either hangs or more likely<br>
> >>>>>>>> bails<br>
> >>>>>>>> on an error that is not reported to the debug reporting ?<br>
> >>>>>>>><br>
> >>>>>>>> however dsync will sync the two servers but only when run manually<br>
> >>>>>>>> (ie<br>
> >>>>>>>> all the folders will sync)<br>
> >>>>>>>><br>
> >>>>>>>> I have two test servers avaliable if you need any kind of access -<br>
> >>>>>>>> again<br>
> >>>>>>>> here to help.<br>
> >>>>>>>><br>
> >>>>>>>> [07:28:42] <a href="http://mail18.scom.ca" rel="noreferrer" target="_blank">mail18.scom.ca</a> [root:0] ~<br>
> >>>>>>>> # sync.status<br>
> >>>>>>>> Queued 'sync' requests 0<br>
> >>>>>>>> Queued 'high' requests 6<br>
> >>>>>>>> Queued 'low' requests 0<br>
> >>>>>>>> Queued 'failed' requests 0<br>
> >>>>>>>> Queued 'full resync' requests 0<br>
> >>>>>>>> Waiting 'failed' requests 0<br>
> >>>>>>>> Total number of known users 255<br>
> >>>>>>>><br>
> >>>>>>>> username type status<br>
> >>>>>>>> <a href="mailto:paul@scom.ca" target="_blank">paul@scom.ca</a> normal Waiting for dsync to<br>
> >>>>>>>> finish<br>
> >>>>>>>> <a href="mailto:keith@elirpa.com" target="_blank">keith@elirpa.com</a> incremental Waiting for dsync to<br>
> >>>>>>>> finish<br>
> >>>>>>>> <a href="mailto:ed.hanna@dssmgmt.com" target="_blank">ed.hanna@dssmgmt.com</a> incremental Waiting for dsync to<br>
> >>>>>>>> finish<br>
> >>>>>>>> <a href="mailto:ed@scom.ca" target="_blank">ed@scom.ca</a> incremental Waiting for dsync to<br>
> >>>>>>>> finish<br>
> >>>>>>>> <a href="mailto:nick@elirpa.com" target="_blank">nick@elirpa.com</a> incremental Waiting for dsync to<br>
> >>>>>>>> finish<br>
> >>>>>>>> <a href="mailto:paul@paulkudla.net" target="_blank">paul@paulkudla.net</a> incremental Waiting for dsync to<br>
> >>>>>>>> finish<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> i have been going through the c code and it seems the replication<br>
> >>>>>>>> gets<br>
> >>>>>>>> requested ok<br>
> >>>>>>>><br>
> >>>>>>>> replicator.db does get updated ok with the replicated request for the<br>
> >>>>>>>> mbox in question.<br>
> >>>>>>>><br>
> >>>>>>>> however i am still looking for the actual replicator function in the<br>
> >>>>>>>> lib's that do the actual replication requests<br>
> >>>>>>>><br>
> >>>>>>>> the number of folders & subfolders is defanately the issue - not the<br>
> >>>>>>>> mbox pyhsical size as thought origionally.<br>
> >>>>>>>><br>
> >>>>>>>> if someone can point me in the right direction, it seems either the<br>
> >>>>>>>> replicator is not picking up on the number of folders to replicat<br>
> >>>>>>>> properly or it has a hard set limit like 256 / 512 / 65535 etc and<br>
> >>>>>>>> stops<br>
> >>>>>>>> the replication request thereafter.<br>
> >>>>>>>><br>
> >>>>>>>> I am mainly a machine code programmer from the 80's and have<br>
> >>>>>>>> concentrated on python as of late, 'c' i am starting to go through<br>
> >>>>>>>> just<br>
> >>>>>>>> to give you a background on my talents.<br>
> >>>>>>>><br>
> >>>>>>>> It took 2 months to finger this out.<br>
> >>>>>>>><br>
> >>>>>>>> this issue also seems to be indirectly causing the duplicate messages<br>
> >>>>>>>> supression not to work as well.<br>
> >>>>>>>><br>
> >>>>>>>> python programming to reproduce issue (loops are for last run<br>
> >>>>>>>> started @<br>
> >>>>>>>> 200 - fyi) :<br>
> >>>>>>>><br>
> >>>>>>>> # cat mbox.gen<br>
> >>>>>>>> #!/usr/local/bin/python2<br>
> >>>>>>>><br>
> >>>>>>>> import os,sys<br>
> >>>>>>>><br>
> >>>>>>>> from lib import *<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> user = '<a href="mailto:paul@paulkudla.net" target="_blank">paul@paulkudla.net</a>'<br>
> >>>>>>>><br>
> >>>>>>>> """<br>
> >>>>>>>> for count in range (0,600) :<br>
> >>>>>>>> box = 'INBOX/folder-%s' %count<br>
> >>>>>>>> print count<br>
> >>>>>>>> command = '/usr/local/bin/doveadm mailbox create -s -u %s<br>
> >>>>>>>> %s'<br>
> >>>>>>>> %(user,box)<br>
> >>>>>>>> print command<br>
> >>>>>>>> a = commands.getoutput(command)<br>
> >>>>>>>> print a<br>
> >>>>>>>> """<br>
> >>>>>>>><br>
> >>>>>>>> for count in range (0,600) :<br>
> >>>>>>>> box = 'INBOX/folder-0/sub-%' %count<br>
> >>>>>>>> print count<br>
> >>>>>>>> command = '/usr/local/bin/doveadm mailbox create -s -u %s<br>
> >>>>>>>> %s'<br>
> >>>>>>>> %(user,box)<br>
> >>>>>>>> print command<br>
> >>>>>>>> a = commands.getoutput(command)<br>
> >>>>>>>> print a<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> #sys.exit()<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> Happy Sunday !!!<br>
> >>>>>>>> Thanks - paul<br>
> >>>>>>>><br>
> >>>>>>>> Paul Kudla<br>
> >>>>>>>><br>
> >>>>>>>><br>
> >>>>>>>> Scom.ca Internet Services <<a href="http://www.scom.ca" rel="noreferrer" target="_blank">http://www.scom.ca</a>><br>
> >>>>>>>> 004-1009 Byron Street South<br>
> >>>>>>>> Whitby, Ontario - Canada<br>
> >>>>>>>> L1N 4S3<br>
> >>>>>>>><br>
> >>>>>>>> Toronto 416.642.7266<br>
> >>>>>>>> Main 1.866.411.7266<br>
> >>>>>>>> Fax 1.888.892.7266<br>
> >>>>>>>><br>
> >>>>>>>> On 4/24/2022 10:22 AM, Arnaud Abélard wrote:<br>
> >>>>>>>>> Hello,<br>
> >>>>>>>>><br>
> >>>>>>>>> I am working on replicating a server (and adding compression on the<br>
> >>>>>>>>> other side) and since I had "Error: dsync I/O has stalled, no<br>
> >>>>>>>>> activity<br>
> >>>>>>>>> for 600 seconds (version not received)" errors I upgraded both<br>
> >>>>>>>>> source<br>
> >>>>>>>>> and destination server with the latest 2.3 version (2.3.18). While<br>
> >>>>>>>>> before the upgrade all the 15 replication connections were busy<br>
> >>>>>>>>> after<br>
> >>>>>>>>> upgrading dovecot replicator dsync-status shows that most of the<br>
> >>>>>>>>> time<br>
> >>>>>>>>> nothing is being replicated at all. I can see some brief<br>
> >>>>>>>>> replications<br>
> >>>>>>>>> that last, but 99,9% of the time nothing is happening at all.<br>
> >>>>>>>>><br>
> >>>>>>>>> I have a replication_full_sync_interval of 12 hours but I have<br>
> >>>>>>>>> thousands of users with their last full sync over 90 hours ago.<br>
> >>>>>>>>><br>
> >>>>>>>>> "doveadm replicator status" also shows that i have over 35,000<br>
> >>>>>>>>> queued<br>
> >>>>>>>>> full resync requests, but no sync, high or low queued requests so<br>
> >>>>>>>>> why<br>
> >>>>>>>>> aren't the full requests occuring?<br>
> >>>>>>>>><br>
> >>>>>>>>> There are no errors in the logs.<br>
> >>>>>>>>><br>
> >>>>>>>>> Thanks,<br>
> >>>>>>>>><br>
> >>>>>>>>> Arnaud<br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>>>>><br>
> >>>>>><br>
> >>>>><br>
> >>><br>
> ><br>
</blockquote></div>