no full syncs after upgrading to dovecot 2.3.18

Aki Tuomi aki.tuomi at open-xchange.com
Wed May 11 05:31:32 UTC 2022


I was hoping it would be fixed, apparently not then.

Can you enable mail_debug=yes and send the logs?

Aki

> On 11/05/2022 07:25 Cassidy B. Larson <alandaluz at gmail.com> wrote:
> 
> 
> Hi Aki,
> 
> We just installed 2.3.19, and are seeing a couple of users throwing the "INBOX/dovecot.index reset, view is now inconsistent" and their replicator status erroring out. Tried force-resync on the full mailbox, but to no avail just yet. Not sure if this bug was supposedly fixed in 2.3.19?
> 
> Thanks,
> 
> Cassidy
> 
> 
> On Thu, Apr 28, 2022 at 5:02 AM Aki Tuomi <aki.tuomi at open-xchange.com> wrote:
> > 2.3.19 is round the corner, so not long. I cannot yet promise an exact date but hopefully within week or two.
> >  
> >  Aki
> >  
> >  > On 28/04/2022 13:57 Paul Kudla (SCOM.CA (http://SCOM.CA) Internet Services Inc.) <paul at scom.ca> wrote:
> >  > 
> >  > 
> >  > Thanks for the update.
> >  > 
> >  > is this for both replication issues (folders +300 etc)
> >  > 
> >  > Just Asking - Any ETA
> >  > 
> >  > 
> >  > 
> >  > 
> >  > 
> >  > Happy Thursday !!!
> >  > Thanks - paul
> >  > 
> >  > Paul Kudla
> >  > 
> >  > 
> >  > Scom.ca Internet Services <http://www.scom.ca>
> >  > 004-1009 Byron Street South
> >  > Whitby, Ontario - Canada
> >  > L1N 4S3
> >  > 
> >  > Toronto 416.642.7266
> >  > Main 1.866.411.7266
> >  > Fax 1.888.892.7266
> >  > 
> >  > On 4/27/2022 9:01 AM, Aki Tuomi wrote:
> >  > > 
> >  > > Hi!
> >  > > 
> >  > > This is probably going to get fixed in 2.3.19, this looks like an issue we are already fixing.
> >  > > 
> >  > > Aki
> >  > > 
> >  > >> On 26/04/2022 16:38 Paul Kudla (SCOM.CA (http://SCOM.CA) Internet Services Inc.) <paul at scom.ca> wrote:
> >  > >>
> >  > >> 
> >  > >> Agreed there seems to be no way of posting these kinds of issues to see
> >  > >> if they are even being addressed or even known about moving forward on
> >  > >> new updates
> >  > >>
> >  > >> i read somewhere there is a new branch soming out but nothing as of yet?
> >  > >>
> >  > >> 2.4 maybe ....
> >  > >> 5.0 ........
> >  > >>
> >  > >> my previous replication issues (back in feb) went unanswered.
> >  > >>
> >  > >> not faulting anyone, but the developers do seem to be disconnected from
> >  > >> issues as of late? or concentrating on other issues.
> >  > >>
> >  > >> I have no problem with support contracts for day to day maintence
> >  > >> however as a programmer myself they usually dont work as the other end
> >  > >> relies on the latest source code anyways. Thus can not help.
> >  > >>
> >  > >> I am trying to take a part the replicator c programming based on 2.3.18
> >  > >> as most of it does work to some extent.
> >  > >>
> >  > >> tcps just does not work (ie 600 seconds default in the c programming)
> >  > >>
> >  > >> My thoughts are tcp works ok but fails when the replicator through
> >  > >> dsync-client.c when asked to return the folder list?
> >  > >>
> >  > >>
> >  > >> replicator-brain.c seems to control the overall process and timing.
> >  > >>
> >  > >> replicator-queue.c seems to handle the que file that does seem to carry
> >  > >> acurate info.
> >  > >>
> >  > >>
> >  > >> things in the source code are documented enough to figure this out but i
> >  > >> am still going through all the related .h files documentation wise which
> >  > >> are all over the place.
> >  > >>
> >  > >> there is no clear documentation on the .h lib files so i have to walk
> >  > >> through the tree one at a time finding relative code.
> >  > >>
> >  > >> since the dsync from doveadm does see to work ok i have to assume the
> >  > >> dsync-client used to compile the replicator is at fault somehow or a
> >  > >> call from it upstream?
> >  > >>
> >  > >> Thanks for your input on the other issues noted below, i will keep that
> >  > >> in mind when disassembling the source code.
> >  > >>
> >  > >> No sense in fixing one thing and leaving something else behind, probably
> >  > >> all related anyways.
> >  > >>
> >  > >> i have two test servers avaliable so i can play with all this offline to
> >  > >> reproduce the issues
> >  > >>
> >  > >> Unfortunately I have to make a living first, this will be addressed when
> >  > >> possible as i dont like systems that are live running this way and
> >  > >> currently only have 5 accounts with this issue (mine included)
> >  > >>
> >  > >>
> >  > >>
> >  > >>
> >  > >> Happy Tuesday !!!
> >  > >> Thanks - paul
> >  > >>
> >  > >> Paul Kudla
> >  > >>
> >  > >>
> >  > >> Scom.ca Internet Services <http://www.scom.ca>
> >  > >> 004-1009 Byron Street South
> >  > >> Whitby, Ontario - Canada
> >  > >> L1N 4S3
> >  > >>
> >  > >> Toronto 416.642.7266
> >  > >> Main 1.866.411.7266
> >  > >> Fax 1.888.892.7266
> >  > >>
> >  > >> On 4/26/2022 9:03 AM, Reuben Farrelly wrote:
> >  > >>>
> >  > >>> I ran into this back in February and documented a reproducible test case
> >  > >>> (and sent it to this list). In short - I was able to reproduce this by
> >  > >>> having a valid and consistent mailbox on the source/local, creating a
> >  > >>> very standard empty Maildir/(new|cur|tmp) folder on the remote replica,
> >  > >>> and then initiating the replicate from the source. This consistently
> >  > >>> caused dsync to fail replication with the error "dovecot.index reset,
> >  > >>> view is now inconsistent" and sync aborted, leaving the replica mailbox
> >  > >>> in a screwed up inconsistent state. Client connections on the source
> >  > >>> replica were also dropped when this error occurred. You can see the
> >  > >>> error by enabling debug level logging if you initiate dsync manually on
> >  > >>> a test mailbox.
> >  > >>>
> >  > >>> The only workaround I found was to remove the remote Maildir and let
> >  > >>> Dovecot create the whole thing from scratch. Dovecot did not like any
> >  > >>> existing folders on the destination replica even if they were the same
> >  > >>> names as the source and completely empty. I was able to reproduce this
> >  > >>> the bare minimum of folders - just an INBOX!
> >  > >>>
> >  > >>> I have no idea if any of the developers saw my post or if the bug has
> >  > >>> been fixed for the next release. But it seemed to be quite a common
> >  > >>> problem over time (saw a few posts from people going back a long way
> >  > >>> with the same problem) and it is seriously disruptive to clients. The
> >  > >>> error message is not helpful in tracking down the problem either.
> >  > >>>
> >  > >>> Secondly, I also have had an ongoing and longstanding problem using
> >  > >>> tcps: for replication. For some reason using tcps: (with no other
> >  > >>> changes at all to the config) results in a lot of timeout messages
> >  > >>> "Error: dsync I/O has stalled, no activity for 600 seconds". This goes
> >  > >>> away if I revert back to tcp: instead of tcps - with tcp: I very rarely
> >  > >>> get timeouts. No idea why, guess this is a bug of some sort also.
> >  > >>>
> >  > >>> It's disappointing that there appears to be no way to have these sorts
> >  > >>> or problems addressed like there once was. I am not using Dovecot for
> >  > >>> commercial purposes so paying a fortune for a support contract for a
> >  > >>> high end installation just isn't going to happen, and this list seems to
> >  > >>> be quite ordinary for getting support and reporting bugs nowadays....
> >  > >>>
> >  > >>> Reuben
> >  > >>>
> >  > >>> On 26/04/2022 7:21 pm, Paul Kudla (SCOM.CA (http://SCOM.CA) Internet Services Inc.) wrote:
> >  > >>>
> >  > >>>>
> >  > >>>> side issue
> >  > >>>>
> >  > >>>> if you are getting inconsistant dsyncs there is no real way to fix
> >  > >>>> this in the long run.
> >  > >>>>
> >  > >>>> i know its a pain (already had to my self)
> >  > >>>>
> >  > >>>> i needed to do a full sync, take one server offline, delete the user
> >  > >>>> dir (with dovecot offline) and then rsync (or somehow duplicate the
> >  > >>>> main server's user data) over the the remote again.
> >  > >>>>
> >  > >>>> then bring remote back up and it kind or worked worked
> >  > >>>>
> >  > >>>> best suggestion is to bring the main server down at night so the copy
> >  > >>>> is clean?
> >  > >>>>
> >  > >>>> if using postfix you can enable the soft bounce option and the mail
> >  > >>>> will back spool until everything comes back online
> >  > >>>>
> >  > >>>> (needs to be enable on bother servers)
> >  > >>>>
> >  > >>>> replication was still an issue on accounts with 300+ folders in them,
> >  > >>>> still working on a fix for that.
> >  > >>>>
> >  > >>>>
> >  > >>>> Happy Tuesday !!!
> >  > >>>> Thanks - paul
> >  > >>>>
> >  > >>>> Paul Kudla
> >  > >>>>
> >  > >>>>
> >  > >>>> Scom.ca Internet Services <http://www.scom.ca>
> >  > >>>> 004-1009 Byron Street South
> >  > >>>> Whitby, Ontario - Canada
> >  > >>>> L1N 4S3
> >  > >>>>
> >  > >>>> Toronto 416.642.7266
> >  > >>>> Main 1.866.411.7266
> >  > >>>> Fax 1.888.892.7266
> >  > >>>>
> >  > >>>> On 4/25/2022 10:01 AM, Arnaud Abélard wrote:
> >  > >>>>> Ah, I'm now getting errors in the logs, that would explains the
> >  > >>>>> increasing number of failed sync requests:
> >  > >>>>>
> >  > >>>>> dovecot: imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>:
> >  > >>>>> Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset, view is
> >  > >>>>> now inconsistent
> >  > >>>>>
> >  > >>>>>
> >  > >>>>> And sure enough:
> >  > >>>>>
> >  > >>>>> # dovecot replicator status xxxxx
> >  > >>>>>
> >  > >>>>> xxxxx none 00:02:54 07:11:28 - y
> >  > >>>>>
> >  > >>>>>
> >  > >>>>> What could explain that error?
> >  > >>>>>
> >  > >>>>> Arnaud
> >  > >>>>>
> >  > >>>>>
> >  > >>>>>
> >  > >>>>> On 25/04/2022 15:13, Arnaud Abélard wrote:
> >  > >>>>>> Hello,
> >  > >>>>>>
> >  > >>>>>> On my side we are running Linux (Debian Buster).
> >  > >>>>>>
> >  > >>>>>> I'm not sure my problem is actually the same as Paul or you
> >  > >>>>>> Sebastian since I have a lot of boxes but those are actually small
> >  > >>>>>> (quota of 110MB) so I doubt any of them have more than a dozen imap
> >  > >>>>>> folders.
> >  > >>>>>>
> >  > >>>>>> The main symptom is that I have tons of full sync requests awaiting
> >  > >>>>>> but even though no other sync is pending the replicator just waits
> >  > >>>>>> for something to trigger those syncs.
> >  > >>>>>>
> >  > >>>>>> Today, with users back I can see that normal and incremental syncs
> >  > >>>>>> are being done on the 15 connections, with an occasional full sync
> >  > >>>>>> here or there and lots of "Waiting 'failed' requests":
> >  > >>>>>>
> >  > >>>>>> Queued 'sync' requests 0
> >  > >>>>>>
> >  > >>>>>> Queued 'high' requests 0
> >  > >>>>>>
> >  > >>>>>> Queued 'low' requests 0
> >  > >>>>>>
> >  > >>>>>> Queued 'failed' requests 122
> >  > >>>>>>
> >  > >>>>>> Queued 'full resync' requests 28785
> >  > >>>>>>
> >  > >>>>>> Waiting 'failed' requests 4294
> >  > >>>>>>
> >  > >>>>>> Total number of known users 42512
> >  > >>>>>>
> >  > >>>>>>
> >  > >>>>>>
> >  > >>>>>> So, why didn't the replicator take advantage of the weekend to
> >  > >>>>>> replicate the mailboxes while no user were using them?
> >  > >>>>>>
> >  > >>>>>> Arnaud
> >  > >>>>>>
> >  > >>>>>>
> >  > >>>>>>
> >  > >>>>>>
> >  > >>>>>> On 25/04/2022 13:54, Sebastian Marske wrote:
> >  > >>>>>>> Hi there,
> >  > >>>>>>>
> >  > >>>>>>> thanks for your insights and for diving deeper into this Paul!
> >  > >>>>>>>
> >  > >>>>>>> For me, the users ending up in 'Waiting for dsync to finish' all have
> >  > >>>>>>> more than 256 Imap folders as well (ranging from 288 up to >5500;
> >  > >>>>>>> as per
> >  > >>>>>>> 'doveadm mailbox list -u <username> | wc -l'). For more details on my
> >  > >>>>>>> setup please see my post from February [1].
> >  > >>>>>>>
> >  > >>>>>>> @Arnaud: What OS are you running on?
> >  > >>>>>>>
> >  > >>>>>>>
> >  > >>>>>>> Best
> >  > >>>>>>> Sebastian
> >  > >>>>>>>
> >  > >>>>>>>
> >  > >>>>>>> [1] https://dovecot.org/pipermail/dovecot/2022-February/124168.html
> >  > >>>>>>>
> >  > >>>>>>>
> >  > >>>>>>> On 4/24/22 19:36, Paul Kudla (SCOM.CA (http://SCOM.CA) Internet Services Inc.) wrote:
> >  > >>>>>>>>
> >  > >>>>>>>> Question having similiar replication issues
> >  > >>>>>>>>
> >  > >>>>>>>> pls read everything below and advise the folder counts on the
> >  > >>>>>>>> non-replicated users?
> >  > >>>>>>>>
> >  > >>>>>>>> i find the total number of folders / account seems to be a factor
> >  > >>>>>>>> and
> >  > >>>>>>>> NOT the size of the mail box
> >  > >>>>>>>>
> >  > >>>>>>>> ie i have customers with 40G of emails no problem over 40 or so
> >  > >>>>>>>> folders
> >  > >>>>>>>> and it works ok
> >  > >>>>>>>>
> >  > >>>>>>>> 300+ folders seems to be the issue
> >  > >>>>>>>>
> >  > >>>>>>>> i have been going through the replication code
> >  > >>>>>>>>
> >  > >>>>>>>> no errors being logged
> >  > >>>>>>>>
> >  > >>>>>>>> i am assuming that the replication --> dhclient --> other server is
> >  > >>>>>>>> timing out or not reading the folder lists correctly (ie dies after X
> >  > >>>>>>>> folders read)
> >  > >>>>>>>>
> >  > >>>>>>>> thus i am going through the code patching for log entries etc to find
> >  > >>>>>>>> the issues.
> >  > >>>>>>>>
> >  > >>>>>>>> see
> >  > >>>>>>>>
> >  > >>>>>>>> [13:33:57] mail18.scom.ca (http://mail18.scom.ca) [root:0] /usr/local/var/lib/dovecot
> >  > >>>>>>>> # ll
> >  > >>>>>>>> total 86
> >  > >>>>>>>> drwxr-xr-x 2 root wheel uarch 4B Apr 24 11:11 .
> >  > >>>>>>>> drwxr-xr-x 4 root wheel uarch 4B Mar 8 2021 ..
> >  > >>>>>>>> -rw-r--r-- 1 root wheel uarch 73B Apr 24 11:11 instances
> >  > >>>>>>>> -rw-r--r-- 1 root wheel uarch 160K Apr 24 13:33 replicator.db
> >  > >>>>>>>>
> >  > >>>>>>>> [13:33:58] mail18.scom.ca (http://mail18.scom.ca) [root:0] /usr/local/var/lib/dovecot
> >  > >>>>>>>> #
> >  > >>>>>>>>
> >  > >>>>>>>> replicator.db seems to get updated ok but never processed properly.
> >  > >>>>>>>>
> >  > >>>>>>>> # sync.users
> >  > >>>>>>>> nick at elirpa.com high 00:09:41 463:47:01 - y
> >  > >>>>>>>> keith at elirpa.com high 00:09:23 463:45:43 - y
> >  > >>>>>>>> paul at scom.ca high 00:09:41 463:46:51 - y
> >  > >>>>>>>> ed at scom.ca high 00:09:43 463:47:01 - y
> >  > >>>>>>>> ed.hanna at dssmgmt.com high 00:09:42 463:46:58 - y
> >  > >>>>>>>> paul at paulkudla.net high 00:09:44 463:47:03
> >  > >>>>>>>> 580:35:07
> >  > >>>>>>>> y
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> so ....
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> two things :
> >  > >>>>>>>>
> >  > >>>>>>>> first to get the production stuff to work i had to write a script
> >  > >>>>>>>> that
> >  > >>>>>>>> whould find the bad sync's and the force a dsync between the servers
> >  > >>>>>>>>
> >  > >>>>>>>> i run this every five minutes or each server.
> >  > >>>>>>>>
> >  > >>>>>>>> in crontab
> >  > >>>>>>>>
> >  > >>>>>>>> */10 * * * * root /usr/bin/nohup
> >  > >>>>>>>> /programs/common/sync.recover > /dev/null
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> python script to sort things out
> >  > >>>>>>>>
> >  > >>>>>>>> # cat /programs/common/sync.recover
> >  > >>>>>>>> #!/usr/local/bin/python3
> >  > >>>>>>>>
> >  > >>>>>>>> #Force sync between servers that are reporting bad?
> >  > >>>>>>>>
> >  > >>>>>>>> import os,sys,django,socket
> >  > >>>>>>>> from optparse import OptionParser
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> from lib import *
> >  > >>>>>>>>
> >  > >>>>>>>> #Sample Re-Index MB
> >  > >>>>>>>> #doveadm -D force-resync -u paul at scom.ca -f INBOX*
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> USAGE_TEXT = '''\
> >  > >>>>>>>> usage: %%prog %s[options]
> >  > >>>>>>>> '''
> >  > >>>>>>>>
> >  > >>>>>>>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4')
> >  > >>>>>>>>
> >  > >>>>>>>> parser.add_option("-m", "--send_to", dest="send_to", help="Send
> >  > >>>>>>>> Email To")
> >  > >>>>>>>> parser.add_option("-e", "--email", dest="email_box", help="Box to
> >  > >>>>>>>> Index")
> >  > >>>>>>>> parser.add_option("-d", "--detail",action='store_true',
> >  > >>>>>>>> dest="detail",default =False, help="Detailed report")
> >  > >>>>>>>> parser.add_option("-i", "--index",action='store_true',
> >  > >>>>>>>> dest="index",default =False, help="Index")
> >  > >>>>>>>>
> >  > >>>>>>>> options, args = parser.parse_args()
> >  > >>>>>>>>
> >  > >>>>>>>> print (options.email_box)
> >  > >>>>>>>> print (options.send_to)
> >  > >>>>>>>> print (options.detail)
> >  > >>>>>>>>
> >  > >>>>>>>> #sys.exit()
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> print ('Getting Current User Sync Status')
> >  > >>>>>>>> command = commands("/usr/local/bin/doveadm replicator status '*'")
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> #print command
> >  > >>>>>>>>
> >  > >>>>>>>> sync_user_status = command.output.split('\n')
> >  > >>>>>>>>
> >  > >>>>>>>> #print sync_user_status
> >  > >>>>>>>>
> >  > >>>>>>>> synced = []
> >  > >>>>>>>>
> >  > >>>>>>>> for n in range(1,len(sync_user_status)) :
> >  > >>>>>>>> user = sync_user_status[n]
> >  > >>>>>>>> print ('Processing User : %s' %user.split(' ')[0])
> >  > >>>>>>>> if user.split(' ')[0] != options.email_box :
> >  > >>>>>>>> if options.email_box != None :
> >  > >>>>>>>> continue
> >  > >>>>>>>>
> >  > >>>>>>>> if options.index == True :
> >  > >>>>>>>> command = '/usr/local/bin/doveadm -D force-resync
> >  > >>>>>>>> -u %s
> >  > >>>>>>>> -f INBOX*' %user.split(' ')[0]
> >  > >>>>>>>> command = commands(command)
> >  > >>>>>>>> command = command.output
> >  > >>>>>>>>
> >  > >>>>>>>> #print user
> >  > >>>>>>>> for nn in range (len(user)-1,0,-1) :
> >  > >>>>>>>> #print nn
> >  > >>>>>>>> #print user[nn]
> >  > >>>>>>>>
> >  > >>>>>>>> if user[nn] == '-' :
> >  > >>>>>>>> #print 'skipping ... %s' %user.split(' ')[0]
> >  > >>>>>>>>
> >  > >>>>>>>> break
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> if user[nn] == 'y': #Found a Bad Mailbox
> >  > >>>>>>>> print ('syncing ... %s' %user.split(' ')[0])
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> if options.detail == True :
> >  > >>>>>>>> command = '/usr/local/bin/doveadm -D
> >  > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]
> >  > >>>>>>>> print (command)
> >  > >>>>>>>> command = commands(command)
> >  > >>>>>>>> command = command.output.split('\n')
> >  > >>>>>>>> print (command)
> >  > >>>>>>>> print ('Processed Mailbox for ...
> >  > >>>>>>>> %s'
> >  > >>>>>>>> %user.split(' ')[0] )
> >  > >>>>>>>> synced.append('Processed Mailbox
> >  > >>>>>>>> for ...
> >  > >>>>>>>> %s' %user.split(' ')[0])
> >  > >>>>>>>> for nnn in range(len(command)):
> >  > >>>>>>>> synced.append(command[nnn] + '\n')
> >  > >>>>>>>> break
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> if options.detail == False :
> >  > >>>>>>>> #command =
> >  > >>>>>>>> '/usr/local/bin/doveadm -D
> >  > >>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]
> >  > >>>>>>>> #print (command)
> >  > >>>>>>>> #command = os.system(command)
> >  > >>>>>>>> command = subprocess.Popen(
> >  > >>>>>>>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split('
> >  > >>>>>>>> ')[0]
> >  > >>>>>>>> ], \
> >  > >>>>>>>> shell = True, stdin=None,
> >  > >>>>>>>> stdout=None,
> >  > >>>>>>>> stderr=None, close_fds=True)
> >  > >>>>>>>>
> >  > >>>>>>>> print ( 'Processed Mailbox for
> >  > >>>>>>>> ... %s'
> >  > >>>>>>>> %user.split(' ')[0] )
> >  > >>>>>>>> synced.append('Processed Mailbox
> >  > >>>>>>>> for ...
> >  > >>>>>>>> %s' %user.split(' ')[0])
> >  > >>>>>>>> #sys.exit()
> >  > >>>>>>>> break
> >  > >>>>>>>>
> >  > >>>>>>>> if len(synced) != 0 :
> >  > >>>>>>>> #send email showing bad synced boxes ?
> >  > >>>>>>>>
> >  > >>>>>>>> if options.send_to != None :
> >  > >>>>>>>> send_from = 'monitor at scom.ca'
> >  > >>>>>>>> send_to = ['%s' %options.send_to]
> >  > >>>>>>>> send_subject = 'Dovecot Bad Sync Report for : %s'
> >  > >>>>>>>> %(socket.gethostname())
> >  > >>>>>>>> send_text = '\n\n'
> >  > >>>>>>>> for n in range (len(synced)) :
> >  > >>>>>>>> send_text = send_text + synced[n] + '\n'
> >  > >>>>>>>>
> >  > >>>>>>>> send_files = []
> >  > >>>>>>>> sendmail (send_from, send_to, send_subject,
> >  > >>>>>>>> send_text,
> >  > >>>>>>>> send_files)
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> sys.exit()
> >  > >>>>>>>>
> >  > >>>>>>>> second :
> >  > >>>>>>>>
> >  > >>>>>>>> i posted this a month ago - no response
> >  > >>>>>>>>
> >  > >>>>>>>> please appreciate that i am trying to help ....
> >  > >>>>>>>>
> >  > >>>>>>>> after much testing i can now reporduce the replication issues at hand
> >  > >>>>>>>>
> >  > >>>>>>>> I am running on freebsd 12 & 13 stable (both test and production
> >  > >>>>>>>> servers)
> >  > >>>>>>>>
> >  > >>>>>>>> sdram drives etc ...
> >  > >>>>>>>>
> >  > >>>>>>>> Basically replication works fine until reaching a folder quantity
> >  > >>>>>>>> of ~
> >  > >>>>>>>> 256 or more
> >  > >>>>>>>>
> >  > >>>>>>>> to reproduce using doveadm i created folders like
> >  > >>>>>>>>
> >  > >>>>>>>> INBOX/folder-0
> >  > >>>>>>>> INBOX/folder-1
> >  > >>>>>>>> INBOX/folder-2
> >  > >>>>>>>> INBOX/folder-3
> >  > >>>>>>>> and so forth ......
> >  > >>>>>>>>
> >  > >>>>>>>> I created 200 folders and they replicated ok on both servers
> >  > >>>>>>>>
> >  > >>>>>>>> I created another 200 (400 total) and the replicator got stuck and
> >  > >>>>>>>> would
> >  > >>>>>>>> not update the mbox on the alternate server anymore and is still
> >  > >>>>>>>> updating 4 days later ?
> >  > >>>>>>>>
> >  > >>>>>>>> basically replicator goes so far and either hangs or more likely
> >  > >>>>>>>> bails
> >  > >>>>>>>> on an error that is not reported to the debug reporting ?
> >  > >>>>>>>>
> >  > >>>>>>>> however dsync will sync the two servers but only when run manually
> >  > >>>>>>>> (ie
> >  > >>>>>>>> all the folders will sync)
> >  > >>>>>>>>
> >  > >>>>>>>> I have two test servers avaliable if you need any kind of access -
> >  > >>>>>>>> again
> >  > >>>>>>>> here to help.
> >  > >>>>>>>>
> >  > >>>>>>>> [07:28:42] mail18.scom.ca (http://mail18.scom.ca) [root:0] ~
> >  > >>>>>>>> # sync.status
> >  > >>>>>>>> Queued 'sync' requests 0
> >  > >>>>>>>> Queued 'high' requests 6
> >  > >>>>>>>> Queued 'low' requests 0
> >  > >>>>>>>> Queued 'failed' requests 0
> >  > >>>>>>>> Queued 'full resync' requests 0
> >  > >>>>>>>> Waiting 'failed' requests 0
> >  > >>>>>>>> Total number of known users 255
> >  > >>>>>>>>
> >  > >>>>>>>> username type status
> >  > >>>>>>>> paul at scom.ca normal Waiting for dsync to
> >  > >>>>>>>> finish
> >  > >>>>>>>> keith at elirpa.com incremental Waiting for dsync to
> >  > >>>>>>>> finish
> >  > >>>>>>>> ed.hanna at dssmgmt.com incremental Waiting for dsync to
> >  > >>>>>>>> finish
> >  > >>>>>>>> ed at scom.ca incremental Waiting for dsync to
> >  > >>>>>>>> finish
> >  > >>>>>>>> nick at elirpa.com incremental Waiting for dsync to
> >  > >>>>>>>> finish
> >  > >>>>>>>> paul at paulkudla.net incremental Waiting for dsync to
> >  > >>>>>>>> finish
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> i have been going through the c code and it seems the replication
> >  > >>>>>>>> gets
> >  > >>>>>>>> requested ok
> >  > >>>>>>>>
> >  > >>>>>>>> replicator.db does get updated ok with the replicated request for the
> >  > >>>>>>>> mbox in question.
> >  > >>>>>>>>
> >  > >>>>>>>> however i am still looking for the actual replicator function in the
> >  > >>>>>>>> lib's that do the actual replication requests
> >  > >>>>>>>>
> >  > >>>>>>>> the number of folders & subfolders is defanately the issue - not the
> >  > >>>>>>>> mbox pyhsical size as thought origionally.
> >  > >>>>>>>>
> >  > >>>>>>>> if someone can point me in the right direction, it seems either the
> >  > >>>>>>>> replicator is not picking up on the number of folders to replicat
> >  > >>>>>>>> properly or it has a hard set limit like 256 / 512 / 65535 etc and
> >  > >>>>>>>> stops
> >  > >>>>>>>> the replication request thereafter.
> >  > >>>>>>>>
> >  > >>>>>>>> I am mainly a machine code programmer from the 80's and have
> >  > >>>>>>>> concentrated on python as of late, 'c' i am starting to go through
> >  > >>>>>>>> just
> >  > >>>>>>>> to give you a background on my talents.
> >  > >>>>>>>>
> >  > >>>>>>>> It took 2 months to finger this out.
> >  > >>>>>>>>
> >  > >>>>>>>> this issue also seems to be indirectly causing the duplicate messages
> >  > >>>>>>>> supression not to work as well.
> >  > >>>>>>>>
> >  > >>>>>>>> python programming to reproduce issue (loops are for last run
> >  > >>>>>>>> started @
> >  > >>>>>>>> 200 - fyi) :
> >  > >>>>>>>>
> >  > >>>>>>>> # cat mbox.gen
> >  > >>>>>>>> #!/usr/local/bin/python2
> >  > >>>>>>>>
> >  > >>>>>>>> import os,sys
> >  > >>>>>>>>
> >  > >>>>>>>> from lib import *
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> user = 'paul at paulkudla.net'
> >  > >>>>>>>>
> >  > >>>>>>>> """
> >  > >>>>>>>> for count in range (0,600) :
> >  > >>>>>>>> box = 'INBOX/folder-%s' %count
> >  > >>>>>>>> print count
> >  > >>>>>>>> command = '/usr/local/bin/doveadm mailbox create -s -u %s
> >  > >>>>>>>> %s'
> >  > >>>>>>>> %(user,box)
> >  > >>>>>>>> print command
> >  > >>>>>>>> a = commands.getoutput(command)
> >  > >>>>>>>> print a
> >  > >>>>>>>> """
> >  > >>>>>>>>
> >  > >>>>>>>> for count in range (0,600) :
> >  > >>>>>>>> box = 'INBOX/folder-0/sub-%' %count
> >  > >>>>>>>> print count
> >  > >>>>>>>> command = '/usr/local/bin/doveadm mailbox create -s -u %s
> >  > >>>>>>>> %s'
> >  > >>>>>>>> %(user,box)
> >  > >>>>>>>> print command
> >  > >>>>>>>> a = commands.getoutput(command)
> >  > >>>>>>>> print a
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> #sys.exit()
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> Happy Sunday !!!
> >  > >>>>>>>> Thanks - paul
> >  > >>>>>>>>
> >  > >>>>>>>> Paul Kudla
> >  > >>>>>>>>
> >  > >>>>>>>>
> >  > >>>>>>>> Scom.ca Internet Services <http://www.scom.ca>
> >  > >>>>>>>> 004-1009 Byron Street South
> >  > >>>>>>>> Whitby, Ontario - Canada
> >  > >>>>>>>> L1N 4S3
> >  > >>>>>>>>
> >  > >>>>>>>> Toronto 416.642.7266
> >  > >>>>>>>> Main 1.866.411.7266
> >  > >>>>>>>> Fax 1.888.892.7266
> >  > >>>>>>>>
> >  > >>>>>>>> On 4/24/2022 10:22 AM, Arnaud Abélard wrote:
> >  > >>>>>>>>> Hello,
> >  > >>>>>>>>>
> >  > >>>>>>>>> I am working on replicating a server (and adding compression on the
> >  > >>>>>>>>> other side) and since I had "Error: dsync I/O has stalled, no
> >  > >>>>>>>>> activity
> >  > >>>>>>>>> for 600 seconds (version not received)" errors I upgraded both
> >  > >>>>>>>>> source
> >  > >>>>>>>>> and destination server with the latest 2.3 version (2.3.18). While
> >  > >>>>>>>>> before the upgrade all the 15 replication connections were busy
> >  > >>>>>>>>> after
> >  > >>>>>>>>> upgrading dovecot replicator dsync-status shows that most of the
> >  > >>>>>>>>> time
> >  > >>>>>>>>> nothing is being replicated at all. I can see some brief
> >  > >>>>>>>>> replications
> >  > >>>>>>>>> that last, but 99,9% of the time nothing is happening at all.
> >  > >>>>>>>>>
> >  > >>>>>>>>> I have a replication_full_sync_interval of 12 hours but I have
> >  > >>>>>>>>> thousands of users with their last full sync over 90 hours ago.
> >  > >>>>>>>>>
> >  > >>>>>>>>> "doveadm replicator status" also shows that i have over 35,000
> >  > >>>>>>>>> queued
> >  > >>>>>>>>> full resync requests, but no sync, high or low queued requests so
> >  > >>>>>>>>> why
> >  > >>>>>>>>> aren't the full requests occuring?
> >  > >>>>>>>>>
> >  > >>>>>>>>> There are no errors in the logs.
> >  > >>>>>>>>>
> >  > >>>>>>>>> Thanks,
> >  > >>>>>>>>>
> >  > >>>>>>>>> Arnaud
> >  > >>>>>>>>>
> >  > >>>>>>>>>
> >  > >>>>>>>>>
> >  > >>>>>>>>>
> >  > >>>>>>>>>
> >  > >>>>>>
> >  > >>>>>
> >  > >>>
> >  > >
> >


More information about the dovecot mailing list