no full syncs after upgrading to dovecot 2.3.18

Paul Kudla (SCOM.CA Internet Services Inc.) paul at scom.ca
Thu Apr 28 11:29:03 UTC 2022


Thanks for the update

I dont push anyone when asking for updates

I am a programmer by trade as well and nothing ever goes as planned

prefer we all take our time and roll it out correctly then jumping the gun.

Why I am trying to help elsewhere as I have gotten pretty fluid with 
dovecot etc and can help users out with the day to day stuff.

I just can't help with ldap, never got around to that as i use pgsql 
databases that are replicated etc etc etc on all my configs.

Again thanks for the update.



Happy Thursday !!!
Thanks - paul

Paul Kudla


Scom.ca Internet Services <http://www.scom.ca>
004-1009 Byron Street South
Whitby, Ontario - Canada
L1N 4S3

Toronto 416.642.7266
Main 1.866.411.7266
Fax 1.888.892.7266

On 4/28/2022 7:02 AM, Aki Tuomi wrote:
> 
> 2.3.19 is round the corner, so not long. I cannot yet promise an exact date but hopefully within week or two.
> 
> Aki
> 
>> On 28/04/2022 13:57 Paul Kudla (SCOM.CA Internet Services Inc.) <paul at scom.ca> wrote:
>>
>>   
>> Thanks for the update.
>>
>> is this for both replication issues (folders +300 etc)
>>
>> Just Asking - Any ETA
>>
>>
>>
>>
>>
>> Happy Thursday !!!
>> Thanks - paul
>>
>> Paul Kudla
>>
>>
>> Scom.ca Internet Services <http://www.scom.ca>
>> 004-1009 Byron Street South
>> Whitby, Ontario - Canada
>> L1N 4S3
>>
>> Toronto 416.642.7266
>> Main 1.866.411.7266
>> Fax 1.888.892.7266
>>
>> On 4/27/2022 9:01 AM, Aki Tuomi wrote:
>>>
>>> Hi!
>>>
>>> This is probably going to get fixed in 2.3.19, this looks like an issue we are already fixing.
>>>
>>> Aki
>>>
>>>> On 26/04/2022 16:38 Paul Kudla (SCOM.CA Internet Services Inc.) <paul at scom.ca> wrote:
>>>>
>>>>    
>>>> Agreed there seems to be no way of posting these kinds of issues to see
>>>> if they are even being addressed or even known about moving forward on
>>>> new updates
>>>>
>>>> i read somewhere there is a new branch soming out but nothing as of yet?
>>>>
>>>> 2.4 maybe ....
>>>> 5.0 ........
>>>>
>>>> my previous replication issues (back in feb) went unanswered.
>>>>
>>>> not faulting anyone, but the developers do seem to be disconnected from
>>>> issues as of late? or concentrating on other issues.
>>>>
>>>> I have no problem with support contracts for day to day maintence
>>>> however as a programmer myself they usually dont work as the other end
>>>> relies on the latest source code anyways. Thus can not help.
>>>>
>>>> I am trying to take a part the replicator c programming based on 2.3.18
>>>> as most of it does work to some extent.
>>>>
>>>> tcps just does not work (ie 600 seconds default in the c programming)
>>>>
>>>> My thoughts are tcp works ok but fails when the replicator through
>>>> dsync-client.c when asked to return the folder list?
>>>>
>>>>
>>>> replicator-brain.c seems to control the overall process and timing.
>>>>
>>>> replicator-queue.c seems to handle the que file that does seem to carry
>>>> acurate info.
>>>>
>>>>
>>>> things in the source code are documented enough to figure this out but i
>>>> am still going through all the related .h files documentation wise which
>>>> are all over the place.
>>>>
>>>> there is no clear documentation on the .h lib files so i have to walk
>>>> through the tree one at a time finding relative code.
>>>>
>>>> since the dsync from doveadm does see to work ok i have to assume the
>>>> dsync-client used to compile the replicator is at fault somehow or a
>>>> call from it upstream?
>>>>
>>>> Thanks for your input on the other issues noted below, i will keep that
>>>> in mind when disassembling the source code.
>>>>
>>>> No sense in fixing one thing and leaving something else behind, probably
>>>> all related anyways.
>>>>
>>>> i have two test servers avaliable so i can play with all this offline to
>>>> reproduce the issues
>>>>
>>>> Unfortunately I have to make a living first, this will be addressed when
>>>> possible as i dont like systems that are live running this way and
>>>> currently only have 5 accounts with this issue (mine included)
>>>>
>>>>
>>>>
>>>>
>>>> Happy Tuesday !!!
>>>> Thanks - paul
>>>>
>>>> Paul Kudla
>>>>
>>>>
>>>> Scom.ca Internet Services <http://www.scom.ca>
>>>> 004-1009 Byron Street South
>>>> Whitby, Ontario - Canada
>>>> L1N 4S3
>>>>
>>>> Toronto 416.642.7266
>>>> Main 1.866.411.7266
>>>> Fax 1.888.892.7266
>>>>
>>>> On 4/26/2022 9:03 AM, Reuben Farrelly wrote:
>>>>>
>>>>> I ran into this back in February and documented a reproducible test case
>>>>> (and sent it to this list).  In short - I was able to reproduce this by
>>>>> having a valid and consistent mailbox on the source/local, creating a
>>>>> very standard empty Maildir/(new|cur|tmp) folder on the remote replica,
>>>>> and then initiating the replicate from the source. This consistently
>>>>> caused dsync to fail replication with the error "dovecot.index reset,
>>>>> view is now inconsistent" and sync aborted, leaving the replica mailbox
>>>>> in a screwed up inconsistent state. Client connections on the source
>>>>> replica were also dropped when this error occurred.  You can see the
>>>>> error by enabling debug level logging if you initiate dsync manually on
>>>>> a test mailbox.
>>>>>
>>>>> The only workaround I found was to remove the remote Maildir and let
>>>>> Dovecot create the whole thing from scratch.  Dovecot did not like any
>>>>> existing folders on the destination replica even if they were the same
>>>>> names as the source and completely empty.  I was able to reproduce this
>>>>> the bare minimum of folders - just an INBOX!
>>>>>
>>>>> I have no idea if any of the developers saw my post or if the bug has
>>>>> been fixed for the next release.  But it seemed to be quite a common
>>>>> problem over time (saw a few posts from people going back a long way
>>>>> with the same problem) and it is seriously disruptive to clients.  The
>>>>> error message is not helpful in tracking down the problem either.
>>>>>
>>>>> Secondly, I also have had an ongoing and longstanding problem using
>>>>> tcps: for replication.  For some reason using tcps: (with no other
>>>>> changes at all to the config) results in a lot of timeout messages
>>>>> "Error: dsync I/O has stalled, no activity for 600 seconds".  This goes
>>>>> away if I revert back to tcp: instead of tcps - with tcp: I very rarely
>>>>> get timeouts.  No idea why, guess this is a bug of some sort also.
>>>>>
>>>>> It's disappointing that there appears to be no way to have these sorts
>>>>> or problems addressed like there once was.  I am not using Dovecot for
>>>>> commercial purposes so paying a fortune for a support contract for a
>>>>> high end installation just isn't going to happen, and this list seems to
>>>>> be quite ordinary for getting support and reporting bugs nowadays....
>>>>>
>>>>> Reuben
>>>>>
>>>>> On 26/04/2022 7:21 pm, Paul Kudla (SCOM.CA Internet Services Inc.) wrote:
>>>>>
>>>>>>
>>>>>> side issue
>>>>>>
>>>>>> if you are getting inconsistant dsyncs there is no real way to fix
>>>>>> this in the long run.
>>>>>>
>>>>>> i know its a pain (already had to my self)
>>>>>>
>>>>>> i needed to do a full sync, take one server offline, delete the user
>>>>>> dir (with dovecot offline) and then rsync (or somehow duplicate the
>>>>>> main server's user data) over the the remote again.
>>>>>>
>>>>>> then bring remote back up and it kind or worked worked
>>>>>>
>>>>>> best suggestion is to bring the main server down at night so the copy
>>>>>> is clean?
>>>>>>
>>>>>> if using postfix you can enable the soft bounce option and the mail
>>>>>> will back spool until everything comes back online
>>>>>>
>>>>>> (needs to be enable on bother servers)
>>>>>>
>>>>>> replication was still an issue on accounts with 300+ folders in them,
>>>>>> still working on a fix for that.
>>>>>>
>>>>>>
>>>>>> Happy Tuesday !!!
>>>>>> Thanks - paul
>>>>>>
>>>>>> Paul Kudla
>>>>>>
>>>>>>
>>>>>> Scom.ca Internet Services <http://www.scom.ca>
>>>>>> 004-1009 Byron Street South
>>>>>> Whitby, Ontario - Canada
>>>>>> L1N 4S3
>>>>>>
>>>>>> Toronto 416.642.7266
>>>>>> Main 1.866.411.7266
>>>>>> Fax 1.888.892.7266
>>>>>>
>>>>>> On 4/25/2022 10:01 AM, Arnaud Abélard wrote:
>>>>>>> Ah, I'm now getting errors in the logs, that would explains the
>>>>>>> increasing number of failed sync requests:
>>>>>>>
>>>>>>> dovecot: imap(xxxxx)<2961235><Bs6w43rdQPAqAcsFiXEmAInUhhA3Rfqh>:
>>>>>>> Error: Mailbox INBOX: /vmail/l/i/xxxxx/dovecot.index reset, view is
>>>>>>> now inconsistent
>>>>>>>
>>>>>>>
>>>>>>> And sure enough:
>>>>>>>
>>>>>>> # dovecot replicator status xxxxx
>>>>>>>
>>>>>>> xxxxx         none     00:02:54  07:11:28  -            y
>>>>>>>
>>>>>>>
>>>>>>> What could explain that error?
>>>>>>>
>>>>>>> Arnaud
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 25/04/2022 15:13, Arnaud Abélard wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> On my side we are running Linux (Debian Buster).
>>>>>>>>
>>>>>>>> I'm not sure my problem is actually the same as Paul or you
>>>>>>>> Sebastian since I have a lot of boxes but those are actually small
>>>>>>>> (quota of 110MB) so I doubt any of them have more than a dozen imap
>>>>>>>> folders.
>>>>>>>>
>>>>>>>> The main symptom is that I have tons of full sync requests awaiting
>>>>>>>> but even though no other sync is pending the replicator just waits
>>>>>>>> for something to trigger those syncs.
>>>>>>>>
>>>>>>>> Today, with users back I can see that normal and incremental syncs
>>>>>>>> are being done on the 15 connections, with an occasional full sync
>>>>>>>> here or there and lots of "Waiting 'failed' requests":
>>>>>>>>
>>>>>>>> Queued 'sync' requests        0
>>>>>>>>
>>>>>>>> Queued 'high' requests        0
>>>>>>>>
>>>>>>>> Queued 'low' requests         0
>>>>>>>>
>>>>>>>> Queued 'failed' requests      122
>>>>>>>>
>>>>>>>> Queued 'full resync' requests 28785
>>>>>>>>
>>>>>>>> Waiting 'failed' requests     4294
>>>>>>>>
>>>>>>>> Total number of known users   42512
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> So, why didn't the replicator take advantage of the weekend to
>>>>>>>> replicate the mailboxes while no user were using them?
>>>>>>>>
>>>>>>>> Arnaud
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 25/04/2022 13:54, Sebastian Marske wrote:
>>>>>>>>> Hi there,
>>>>>>>>>
>>>>>>>>> thanks for your insights and for diving deeper into this Paul!
>>>>>>>>>
>>>>>>>>> For me, the users ending up in 'Waiting for dsync to finish' all have
>>>>>>>>> more than 256 Imap folders as well (ranging from 288 up to >5500;
>>>>>>>>> as per
>>>>>>>>> 'doveadm mailbox list -u <username> | wc -l'). For more details on my
>>>>>>>>> setup please see my post from February [1].
>>>>>>>>>
>>>>>>>>> @Arnaud: What OS are you running on?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best
>>>>>>>>> Sebastian
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [1] https://dovecot.org/pipermail/dovecot/2022-February/124168.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 4/24/22 19:36, Paul Kudla (SCOM.CA Internet Services Inc.) wrote:
>>>>>>>>>>
>>>>>>>>>> Question having similiar replication issues
>>>>>>>>>>
>>>>>>>>>> pls read everything below and advise the folder counts on the
>>>>>>>>>> non-replicated users?
>>>>>>>>>>
>>>>>>>>>> i find  the total number of folders / account seems to be a factor
>>>>>>>>>> and
>>>>>>>>>> NOT the size of the mail box
>>>>>>>>>>
>>>>>>>>>> ie i have customers with 40G of emails no problem over 40 or so
>>>>>>>>>> folders
>>>>>>>>>> and it works ok
>>>>>>>>>>
>>>>>>>>>> 300+ folders seems to be the issue
>>>>>>>>>>
>>>>>>>>>> i have been going through the replication code
>>>>>>>>>>
>>>>>>>>>> no errors being logged
>>>>>>>>>>
>>>>>>>>>> i am assuming that the replication --> dhclient --> other server is
>>>>>>>>>> timing out or not reading the folder lists correctly (ie dies after X
>>>>>>>>>> folders read)
>>>>>>>>>>
>>>>>>>>>> thus i am going through the code patching for log entries etc to find
>>>>>>>>>> the issues.
>>>>>>>>>>
>>>>>>>>>> see
>>>>>>>>>>
>>>>>>>>>> [13:33:57] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot
>>>>>>>>>> # ll
>>>>>>>>>> total 86
>>>>>>>>>> drwxr-xr-x  2 root  wheel  uarch    4B Apr 24 11:11 .
>>>>>>>>>> drwxr-xr-x  4 root  wheel  uarch    4B Mar  8  2021 ..
>>>>>>>>>> -rw-r--r--  1 root  wheel  uarch   73B Apr 24 11:11 instances
>>>>>>>>>> -rw-r--r--  1 root  wheel  uarch  160K Apr 24 13:33 replicator.db
>>>>>>>>>>
>>>>>>>>>> [13:33:58] mail18.scom.ca [root:0] /usr/local/var/lib/dovecot
>>>>>>>>>> #
>>>>>>>>>>
>>>>>>>>>> replicator.db seems to get updated ok but never processed properly.
>>>>>>>>>>
>>>>>>>>>> # sync.users
>>>>>>>>>> nick at elirpa.com                   high     00:09:41 463:47:01 -     y
>>>>>>>>>> keith at elirpa.com                  high     00:09:23 463:45:43 -     y
>>>>>>>>>> paul at scom.ca                      high     00:09:41 463:46:51 -     y
>>>>>>>>>> ed at scom.ca                        high     00:09:43 463:47:01 -     y
>>>>>>>>>> ed.hanna at dssmgmt.com              high     00:09:42 463:46:58 -     y
>>>>>>>>>> paul at paulkudla.net                high     00:09:44 463:47:03
>>>>>>>>>> 580:35:07
>>>>>>>>>>       y
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> so ....
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> two things :
>>>>>>>>>>
>>>>>>>>>> first to get the production stuff to work i had to write a script
>>>>>>>>>> that
>>>>>>>>>> whould find the bad sync's and the force a dsync between the servers
>>>>>>>>>>
>>>>>>>>>> i run this every five minutes or each server.
>>>>>>>>>>
>>>>>>>>>> in crontab
>>>>>>>>>>
>>>>>>>>>> */10    *                *    *    *    root /usr/bin/nohup
>>>>>>>>>> /programs/common/sync.recover > /dev/null
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> python script to sort things out
>>>>>>>>>>
>>>>>>>>>> # cat /programs/common/sync.recover
>>>>>>>>>> #!/usr/local/bin/python3
>>>>>>>>>>
>>>>>>>>>> #Force sync between servers that are reporting bad?
>>>>>>>>>>
>>>>>>>>>> import os,sys,django,socket
>>>>>>>>>> from optparse import OptionParser
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> from lib import *
>>>>>>>>>>
>>>>>>>>>> #Sample Re-Index MB
>>>>>>>>>> #doveadm -D force-resync -u paul at scom.ca -f INBOX*
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> USAGE_TEXT = '''\
>>>>>>>>>> usage: %%prog %s[options]
>>>>>>>>>> '''
>>>>>>>>>>
>>>>>>>>>> parser = OptionParser(usage=USAGE_TEXT % '', version='0.4')
>>>>>>>>>>
>>>>>>>>>> parser.add_option("-m", "--send_to", dest="send_to", help="Send
>>>>>>>>>> Email To")
>>>>>>>>>> parser.add_option("-e", "--email", dest="email_box", help="Box to
>>>>>>>>>> Index")
>>>>>>>>>> parser.add_option("-d", "--detail",action='store_true',
>>>>>>>>>> dest="detail",default =False, help="Detailed report")
>>>>>>>>>> parser.add_option("-i", "--index",action='store_true',
>>>>>>>>>> dest="index",default =False, help="Index")
>>>>>>>>>>
>>>>>>>>>> options, args = parser.parse_args()
>>>>>>>>>>
>>>>>>>>>> print (options.email_box)
>>>>>>>>>> print (options.send_to)
>>>>>>>>>> print (options.detail)
>>>>>>>>>>
>>>>>>>>>> #sys.exit()
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> print ('Getting Current User Sync Status')
>>>>>>>>>> command = commands("/usr/local/bin/doveadm replicator status '*'")
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> #print command
>>>>>>>>>>
>>>>>>>>>> sync_user_status = command.output.split('\n')
>>>>>>>>>>
>>>>>>>>>> #print sync_user_status
>>>>>>>>>>
>>>>>>>>>> synced = []
>>>>>>>>>>
>>>>>>>>>> for n in range(1,len(sync_user_status)) :
>>>>>>>>>>            user = sync_user_status[n]
>>>>>>>>>>            print ('Processing User : %s' %user.split(' ')[0])
>>>>>>>>>>            if user.split(' ')[0] != options.email_box :
>>>>>>>>>>                    if options.email_box != None :
>>>>>>>>>>                            continue
>>>>>>>>>>
>>>>>>>>>>            if options.index == True :
>>>>>>>>>>                    command = '/usr/local/bin/doveadm -D force-resync
>>>>>>>>>> -u %s
>>>>>>>>>> -f INBOX*' %user.split(' ')[0]
>>>>>>>>>>                    command = commands(command)
>>>>>>>>>>                    command = command.output
>>>>>>>>>>
>>>>>>>>>>            #print user
>>>>>>>>>>            for nn in range (len(user)-1,0,-1) :
>>>>>>>>>>                    #print nn
>>>>>>>>>>                    #print user[nn]
>>>>>>>>>>
>>>>>>>>>>                    if user[nn] == '-' :
>>>>>>>>>>                            #print 'skipping ... %s' %user.split(' ')[0]
>>>>>>>>>>
>>>>>>>>>>                            break
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                    if user[nn] == 'y': #Found a Bad Mailbox
>>>>>>>>>>                            print ('syncing ... %s' %user.split(' ')[0])
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                            if options.detail == True :
>>>>>>>>>>                                    command = '/usr/local/bin/doveadm -D
>>>>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]
>>>>>>>>>>                                    print (command)
>>>>>>>>>>                                    command = commands(command)
>>>>>>>>>>                                    command = command.output.split('\n')
>>>>>>>>>>                                    print (command)
>>>>>>>>>>                                    print ('Processed Mailbox for ...
>>>>>>>>>> %s'
>>>>>>>>>> %user.split(' ')[0] )
>>>>>>>>>>                                    synced.append('Processed Mailbox
>>>>>>>>>> for ...
>>>>>>>>>> %s' %user.split(' ')[0])
>>>>>>>>>>                                    for nnn in range(len(command)):
>>>>>>>>>> synced.append(command[nnn] + '\n')
>>>>>>>>>>                                    break
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                            if options.detail == False :
>>>>>>>>>>                                    #command =
>>>>>>>>>> '/usr/local/bin/doveadm -D
>>>>>>>>>> sync -u %s -d -N -l 30 -U' %user.split(' ')[0]
>>>>>>>>>>                                    #print (command)
>>>>>>>>>>                                    #command = os.system(command)
>>>>>>>>>>                                    command = subprocess.Popen(
>>>>>>>>>> ["/usr/local/bin/doveadm sync -u %s -d -N -l 30 -U" %user.split('
>>>>>>>>>> ')[0]
>>>>>>>>>> ], \
>>>>>>>>>>                                    shell = True, stdin=None,
>>>>>>>>>> stdout=None,
>>>>>>>>>> stderr=None, close_fds=True)
>>>>>>>>>>
>>>>>>>>>>                                    print ( 'Processed Mailbox for
>>>>>>>>>> ... %s'
>>>>>>>>>> %user.split(' ')[0] )
>>>>>>>>>>                                    synced.append('Processed Mailbox
>>>>>>>>>> for ...
>>>>>>>>>> %s' %user.split(' ')[0])
>>>>>>>>>>                                    #sys.exit()
>>>>>>>>>>                                    break
>>>>>>>>>>
>>>>>>>>>> if len(synced) != 0 :
>>>>>>>>>>            #send email showing bad synced boxes ?
>>>>>>>>>>
>>>>>>>>>>            if options.send_to != None :
>>>>>>>>>>                    send_from = 'monitor at scom.ca'
>>>>>>>>>>                    send_to = ['%s' %options.send_to]
>>>>>>>>>>                    send_subject = 'Dovecot Bad Sync Report for : %s'
>>>>>>>>>> %(socket.gethostname())
>>>>>>>>>>                    send_text = '\n\n'
>>>>>>>>>>                    for n in range (len(synced)) :
>>>>>>>>>>                            send_text = send_text + synced[n] + '\n'
>>>>>>>>>>
>>>>>>>>>>                    send_files = []
>>>>>>>>>>                    sendmail (send_from, send_to, send_subject,
>>>>>>>>>> send_text,
>>>>>>>>>> send_files)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> sys.exit()
>>>>>>>>>>
>>>>>>>>>> second :
>>>>>>>>>>
>>>>>>>>>> i posted this a month ago - no response
>>>>>>>>>>
>>>>>>>>>> please appreciate that i am trying to help ....
>>>>>>>>>>
>>>>>>>>>> after much testing i can now reporduce the replication issues at hand
>>>>>>>>>>
>>>>>>>>>> I am running on freebsd 12 & 13 stable (both test and production
>>>>>>>>>> servers)
>>>>>>>>>>
>>>>>>>>>> sdram drives etc ...
>>>>>>>>>>
>>>>>>>>>> Basically replication works fine until reaching a folder quantity
>>>>>>>>>> of ~
>>>>>>>>>> 256 or more
>>>>>>>>>>
>>>>>>>>>> to reproduce using doveadm i created folders like
>>>>>>>>>>
>>>>>>>>>> INBOX/folder-0
>>>>>>>>>> INBOX/folder-1
>>>>>>>>>> INBOX/folder-2
>>>>>>>>>> INBOX/folder-3
>>>>>>>>>> and so forth ......
>>>>>>>>>>
>>>>>>>>>> I created 200 folders and they replicated ok on both servers
>>>>>>>>>>
>>>>>>>>>> I created another 200 (400 total) and the replicator got stuck and
>>>>>>>>>> would
>>>>>>>>>> not update the mbox on the alternate server anymore and is still
>>>>>>>>>> updating 4 days later ?
>>>>>>>>>>
>>>>>>>>>> basically replicator goes so far and either hangs or more likely
>>>>>>>>>> bails
>>>>>>>>>> on an error that is not reported to the debug reporting ?
>>>>>>>>>>
>>>>>>>>>> however dsync will sync the two servers but only when run manually
>>>>>>>>>> (ie
>>>>>>>>>> all the folders will sync)
>>>>>>>>>>
>>>>>>>>>> I have two test servers avaliable if you need any kind of access -
>>>>>>>>>> again
>>>>>>>>>> here to help.
>>>>>>>>>>
>>>>>>>>>> [07:28:42] mail18.scom.ca [root:0] ~
>>>>>>>>>> # sync.status
>>>>>>>>>> Queued 'sync' requests        0
>>>>>>>>>> Queued 'high' requests        6
>>>>>>>>>> Queued 'low' requests         0
>>>>>>>>>> Queued 'failed' requests      0
>>>>>>>>>> Queued 'full resync' requests 0
>>>>>>>>>> Waiting 'failed' requests     0
>>>>>>>>>> Total number of known users   255
>>>>>>>>>>
>>>>>>>>>> username                       type        status
>>>>>>>>>> paul at scom.ca                   normal      Waiting for dsync to
>>>>>>>>>> finish
>>>>>>>>>> keith at elirpa.com               incremental Waiting for dsync to
>>>>>>>>>> finish
>>>>>>>>>> ed.hanna at dssmgmt.com           incremental Waiting for dsync to
>>>>>>>>>> finish
>>>>>>>>>> ed at scom.ca                     incremental Waiting for dsync to
>>>>>>>>>> finish
>>>>>>>>>> nick at elirpa.com                incremental Waiting for dsync to
>>>>>>>>>> finish
>>>>>>>>>> paul at paulkudla.net             incremental Waiting for dsync to
>>>>>>>>>> finish
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> i have been going through the c code and it seems the replication
>>>>>>>>>> gets
>>>>>>>>>> requested ok
>>>>>>>>>>
>>>>>>>>>> replicator.db does get updated ok with the replicated request for the
>>>>>>>>>> mbox in question.
>>>>>>>>>>
>>>>>>>>>> however i am still looking for the actual replicator function in the
>>>>>>>>>> lib's that do the actual replication requests
>>>>>>>>>>
>>>>>>>>>> the number of folders & subfolders is defanately the issue - not the
>>>>>>>>>> mbox pyhsical size as thought origionally.
>>>>>>>>>>
>>>>>>>>>> if someone can point me in the right direction, it seems either the
>>>>>>>>>> replicator is not picking up on the number of folders to replicat
>>>>>>>>>> properly or it has a hard set limit like 256 / 512 / 65535 etc and
>>>>>>>>>> stops
>>>>>>>>>> the replication request thereafter.
>>>>>>>>>>
>>>>>>>>>> I am mainly a machine code programmer from the 80's and have
>>>>>>>>>> concentrated on python as of late, 'c' i am starting to go through
>>>>>>>>>> just
>>>>>>>>>> to give you a background on my talents.
>>>>>>>>>>
>>>>>>>>>> It took 2 months to finger this out.
>>>>>>>>>>
>>>>>>>>>> this issue also seems to be indirectly causing the duplicate messages
>>>>>>>>>> supression not to work as well.
>>>>>>>>>>
>>>>>>>>>> python programming to reproduce issue (loops are for last run
>>>>>>>>>> started @
>>>>>>>>>> 200 - fyi) :
>>>>>>>>>>
>>>>>>>>>> # cat mbox.gen
>>>>>>>>>> #!/usr/local/bin/python2
>>>>>>>>>>
>>>>>>>>>> import os,sys
>>>>>>>>>>
>>>>>>>>>> from lib import *
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> user = 'paul at paulkudla.net'
>>>>>>>>>>
>>>>>>>>>> """
>>>>>>>>>> for count in range (0,600) :
>>>>>>>>>>            box = 'INBOX/folder-%s' %count
>>>>>>>>>>            print count
>>>>>>>>>>            command = '/usr/local/bin/doveadm mailbox create -s -u %s
>>>>>>>>>> %s'
>>>>>>>>>> %(user,box)
>>>>>>>>>>            print command
>>>>>>>>>>            a = commands.getoutput(command)
>>>>>>>>>>            print a
>>>>>>>>>> """
>>>>>>>>>>
>>>>>>>>>> for count in range (0,600) :
>>>>>>>>>>            box = 'INBOX/folder-0/sub-%' %count
>>>>>>>>>>            print count
>>>>>>>>>>            command = '/usr/local/bin/doveadm mailbox create -s -u %s
>>>>>>>>>> %s'
>>>>>>>>>> %(user,box)
>>>>>>>>>>            print command
>>>>>>>>>>            a = commands.getoutput(command)
>>>>>>>>>>            print a
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>            #sys.exit()
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Happy Sunday !!!
>>>>>>>>>> Thanks - paul
>>>>>>>>>>
>>>>>>>>>> Paul Kudla
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Scom.ca Internet Services <http://www.scom.ca>
>>>>>>>>>> 004-1009 Byron Street South
>>>>>>>>>> Whitby, Ontario - Canada
>>>>>>>>>> L1N 4S3
>>>>>>>>>>
>>>>>>>>>> Toronto 416.642.7266
>>>>>>>>>> Main 1.866.411.7266
>>>>>>>>>> Fax 1.888.892.7266
>>>>>>>>>>
>>>>>>>>>> On 4/24/2022 10:22 AM, Arnaud Abélard wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> I am working on replicating a server (and adding compression on the
>>>>>>>>>>> other side) and since I had "Error: dsync I/O has stalled, no
>>>>>>>>>>> activity
>>>>>>>>>>> for 600 seconds (version not received)" errors I upgraded both
>>>>>>>>>>> source
>>>>>>>>>>> and destination server with the latest 2.3 version (2.3.18). While
>>>>>>>>>>> before the upgrade all the 15 replication connections were busy
>>>>>>>>>>> after
>>>>>>>>>>> upgrading dovecot replicator dsync-status shows that most of the
>>>>>>>>>>> time
>>>>>>>>>>> nothing is being replicated at all. I can see some brief
>>>>>>>>>>> replications
>>>>>>>>>>> that last, but 99,9% of the time nothing is happening at all.
>>>>>>>>>>>
>>>>>>>>>>> I have a replication_full_sync_interval of 12 hours but I have
>>>>>>>>>>> thousands of users with their last full sync over 90 hours ago.
>>>>>>>>>>>
>>>>>>>>>>> "doveadm replicator status" also shows that i have over 35,000
>>>>>>>>>>> queued
>>>>>>>>>>> full resync requests, but no sync, high or low queued requests so
>>>>>>>>>>> why
>>>>>>>>>>> aren't the full requests occuring?
>>>>>>>>>>>
>>>>>>>>>>> There are no errors in the logs.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Arnaud
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
> 


More information about the dovecot mailing list