[Dovecot] repeating dsync - questions

Bob Gustafson bobgus at rcn.com
Mon Apr 8 23:48:02 EEST 2013


OK, success: see timings (real nn) below commands.

Initial copy of Maildir from live system to test sys (14G of data)

rsync -ar --times hoho4:/home/bobgus/Maildir/ /home/bobgus/Maildir
real 37m

Then 1st 'dsync -R backup maildir:~/Maildir'
real 828m

Then 2nd rsync to pick up new mail - **don't touch existing files**

rsync -ar --times --ignore-existing
hoho4:/home/bobgus/Maildir/ /home/bobgus/Maildir
real 3m

Then 2nd 'dsync -R backup maildir:~/Maildir'
real 12m

The --ignore-existing option on the 2nd rsync allows dsync to process
the additional emails in a reasonable amount of time.

The dovecot-uidlist which is modified in the Maildir by dsync is not
written over by the 2nd rsync and therefore the 2nd dsync just processes
the added messages. (There were no deletes between rsync runs)

Thanks much for your hints and comments.

Bob G


On Mon, 2013-04-08 at 00:53 +0300, Timo Sirainen wrote:
> On 8.4.2013, at 0.10, Bob Gustafson <bobgus at rcn.com> wrote:
> 
> >>> I am still on my quest for a quick way to move mail from a live Maildir
> >>> system to a 'soon to be live' sdbox system.
> >>> 
> >>> I copy Maildir to new system using:
> >>> rsync -ar --times hoho4:/home/bobgus/Maildir/ /home/bobgus/Maildir
> >>> 
> >>> Then I convert from Maildir to sdbox with:
> >>> dsync mirror maildir:~/Maildir
> >>> 
> >>> Then I copy more messages from live system using rsync
> >> 
> >> ^^ that is the mistake
> > 
> > I guess I have a basic misunderstanding of what 'dsync mirror' is doing.
> > 
> > My understanding is that going from Maildir to sdbox, dsync does not
> > mess with the data in Maildir. The Maildir metadata is in one form and
> > the sdbox metadata is in another form (in the sdbox directory).
> 
> dsync does mess with metadata in the maildir. also with dsync mirror (as opposed to dsync backup) it can also modify the contents. The main problem here is:
> 
> 1. dsync sees that a folder A in maildir doesn't have a GUID (because dsync is just about the only tool that uses it right now), and assigns the mailbox a new GUID
> 2. dsync syncs the mailbox to sdbox with that GUID
> 3. rsync comes and wipes out the maildir-uidlist that contained the GUID
> 4. second dsync sees that folder A in maildir doesn't have a GUID, and assigns a new GUID to it
>  - now maildir has folder A with GUID 1, and sdbox has folder A with GUID 2
>  - dsync thinks they are two different folders, and duplicates them as A and A_2. the A_2 gets also copied back to maildir, because you're using dsync mirror. This is why the second dsync is slow, it's actually doing all the work again, and actually doing twice the work since it's copying the mails from sdbox to maildir as well.
> 
> v2.2 dsync is somewhat smarter and can figure out that they are actually the same folder A and it simply changes the other's GUID instead of duplicating all data.
> 
> > No new email messages enter the sdbox system to be 'mirrored' to the
> > Maildir system.
> > 
> > I thought of using the 'dsync backup' command, but the sentence "Any
> > changes done in destination are discarded." seems to indicate that each
> > time 'dsync backup' is done, it starts from the beginning. No
> > incremental backup (but this is done in 2.2 ?)
> 
> dsync backup is incremental. it just wipes out any changes done at the other side (if there happens to be any).
> 
> >>> Then I do the 'dsync mirror maildir:~/Maildir' again
> >>> 
> >>> There were only a few messages that were copied over in the 2nd rsync
> >>> pass and it went quickly, but the 2nd dsync pass is taking a long time.
> >> 
> >> The second rsync is overwriting all the metadata changes (mailbox GUIDs
> >> most importantly) that the first dsync run did.
> > 
> > Why does dsync mess with the Maildir metadata? Won't that just confuse
> > the dovecot running on the Maildir system?
> 
> Incremental dsync doesn't work (well) without additional metadata.




More information about the dovecot mailing list