[Dovecot] dsync replication errors

Michael Grimm trashcan at odo.in-berlin.de
Sat Feb 16 20:26:50 EET 2013


Timo Sirainen <tss at iki.fi> wrote:

> I did a bunch of dsync fixes today in hg. With the new locking
> behavior (and other fixes) you shouldn't be able to break it anymore.

Sorry to say, but I am still able to break replicator with v2.2.beta1
(35194cf0693e) under the conditions outlined below.

> On 2013-02-01 Michael Grimm wrote:
> 
>> This is v2.2.beta1 (78bdcb6642c7) with freshly created mailboxes "test"
>> at both servers "mx1" and "mx2", and replicator uses ssh for remote
>> access. Both servers run a recent postfix, use lmtp for local delivery,
>> and "test" is a virtual user.

I might add that both servers run inside FreeBSD jails (if that might make
the difference to your test setup.

>> All tests are run with vanilla mailboxes, after restarting dovecot, and
>> without imap connections by MUA:

This time I did even restart both service jails before every test. And, I
did use both Mail.app and roundcube as MUA to check the results (if Mail.app
might have screwed INBOX ...)

>> 1) Simultaneous mailbomb approach: run both scripts simultaneously, and
>>   you'll end up with numerous duplicates in mailboxes "test". Very often
>>   you'll find multiples.

Still a lot of duplicates and multiples. Those numbers are not reproducable,
240 (best case) up to 340 (worst case) instead of 200 messages (after 10 
tests).
 
Here is one logfile example of a triplicated mail injected at mx1:

logfile at mx1:
| Feb 16 19:03:12 <mail.info> mx1 postfix/pickup[33958]: 3Z7fMh1PYMz5Ng: uid=0 from=<root>
| Feb 16 19:03:12 <mail.info> mx1 postfix/cleanup[34320]: 3Z7fMh1PYMz5Ng: message-id=<3Z7fMh1PYMz5Ng at test.mx1.invalid>
| Feb 16 19:03:12 <mail.info> mx1 postfix/qmgr[33959]: 3Z7fMh1PYMz5Ng: from=<root at mx1.invalid>, size=310, nrcpt=1 (queue active)
| Feb 16 19:03:12 <mail.info> mx1 dovecot: lmtp(34456, test): copy from : box=INBOX, uid=12, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=()
| Feb 16 19:03:12 <mail.info> mx1 dovecot: lmtp(34456, test): nVlIDeDJH1GYhgAAag1aAg: sieve: msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>: stored mail into mailbox 'INBOX'
| Feb 16 19:03:12 <mail.info> mx1 postfix/lmtp[34453]: 3Z7fMh1PYMz5Ng: to=<test at mx1.invalid>, orig_to=<tttttt at mx1.invalid>, relay=test.mx1.invalid[private/dovecot-lmtp], delay=0.29, delays=0.08/0/0/0.21, dsn=2.0.0, status=sent (250 2.0.0 <test at mx1.invalid> nVlIDeDJH1GYhgAAag1aAg Saved)
| Feb 16 19:03:12 <mail.info> mx1 postfix/qmgr[33959]: 3Z7fMh1PYMz5Ng: removed
| Feb 16 19:03:13 <mail.info> mx1 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=42, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=()
| Feb 16 19:03:13 <mail.info> mx1 dovecot: dsync-local(test): expunge: box=INBOX, uid=12, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=(\Recent)
| Feb 16 19:03:16 <mail.info> mx1 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=164, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=()
| Feb 16 19:03:16 <mail.info> mx1 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=263, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=()
| Feb 16 19:03:16 <mail.info> mx1 dovecot: dsync-local(test): expunge: box=INBOX, uid=118, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=(\Recent)
| Feb 16 19:03:16 <mail.info> mx1 dovecot: dsync-local(test): expunge: box=INBOX, uid=42, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=(\Recent)

after reading those three messages at mx1:
| Feb 16 19:04:22 <mail.info> mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: flag_change: box=INBOX, uid=372, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=(\Seen \Recent)
| Feb 16 19:05:40 <mail.info> mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: flag_change: box=INBOX, uid=263, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=(\Seen \Recent)
| Feb 16 19:05:41 <mail.info> mx1 dovecot: imap(test) hQjfUNvVPwBd3Cqw: flag_change: box=INBOX, uid=164, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=(\Seen \Recent)

logfile at mx2:
| Feb 16 19:03:13 <mail.info> mx2 dovecot: dsync-local(test): save: box=INBOX, uid=50, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=()
| Feb 16 19:03:17 <mail.info> mx2 dovecot: dsync-local(test): copy from INBOX: box=INBOX, uid=372, msgid=<3Z7fMh1PYMz5Ng at test.mx1.invalid>, size=544, from=root at mx1.invalid (admin), flags=()

>> 2) Mailbomb approach: run one script at one server only, and all mails
>>   will become perfectly well synchronised.

Same results here.

>> 3) Modify both scripts to "( $INDEX % 1 == 0 )" to add a second waiting
>>   between every mail injection, and run them simultaneously at both
>>   servers, and you'll end up with significantly less duplicates and no
>>   more multiples.

Same results here.

Good: I cannot find any "Error:" entries in both logfiles any longer.

Regards,
Michael




More information about the dovecot mailing list