[Dovecot] force-resync fails to recover all messages in mdbox

Timo Sirainen tss at iki.fi
Thu Feb 9 03:04:04 EET 2012

On 31.1.2012, at 18.34, Lauri Alanko wrote:

> Well, well, well. This is interesting. Back with the indices
> created by dsync:
> $ doveadm fetch guid all | grep guid: | sort | uniq -c | sort -n | tail
>     17 guid: 1b28b22d4b2ee2885b5b81221c41201d
>     17 guid: 730c692395661dd62f82088804b85652
>     17 guid: 865e1537fddba6698e010d0b9dbddd02

http://hg.dovecot.org/dovecot-2.0/rev/4a0b7dec3a22 avoids force-resync deleting these duplicates. It also logs a warning about the duplicates.

http://hg.dovecot.org/dovecot-2.1/rev/2500de8f1f51 implements mbox_md5=all setting which avoids creation of these duplicates in the first place. I thought about adding some duplicate detection also to dsync (or anywhere in its path), but I couldn't do it without impacting performance in normal operation.

> The complexity and opaqueness of the mdbox format is a worrisome.
> It would ease my mind quite a bit if there were a simple tool
> that would just dump out the plain message contents that are
> stored inside the storage files, without involving any of
> dovecot's index machinery. Then I would at least know that
> whatever happens, as long as the storage files stay intact, I can
> always migrate my mails into some other format.

By using Dovecot indexes you could use e.g. "doveadm fetch" to dump them. Also "doveadm dump" can dump the dbox files' metadata, but not the message contents themselves. It probably wouldn't be difficult to implement that though. Also alternatively you could build something based on http://dovecot.org/tools/mdbox-obfuscate.pl

