[Dovecot] force-resync fails to recover all messages in mdbox

Lauri Alanko la at iki.fi
Tue Jan 31 18:34:45 EET 2012


Quoting "Timo Sirainen" <tss at iki.fi>:
> Try instead with:
>
> doveadm fetch guid all | sort | uniq | wc
>
> When you removed indexes Dovecot no longer knew about copies of messages.

Well, well, well. This is interesting. Back with the indices
created by dsync:

$ doveadm fetch guid all | grep guid: | sort | uniq -c | sort -n | tail
      17 guid: 1b28b22d4b2ee2885b5b81221c41201d
      17 guid: 730c692395661dd62f82088804b85652
      17 guid: 865e1537fddba6698e010d0b9dbddd02
      17 guid: d271b6ba8af0e7fa39c16ea8ed13abcf
      17 guid: d2cd391e837cf51cc85991bde814dc54
      17 guid: ebce8373da6ffb134b58aca7906d61f1
      18 guid: 1222b6c222ecb53fdbbec407400cba36
      18 guid: 65695586efc69adc2d7294216ea88e55
      19 guid: 4288f61ebbdcd44870c670439a97693b
      20 guid: 080ec72aa49e2a01c8e249fe127605f6

This would explain why rebuilding the indices reduced the number
of messages. However, those guid assignments seem really weird,
because:

$ doveadm fetch hdr guid 080ec72aa49e2a01c8e249fe127605f6 | grep -i  
'^Message-ID: '
Message-ID: <4B1ACA53.7040503 at rkit.pp.ru>
Message-ID: <29bf512f0912051251u74d246afxafdfb9e5ea24342c at mail.gmail.com>
Message-ID: <5e0214850912051300r3ebd0e44n61a4d6e020c94f4c at mail.gmail.com>
Message-ID: <4B1ACD40.3040507 at btinternet.com>
Message-Id: <200912052220.00317.daniel.is.fischer at web.de>
Message-Id: <200912052225.28597.daniel.is.fischer at web.de>
Message-ID: <20091205212848.GA23711 at seas.upenn.edu>
Message-Id: <200912051336.13792.hgolden at socal.rr.com>
Message-Id: <200912052243.03144.daniel.is.fischer at web.de>
Message-Id: <0B59A706-8C41-47B9-A858-5ACE297581E1 at cs.uu.nl>
Message-ID: <20091205215707.GA6161 at protagoras.phil.berkeley.edu>
Message-ID: <471726.55822.qm at web113106.mail.gq1.yahoo.com>
Message-ID: <4B1AD7FB.8050704 at btinternet.com>
Message-ID: <5fdc56d70912051400h663a25a9w4f9b2e065a5b395e at mail.gmail.com>
Message-Id: <1B613EE3-B4F8-4F6E-8A36-74BACF0C86FC at yandex.ru>
Message-ID: <4B1ADA0E.5070207 at btinternet.com>
Message-Id: <36C40624-B050-4A8C-8CAF-F15D84467180 at phys.washington.edu>
Message-ID: <SNT119-W313697775F905AE968566CC6920 at phx.gbl>
Message-id:  
<alpine.DEB.2.00.0912052309170.31599 at anubis.informatik.uni-halle.de>
Message-ID: <29bf512f0912051423safd7842ka39c8b8b6dee1ac0 at mail.gmail.com>

So all these completely unrelated messages have somehow received
the same guid? And that guid is stored even in the storage files
themselves so they cannot be cleaned up even with force-resync?
Something is _seriously_ wrong.

The complexity and opaqueness of the mdbox format is a worrisome.
It would ease my mind quite a bit if there were a simple tool
that would just dump out the plain message contents that are
stored inside the storage files, without involving any of
dovecot's index machinery. Then I would at least know that
whatever happens, as long as the storage files stay intact, I can
always migrate my mails into some other format.


Lauri




More information about the dovecot mailing list