[Dovecot] Problems with POP3 UIDL when migrating from MBOX to Maildir
Hello,
I am in the process of migrating a hosting setup from UW-IMAP to
Dovecot. The protocols available to mail clients are IMAP and POP3
both before and after migration. I also wanted to change the mail
storage format from MBOX to Maildir. However, at this point I hit a
major snag with Dovecot and POP3 UIDL (unique identification listing
for POP3 mailboxes).
Several of the customers connecting to the mailserver in question are
using POP3 clients and leave their e-mails on the server. To avoid
downloading all messages every time the clients connect they use the
UIDL command in the POP3 specification to determine if any new
messages have arrived since last time they checked.
In order for this to work the unique identifier of each message must
be static and can never change. If the identifiers change, every
client will download all messages again, mixing old messages with new,
in the belief that all messages are new since their identifiers
changed.
The first step, migrating from UW-IMAP to Dovecot, did not pose any
problems in this regard. Connecting to the POP3 service manually to
show the UIDL output revealed:
With UW-IMAP:
uidl
+OK Unique-ID listing follows
1 43ef424e00000001
2 43ef424e00000002
With Dovecot:
uidl
+OK
1 43ef424e00000001
2 43ef424e00000002
However, after performing a migration to Maildir, storing all messages
in ~/Maildir for the user in question, I received a different UIDL
output:
uidl
+OK
1 43ef523100000001
2 43ef523100000002
I used the mb2md utility available from
http://batleth.sapienti-sat.org/projects/mb2md/ to do the conversion.
A diff between the new Maildir messages and the old mailbox reveals no
other difference than the leading "From " line of each message missing
from the Maildir messages.
----------- snip ------------------------------
--- /var/mail/testuser 2006-02-12 15:13:01.000000000 +0100
+++ maildir-concat.txt 2006-02-12 15:21:54.000000000 +0100
@@ -1,4 +1,3 @@
-From test@example.com Sun Feb 12 15:09:49 2006
Return-Path:
On Sun, 2006-02-12 at 16:20 +0100, Jonas Olsson wrote:
Questions:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
Is there any workaround that I could use to make Dovecot generate a consistent POP3 UIDL output, enabling a migration from MBOX to Maildir, or am I stuck with using MBOX?
You can convert your mails in a way that preserves the UIDVALIDITY and UIDs. In mboxes the UIDVALIDITY is stored in X-IMAPbase or X-IMAP header in the first mail in the mailbox. UIDs are stored in X-UID header for each mail.
So what you'd need to do is create some script to build the dovecot-uidlist file properly. As far as I know no-one's done this yet, but it shouldn't be too difficult.
dovecot-uidlist file format is:
<uidvalidity> <next-unused-uid> <uid> <filename> <next uid> <next filename> ..
You should also set next-unused-uid value properly. It's also in X-IMAPbase/X-IMAP header as the next number after UIDVALIDITY.
On Sun, Feb 12, 2006 at 10:44:54PM +0200, Timo Sirainen wrote:
On Sun, 2006-02-12 at 16:20 +0100, Jonas Olsson wrote:
Questions:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
More generically there is also the pop3_reuse_xuidl option which I've used extensively in migration scripts from legacy POP3 servers; the scripts actually fetch from the legacy store using POP3 UIDL and just insert the header when spewing it to the dovecot store.
see http://wiki.dovecot.org/Migration#head-99ef10c69108e731db56d7b83b51a9fdb1152...
joshua
-- Josh "Koshua" Goodall "as modern as tomorrow afternoon" joshua@roughtrade.net - FW109
On Mon, Feb 13, 2006 at 10:55:22AM +1100, Joshua Goodall wrote:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
More generically there is also the pop3_reuse_xuidl option which I've used extensively in migration scripts from legacy POP3 servers; the scripts actually fetch from the legacy store using POP3 UIDL and just insert the header when spewing it to the dovecot store.
see http://wiki.dovecot.org/Migration#head-99ef10c69108e731db56d7b83b51a9fdb1152...
the only side effect of using this feature is that you need to leave it turned on, even after users have POP'd into their mailbox because if you turn it off and are using an X-UIDL format such as the default, Dovecot knows that it can generate it on the fly and does so, even if the X-UIDL for a given message is present in the index.
the upshot of this is that every (new) message in a mailbox is read on each pop3 connection, looking for an X-UIDL header (that won't be there), resulting in higher disk IO than necessary.
I need to hack on the code to always use the X-UIDL from the index if present, then turn off pop3_reuse_xuidl and eliminate a bunch of disk IO.
grant.
On Mon, Feb 13, 2006 at 11:22:26AM +1100, grant beattie wrote:
the only side effect of using this feature is that you need to leave it turned on, even after users have POP'd into their mailbox because if you turn it off and are using an X-UIDL format such as the default, Dovecot knows that it can generate it on the fly and does so, even if the X-UIDL for a given message is present in the index.
the upshot of this is that every (new) message in a mailbox is read on each pop3 connection, looking for an X-UIDL header (that won't be there), resulting in higher disk IO than necessary.
I need to hack on the code to always use the X-UIDL from the index if present, then turn off pop3_reuse_xuidl and eliminate a bunch of disk IO.
That optimisation may be unnecessary: the common case is going to be that new messages will be read, indexed, downloaded in one go. It'll all be in buffer cache for that time so I wouldn't expect any additional I/O on a sane OS; just more syscalls by Dovecot to get the data.
In particular, no extra disk seek (which is what kills us). Although I confess I'm thinking mostly of maildirs here.
/k
-- Josh "Koshua" Goodall "as modern as tomorrow afternoon" joshua@roughtrade.net - FW109
2006/2/12, Timo Sirainen tss@iki.fi:
On Sun, 2006-02-12 at 16:20 +0100, Jonas Olsson wrote:
Questions:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
Is there any workaround that I could use to make Dovecot generate a consistent POP3 UIDL output, enabling a migration from MBOX to Maildir, or am I stuck with using MBOX?
You can convert your mails in a way that preserves the UIDVALIDITY and UIDs. In mboxes the UIDVALIDITY is stored in X-IMAPbase or X-IMAP header in the first mail in the mailbox. UIDs are stored in X-UID header for each mail.
So what you'd need to do is create some script to build the dovecot-uidlist file properly. As far as I know no-one's done this yet, but it shouldn't be too difficult.
dovecot-uidlist file format is:
<uidvalidity> <next-unused-uid> <uid> <filename> <next uid> <next filename> ..
You should also set next-unused-uid value properly. It's also in X-IMAPbase/X-IMAP header as the next number after UIDVALIDITY.
A quick lock through the current mailboxes reveals that 28 of them contain messages but lack an X-IMAPbase/X-IMAP header. A large majority of the mailboxes also lack any X-UID headers.
I can get the UIDVALIDITY base of the the mailboxes missing an X-IMAP header by connecting to the Dovecot POP3 service, I guess, but how about all the mailboxes lacking X-UID headers? Is there a way I can determine, in a scriptable way, which message has which UID or should I assume that the UIDs are assigned in chronological order based on the next-UID value in the X-IMAP header? That is, the last message in the MBOX file has UID = next-UID - 1, the next to last = next-UID - 2, and so on?
Thank you for your replies by the way. They are very helpful in making this migration happen.
-- /Jonas Olsson
On Mon, 2006-02-13 at 09:02 +0100, Jonas Olsson wrote:
You should also set next-unused-uid value properly. It's also in X-IMAPbase/X-IMAP header as the next number after UIDVALIDITY.
A quick lock through the current mailboxes reveals that 28 of them contain messages but lack an X-IMAPbase/X-IMAP header. A large majority of the mailboxes also lack any X-UID headers.
If they don't have the headers, then it most likely means that POP3 client has downloaded and deleted all the mails, and the mails without X-UID headers are new ones that the client has never seen yet.
I can get the UIDVALIDITY base of the the mailboxes missing an X-IMAP header by connecting to the Dovecot POP3 service, I guess, but how about all the mailboxes lacking X-UID headers? Is there a way I can determine, in a scriptable way, which message has which UID or should I assume that the UIDs are assigned in chronological order based on the next-UID value in the X-IMAP header? That is, the last message in the MBOX file has UID = next-UID - 1, the next to last = next-UID - 2, and so on?
No. If a message doesn't have X-UID header, then it should be given next-uid UID. The next message should then be given next-uid+1, etc. And this should be done even if the message contains already X-UID header, since it might have been sent remotely and it would be broken in that case.
But anyway, those messages without X-UID shouldn't have been seen by POP3 clients yet, so you wouldn't necessarily have to put them into dovecot-uidlist at all, just let Dovecot give them new UIDs.
participants (4)
-
grant beattie
-
Jonas Olsson
-
Joshua Goodall
-
Timo Sirainen