[Dovecot] Problems with POP3 UIDL when migrating from MBOX to Maildir
Hello, I am in the process of migrating a hosting setup from UW-IMAP to Dovecot. The protocols available to mail clients are IMAP and POP3 both before and after migration. I also wanted to change the mail storage format from MBOX to Maildir. However, at this point I hit a major snag with Dovecot and POP3 UIDL (unique identification listing for POP3 mailboxes). Several of the customers connecting to the mailserver in question are using POP3 clients and leave their e-mails on the server. To avoid downloading all messages every time the clients connect they use the UIDL command in the POP3 specification to determine if any new messages have arrived since last time they checked. In order for this to work the unique identifier of each message must be static and can never change. If the identifiers change, every client will download all messages again, mixing old messages with new, in the belief that all messages are new since their identifiers changed. The first step, migrating from UW-IMAP to Dovecot, did not pose any problems in this regard. Connecting to the POP3 service manually to show the UIDL output revealed: With UW-IMAP: uidl +OK Unique-ID listing follows 1 43ef424e00000001 2 43ef424e00000002 With Dovecot: uidl +OK 1 43ef424e00000001 2 43ef424e00000002 However, after performing a migration to Maildir, storing all messages in ~/Maildir for the user in question, I received a different UIDL output: uidl +OK 1 43ef523100000001 2 43ef523100000002 I used the mb2md utility available from http://batleth.sapienti-sat.org/projects/mb2md/ to do the conversion. A diff between the new Maildir messages and the old mailbox reveals no other difference than the leading "From " line of each message missing from the Maildir messages. ----------- snip ------------------------------ --- /var/mail/testuser 2006-02-12 15:13:01.000000000 +0100 +++ maildir-concat.txt 2006-02-12 15:21:54.000000000 +0100 @@ -1,4 +1,3 @@ -From test@example.com Sun Feb 12 15:09:49 2006 Return-Path: <test@example.com> X-Original-To: testuser Delivered-To: testuser@localhost.localdomain @@ -16,7 +15,6 @@ Testing 1 -From test@example.com Sun Feb 12 15:10:23 2006 Return-Path: <test@example.com> X-Original-To: testuser Delivered-To: testuser@localhost.localdomain ----------- snip ------------------------------ When removing the ~/Maildir directory of the user to redo the conversion I discovered that Dovecot returned another set of UIDL output this time around: uidl +OK 1 43ef528300000001 2 43ef528300000002 This output is for the same messages as in the previous example, with preserved message filenames. Doing a diff between the old ~/Maildir and the new reveals the following differences: ----------- snip ------------------------------ $ diff -ur Maildir.example1 Maildir.example2 Binary files Maildir.example1/dovecot.index and Maildir.example2/dovecot.index differ Binary files Maildir.example1/dovecot.index.cache and Maildir.example2/dovecot.index.cache differ Binary files Maildir.example1/dovecot.index.log and Maildir.example2/dovecot.index.log differ diff -ur Maildir.example1/dovecot-uidlist Maildir.example2/dovecot-uidlist --- Maildir.example1/dovecot-uidlist 2006-02-12 16:20:17.000000000 +0100 +++ Maildir.example2/dovecot-uidlist 2006-02-12 16:21:39.000000000 +0100 @@ -1,3 +1,3 @@ -1 1139757617 3 +1 1139757699 3 1 1139757559.000001.mbox:2, 2 1139757559.000000.mbox:2, ----------- snip ------------------------------ Questions: ---------------- How does Dovecot generate the POP3 UIDL? Is there any workaround that I could use to make Dovecot generate a consistent POP3 UIDL output, enabling a migration from MBOX to Maildir, or am I stuck with using MBOX? My environment ----------------------- OS: CentOS 4.2 Dovecot version: 1.0 beta3 (built locally to an RPM package, available on request) I have tested with earlier versions of Dovecot, getting the same behaviour. Relevant entries in /etc/dovecot.conf: ----------- snip ------------------------------ protocols = imap imaps pop3 pop3s listen = * disable_plaintext_auth = no protocol pop3 { pop3_uidl_format = %08Xv%08Xu pop3_client_workarounds = outlook-no-nuls oe-ns-eoh } ----------- snip ------------------------------ -- /Jonas Olsson
On Sun, 2006-02-12 at 16:20 +0100, Jonas Olsson wrote:
Questions:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
Is there any workaround that I could use to make Dovecot generate a consistent POP3 UIDL output, enabling a migration from MBOX to Maildir, or am I stuck with using MBOX?
You can convert your mails in a way that preserves the UIDVALIDITY and UIDs. In mboxes the UIDVALIDITY is stored in X-IMAPbase or X-IMAP header in the first mail in the mailbox. UIDs are stored in X-UID header for each mail.
So what you'd need to do is create some script to build the dovecot-uidlist file properly. As far as I know no-one's done this yet, but it shouldn't be too difficult.
dovecot-uidlist file format is:
<uidvalidity> <next-unused-uid> <uid> <filename> <next uid> <next filename> ..
You should also set next-unused-uid value properly. It's also in X-IMAPbase/X-IMAP header as the next number after UIDVALIDITY.
On Sun, Feb 12, 2006 at 10:44:54PM +0200, Timo Sirainen wrote:
On Sun, 2006-02-12 at 16:20 +0100, Jonas Olsson wrote:
Questions:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
More generically there is also the pop3_reuse_xuidl option which I've used extensively in migration scripts from legacy POP3 servers; the scripts actually fetch from the legacy store using POP3 UIDL and just insert the header when spewing it to the dovecot store.
see http://wiki.dovecot.org/Migration#head-99ef10c69108e731db56d7b83b51a9fdb1152...
joshua
-- Josh "Koshua" Goodall "as modern as tomorrow afternoon" joshua@roughtrade.net - FW109
On Mon, Feb 13, 2006 at 10:55:22AM +1100, Joshua Goodall wrote:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
More generically there is also the pop3_reuse_xuidl option which I've used extensively in migration scripts from legacy POP3 servers; the scripts actually fetch from the legacy store using POP3 UIDL and just insert the header when spewing it to the dovecot store.
see http://wiki.dovecot.org/Migration#head-99ef10c69108e731db56d7b83b51a9fdb1152...
the only side effect of using this feature is that you need to leave it turned on, even after users have POP'd into their mailbox because if you turn it off and are using an X-UIDL format such as the default, Dovecot knows that it can generate it on the fly and does so, even if the X-UIDL for a given message is present in the index.
the upshot of this is that every (new) message in a mailbox is read on each pop3 connection, looking for an X-UIDL header (that won't be there), resulting in higher disk IO than necessary.
I need to hack on the code to always use the X-UIDL from the index if present, then turn off pop3_reuse_xuidl and eliminate a bunch of disk IO.
grant.
On Mon, Feb 13, 2006 at 11:22:26AM +1100, grant beattie wrote:
the only side effect of using this feature is that you need to leave it turned on, even after users have POP'd into their mailbox because if you turn it off and are using an X-UIDL format such as the default, Dovecot knows that it can generate it on the fly and does so, even if the X-UIDL for a given message is present in the index.
the upshot of this is that every (new) message in a mailbox is read on each pop3 connection, looking for an X-UIDL header (that won't be there), resulting in higher disk IO than necessary.
I need to hack on the code to always use the X-UIDL from the index if present, then turn off pop3_reuse_xuidl and eliminate a bunch of disk IO.
That optimisation may be unnecessary: the common case is going to be that new messages will be read, indexed, downloaded in one go. It'll all be in buffer cache for that time so I wouldn't expect any additional I/O on a sane OS; just more syscalls by Dovecot to get the data.
In particular, no extra disk seek (which is what kills us). Although I confess I'm thinking mostly of maildirs here.
/k
-- Josh "Koshua" Goodall "as modern as tomorrow afternoon" joshua@roughtrade.net - FW109
2006/2/12, Timo Sirainen <tss@iki.fi>:
On Sun, 2006-02-12 at 16:20 +0100, Jonas Olsson wrote:
Questions:
How does Dovecot generate the POP3 UIDL?
Using pop3_uidl_format. %v means UIDVALIDITY and %u means message UID. With maildir those are taken from dovecot-uidlist file.
Is there any workaround that I could use to make Dovecot generate a consistent POP3 UIDL output, enabling a migration from MBOX to Maildir, or am I stuck with using MBOX?
You can convert your mails in a way that preserves the UIDVALIDITY and UIDs. In mboxes the UIDVALIDITY is stored in X-IMAPbase or X-IMAP header in the first mail in the mailbox. UIDs are stored in X-UID header for each mail.
So what you'd need to do is create some script to build the dovecot-uidlist file properly. As far as I know no-one's done this yet, but it shouldn't be too difficult.
dovecot-uidlist file format is:
<uidvalidity> <next-unused-uid> <uid> <filename> <next uid> <next filename> ..
You should also set next-unused-uid value properly. It's also in X-IMAPbase/X-IMAP header as the next number after UIDVALIDITY.
A quick lock through the current mailboxes reveals that 28 of them contain messages but lack an X-IMAPbase/X-IMAP header. A large majority of the mailboxes also lack any X-UID headers.
I can get the UIDVALIDITY base of the the mailboxes missing an X-IMAP header by connecting to the Dovecot POP3 service, I guess, but how about all the mailboxes lacking X-UID headers? Is there a way I can determine, in a scriptable way, which message has which UID or should I assume that the UIDs are assigned in chronological order based on the next-UID value in the X-IMAP header? That is, the last message in the MBOX file has UID = next-UID - 1, the next to last = next-UID - 2, and so on?
Thank you for your replies by the way. They are very helpful in making this migration happen.
-- /Jonas Olsson
On Mon, 2006-02-13 at 09:02 +0100, Jonas Olsson wrote:
You should also set next-unused-uid value properly. It's also in X-IMAPbase/X-IMAP header as the next number after UIDVALIDITY.
A quick lock through the current mailboxes reveals that 28 of them contain messages but lack an X-IMAPbase/X-IMAP header. A large majority of the mailboxes also lack any X-UID headers.
If they don't have the headers, then it most likely means that POP3 client has downloaded and deleted all the mails, and the mails without X-UID headers are new ones that the client has never seen yet.
I can get the UIDVALIDITY base of the the mailboxes missing an X-IMAP header by connecting to the Dovecot POP3 service, I guess, but how about all the mailboxes lacking X-UID headers? Is there a way I can determine, in a scriptable way, which message has which UID or should I assume that the UIDs are assigned in chronological order based on the next-UID value in the X-IMAP header? That is, the last message in the MBOX file has UID = next-UID - 1, the next to last = next-UID - 2, and so on?
No. If a message doesn't have X-UID header, then it should be given next-uid UID. The next message should then be given next-uid+1, etc. And this should be done even if the message contains already X-UID header, since it might have been sent remotely and it would be broken in that case.
But anyway, those messages without X-UID shouldn't have been seen by POP3 clients yet, so you wouldn't necessarily have to put them into dovecot-uidlist at all, just let Dovecot give them new UIDs.
participants (4)
-
grant beattie
-
Jonas Olsson
-
Joshua Goodall
-
Timo Sirainen