[Dovecot] Small change to make dovecot pop3 uw-imap migration friendly
Hi, today I've finished migration from uw-imap daemons to shiny and fast dovecot. With thanks of dovecot my mail server load average drops by factor of ten even with ancient unix mailboxes. ;-) So, the only thing I've discovered is what POP3 uidls are different than ones used in the uw-imap. Luckily, the difference are only in the format string used in uidl response. So, I've made the following change in the code --- dovecot-0.99.10.4.orig/src/pop3/commands.c 2003-05-28 15:17:15.000000000 +0400 +++ dovecot-0.99.10.4/src/pop3/commands.c 2004-05-27 14:06:48.000000000 +0400 @@ -374,7 +374,7 @@ while ((mail = client->mailbox->fetch_next(ctx)) != NULL) { client_send_line(client, message == 0 ? - "%u %u.%u" : "+OK %u %u.%u", + "%u %08x%08x" : "+OK %u %08x%08x", mail->seq, client->uidvalidity, mail->uid); found = TRUE; } and got everything just like in the old uw-imap pop3. I'm current running RH9, but thinking of FC2 which uses dovecot. So, I've rebuild the FC2 rpm for RH9 including this patch. Pop3 users, which do not remove their mails from the server are really happy - they do not receive old mails twice. Probably this small change could be applied to the 1.0, or used as a config option. Any suggestions are greatly appriciated. Thanks, Oleg.
On 27.5.2004, at 16:29, Oleg I. Vdovikin wrote:
Hi,
today I've finished migration from uw-imap daemons to shiny and
fast dovecot. With thanks of dovecot my mail server load average drops by factor of ten even with ancient unix mailboxes. ;-)
Get 0.99.10.5, it fixes possible mbox corruption.
So, the only thing I've discovered is what POP3 uidls are
different than ones used in the uw-imap. Luckily, the difference are only in the format string used in uidl response. So, I've made the following change in the code
Actually I just wrote a similiar patch a few days ago .. :) http://dovecot.org/patches/pop3-uidl-uwimap.patch it's against .10.5 which changed a bit.
Probably this small change could be applied to the 1.0, or used as
a config option.
I was thinking about making this fully configurable, ie. a config option which would accept printf-like string. "%v.%u" "%08xv%08xu" or something. Or maybe it's not worth it, I'd have to look at other POP3 servers to see what different kinds of UIDs they use..
On 27.5.2004, at 16:29, Oleg I. Vdovikin wrote:
Hi,
today I've finished migration from uw-imap daemons to shiny and
fast dovecot. With thanks of dovecot my mail server load average drops by factor of ten even with ancient unix mailboxes. ;-)
Get 0.99.10.5, it fixes possible mbox corruption. Thanks, will do.
So, the only thing I've discovered is what POP3 uidls are
different than ones used in the uw-imap. Luckily, the difference are only in the format string used in uidl response. So, I've made the following change in the code
Actually I just wrote a similiar patch a few days ago .. :) http://dovecot.org/patches/pop3-uidl-uwimap.patch it's against .10.5 which changed a bit.
Perfect. ;-)
Probably this small change could be applied to the 1.0, or used as
a config option.
I was thinking about making this fully configurable, ie. a config option which would accept printf-like string. "%v.%u" "%08xv%08xu" or something. Or maybe it's not worth it, I'd have to look at other POP3 servers to see what different kinds of UIDs they use..
Yes, this sounds reasonable. Probably you will need to add more format specifiers to catch everything. ;-)
Thanks, Oleg.
On Thu, May 27, 2004 at 05:57:40PM +0300, Timo Sirainen wrote:
Probably this small change could be applied to the 1.0, or used as a config option.
I was thinking about making this fully configurable, ie. a config option which would accept printf-like string. "%v.%u" "%08xv%08xu" or something. Or maybe it's not worth it, I'd have to look at other POP3 servers to see what different kinds of UIDs they use..
Suggestion: if you have any way to maintain per-message state information, keep the UID in there. You can prime it using an algorithm of your choice. Ideally you'd allow its value to be taken from a header (e.g. X-POP3-UIDL) if present.
The reason for this: when migrating a mailbox from any other POP3 server, you can take whatever UIDL the old POP3 server gives and attach it to the message in Dovecot.
courier-imap *almost* allows me to do this. It keeps a state file (courierpop3dsizelist), containing one line per message - it has the exact message size as per RFC1939, and a sequence number. The UIDL entry is then sequenced from UID${uidvalidity}.${sequence}
What I'd prefer is that instead of a sequence number, the exact UIDL string is stored there. Then I could migrate from *any* POP3 server and preserve UIDLs.
(I'm keeping an eye on Dovecot... courier-imap has served me extremely well in production environments and under heavy load, but from what I've seen on this list, Dovecot is very good at producing diagnostic error messages when things go wrong, which courier is dreadful at... I can't deploy Dovecot until it supports Maildir++ quotas though)
Regards,
Brian.
Timo Sirainen <tss@iki.fi> writes:
On 27.5.2004, at 16:29, Oleg I. Vdovikin wrote:
So, the only thing I've discovered is what POP3 uidls are
different than ones used in the uw-imap. Luckily, the difference are only in the format string used in uidl response. So, I've made the following change in the code
Actually I just wrote a similiar patch a few days ago .. :) http://dovecot.org/patches/pop3-uidl-uwimap.patch it's against .10.5 which changed a bit.
[...]
I was thinking about making this fully configurable, ie. a config option which would accept printf-like string. "%v.%u" "%08xv%08xu" or something. Or maybe it's not worth it, I'd have to look at other POP3 servers to see what different kinds of UIDs they use..
Question: will this apply to all UIDs or just to those for new mail? I can't tell from the code fragment.
There are two requirements:
- major: a UID that any client may have seen _MUST NOT_ change.
This means that a formatting string MUST ONLY have an impact on newly arriving mail.
- minor: those who migrate from UWImap would like to use UWImap UIDs for existing mail.
This could be achieved by a little program that runs once per mailbox after the initial switch.
And note that the client must not care about the UIDL format as long as it's legal, so anything beyond "importing" the UWIMAP UIDL is bogus.
-- Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95
----- Original Message ----- From: "Matthias Andree" <matthias.andree@gmx.de> To: <dovecot@dovecot.org> Sent: Friday, May 28, 2004 11:43 AM Subject: [Dovecot] Re: Small change to make dovecot pop3 uw-imap migrationfriendly
Timo Sirainen <tss@iki.fi> writes:
On 27.5.2004, at 16:29, Oleg I. Vdovikin wrote:
So, the only thing I've discovered is what POP3 uidls are
different than ones used in the uw-imap. Luckily, the difference are only in the format string used in uidl response. So, I've made the following change in the code
Actually I just wrote a similiar patch a few days ago .. :) http://dovecot.org/patches/pop3-uidl-uwimap.patch it's against .10.5 which changed a bit.
[...]
I was thinking about making this fully configurable, ie. a config option which would accept printf-like string. "%v.%u" "%08xv%08xu" or something. Or maybe it's not worth it, I'd have to look at other POP3 servers to see what different kinds of UIDs they use..
Question: will this apply to all UIDs or just to those for new mail? I can't tell from the code fragment. In may particular environment this applied to all messages. Taking in account that old messages was served by uw-pop3 they will not refeteched by POP3 clients after the upgrade to dovecot.
There are two requirements:
- major: a UID that any client may have seen _MUST NOT_ change.
Right.
This means that a formatting string MUST ONLY have an impact on newly arriving mail. No, there is no need to store the exact formatting string with each message.
- minor: those who migrate from UWImap would like to use UWImap UIDs for existing mail.
This could be achieved by a little program that runs once per mailbox after the initial switch.
There is no need for this. The only thing is needed - is just an ability
to specify the format string in the config file and use this forever. So, for existing dovecot configuration (default) it will be like
pop3uidl = "%u.%u"
while for uw-imap migrated environment it should be changed to
pop3uidl = "%08x%08x"
And everyone will be happy. No UIDL value changes.
And note that the client must not care about the UIDL format as long as it's legal, so anything beyond "importing" the UWIMAP UIDL is bogus. There is no dependency on the UIDL format, but the dependecy on the content for the messages received with old pop3 daemon.
Regards, Oleg.
On Fri, May 28, 2004 at 03:55:03PM +0400, Oleg I. Vdovikin wrote:
- major: a UID that any client may have seen _MUST NOT_ change.
Right.
This means that a formatting string MUST ONLY have an impact on newly arriving mail. No, there is no need to store the exact formatting string with each message.
He did not say that the formatting string is stored with each message; but there is a requirement that the *UID string* is stored with each message.
i.e. changing the format string does not change the UIDs of existing messages; or put another way, the format string is only applied the first time that a new message is seen.
Given that, you could apply *any* UID to messages when you import them, and the format string will only affect subsequently-delivered messages.
----- Original Message ----- From: "Brian Candler" <B.Candler@pobox.com> To: "Oleg I. Vdovikin" <oleg@cs.msu.su> Cc: <dovecot@dovecot.org> Sent: Friday, May 28, 2004 4:06 PM Subject: Re: [Dovecot] Re: Small change to make dovecot pop3 uw-imapmigrationfriendly
On Fri, May 28, 2004 at 03:55:03PM +0400, Oleg I. Vdovikin wrote:
- major: a UID that any client may have seen _MUST NOT_ change.
Right.
This means that a formatting string MUST ONLY have an impact on newly arriving mail. No, there is no need to store the exact formatting string with each message.
He did not say that the formatting string is stored with each message; but there is a requirement that the *UID string* is stored with each message. I mean UIDL string, not formatting string, sorry. But this does not matter.
i.e. changing the format string does not change the UIDs of existing messages; or put another way, the format string is only applied the first time that a new message is seen. IMAP UID is just a number. And it's appearence is fixed. It's not an UIDL response. Right?
Given that, you could apply *any* UID to messages when you import them, and the format string will only affect subsequently-delivered messages. Just to clarify: there is no importing at all. Dovecot stores UID and UIDVALIDITY internally and basically it's compatible with uw-imap. But the UIDL command answers via POP3 is different. So, I need to be able specify formatting string for the UIDL answer. This does not requires any changes for old messages and for the new ones. I'm just want sit fixed at %08x%08x format forever and do not want to switch to %u.%u used by dovecot...
Regards, Oleg.
On Fri, May 28, 2004 at 04:32:22PM +0400, Oleg I. Vdovikin wrote:
i.e. changing the format string does not change the UIDs of existing messages; or put another way, the format string is only applied the first time that a new message is seen. IMAP UID is just a number. And it's appearence is fixed. It's not an UIDL response. Right?
I'm sorry - given the discussion about its *format* being important, I was assuming we were talking about POP3 UIDL.
IMAP UIDs are *defined* to be monotonically increasing integers, and number 1234 is 1234. (RFC3501 is bad, but it doesn't say you can send them in hex :-)
Just to clarify: there is no importing at all. Dovecot stores UID and
UIDVALIDITY internally and basically it's compatible with uw-imap. But the UIDL command answers via POP3 is different. So, I need to be able specify formatting string for the UIDL answer.
Ah OK. But POP3 servers are not *required* to use any particular format for UIDL. So when importing messages from server X, then the UID could be any arbitary string. If you want to allow transparent migration of messages from server X to dovecot, then you need to import the UID string as-is.
Maybe some POP3 servers derive their UIDL from the IMAP UID, but not all do (courier-imap for one; also any standalone POP3 server which does not have an IMAP component)
This does not requires any changes for old messages and for the new ones. I'm just want sit fixed at %08x%08x format forever and do not want to switch to %u.%u used by dovecot...
The other point is that if you have a global default like this, and one day you decide to change from one format to the other, then all POP3 clients will suddenly believe that all the stored mail is new and will download it again.
Regards,
Brian.
----- Original Message ----- From: "Brian Candler" <B.Candler@pobox.com> To: "Oleg I. Vdovikin" <oleg@cs.msu.su> Cc: <dovecot@dovecot.org> Sent: Friday, May 28, 2004 5:14 PM Subject: Re: [Dovecot] Re: Small change to make dovecot pop3uw-imapmigrationfriendly
Maybe some POP3 servers derive their UIDL from the IMAP UID, but not all do (courier-imap for one; also any standalone POP3 server which does not have an IMAP component) Yes, that's the case. And luckily dovecot and uw-imap use the same approach. ;-) And RedHat guys pushing dovecot instead of uw-imap in Fedora Core 2. Many users will "migrate" this way.
This does not requires any changes for old messages and for the new ones. I'm just want sit fixed at
%08x%08x
format forever and do not want to switch to %u.%u used by dovecot...
The other point is that if you have a global default like this, and one day you decide to change from one format to the other, then all POP3 clients will suddenly believe that all the stored mail is new and will download it again. Yes, but I need at least transparent migration from uw-imap to dovecot. ;-)
Regards,
Brian.
Regards, Oleg.
On Fri, May 28, 2004 at 05:30:58PM +0400, Oleg I. Vdovikin wrote:
The other point is that if you have a global default like this, and one day you decide to change from one format to the other, then all POP3 clients will suddenly believe that all the stored mail is new and will download it again. Yes, but I need at least transparent migration from uw-imap to dovecot. ;-)
OK. I think what we're saying is that a format string for creating POP3 UIDLs from IMAP UIDs will solve that particular problem, but it doesn't solve the general problem of migrating from an arbitary POP3 server to dovecot. I'm more interested in the latter; if you're an ISP and you buy out another ISP, you want to migrate their mailboxes transparently but the other ISP could have been running any POP3 server at all.
Regards,
Brian.
"Oleg I. Vdovikin" <oleg@cs.msu.su> writes:
Yes, but I need at least transparent migration from uw-imap to dovecot.
We need transparent migration from any server, not just from UW Imap. It can be a Perl/Python/Whatever script - it only runs one if Dovecot stores the UIDLs in a file.
-- Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95
On Fri, 2004-05-28 at 20:56, Matthias Andree wrote:
"Oleg I. Vdovikin" <oleg@cs.msu.su> writes:
Yes, but I need at least transparent migration from uw-imap to dovecot.
We need transparent migration from any server, not just from UW Imap. It can be a Perl/Python/Whatever script - it only runs one if Dovecot stores the UIDLs in a file.
Would it be enough if Dovecot simply used "X-POP3-UID" (or something) header which contains the UID for the old server? That would have to be generated using some script though.
If the header doesn't exist, Dovecot would use it's internal method.
On Sun, May 30, 2004 at 01:20:31AM +0300, Timo Sirainen wrote:
On Fri, 2004-05-28 at 20:56, Matthias Andree wrote:
"Oleg I. Vdovikin" <oleg@cs.msu.su> writes:
Yes, but I need at least transparent migration from uw-imap to dovecot.
We need transparent migration from any server, not just from UW Imap. It can be a Perl/Python/Whatever script - it only runs one if Dovecot stores the UIDLs in a file.
Would it be enough if Dovecot simply used "X-POP3-UID" (or something) header which contains the UID for the old server? That would have to be generated using some script though.
If the header doesn't exist, Dovecot would use it's internal method.
As long as the X-POP3-UID header is cached somewhere, it would be fine.
The main purpose of UIDL is for when clients leave mail on the server; each time they connect they issue UIDL to check for new mail. So if they have 1000 messages held on the server, each time they log in you don't want to have to open and read each of those 1000 files.
Same applies for message sizes for 'LIST' (where the message size counts each newline as two bytes, \r\n). Some clients issue 'LIST' each time they connect, so you don't want to have to open and read each message from start to end just to count the number of newlines it contains.
Regards,
Brian.
Brian Candler <B.Candler@pobox.com> writes:
The main purpose of UIDL is for when clients leave mail on the server; each time they connect they issue UIDL to check for new mail. So if they have 1000 messages held on the server, each time they log in you don't want to have to open and read each of those 1000 files.
And clients will also assume that you don't insert messages into the middle of the mailbox, but append them at the end of the list, and that message-number <-> UIDL assignments remain in the known order (obviously, DELE from the middle will change the assignment overall, but if you had
no uid 1 abc 2 def 3 ghi
dele 2 should keep 1 abc and 2 ghi.
-- Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95
On Sun, May 30, 2004 at 09:55:39PM +0200, Matthias Andree wrote:
The main purpose of UIDL is for when clients leave mail on the server; each time they connect they issue UIDL to check for new mail. So if they have 1000 messages held on the server, each time they log in you don't want to have to open and read each of those 1000 files.
And clients will also assume that you don't insert messages into the middle of the mailbox, but append them at the end of the list, and that message-number <-> UIDL assignments remain in the known order
They will?? They shouldn't. RFC1939 doesn't allow them to do this as far as I know.
(obviously, DELE from the middle will change the assignment overall, but if you had
no uid 1 abc 2 def 3 ghi
dele 2 should keep 1 abc and 2 ghi.
I don't believe they have any permission to assume this. If the server happens to maintain this ordering, well, the client could attempt some sort of dubious binary-chop to work out where new mail has arrived (since some or all messages may have been deleted); I think 'fetchmail' has some code to do this. I consider this an abuse of the protocol.
This requirement would mandate that the server do a sort() on the directory listing, because otherwise the order of files returned by opendir/readdir/ closedir is not necessarily the order in which they are created.
Brian.
Brian Candler <B.Candler@pobox.com> writes:
I don't believe they have any permission to assume this. If the server happens to maintain this ordering, well, the client could attempt some sort of dubious binary-chop to work out where new mail has arrived (since some or all messages may have been deleted)
Yup, given the lack of a range for UIDL, this is necessary.
; I think 'fetchmail' has some code to do this. I consider this an abuse of the protocol.
Historical convention, earlier POP3 versions required this, to support the LAST command.
This requirement would mandate that the server do a sort() on the directory listing, because otherwise the order of files returned by opendir/readdir/ closedir is not necessarily the order in which they are created.
True. I know this feature was implemented by Courier-IMAP around 2.0 because too many clients choked with adding new mail out-of-order. qmail-pop3d also shows new mail at the end of the list, and it is perhaps a reference implementation of all Maildir POP3 servers.
-- Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95
On Tue, Jun 01, 2004 at 12:06:49AM +0200, Matthias Andree wrote:
I don't believe they have any permission to assume this. If the server happens to maintain this ordering, well, the client could attempt some sort of dubious binary-chop to work out where new mail has arrived (since some or all messages may have been deleted)
Yup, given the lack of a range for UIDL, this is necessary.
I wouldn't say "necessary" - I'd say "a possible optimisation not sanctioned by the RFC"
But anyway, I'm sure you'd agree that clients are not *required* to perform this optimisation. So, some (many?) clients do ask for a complete UIDL listing every time they connect. If there are 1000 old messages in the mailbox, then you definitely don't want to open and read every one to find its X-POP3-UID header. Equally, clients may ask for a complete LIST of the maildrop. Again, you don't want to open and read every message file just to calculate its size.
courier-imap used to do this, and the load was enormous from clients which left mail on the server. Once it introduced a cache file (courierpop3dsizelist), this problem went away. (This file also stores the UID; previously it had used the Maildir filename as UID, but this turned out to be too long for some clients)
I'm not trying to be pro-courier-imap here, by the way. I'd be happy to have a path to bail out; but there are some key features that I need to have first. One is Maildir++ quotas, and another is efficient POP3 operation in the presence of clients who leave mail on server, and issue a full maildrop UIDL and/or LIST each time they connect. A mechanism for transparent migration of mailbox contents from a remote POP3 server is another. If I can preserve the original UID of each message when migrating mail, then this will be a bonus which courier-imap doesn't give me.
Oh, and perhaps most important of all, I need robustness. The pop3/imap server should not die and drop the connection, say because a message within the maildrop is mailformed in some way.
Regards,
Brian.
"Oleg I. Vdovikin" <oleg@cs.msu.su> writes:
Question: will this apply to all UIDs or just to those for new mail? I can't tell from the code fragment.
In may particular environment this applied to all messages. Taking in
account that old messages was served by uw-pop3 they will not refeteched by POP3 clients after the upgrade to dovecot.
I understand your motivation.
There are two requirements:
- major: a UID that any client may have seen _MUST NOT_ change.
Right.
This means that a formatting string MUST ONLY have an impact on newly arriving mail. No, there is no need to store the exact formatting string with each message.
No, but the UID must be stored UNLESS the format is perpetual. How do I tell Dovecot which format $ANY_OTHER_SERVER used? What if the server used MD5? Will we see dozens of plugins? Or would not it be more sensible to just give it a list of UIDs?
This could be achieved by a little program that runs once per mailbox after the initial switch. There is no need for this. The only thing is needed - is just an ability to specify the format string in the config file and use this forever. So, for existing dovecot configuration (default) it will be like
pop3uidl = "%u.%u"
while for uw-imap migrated environment it should be changed to
pop3uidl = "%08x%08x"
This won't catch all cases, as above.
Cheers,
-- Matthias Andree
Encrypted mail welcome: my GnuPG key ID is 0x052E7D95
participants (5)
-
Brian Candler
-
Matthias Andree
-
Oleg I. Vdovikin
-
oleg@cs.msu.su
-
Timo Sirainen