[Dovecot] using dsync to convert mailboxes looses caching options
Hi there,
We're trying to convert users from Maildir to sdbox at present; I'm using dsync to achieve this (2.0.16) however when the user's have been converted we only get minimal information in the caching files. Is there some way to preserve all the caching decisions that were previously made so that when the user logs in to the new mailbox we don't have to cause an io storm rebuilding the cache that we know was good? Dovecot seems to be partially doing this - if i remove the logs/cache from the source mailbox no cache files are built in the conversion; if i put them back then we get a cache file built but it only contains a few bits of information (guid, date.save). Looking into this a bit further i find that when the caches are present at source the fields are preserved but the 'last used' date and caching decisions are not which I suspect means dsync doesn't bother caching on import - only fields with a yes decision in the source are copied (but their decision is only copied as a tmp with the date of import). For example:
Source idxview cache:
-- Cache fields --
# Name Type Size Dec Last used
0: flags bit 4 tmp
2011-11-25 16:09
1: date.received fix 4 yes
2011-11-26 16:17
2: size.virtual fix 8 tmp
2011-11-25 16:09
3: imap.bodystructure str - tmp
2011-11-25 16:09
4: mime.parts var - tmp
2011-11-25 16:09
5: hdr.IMPORTANCE hdr - tmp
2011-11-25 16:09
6: hdr.X-PRIORITY hdr - tmp
2011-11-25 16:09
7: hdr.CONTENT-TYPE hdr - tmp
2011-11-25 16:09
...
18: date.sent fix 8 no
1970-01-01 01:00
19: date.save fix 4 yes
2011-11-26 16:17
20: size.physical fix 8 no
1970-01-01 01:00
21: imap.body str - no
1970-01-01 01:00
... (24 total entries. stripped ones are just hdr. with a tmp decision)
dst cache:
# Name Type Size Dec Last used
0: flags bit 4 tmp
1970-01-01 01:00
1: date.sent fix 8 no
1970-01-01 01:00
2: date.received fix 4 tmp
1970-01-01 01:00
3: date.save fix 4 tmp
2011-11-26 16:19
4: size.virtual fix 8 tmp
1970-01-01 01:00
5: size.physical fix 8 no
1970-01-01 01:00
6: imap.body str - no
1970-01-01 01:00
7: imap.bodystructure str - tmp
1970-01-01 01:00
8: imap.envelope str - no!
1970-01-01 01:00
9: pop3.uidl str - no
1970-01-01 01:00
10: guid str - tmp
2011-11-26 16:19
11: mime.parts var - tmp
1970-01-01 01:00
12: hdr.IMPORTANCE hdr - tmp
1970-01-01 01:00
13: hdr.X-PRIORITY hdr - tmp
1970-01-01 01:00
...
and dst cached message for example:
RECORD: seq=57, uid=12207, flags=0x09 (Seen Answered)
- ext 1 modseq : 1 (0100000000000000)
- ext 3 cache : 5516 (8c150000)
- cache offset=5516 size=92, prev_offset = 0
- guid: 1321895282.XXXt,S=7399760,W=7496225
- date.save: 1322324286 (3e11d14e)
src idx record for this message contains pretty much all fields.
Thanks,
Mark
On Sat, 2011-11-26 at 18:33 +0200, Mark Zealey wrote:
We're trying to convert users from Maildir to sdbox at present; I'm using dsync to achieve this (2.0.16) however when the user's have been converted we only get minimal information in the caching files. Is there some way to preserve all the caching decisions that were previously made so that when the user logs in to the new mailbox we don't have to cause an io storm rebuilding the cache that we know was good? Dovecot seems to be partially doing this - if i remove the logs/cache from the source mailbox no cache files are built in the conversion; if i put them back then we get a cache file built but it only contains a few bits of information (guid, date.save). Looking into this a bit further i find that when the caches are present at source the fields are preserved but the 'last used' date and caching decisions are not which I suspect means dsync doesn't bother caching on import - only fields with a yes decision in the source are copied (but their decision is only copied as a tmp with the date of import). For example:
How are you calling dsync? Does the destination already exist? I tried with:
rm -rf /tmp/foo; dsync -u tss -m INBOX mirror sdbox:/tmp/foo
It sets all of the cache fields with "yes" or "tmp" decision, as it should. But yes, the "last used" field should probably be copied as well.
Perhaps the problem with you is that dsync actually writes all of the cache fields, but then it does a "cache compression" at the end, which sees that the "last used" fields are so old, so it deletes them.
But yes, it is a problem that dsync doesn't update caching decisions.. Hmm. I guess I'll have to fix that for v2.1.
Apologies for top-posting but I can't figure out how to make this client do inline... I am seeing on the first run (we are using 'backup') we don't get any of the cache copied just the index files created. On the second run (ie when dest exists); a cache file is created and populated with the bits that are required for the sync presumably - guid. As you say the yes/tmp caching decisions are copied over (and visible in the cache file) but because the last used date is not copied; these fields are not activated for any of the messages so none of their data actually gets cached. I'm not seeing a compression at the end as the tmp etc fields are still there (mostly don't have any yes fields in our source caches) but as I say, because they don't have a last used date then the none of them are ever actually used until the client requests them via pop/imap.
Mark
From: Timo Sirainen [tss@iki.fi] Sent: 08 December 2011 05:53 To: Mark Zealey Cc: Dovecot Mailing List Subject: Re: [Dovecot] using dsync to convert mailboxes looses caching options
On Sat, 2011-11-26 at 18:33 +0200, Mark Zealey wrote:
We're trying to convert users from Maildir to sdbox at present; I'm using dsync to achieve this (2.0.16) however when the user's have been converted we only get minimal information in the caching files. Is there some way to preserve all the caching decisions that were previously made so that when the user logs in to the new mailbox we don't have to cause an io storm rebuilding the cache that we know was good? Dovecot seems to be partially doing this - if i remove the logs/cache from the source mailbox no cache files are built in the conversion; if i put them back then we get a cache file built but it only contains a few bits of information (guid, date.save). Looking into this a bit further i find that when the caches are present at source the fields are preserved but the 'last used' date and caching decisions are not which I suspect means dsync doesn't bother caching on import - only fields with a yes decision in the source are copied (but their decision is only copied as a tmp with the date of import). For example:
How are you calling dsync? Does the destination already exist? I tried with:
rm -rf /tmp/foo; dsync -u tss -m INBOX mirror sdbox:/tmp/foo
It sets all of the cache fields with "yes" or "tmp" decision, as it should. But yes, the "last used" field should probably be copied as well.
Perhaps the problem with you is that dsync actually writes all of the cache fields, but then it does a "cache compression" at the end, which sees that the "last used" fields are so old, so it deletes them.
But yes, it is a problem that dsync doesn't update caching decisions.. Hmm. I guess I'll have to fix that for v2.1.
On Thu, 2011-12-08 at 07:53 +0200, Timo Sirainen wrote:
But yes, it is a problem that dsync doesn't update caching decisions.. Hmm. I guess I'll have to fix that for v2.1.
Could you try if the attached patch fixes your problems when patching against latest v2.1 hg? It's annoyingly large, and it makes v2.1 dsync incompatible with v2.0, but maybe it's better to do it sooner than later..
OK now it's copying the timestamp fields for tmp ones. However:
- hdr.* fields are not being copied at all (unlike in previous releases)
- although the decisions are now being recorded; the items are not actually being put into the cache for previously sync'd mails. New mails are having all the cache information produced however.
Note: this is only when using the -f option to dsync; when not using -f it doesnt even get round to generating a cache so no fields are put there.
Perhaps this should be activated by a new option to dsync; if people are using this for backup (rather than conversion) caches could get relatively large?
Mark
From: Timo Sirainen [tss@iki.fi] Sent: 08 December 2011 07:33 To: Dovecot Mailing List Cc: Mark Zealey Subject: Re: [Dovecot] using dsync to convert mailboxes looses caching options
On Thu, 2011-12-08 at 07:53 +0200, Timo Sirainen wrote:
But yes, it is a problem that dsync doesn't update caching decisions.. Hmm. I guess I'll have to fix that for v2.1.
Could you try if the attached patch fixes your problems when patching against latest v2.1 hg? It's annoyingly large, and it makes v2.1 dsync incompatible with v2.0, but maybe it's better to do it sooner than later..
On Thu, 2011-12-08 at 09:19 +0000, Mark Zealey wrote:
OK now it's copying the timestamp fields for tmp ones. However:
- hdr.* fields are not being copied at all (unlike in previous releases)
They are in my tests.. This also happens if the destination doesn't exist?
- although the decisions are now being recorded; the items are not actually being put into the cache for previously sync'd mails. New mails are having all the cache information produced however.
This is intentional. Doing anything else would be horribly inefficient. Note that dsync isn't *copying* cached data. It's simply setting the caching decisions, and the mail saving code parses the mails and updates cache.
Perhaps this should be activated by a new option to dsync; if people are using this for backup (rather than conversion) caches could get relatively large?
Hm. Maybe..
OK I'll test the header copying more fully. The reason we want to preserve caching decisions is to avoid an IO storm when users log in to their mailboxes after an sdbox upgrade so it would be great to be able to have some way to warm caches.
Mark
From: Timo Sirainen [tss@iki.fi] Sent: 08 December 2011 09:27 To: Mark Zealey Cc: Dovecot Mailing List Subject: RE: [Dovecot] using dsync to convert mailboxes looses caching options
On Thu, 2011-12-08 at 09:19 +0000, Mark Zealey wrote:
OK now it's copying the timestamp fields for tmp ones. However:
- hdr.* fields are not being copied at all (unlike in previous releases)
They are in my tests.. This also happens if the destination doesn't exist?
- although the decisions are now being recorded; the items are not actually being put into the cache for previously sync'd mails. New mails are having all the cache information produced however.
This is intentional. Doing anything else would be horribly inefficient. Note that dsync isn't *copying* cached data. It's simply setting the caching decisions, and the mail saving code parses the mails and updates cache.
Perhaps this should be activated by a new option to dsync; if people are using this for backup (rather than conversion) caches could get relatively large?
Hm. Maybe..
By the way, another bug I noticed with dsync is that when converting from Maildir to sdbox is that the date.saved field is not preserved - it's just the time when the first dsync command happened. Presumably it should be the mtime of the Maildir message file
Mark
On Thu, 2011-12-08 at 16:10 +0000, Mark Zealey wrote:
By the way, another bug I noticed with dsync is that when converting from Maildir to sdbox is that the date.saved field is not preserved - it's just the time when the first dsync command happened. Presumably it should be the mtime of the Maildir message file
With Maildir the date.saved is taken from the mail file's ctime (yes, it's not perfect, but it's good enough for what it's used for). It's preserved in my tests.
10-12-2011 08:27, Timo Sirainen yazmış:
By the way, another bug I noticed with dsync is that when converting from Maildir to sdbox is that the date.saved field is not preserved - it's just the time when the first dsync command happened. Presumably it should be the mtime of the Maildir message file With Maildir the date.saved is taken from the mail file's ctime (yes, it's not perfect, but it's good enough for what it's used for). It's
On Thu, 2011-12-08 at 16:10 +0000, Mark Zealey wrote: preserved in my tests.
It could well be because of the conversion to sdbox then - the ctime/mtime of the files are not being preserved by dsync (in stock 2.0.16). The date.saved timestamp is only put into the cache on the second dsync run; presumably therefore it picks it up from the filesystem.
Mark
participants (3)
-
Mark Zealey
-
Mark Zealey
-
Timo Sirainen