[Dovecot] Need fast Maildir to mdbox conversion
I looked around the 'Net to see if there might be a custom program for
offline Maildir to mdbox conversion. So far I haven't turned up anything. The problem for us is that the dsync program simply takes a lot of time to convert mailboxes. I wonder if time could be saved with a program that is optimized to convert mailboxes without the fancy locking that dsync needs to do. Does have (or seen) a tool that could do this? We're hoping that converting away from Maildir will help us speed up the backup processes by reducing the number of files to process.
...Jeff
On 3/27/2012 3:40 PM, Jeff Gustafson wrote:
I looked around the 'Net to see if there might be a custom program for offline Maildir to mdbox conversion. So far I haven't turned up anything. The problem for us is that the dsync program simply takes a lot of time to convert mailboxes.
Is it slower than doing an IMAP APPEND over an authenticated dovecot connection?
I've used a simple PERL script based on Mail::IMAPClient and Mail::Box to import 180,000+ mailboxes into dovecot's mdbox at fairly high speed, and all it does is IMAP APPENDs. (I had to shard the mailboxes because these PERL based tools exhaust RAM when run with mailboxes larger than about 600MB).
On my development VM test box (32 bit Slack 13.37, 2G/2G split kernel, no RAID, Q6600 with only two cores allocated to the VM) and 8GB of DDR2 RAM does
Emails=180,044 real 237m28.485s (12.5 emails/second) user 94m50.425s sys 10m09.389s 21,984,824 /mail/home
I'm writing a swiss-army (C-based, no bytecode crap languages) mailbox "transcoding" tool, since none appear to exist. To keep it simple, I/O to/from "remote" mailbox (connections) are not pipelined. It won't require more than MAXEMAILSIZE's worth of RAM (if one of the directions involves a remote connection), and so far when processing MIX, Maildir, and Mbox files, it's extremely fast.
Adding support for [sm]dbox wouldn't appear to be problematic. At the moment, it supports everything Panda's c-client supports plus Maildir/Maildir++ (including Panda's "MIX").
Write support for Maildir's extremely UNDER-tested so far, as I've mainly used it to import Maildir hives.
I've experimented with Maildir as a format, and while the one email to a file model seems like a sensible idea, it seems to simply transfer stress from one part of the system to another, mainly filesystems, and not many of those are really up for handling that many files in one directory very efficiently.
None of my users have mailboxes with fewer than 100K emails in them, some have more than a million.
=R=
On Tue, 2012-03-27 at 20:00 -0700, Robin wrote:
I'm writing a swiss-army (C-based, no bytecode crap languages) mailbox "transcoding" tool, since none appear to exist. To keep it simple, I/O to/from "remote" mailbox (connections) are not pipelined. It won't require more than MAXEMAILSIZE's worth of RAM (if one of the directions involves a remote connection), and so far when processing MIX, Maildir, and Mbox files, it's extremely fast.
This sounds interesting. If it could so [sm]dbox, it would be very,
very useful to large installations.
...Jeff
On Wed, Mar 28, 2012 at 12:40 AM, Jeff Gustafson ncjeffgus@zimage.com wrote:
I looked around the 'Net to see if there might be a custom program for offline Maildir to mdbox conversion. So far I haven't turned up anything. The problem for us is that the dsync program simply takes a lot of time to convert mailboxes. I wonder if time could be saved with a program that is optimized to convert mailboxes without the fancy locking that dsync needs to do. Does have (or seen) a tool that could do this?
Why is it a problem that dsync takes a long time, when it can be done without downtime for the users?
I just started our maildir->mdbox convertion yesterday, using the attached script. I only converted a little over 10000 easy accounts (accounts with simple folder names, as I expect to run into problems once we start hitting accounts with trailing dot or broken latin1/utf8 characters in the folder names). I might agree it wasn't quick, but that really doesn't matter as the only downtime for the user is that he's potentially kicked out during the userdb update.
-jf
We're hoping that converting away from Maildir will help us speed up the backup processes by reducing the number of files to process.
On Wed, 2012-03-28 at 09:24 +0200, Jan-Frode Myklebust wrote:
Why is it a problem that dsync takes a long time, when it can be done without downtime for the users?
I just started our maildir->mdbox convertion yesterday, using the attached script. I only converted a little over 10000 easy accounts (accounts with simple folder names, as I expect to run into problems once we start hitting accounts with trailing dot or broken latin1/utf8 characters in the folder names). I might agree it wasn't quick, but that really doesn't matter as the only downtime for the user is that he's potentially kicked out during the userdb update.
I looked over your script. I plan on doing some trial runs with it. I
think the trick where you re-run the sync and then boot the user off the connection should work pretty well. I hadn't totally fleshed out the scripting on the conversion since there is a lot more I need to do with the database and configuration files first. It appears I can use your script as a starting point for our configuration.
...Jeff
-jf
We're hoping that converting away from Maildir will help us speed up
the backup processes by reducing the number of files to process.
participants (3)
-
Jan-Frode Myklebust
-
Jeff Gustafson
-
Robin