On 3/27/2012 3:40 PM, Jeff Gustafson wrote:
I looked around the 'Net to see if there might be a custom program for offline Maildir to mdbox conversion. So far I haven't turned up anything. The problem for us is that the dsync program simply takes a lot of time to convert mailboxes.
Is it slower than doing an IMAP APPEND over an authenticated dovecot connection?
I've used a simple PERL script based on Mail::IMAPClient and Mail::Box to import 180,000+ mailboxes into dovecot's mdbox at fairly high speed, and all it does is IMAP APPENDs. (I had to shard the mailboxes because these PERL based tools exhaust RAM when run with mailboxes larger than about 600MB).
On my development VM test box (32 bit Slack 13.37, 2G/2G split kernel, no RAID, Q6600 with only two cores allocated to the VM) and 8GB of DDR2 RAM does
Emails=180,044 real 237m28.485s (12.5 emails/second) user 94m50.425s sys 10m09.389s 21,984,824 /mail/home
I'm writing a swiss-army (C-based, no bytecode crap languages) mailbox "transcoding" tool, since none appear to exist. To keep it simple, I/O to/from "remote" mailbox (connections) are not pipelined. It won't require more than MAXEMAILSIZE's worth of RAM (if one of the directions involves a remote connection), and so far when processing MIX, Maildir, and Mbox files, it's extremely fast.
Adding support for [sm]dbox wouldn't appear to be problematic. At the moment, it supports everything Panda's c-client supports plus Maildir/Maildir++ (including Panda's "MIX").
Write support for Maildir's extremely UNDER-tested so far, as I've mainly used it to import Maildir hives.
I've experimented with Maildir as a format, and while the one email to a file model seems like a sensible idea, it seems to simply transfer stress from one part of the system to another, mainly filesystems, and not many of those are really up for handling that many files in one directory very efficiently.
None of my users have mailboxes with fewer than 100K emails in them, some have more than a million.
=R=