[Dovecot] mbox vs. maildir storage block waste
Robin
dovecot at r.paypc.com
Tue Nov 13 00:44:22 EET 2012
On 11/11/2012 5:26 PM, Christoph Anton Mitterer wrote:
> Have you made systematic tests? I.e. compared times for all of these
> with those from the different dovecot backends.
The choice of Dovecot backends made no substantial difference. I used maildir, sdbox, and mdbox. I also added SiS (with mdbox). Initial tests were on local multi-spindle RAID5 storage, but to handicap Dovecot, I pushed it over NFS (also Linux 3.2 on a local GigE segment). It wasn't slow enough to make dbmail competitive, even though you have to start turning off performance optimisation features in Dovecot to avoid NFS bugs.
>> There wasn't a task that the dbmail setup performed faster than
>> Dovecot, in either low or high load situations.
> Which backend did you use?
Backend for dbmail? Two MySQL versions (5.0 and 5.5) - InnoDB is required for dbmail, by the way. Postgres 8.4 and 9.1 backends, using its default storage engine. I tried the tests with both a separate DB machine, as well as a cohosted one with the dbmail connector using local sockets instead of TCP/IP, but that didn't significantly alter the performance.
I've found my first notes from the tests. It was the second round of tests with the latest MySQL 5.0 server given some tuning to more aggressively use system memory. You will note the puny size of the mail folder hive in this round.
> The mysqld process has consumed nearly an hour of CPU time during this process.
> dbmail is configured to use local sockets rather than network I/O.
>
> I'm using the PERL MailTools http://search.cpan.org/dist/MailTools/
> to import about 10 folders' worth of email, totaling about 560MB in raw size,
> constituting about 23,000 emails. The script basically creates the folders,
> and does an APPEND for each email. It's bog simple.
>
> I DROP the database, recreated it, added the one user, verify DBMail
> accepts authentication for the newly created mailbox, and then do the import.
> The MySQL files live on a freshly formatted ext4 filesystem.
>
> The import takes Dovecot (MailDir or mdbox format), or Panda IMAP (mix)
> about six minutes to complete.
>
> DBMail 3 took 4h 23m. Casual inspection of the system showed modestly
> high CPU usage in mysqld and dbmail-imapd (as well as the import perl
> command on occasion), but the Load Average didn't get too close to 1.0,
> let alone 2.0, which concerns me that I might have hit some kind of
> "busy wait" pathology.
To clarify the above: To streamline iterative testing, I made a script to deactivate the currently running SQL server, unmount, re-format, re-mount, and re-populate the skeletal DB directories and restart the DB engine. So between each test, no matter the imapd or DB back-end, the mailstore was presented with a freshly formatted volume on dedicated spindles. The filesystem was ext4, formatted with:
lazy_itable_init=0,lazy_journal_init=0,dir_index=1,extents=1,uninit_bg=0,flex_bg=0,has_journal=0,inode_size=256,dir_index=1,
> Do you have detailed numbers?
Not really, but after it was clear that I wasn't going to get comparable performance even within the same magnitude, I stopped testing it. I included the IMAP SEARCH performance comparison against fts_squat in my original mail to this list. In addition to huge performance deficiencies, it also has/had fatal operational bugs.
> I guess you’ve "only" tried dbmail?
I did try Manitou, but the lack of a proper IMAP service for it made extensive "like for like" testing very difficult. Manitou is still in the very early days, alas. It also relies on the SQL DB's underlying authentication systems which is rather ... alarming. It performs quite a bit better than dbmail, but still it's not close to Dovecot. At the time I tested it, only custom-rolled clients could talk to it, i.e., no imap4/pop3 "gateways" to it.
I think I was most alarmed to see that the widely assumed benefits of putting mail on a SQL DB, i.e., fast searching/sorting, didn't actually happen in reality.
As others have mentioned, I also shudder to think of backup/restore issues, especially on a single user level. The mechanisms of backing up and restoring maildirs and even mdboxes, i.e., simple files, are not only well understood, the failure modes are generally fully recoverable. SQL-DB file blobs, especially with MySQL, remind me too much of the "PST Hell" that Exchange administrators face. But maybe that's just my ignorance talking.
> All something I wouldn’t want to do on my production systems ;)
Neither would I. But as I said, I was "desperate" to get this close to Dovecot's performance. I had about 2-3 weeks to pre-qualify mail storage back-ends with an eye towards 4 or 5 digits of usercount, and maybe tens to hundreds of TBs' scale of mail storage. Running across such poor performance with such relatively small loads disqualified the DB-based mail products very very quickly, for ME, anyway.
If you want to run your own tests, my suggestion is to start with Postgres, put as much RAM into your DB machine as you can afford, and maybe populate your DB machine exclusively with SSDs.
=R=
More information about the dovecot
mailing list