[Dovecot] mbox vs. maildir storage block waste
Timo Sirainen
tss at iki.fi
Tue Nov 13 01:02:26 EET 2012
On 13.11.2012, at 0.44, Robin wrote:
> On 11/11/2012 5:26 PM, Christoph Anton Mitterer wrote:
>> Have you made systematic tests? I.e. compared times for all of these
>> with those from the different dovecot backends.
>
> The choice of Dovecot backends made no substantial difference. I used maildir, sdbox, and mdbox. I also added SiS (with mdbox). Initial tests were on local multi-spindle RAID5 storage,
With local disks the tests often measure only the local RAM/CPU speed, unless you're testing thousands of users.
> but to handicap Dovecot, I pushed it over NFS (also Linux 3.2 on a local GigE segment). It wasn't slow enough to make dbmail competitive, even though you have to start turning off performance optimisation features in Dovecot to avoid NFS bugs.
NFS makes a better test case if you're measuring single user performance. Much of it is probably due to the index file access latency, although not all. In some cases Dovecot's prefetching mails can help (maildir, sdbox backends with local disks currently, nothing preventing it from working in other use cases though, even with Dovecot-SQL backend).
>> I guess you’ve "only" tried dbmail?
>
> I did try Manitou, but the lack of a proper IMAP service for it made extensive "like for like" testing very difficult. Manitou is still in the very early days, alas. It also relies on the SQL DB's underlying authentication systems which is rather ... alarming. It performs quite a bit better than dbmail, but still it's not close to Dovecot. At the time I tested it, only custom-rolled clients could talk to it, i.e., no imap4/pop3 "gateways" to it.
Manitou seems to advertise itself as being email client .. although then also seems to say SQL is faster than IMAP (which doesn't make much sense itself).
> I think I was most alarmed to see that the widely assumed benefits of putting mail on a SQL DB, i.e., fast searching/sorting, didn't actually happen in reality.
SQL has nothing that makes any type of email access even potentially efficient. SQL indexes are mostly about binary trees, and there are about zero things in IMAP where I have thought of binary tree being even potentially useful. (Okay, potentially for expunging old mails when you have >1M mails in one folder. Not something you normally optimize for.)
With most of Dovecot's optimized lookups, latency is the most important thing. SQL is bad for latency. With remote systems it's usually much faster to just download 1 MB blob and parse it than fetch a couple of 100 byte blocks.
> As others have mentioned, I also shudder to think of backup/restore issues, especially on a single user level. The mechanisms of backing up and restoring maildirs and even mdboxes, i.e., simple files, are not only well understood, the failure modes are generally fully recoverable. SQL-DB file blobs, especially with MySQL, remind me too much of the "PST Hell" that Exchange administrators face. But maybe that's just my ignorance talking.
I'd think everyone would use the human-readable SQL dumps for database backups. At least with MySQL/PostgreSQL I wouldn't really trust anything else.
More information about the dovecot
mailing list