On 21.6.2004, at 21:45, Moe Wibble wrote:
I recently saw some benchmarks (measuring system load) comparing Dovecot mbox, maildir and Cyrus. Dovecot was much slower than I thought, Cyrus was many times faster in most tests. Dovecot with mbox was also much faster than with maildir, even though my 0.99.10 mbox code is pretty bad.
Strange... I'd think that rewriting the mbox files would cause a lot more performance issues than shuffling around files in a Maildir.
They probably were using mboxes that had already all the necessay X-UID etc. fields, so rewriting wouldn't need to do more than really required.
0.99.10 indexes aren't too good, but I still find it a bit strange that Cyrus takes something like 10x less load. I'd think most of it has to do with maildir format itself, that it needs to rename files when flags change, and Dovecot needs to resync the whole maildir after each change in mailbox (and sometimes twice).
I haven't tested dovecot in a high load (multiuser) environment yet so I can't say much about the actual load in such a situation. But once the indexes are made (and don't break) what's really left to cause (unjustified) load?
With maildir the problem is that once it's modified by Dovecot, I can't know if someone else didn't modify it at the same time. So after I change anything it, I'll have to resync the whole maildir again, just in case.
I guess we'll need a IMAP-optimized format sometimes soon.
But please not before maildir support is stable again
I'd probably base it on maildir. Differences would be mostly just: directory's timestamp didn't match the one in index.
- filename = message UID
- new/ directory could exist so existing LDAs can easily add mail
- Dovecot-optimized LDAs would store mails directly into cur/
- if anyone modifies cur/ directory, Dovecot issues a warning when it's noticed but still syncs and fixes it. this fallbacking shouldn't affect performance much, since it would be done only if cur/
- possibly rename the cur/ dir into something else
- possibly add some way to insert new mails into mailbox atomically (mkdir tra/1, put mails there, rename tra/1 tra-done/1 -> after that it's considered as committed and if found by any MUA the mails in it must be moved into cur/)
and shared folders (which I'd claim is a must-have for most corporate deployments) have been implemented. :)
1.0-test sort of tupport shared folders.. If you symlink them manually and create a "dovecot-shared" file with the permissions that should be used for new files. But that's only ugly temporary hack and I'll have to figure out some better way.
I think the proper support is post-1.0 feature.
I do realize that getting the highest performance out of server-side searches may require moving to a prop. format.
Actually I don't think searching can be optimized at all by moving to another format.
But for me the drawbacks (uneasy access to the actual mails, probably backup issues/version incompatibilies) outweight the advantages in most cases. Not to mention all the implementation effort that could be spent on smarter Maildir indexing
Unless the backends share 95% of their code and are compatible in most of the ways.. :)
and maybe a separate "search-optimized index" instead ;)
For SEARCH command, only way to index data in usable way that I know of is the Cyrus squat indexer, but it generates large indexes and it's slow to update, so most people aren't using it.. I'll probably implement it one day though.
Anyway, 1.0 is nearing a state where I'd like to begin hearing benchmarks about it. It's mbox performance should be excellent.
I'm curious too (more about Maildir performance, tho).
Maildir performance should still get somewhat better before 1.0, but mbox has very few optimizations left that I can think of.