On Mon, 2003-11-24 at 14:19, Farkas Levente wrote:
but I always feel that dovecot reinvent the wheel, since there is a dozen of database system which has nothing else to do just indexing (ok it's not true, but..). they probably do it in the right way (or at least we can find some) and they has a few years of experience. they do right the indexing, the locking, the transactions, etc.. so why we not use one realy fast and good database engine to index our mail storage? the only reason what I can accept in this case, that this is some very special type of database and dovecot can use such algorithm which suited to this problem better then a general indexing algorithms. is this true?
Pretty much, yes. Databases normally do binary tree (or similiar) indexes. Sometimes hash and bitmap indexes. Dovecot doesn't really use any of these. Or, well, cache file would work pretty well in a database since it's for UID -> some cached message data lookups.
Rest of the indexes wouldn't work all that well in database though. For example modify/transaction log can be used to quickly figure out what another session changed in a mailbox (flags, expunges). How many databases allow you to easily and quickly look at transaction history? Well, it would be possible to create our own transaction log table for each mailbox, but it of course costs more.
Another problem is how to do message sequence -> UID lookups. With old indexes we're doing it in a quite difficult way, but with new indexes it's a simple array lookup in index file. I don't think any database allows this kind of queries, so what we have to constantly keep an array of all message UIDs in memory. Not that bad necessarily, but it's extra memory overhead.
Anyway, some day I will write SQL database support, but even if it was just Berkeley DB I bet there would be rather large memory, disk usage and CPU usage overhead compared to what we have now.
I'm not sure if there are lock contention problems with databases.
another thing which always come to my mind when think about speed: why we do the indexing when we look into the folders, since IMHO it'd be more efficient if we do it at the mail delivery time. the mail arrival is more balanced during the time, so the system load is more balanced. so there can be the delivery helper apps.
- one optional deivery helper application which can do the indexing during the deivery time,
- indexing during remove, move, copy etc. imap operations,
- and the current (eg, the new indexing engine) if someone do not use
Yep, this has been in TODO for a while. Once this is possible it's also simple to add support for UIDPLUS extension. And update mailbox quotas quickly and accurately. It's a bit difficult to implement to 0.99.10 code base, but should be pretty easy to add to current CVS. Once new indexing works, I'll add this immediately.