Re: speed Re: [Dovecot] New index / mailbox API

24 Nov 2003

      On Mon, 2003-11-24 at 14:19, Farkas Levente wrote:
...
but I always feel that dovecot reinvent the wheel, since there is a
dozen of database system which has nothing else to do just indexing (ok
it's not true, but..). they probably do it in the right way (or at least
we can find some) and they has a few years of experience. they do right
the indexing, the locking, the transactions, etc..
so why we not use one realy fast and good database engine to index our
mail storage?
the only reason what I can accept in this case, that this is some very
special type of database and dovecot can use such algorithm which suited
to this problem better then a general indexing algorithms. is this true?
Pretty much, yes. Databases normally do binary tree (or similiar)
indexes. Sometimes hash and bitmap indexes. Dovecot doesn't really use
any of these. Or, well, cache file would work pretty well in a database
since it's for UID -> some cached message data lookups.
Rest of the indexes wouldn't work all that well in database though. For
example modify/transaction log can be used to quickly figure out what
another session changed in a mailbox (flags, expunges). How many
databases allow you to easily and quickly look at transaction history?
Well, it would be possible to create our own transaction log table for
each mailbox, but it of course costs more.
Another problem is how to do message sequence -> UID lookups. With old
indexes we're doing it in a quite difficult way, but with new indexes
it's a simple array lookup in index file. I don't think any database
allows this kind of queries, so what we have to constantly keep an array
of all message UIDs in memory. Not that bad necessarily, but it's extra
memory overhead.
Anyway, some day I will write SQL database support, but even if it was
just Berkeley DB I bet there would be rather large memory, disk usage
and CPU usage overhead compared to what we have now.
I'm not sure if there are lock contention problems with databases.
...
another thing which always come to my mind when think about speed:
why we do the indexing when we look into the folders, since IMHO it'd be
more efficient if we do it at the mail delivery time. the mail arrival
is more balanced during the time, so the system load is more balanced.
so there can be

one optional deivery helper application which can do the indexing
during the deivery time,
indexing during remove, move, copy etc. imap operations,
and the current (eg, the new indexing engine) if someone do not use
the delivery helper apps.

Yep, this has been in TODO for a while. Once this is possible it's also
simple to add support for UIDPLUS extension. And update mailbox quotas
quickly and accurately. It's a bit difficult to implement to 0.99.10
code base, but should be pretty easy to add to current CVS. Once new
indexing works, I'll add this immediately.

Re: speed Re: [Dovecot] New index / mailbox API

Timo Sirainen