On 8.4.2007, at 10.29, Daniel L. Miller wrote:
This is probably a scary thought, but . . . what would it take for
the indexing part of Dovecot to be implemented via an API/plug-in
model? I'm curious about the effect of using an external SQL
engine (my vote would be Firebird) for processing these, and using
a open plug-in method would allow for that without binding Dovecot
to a particular implementation.
Well.. It would be possible to make the lib-index API completely
virtualized, but I don't think there's much point. The lib-index API
actually doesn't have all that much to do with reading/writing index
files. It's much more about easily manipulating mailbox metadata in
memory.
For example the way I was planning on implementing SQL mail storage
was to create an in-memory index and keep it updated by reading the
data from SQL. The same metadata is in SQL, but it still needs to be
stored into Dovecot's internal structures (== the indexes).
Then there is however dovecot.index.cache file. It's a pretty simple
database, so replacing it with SQL would make more sense. The cache
file API isn't virtualizable yet either, but I was planning on doing
that if I ever got around to making the SQL mail storage plugin
really usable.
SQL cache replacement would need to be a bit tricky however to work.
Currently lib-storage API works like:
- mail_alloc() is done first. It tells what fields it most likely
wants to fetch. - mail_set_seq() can be used to switch to whatever message in mailbox
- mail_get_*() functions can be used to fetch the message data.
The simplest SQL implementation would just do a SQL query for each
mail_get_*() call, but this would also be the slowest implementation.
A bit better would be to use one SQL query to fetch all the data
specified by mail_alloc() in the first mail_get_*() function call.
Then if something extra is fetched that would generate extra SQL
queries.
However most of the time mail_set_seq() isn't used randomly. It's
mostly done only when building a reply for THREAD command. Usually
searching is used:
- mailbox_search_init() specifies search arguments. For FETCH
commands this is simple "sequences 1-10". - mailbox_search_next() finds the next match and calls mail_set_seq () for that mail
So hooking into these functions you could figure out in the _init()
that you want to do the SQL query for messages 1-10, and the first
call to _next() tells you the mail structure where you can get the
list of wanted fields. So IMAP command:
UID FETCH 1:5,10:20 (ENVELOPE BODY INTERNALDATE)
Could be done with a single SQL query, something like:
select envelope, body, internaldate from message_cache where uid
between 1 and 5 or uid between 10 and 20;