On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
I didn't understood yet what is the plugin's design and how the plugins are called from the core system and I was wondering if anyone could help me with that.
fts-storage.c hooks into all the functions in mail-storage API that it needs to. Currently indexing isn't done while messages are being saved, but instead just before searching. The searching functions are:
fts_mailbox_search_init() tries to figure out if FTS can optimize the search. If it does, it tries to figure out if FTS index is up-to-date and if not, starts the search.
fts_mailbox_search_next_nonblock() continues the indexing (or searching after indexing) for a while. The idea is that IMAP connection is able to process other commands while doing a long-running search. So fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It would be nice if that value was dynamically calculated and also based on bytes instead of messages, but that's maybe too much trouble.
fts_mailbox_search_next_update_seq() uses the fts search results and updates mail-storage's search stuff so that it doesn't go through messages that don't match.
fts_build_mail() indexes a single mail. It parses the messages and returns the data in small blocks. For text/* and message/rfc822 parts those blocks are currently sent to FTS backend. This is where I think you should look into hooking your attachment parsing. Change fts_build_want_index_part() to look for more content-types that you're interested in and then before feeding the blocks to FTS backend put them through your own converter function, something like:
int attachment_extract_text(struct attachment_extract_context *ctx, const struct message_block *input, struct message_block *output);