[Dovecot] FTS Plugin design

Rui Carneiro rui.arc at gmail.com
Fri Apr 17 12:03:38 EEST 2009


Thank you for all tips. The design look more clear to me now.

I have one more question. I looked into fts_build_want_index_part() and I
saw that I need to add some flags to message_part_flags, what values should
I choose? My first approach was to follow your schema and set
MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this?

I already had changed parse_content_type() to set ctx->part->flags correctly
but if i choose my custom flag dovecot assume that all attachment lines are
headers. I already tried to set those ctx->part->flags as TEXT and the
fts_backend was feeded correctly with all attachment lines.

I don't know if this is related with the value of
MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting
block.hdr = NULL or some more code to handle new flags).

Thank you,
Rui Carneiro

On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen <tss at iki.fi> wrote:

> On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
> > I didn't understood yet what is the plugin's design and how the plugins
> are
> > called from the core system and I was wondering if anyone could help me
> with
> > that.
>
> fts-storage.c hooks into all the functions in mail-storage API that it
> needs to. Currently indexing isn't done while messages are being saved,
> but instead just before searching. The searching functions are:
>
>  - fts_mailbox_search_init() tries to figure out if FTS can optimize the
> search. If it does, it tries to figure out if FTS index is up-to-date
> and if not, starts the search.
>
>  - fts_mailbox_search_next_nonblock() continues the indexing (or
> searching after indexing) for a while. The idea is that IMAP connection
> is able to process other commands while doing a long-running search. So
> fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It
> would be nice if that value was dynamically calculated and also based on
> bytes instead of messages, but that's maybe too much trouble.
>
>  - fts_mailbox_search_next_update_seq() uses the fts search results and
> updates mail-storage's search stuff so that it doesn't go through
> messages that don't match.
>
>  - fts_build_mail() indexes a single mail. It parses the messages and
> returns the data in small blocks. For text/* and message/rfc822 parts
> those blocks are currently sent to FTS backend. This is where I think
> you should look into hooking your attachment parsing. Change
> fts_build_want_index_part() to look for more content-types that you're
> interested in and then before feeding the blocks to FTS backend put them
> through your own converter function, something like:
>
> int attachment_extract_text(struct attachment_extract_context *ctx,
> const struct message_block *input, struct message_block *output);
>
>
>


-- 
mobile: +351 963446125
mail: rui.arc at gmail.com
mail: ei04073 at fe.up.pt
website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>


More information about the dovecot mailing list