[Dovecot] FTS Plugin design

Rui Carneiro rui.arc at gmail.com
Tue May 5 14:08:41 EEST 2009


Hi again,

On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen <tss at iki.fi> wrote:

>  - fts_build_mail() indexes a single mail. It parses the messages and
> returns the data in small blocks. For text/* and message/rfc822 parts
> those blocks are currently sent to FTS backend. This is where I think
> you should look into hooking your attachment parsing. Change
> fts_build_want_index_part() to look for more content-types that you're
> interested in and then before feeding the blocks to FTS backend put them
> through your own converter function, something like:
>
> int attachment_extract_text(struct attachment_extract_context *ctx,
> const struct message_block *input, struct message_block *output);


Let's take the example of an application-pdf content-type. Before I
converter all pdf data to text I need to gather all data before. The actual
process is feeding FTS backend with small parts of data and appending them
on "build_more" functions (e.g. fts_backend_solr_build_more()).

So where should I call attachment_extract_text()? In
fts_backend_solr_build_more() and not making append to cmd until data is
extracted? Or gather all information before (e.g. fts_build_mail()) and send
all in once to FTS backend?

I hope I've made myself clear.

Regards,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net


More information about the dovecot mailing list