Re: [Dovecot] FTS Plugin design

15 May 2009

      Citando Timo Sirainen <tss@iki.fi>:
...

You notice a non-text/* content-type and initialize text extraction
for the MIME part. Like:

struct attachment_extract_context *
attachment_extract_init(const char *content_type);

After this you feed all the input belonging to that MIME part to:

int attachment_extract_add(struct attachment_extract_context *ctx,
const struct message_block *input);
Don't output anything to FTS backend at this point. The
attachment_extract_add() would probably just basically write to a
temporary file.

Finally you'll notice that the MIME part ends (either you get headers
for the next MIME part or the entire message ends). Then finish the
extraction, which actually executes the whatever conversion binaries:

int attachment_extract_finish(struct attachment_extract_context *ctx);

Get the resulting text to fts_backend_build_more() somehow. Either
some attachment_extract_add_to_fts() which internally adds it or some
kind of an iterator that returns the text in smaller blocks. Either
would work..

That kind of an API would also make it possible to pretty easily modify
in future to not write temporary files for specific content types if
it's not required.
I tried your approach and I think it is working pretty well. Now I only need to look carefully to the output of external programs and build the XML correctly to send to Solr.
Thanks Timo
Regards,
Rui Carneiro
--
Portugalmail, Comunicações S.A.
www.portugalmail.net