[Dovecot] FTS Plugin design
Rui Carneiro
rui.carneiro at portugalmail.net
Fri May 15 19:47:57 EEST 2009
Citando Timo Sirainen <tss at iki.fi>:
> 1. You notice a non-text/* content-type and initialize text extraction
> for the MIME part. Like:
>
> struct attachment_extract_context *
> attachment_extract_init(const char *content_type);
>
> 2. After this you feed all the input belonging to that MIME part to:
>
> int attachment_extract_add(struct attachment_extract_context *ctx,
> const struct message_block *input);
>
> Don't output anything to FTS backend at this point. The
> attachment_extract_add() would probably just basically write to a
> temporary file.
>
> 3. Finally you'll notice that the MIME part ends (either you get headers
> for the next MIME part or the entire message ends). Then finish the
> extraction, which actually executes the whatever conversion binaries:
>
> int attachment_extract_finish(struct attachment_extract_context *ctx);
>
> 4. Get the resulting text to fts_backend_build_more() somehow. Either
> some attachment_extract_add_to_fts() which internally adds it or some
> kind of an iterator that returns the text in smaller blocks. Either
> would work..
>
> That kind of an API would also make it possible to pretty easily modify
> in future to not write temporary files for specific content types if
> it's not required.
>
I tried your approach and I think it is working pretty well. Now I only need to look carefully to the output of external programs and build the XML correctly to send to Solr.
Thanks Timo
Regards,
Rui Carneiro
--
Portugalmail, Comunicações S.A.
www.portugalmail.net
More information about the dovecot
mailing list