[Dovecot] FTS Plugin design
rui.carneiro at portugalmail.net
Fri May 15 19:47:57 EEST 2009
Citando Timo Sirainen <tss at iki.fi>:
> 1. You notice a non-text/* content-type and initialize text extraction
> for the MIME part. Like:
> struct attachment_extract_context *
> attachment_extract_init(const char *content_type);
> 2. After this you feed all the input belonging to that MIME part to:
> int attachment_extract_add(struct attachment_extract_context *ctx,
> const struct message_block *input);
> Don't output anything to FTS backend at this point. The
> attachment_extract_add() would probably just basically write to a
> temporary file.
> 3. Finally you'll notice that the MIME part ends (either you get headers
> for the next MIME part or the entire message ends). Then finish the
> extraction, which actually executes the whatever conversion binaries:
> int attachment_extract_finish(struct attachment_extract_context *ctx);
> 4. Get the resulting text to fts_backend_build_more() somehow. Either
> some attachment_extract_add_to_fts() which internally adds it or some
> kind of an iterator that returns the text in smaller blocks. Either
> would work..
> That kind of an API would also make it possible to pretty easily modify
> in future to not write temporary files for specific content types if
> it's not required.
I tried your approach and I think it is working pretty well. Now I only need to look carefully to the output of external programs and build the XML correctly to send to Solr.
Portugalmail, Comunicações S.A.
More information about the dovecot