[Dovecot] FTS Plugin design
    Rui Carneiro 
    rui.arc at gmail.com
       
    Tue May  5 14:08:41 EEST 2009
    
    
  
Hi again,
On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen <tss at iki.fi> wrote:
>  - fts_build_mail() indexes a single mail. It parses the messages and
> returns the data in small blocks. For text/* and message/rfc822 parts
> those blocks are currently sent to FTS backend. This is where I think
> you should look into hooking your attachment parsing. Change
> fts_build_want_index_part() to look for more content-types that you're
> interested in and then before feeding the blocks to FTS backend put them
> through your own converter function, something like:
>
> int attachment_extract_text(struct attachment_extract_context *ctx,
> const struct message_block *input, struct message_block *output);
Let's take the example of an application-pdf content-type. Before I
converter all pdf data to text I need to gather all data before. The actual
process is feeding FTS backend with small parts of data and appending them
on "build_more" functions (e.g. fts_backend_solr_build_more()).
So where should I call attachment_extract_text()? In
fts_backend_solr_build_more() and not making append to cmd until data is
extracted? Or gather all information before (e.g. fts_build_mail()) and send
all in once to FTS backend?
I hope I've made myself clear.
Regards,
Rui Carneiro
-- 
Portugalmail, Comunicações S.A.
www.portugalmail.net
    
    
More information about the dovecot
mailing list