[Dovecot] FTS Plugin design
Rui Carneiro
rui.arc at gmail.com
Mon Apr 20 17:29:25 EEST 2009
Hi,
The problem was on the flag. My hexa to binary conversions was wrong.
Regards,
Rui Carneiro
On Fri, Apr 17, 2009 at 10:03 AM, Rui Carneiro <rui.arc at gmail.com> wrote:
> Thank you for all tips. The design look more clear to me now.
>
> I have one more question. I looked into fts_build_want_index_part() and I
> saw that I need to add some flags to message_part_flags, what values should
> I choose? My first approach was to follow your schema and set
> MESSAGE_PART_FLAG_ATTACHMENT = 0x16. There is any problem with this?
>
> I already had changed parse_content_type() to set ctx->part->flags
> correctly but if i choose my custom flag dovecot assume that all attachment
> lines are headers. I already tried to set those ctx->part->flags as TEXT and
> the fts_backend was feeded correctly with all attachment lines.
>
> I don't know if this is related with the value of
> MESSAGE_PART_FLAG_ATTACHMENT or if I am missing something (like setting
> block.hdr = NULL or some more code to handle new flags).
>
> Thank you,
> Rui Carneiro
>
>
> On Wed, Apr 15, 2009 at 11:23 PM, Timo Sirainen <tss at iki.fi> wrote:
>
>> On Mon, 2009-04-13 at 11:18 +0100, Rui Carneiro wrote:
>> > I didn't understood yet what is the plugin's design and how the plugins
>> are
>> > called from the core system and I was wondering if anyone could help me
>> with
>> > that.
>>
>> fts-storage.c hooks into all the functions in mail-storage API that it
>> needs to. Currently indexing isn't done while messages are being saved,
>> but instead just before searching. The searching functions are:
>>
>> - fts_mailbox_search_init() tries to figure out if FTS can optimize the
>> search. If it does, it tries to figure out if FTS index is up-to-date
>> and if not, starts the search.
>>
>> - fts_mailbox_search_next_nonblock() continues the indexing (or
>> searching after indexing) for a while. The idea is that IMAP connection
>> is able to process other commands while doing a long-running search. So
>> fts plugin indexes FTS_SEARCH_NONBLOCK_COUNT (50) messages at a time. It
>> would be nice if that value was dynamically calculated and also based on
>> bytes instead of messages, but that's maybe too much trouble.
>>
>> - fts_mailbox_search_next_update_seq() uses the fts search results and
>> updates mail-storage's search stuff so that it doesn't go through
>> messages that don't match.
>>
>> - fts_build_mail() indexes a single mail. It parses the messages and
>> returns the data in small blocks. For text/* and message/rfc822 parts
>> those blocks are currently sent to FTS backend. This is where I think
>> you should look into hooking your attachment parsing. Change
>> fts_build_want_index_part() to look for more content-types that you're
>> interested in and then before feeding the blocks to FTS backend put them
>> through your own converter function, something like:
>>
>> int attachment_extract_text(struct attachment_extract_context *ctx,
>> const struct message_block *input, struct message_block *output);
>>
>>
>>
>
>
> --
> mobile: +351 963446125
> mail: rui.arc at gmail.com
> mail: ei04073 at fe.up.pt
> website: http://paginas.fe.up.pt/~ei04073<http://paginas.fe.up.pt/%7Eei04073>
>
--
mobile: +351 963446125
mail: rui.arc at gmail.com
mail: ei04073 at fe.up.pt
website: http://paginas.fe.up.pt/~ei04073
More information about the dovecot
mailing list