Well, thank you for the answer, but the actual issue is that data sent by the decoder (stipulated in the conf file) is properly collected by dovecot core, but /not/ sent to the plugin : the plugin receives the original data.

This is not linked to a particular plugin (xapian, solr, squat, etc..) but seems to be a general issue of dovecot core



On 2021-02-08 01:03, John Fawcett wrote:

On 07/02/2021 18:51, Joan Moreau wrote:

more info : the function fts_parser_script_more in plugins/fts/fts-parser.c properly read the output of the script

still, the data is not sent to the FTS pligins (xapian or any other)



On 2021-02-07 17:37, Joan Moreau wrote:

more info : I am running dovecot git version


On 2021-02-07 17:15, Joan Moreau wrote:

a bit more on this, adding log in the decode2text.sh, I can see that pdftotext output the right data, but that data is /not/ transmitted to the fts plugin for indexing (only the original pdf code is)



On 2021-02-07 17:00, Joan Moreau wrote:

Hello,

I am trying to deal properly with email attachements in fts-xapian plugins.

I tried the default script with a PDF file.

The data I receive in the fts plugin part ("xxx_build_more") is the original document, no the output of the pdftotext

Is there anything I am missing ?

Here my config:


plugin {
        plugin = fts_xapian managesieve sieve

        fts = xapian
        fts_xapian = partial=2 full=20 verbose=1 attachments=1

        fts_autoindex = yes
        fts_enforced = yes
        fts_autoindex_exclude = \Trash
        fts_autoindex_exclude2 = \Drafts

        fts_decoder = decode2text

        sieve = /data/mail/%d/%n/local.sieve
        sieve_after = /data/mail/after.sieve
        sieve_before = /data/mail/before.sieve
        sieve_dir = /data/mail/%d/%n/sieve
        sieve_global_dir = /data/mail
        sieve_global_path = /data/mail/global.sieve
}

...

service decode2text {
   executable = script /usr/libexec/dovecot/decode2text.sh
   user = dovecot
   unix_listener decode2text {
     mode = 0666
   }
}


Thank you


Joan

I'm not sure I can be much use for xapian, but looking at your configuration I did notice some differences with the documentation. I don't know if they are relevant to the issue you're seeing.

First of all I don't see

mail_plugins = fts

plugin = fts

settings which are both mentioned in the xapian documentation.

Also the documentation states that attachments=1 can only index text attachments. Maybe you should be using attachments=0 and let fts_decode handle the attachments.

Failing that, I can only advise to turn on some debugging and see what that brings.

best regards

John