<html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /></head><body style='font-size: 9pt; font-family: Verdana,Geneva,sans-serif'>
<p>Well, thank you for the answer, but the actual issue is that data sent by the decoder (stipulated in the conf file) is properly collected by dovecot core, but /not/ sent to the plugin : the plugin receives the original data. </p>
<p>This is not linked to a particular plugin (xapian, solr, squat, etc..) but seems to be a general issue of dovecot core</p>
<p><br /></p>
<p><br /></p>
<p id="reply-intro">On 2021-02-08 01:03, John Fawcett wrote:</p>
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">
<div id="replybody1">
<div>
<div class="v1moz-cite-prefix">On 07/02/2021 18:51, Joan Moreau wrote:</div>
<blockquote type="cite" style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0">
<p>more info : the function fts_parser_script_more in plugins/fts/fts-parser.c properly read the output of the script</p>
<p>still, the data is not sent to the FTS pligins (xapian or any other)</p>
<p><br /></p>
<p><br /></p>
<p id="v1reply-intro">On 2021-02-07 17:37, Joan Moreau wrote:</p>
<blockquote style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0;">
<div id="v1replybody1">
<div style="font-size: 9pt; font-family: Verdana,Geneva,sans-serif;">
<p>more info : I am running dovecot git version</p>
<p><br /></p>
<p id="v1v1reply-intro">On 2021-02-07 17:15, Joan Moreau wrote:</p>
<blockquote style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0;">
<div id="v1v1replybody1">
<div style="font-size: 9pt; font-family: Verdana,Geneva,sans-serif;">
<p>a bit more on this, adding log in the decode2text.sh, I can see that pdftotext output the right data, but that data is /not/ transmitted to the fts plugin for indexing (only the original pdf code is)</p>
<p><br /></p>
<p><br /></p>
<p id="v1v1v1reply-intro">On 2021-02-07 17:00, Joan Moreau wrote:</p>
<blockquote style="padding: 0 0.4em; border-left: #1010ff 2px solid; margin: 0;">
<div id="v1v1v1replybody1">
<div style="font-size: 9pt; font-family: Verdana,Geneva,sans-serif;">
<p>Hello,</p>
<p>I am trying to deal properly with email attachements in fts-xapian plugins.</p>
<p>I tried the default script with a PDF file.</p>
<p>The data I receive in the fts plugin part ("xxx_build_more") is the original document, no the output of the pdftotext</p>
<p>Is there anything I am missing ?</p>
<p>Here my config:</p>
<p><br /></p>
<p><span style="font-family: 'courier new', courier, monospace;">plugin {</span><br /><span style="font-family: 'courier new', courier, monospace;"> plugin = fts_xapian managesieve sieve</span></p>
<p><span style="font-family: 'courier new', courier, monospace;"> fts = xapian</span><br /><span style="font-family: 'courier new', courier, monospace;"> fts_xapian = partial=2 full=20 verbose=1 attachments=1</span></p>
<p><span style="font-family: 'courier new', courier, monospace;"> fts_autoindex = yes</span><br /><span style="font-family: 'courier new', courier, monospace;"> fts_enforced = yes</span><br /><span style="font-family: 'courier new', courier, monospace;"> fts_autoindex_exclude = \Trash</span><br /><span style="font-family: 'courier new', courier, monospace;"> fts_autoindex_exclude2 = \Drafts</span></p>
<p><span style="font-family: 'courier new', courier, monospace;"> fts_decoder = decode2text</span></p>
<p><span style="font-family: 'courier new', courier, monospace;"> sieve = /data/mail/%d/%n/local.sieve</span><br /><span style="font-family: 'courier new', courier, monospace;"> sieve_after = /data/mail/after.sieve</span><br /><span style="font-family: 'courier new', courier, monospace;"> sieve_before = /data/mail/before.sieve</span><br /><span style="font-family: 'courier new', courier, monospace;"> sieve_dir = /data/mail/%d/%n/sieve</span><br /><span style="font-family: 'courier new', courier, monospace;"> sieve_global_dir = /data/mail</span><br /><span style="font-family: 'courier new', courier, monospace;"> sieve_global_path = /data/mail/global.sieve</span><br /><span style="font-family: 'courier new', courier, monospace;">}</span></p>
<p><span style="font-family: 'courier new', courier, monospace;">...</span></p>
<p><span style="font-family: 'courier new', courier, monospace;">service decode2text {</span><br /><span style="font-family: 'courier new', courier, monospace;"> executable = script /usr/libexec/dovecot/decode2text.sh</span><br /><span style="font-family: 'courier new', courier, monospace;"> user = dovecot</span><br /><span style="font-family: 'courier new', courier, monospace;"> unix_listener decode2text {</span><br /><span style="font-family: 'courier new', courier, monospace;"> mode = 0666</span><br /><span style="font-family: 'courier new', courier, monospace;"> }</span><br /><span style="font-family: 'courier new', courier, monospace;">}</span></p>
<p><br /></p>
<p>Thank you</p>
<p><br /></p>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</div>
</div>
</blockquote>
</blockquote>
<p>Joan</p>
<p>I'm not sure I can be much use for xapian, but looking at your configuration I did notice some differences with the documentation. I don't know if they are relevant to the issue you're seeing.</p>
<p>First of all I don't see </p>
<pre><code>mail_plugins = fts</code></pre>
<p>plugin = fts</p>
<p>settings which are both mentioned in the xapian documentation. </p>
<p>Also the documentation states that attachments=1 can only index text attachments. Maybe you should be using attachments=0 and let fts_decode handle the attachments.</p>
<p>Failing that, I can only advise to turn on some debugging and see what that brings.</p>
<p>best regards</p>
<p>John</p>
<p><br /></p>
</div>
</div>
</blockquote>
</body></html>