Joan Moreau jom at grosjo.net
Mon Feb 8 23:33:54 EET 2021

Yes , once again : output of the decoder is fine, I also put log inide 
the dovecot core to check whether data is properly transmitted, and 
result is that it is (i.e. dovecot core receives the proper output of 
pdftotext via the decoder

Now, that data is the /not/ the one sent from dovecot core to the fts 
plugin (and this is the same issue for solr and all other plugins)

Of course, the stemming will show a good results (as PDF content will be 
stemmed) but the problem does remain.

How to make sure the data sent to the FTS plugins (xapian, solr, 
whatever...) is the the output of the decoder and /not/ the original 
data ?

On 2021-02-08 21:11, Stuart Henderson wrote:

> On 2021-02-08, Joan Moreau <jom at grosjo.net> wrote:
>> Well, in the function xxx_build_more of FTS plugin, the data received 
>> in
>> the original PDF, not the output of pdftotext
>> Can you clarify where do you put your log in the solr plugin , so I 
>> can
>> check the situation in the xapian plugin ?
> The log is particular to fts_solr, you set it with e.g.
> "fts_solr = url= 
> rawlog_dir=/tmp/solr"
> Confirmed it works for me, i.e. passes text from inside the pdf, and 
> not
> the whole pdf itself.
> Did you check that decode2text.sh works ok on your system (when running
> as the relevant uid)?
> cat foo.pdf | sudo -u dovecot /usr/libexec/dovecot/decode2text.sh 
> application/pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210208/98119731/attachment.html>

More information about the dovecot mailing list