8 Feb
2021
8 Feb
'21
11:11 p.m.
On 2021-02-08, Joan Moreau jom@grosjo.net wrote:
Well, in the function xxx_build_more of FTS plugin, the data received in the original PDF, not the output of pdftotext
Can you clarify where do you put your log in the solr plugin , so I can check the situation in the xapian plugin ?
The log is particular to fts_solr, you set it with e.g.
"fts_solr = url=http://127.0.0.1:8983/solr/dovecot/ rawlog_dir=/tmp/solr"
Confirmed it works for me, i.e. passes text from inside the pdf, and not the whole pdf itself.
Did you check that decode2text.sh works ok on your system (when running as the relevant uid)?
cat foo.pdf | sudo -u dovecot /usr/libexec/dovecot/decode2text.sh application/pdf