fts_encoder
Joan Moreau
jom at grosjo.net
Mon Feb 8 23:33:54 EET 2021
Yes , once again : output of the decoder is fine, I also put log inide
the dovecot core to check whether data is properly transmitted, and
result is that it is (i.e. dovecot core receives the proper output of
pdftotext via the decoder
Now, that data is the /not/ the one sent from dovecot core to the fts
plugin (and this is the same issue for solr and all other plugins)
Of course, the stemming will show a good results (as PDF content will be
stemmed) but the problem does remain.
How to make sure the data sent to the FTS plugins (xapian, solr,
whatever...) is the the output of the decoder and /not/ the original
data ?
On 2021-02-08 21:11, Stuart Henderson wrote:
> On 2021-02-08, Joan Moreau <jom at grosjo.net> wrote:
>
>> Well, in the function xxx_build_more of FTS plugin, the data received
>> in
>> the original PDF, not the output of pdftotext
>>
>> Can you clarify where do you put your log in the solr plugin , so I
>> can
>> check the situation in the xapian plugin ?
>
> The log is particular to fts_solr, you set it with e.g.
>
> "fts_solr = url=http://127.0.0.1:8983/solr/dovecot/
> rawlog_dir=/tmp/solr"
>
> Confirmed it works for me, i.e. passes text from inside the pdf, and
> not
> the whole pdf itself.
>
> Did you check that decode2text.sh works ok on your system (when running
> as the relevant uid)?
>
> cat foo.pdf | sudo -u dovecot /usr/libexec/dovecot/decode2text.sh
> application/pdf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210208/98119731/attachment.html>
More information about the dovecot
mailing list