fts_encoder

Joan Moreau jom at grosjo.net
Thu Feb 11 18:51:21 EET 2021


Created a PR

https://github.com/dovecot/core/pull/155

On 2021-02-11 13:25, Joan Moreau wrote:

> Hello
> 
> Checking further, and putting logs a bit every where in the dovecot 
> code, the core is sending FIRST the initial document (not decoded) then 
> SECOND the decoded version
> 
> Thisi is really weird, and the indexer then indexes a lot of binary 
> crap
> 
> I am struggling to find where in the code this double call is made.
> 
> Anyone knows ?
> 
> On 2021-02-10 00:05, John Fawcett wrote:
> 
> On 09/02/2021 15:33, Joan Moreau wrote:
> 
> If I place the following code in the plugin 
> fts_backend_xxx_update_build_more function (lucene, squat and xapian, 
> as solr refuses to work properly on my setup)
> 
> {
> char * s = i_strdup("EMPTY");
> if(data != NULL) { i_free(s); s = i_strndup(data,20); }
> i_info("fts_backend_update_build_more: data like '%s'",s);
> i_free(s);
> }
> 
> and if I send a PDF by email, the data shown in the log is "%PDF-1.7 "
> 
> so it does mean the decoder data is not properly transmitted to the 
> plugin
> 
> Something is wrong in the data transmission
> 
> Joan
> 
> I too see something similar with fts_solr. I do see the raw %PDF string 
> and PDF binary data being passed through to 
> fts_backend_xxx_update_build_more function but I disagree with the 
> conclusion you draw from it.
> 
> After the raw data I also see the decoded data, so at least in my case 
> it is possible to see both the raw and decoded data in 
> fts_backend_xxx_update_build_more function. In the rawlog I no longer 
> see the binary data (but some blank lines), so something is filtering 
> it. I do see the decoded data in the rawlog. I do get hits on the solr 
> search for the decoded text.
> 
> John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://dovecot.org/pipermail/dovecot/attachments/20210211/3d67ae16/attachment-0001.html>


More information about the dovecot mailing list