Missing hits when performing full-text searches

Frerich Raabe raabe at froglogic.com
Mon Dec 8 09:17:34 UTC 2014


Hi,

we're running Dovecot 2.1.7 together with Solr for efficient fulltext search.

A couple of days ago we reinstalled our Solr server on a new machine. After 
adjusting our Dovecot setup to use the new server, it took a few days to 
notice that something seems fishy about our full-text search: expected hits 
wouldn't be shown among the search results.

For instance, one of the folders (a shared, read-only folder which is 
basically a mailinglist archive) with about 210k messages has a plaintext 
mail with the text 'Amman'. However, logging into the IMAP server and 
issueing a

   . SEARCH TEXT Amman

In the folder doesn't yield any hits. It seems that this happens for older 
mails only -- trying other keywords, we did notice hits in recent mails but 
not in older ones. Some caching related to the old Solr server causing 
issues?

Debugging this further, I noticed that the above IMAP command shows this in 
the Solr log files:

   INFO: [] webapp=/solr path=/select 
params={fl=uid,score&sort=uid+asc&q=(hdr:"Amman"+OR+body:"Amman")&fq=%2Bbox:b68ece09e22fb9502d34010017227a26+%2Buser:""&rows=209392} 
hits=0 status=0 QTime=229

And indeed, something like

   $ curl 
'http://indexer:8080/solr/select?fl=uid,score&sort=uid+asc&q=(hdr:"Amman"+OR+body:"Amman")&fq=%2Bbox:b68ece09e22fb9502d34010017227a26+%2Buser:""&rows=209392'

Yields no results. However, I noticed that if I remove the 'fq=' part from 
the query then I get a bunch of hits. Alas, I don't know whether those are to 
be expected or not.

Does anybody have an idea what might cause this, or what the meaning of that 
'box' checksum is?

-- 
Frerich Raabe - raabe at froglogic.com
www.froglogic.com - Multi-Platform GUI Testing


More information about the dovecot mailing list