Missing hits when performing full-text searches
Frerich Raabe
raabe at froglogic.com
Mon Dec 8 09:17:34 UTC 2014
Hi,
we're running Dovecot 2.1.7 together with Solr for efficient fulltext search.
A couple of days ago we reinstalled our Solr server on a new machine. After
adjusting our Dovecot setup to use the new server, it took a few days to
notice that something seems fishy about our full-text search: expected hits
wouldn't be shown among the search results.
For instance, one of the folders (a shared, read-only folder which is
basically a mailinglist archive) with about 210k messages has a plaintext
mail with the text 'Amman'. However, logging into the IMAP server and
issueing a
. SEARCH TEXT Amman
In the folder doesn't yield any hits. It seems that this happens for older
mails only -- trying other keywords, we did notice hits in recent mails but
not in older ones. Some caching related to the old Solr server causing
issues?
Debugging this further, I noticed that the above IMAP command shows this in
the Solr log files:
INFO: [] webapp=/solr path=/select
params={fl=uid,score&sort=uid+asc&q=(hdr:"Amman"+OR+body:"Amman")&fq=%2Bbox:b68ece09e22fb9502d34010017227a26+%2Buser:""&rows=209392}
hits=0 status=0 QTime=229
And indeed, something like
$ curl
'http://indexer:8080/solr/select?fl=uid,score&sort=uid+asc&q=(hdr:"Amman"+OR+body:"Amman")&fq=%2Bbox:b68ece09e22fb9502d34010017227a26+%2Buser:""&rows=209392'
Yields no results. However, I noticed that if I remove the 'fq=' part from
the query then I get a bunch of hits. Alas, I don't know whether those are to
be expected or not.
Does anybody have an idea what might cause this, or what the meaning of that
'box' checksum is?
--
Frerich Raabe - raabe at froglogic.com
www.froglogic.com - Multi-Platform GUI Testing
More information about the dovecot
mailing list