Did you test what the rate was approximately with some somewhat sensible search strings?
I'll try. Sensible searchstrings are not that easy to come up with though. The data indexed are the bodyparts of an UML (linux) archive.
First line is nr of candidates. Second line is result of a grep -i in the rawdata. (Ouf of a total of almost 13000 maps).
grep -c -i may be more useful. (then the two numbers are directly comparable)
I certainly think that orders of magnitude is more important than O() in the search case of mailboxes where messages come and go regularly ... this type of index is actually appendable.
Some worse cases: jensl:~/project/jelindex> ./search.sh "management" 5098 341 jensl:~/project/jelindex> ./search.sh "Timo Sirainen" 494 0
??? What Timo Sirainen requires all of the pairs: "ti", "im", "mo", "o ", " S", "Si" ... and so on. Longer strings should (all else being equal) result in greater accuracy.
I understand spaces are treated exactly the same way as characters in IMAP search.
Rob.