[Dovecot] New full text search indexer

Daniel Watts d at nielwatts.com
Wed Nov 14 17:04:15 EET 2007


Timo Sirainen wrote:
> On Fri, 2007-04-06 at 00:34 +0300, Timo Sirainen wrote:
>> Squat-like 4 byte substrings (but can answer 1-3 char queries also):
> 
> Indexing a 1,4GB Linux kernel mailing list mbox with 367919 messages:
> 
> UID count: 367919
> Index time: 129.86 CPU seconds (10.43MB/CPUs), 132.47 seconds
> (10.23MB/s)
> Memory: UID list 143162 kB, heap total 263996 kB
> Node space: 9182910 bytes (8967 kB) in 47989 nodes
> UID space: 357344706 bytes (348969 kB) in 7404211 lists
> Total space: 366527616 / 1420611590 bytes = 25.80%
> 
> So the process takes 260MB of memory in the end. I think that's OK. If I
> removed memory limits, it would use 1,4GB of memory and index file size
> would drop to 17,96%. That could be achieved also by post-processing the
> indexes.
> 
> gzip compression makes the uidlist still 25% smaller (total space
> 19,50%). It'd have to be used to compress the file in smaller blocks
> because zlib doesn't support quickly seeking inside the file. That would
> probably waste some space. I don't think it's worth the extra CPU time
> and complexity.
> 
> Currently the uidlists are stored either as "packed UID ranges" or
> bitmasks, whichever takes less space. I haven't yet tried how much space
> could be saved by using a combination of these within uidlists.
> 
> I'd still have to figure out how to do incremental updates without
> wasting a lot of space. And support the actual searching. :)
> 
> http://dovecot.org/tmp/new-indexer.c updated.
> 

Timo - we were just  having a conversation about how we might be able to 
provide full body indexed search for our clients and I realised it might 
be worth checking the Dovecot list to see if this has been done already...

And then I find this thread!

What is the latest? Searches in the headers work fantastically but any 
search in a sizeable folder (1000's messages) often just time out.

Would love to offer a full text search so let me know how far you have 
gotten.

Will the search be able to offer phrase searching? Eg "Search for this 
string" as an exact match?

Fantastic - very happy to see you're working on adding this.

Daniel



More information about the dovecot mailing list