[Dovecot] slow squat fts index creation
Hi all, ive been playing with squat indexes. Up to about 300.000 emails in a single mailbox this was working flawlessly. The search index file is about 500MB at that time. Ive now added some more emails, and at 450.000 or so emails im seeing a serious problem with squat index creation. It takes...f o r e v e r . The .tmp file is being so slowly, it will probably take 2-3 hours to create. Upto this point it took maybe a minute.
Im doing this in an openvz container, so theoretically i may be hitting some openvz resource limit. But ive upped all the limits and dont see any improvements. I dont see any resources starvation either.
Could there be some dovecot issue when the search index reaches say 1GB? (im estimating that it's now trying to save about 1GB search index).
Regards,
Cor
On Tue, 2011-05-24 at 17:01 +0200, Cor Bosman wrote:
Hi all, ive been playing with squat indexes. Up to about 300.000 emails in a single mailbox this was working flawlessly. The search index file is about 500MB at that time. Ive now added some more emails, and at 450.000 or so emails im seeing a serious problem with squat index creation. It takes...f o r e v e r . The .tmp file is being so slowly, it will probably take 2-3 hours to create. Upto this point it took maybe a minute.
Im doing this in an openvz container, so theoretically i may be hitting some openvz resource limit. But ive upped all the limits and dont see any improvements. I dont see any resources starvation either.
Could there be some dovecot issue when the search index reaches say 1GB? (im estimating that it's now trying to save about 1GB search index).
Initially squat just builds a large unorganized index. The last step is the organization. This is the main problem with Squat's indexing speed. The file is mmaped() and the accessed in pretty random order. As long as you have enough memory to keep all of this mmaped data in physical memory this works pretty fast, but otherwise the kernel starts page faulting like crazy and it takes forever. That's why the Squat has this code:
/* Tell the kernel we're going to use the uidlist data, so it loads
it into memory and keeps it there. */
(void)madvise(uidlist->mmap_base, uidlist->mmap_size, MADV_WILLNEED);
/* It also speeds up a bit for us to sequentially load everything
into memory, although at least Linux catches up quite fast even
without this code. Compiler can quite easily optimize away this
entire for loop, but volatile seems to help with gcc 4.2. */
for (i = 0; i < uidlist->mmap_size; i += page_size)
((const volatile char *)uidlist->data)[i];
participants (2)
-
Cor Bosman
-
Timo Sirainen