Huge difference between the lucene index size created by v2.1 and v2.2
Hi everyone,
While examining dovecot versions v2.1 and v2.2 for their lucene search performances, I have noticed a huge difference in the index sizes created by them. Both versions were compiled on same system, against same libclucene, same configure options and were used with same dovecot.conf during run-time. I deleted the lucene-indexes folder and dovecot* files in the Maildir prior to indexing with both versions. The tests were performed on a untouched mail archive folder containing 300000 mails without any dovecot* files in it:
root@server:/home/admin/mails/.Archive# ls -l total 78640 drwx------ 2 2500 2500 14098432 Aug 17 22:16 cur drwx------ 2 2500 2500 12435456 Jul 30 09:46 new drwx------ 2 2500 2500 4096 Aug 2 13:02 tmp
The command used was:
doveadm -v index -u admin Archive
After search indexing on v2.1 resulted in:
root@server:/home/admin/mails# ls -lh lucene-indexes total 390M -rw------- 1 2500 2500 390M Aug 18 07:03 _25.cfs -rw------- 1 2500 2500 20 Aug 18 07:03 segments.gen -rw------- 1 2500 2500 46 Aug 18 07:03 segments_4d
Whereas dovecot v2.2 resulted in:
root@server:/home/admin/mails# ls -lh lucene-indexes total 1.5G -rw------- 1 2500 2500 1.5G Aug 18 06:41 _5g.cfs -rw------- 1 2500 2500 20 Aug 18 06:41 segments.gen -rw------- 1 2500 2500 46 Aug 18 06:41 segments_az
390M vs 1.5G. That is a huge difference in size. Why is that?
Thanks in advance.
-Regards, Akash
On 18 Aug 2014, at 09:42, Akash <akbwiz+dovecot@gmail.com> wrote:
390M vs 1.5G. That is a huge difference in size. Why is that?
Can you test if the attached patch shrinks it back? I had been planning on making that also configurable. There might be something else also causing it.
Thanks for checking. The patch didn't make any significant difference. Now its 1.3G instead of 1.5G.
root@server:~# ls -lh /home/admin/mails/lucene-indexes total 1.3G -rw------- 1 nobody nogroup 1.3G Aug 18 11:30 _4a.cfs -rw------- 1 nobody nogroup 20 Aug 18 11:30 segments.gen -rw------- 1 nobody nogroup 46 Aug 18 11:30 segments_8n
On 18-08-2014 14:17, Timo Sirainen wrote:
Actually that same header indexing behavior was already in v2.1. But I found the real problem now: http://hg.dovecot.org/dovecot-2.2/rev/febedba15c7e
That bug was actually already in v2.1, but because memory was always allocated from stack it wasn't causing as many problems.
On 18 Aug 2014, at 13:34, Akash <akbwiz+dovecot@gmail.com> wrote:
participants (2)
-
Akash
-
Timo Sirainen