[Dovecot] Squat indexing a Maildir of over 600 GB?

Timo Sirainen tss at iki.fi
Mon Jul 21 19:50:50 EEST 2008


On Mon, 2008-07-21 at 12:37 -0400, John Wells wrote:
> Guys,
> 
> We have a very large maildir for email auditing purposes. It's
> currently at 600 GB and continues to grow.
> 
> Can dovecot handle this with squat indexing, or am I out of my mind?

You can try of course, but that might be a bit too much. :) I've only
tested with a 1,4 GB mailbox and memory usage went somewhere like 700 MB
I think.

It would be nice if Squat was able to scale to infinitely large
mailboxes, but I currently I don't really see how that would be
possible.

There are two issues here:

1) It needs to keep a trie in memory containing all the 4 character
blocks of messages. If the input data doesn't contain all that much
unique blocks perhaps this doesn't grow too large with 600 GB of data.
Maybe this could be somehow changed so that the rarely used trie
branches would be written to disk when memory usage gets too high.

2) Once the entire index is created Dovecot goes through it again and
defragments all the pieces. This reduces the index size and speeds up
lookups, but if the index doesn't fit entirely to memory this stage can
take a really really long time.

Originally I was thinking about dropping this stage since it seemed to
take forever, but then I figured out that once I first sequentially read
the entire index into memory before starting the defragmentation it
would take a lot less time (with the 1,5 GB mailbox it dropped from
somewhere around 10 mins -> 0,5 mins). But if your index is larger than
what fits into memory, this sequential read is pointless.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20080721/f1f76968/attachment.bin 


More information about the dovecot mailing list