On Mon, 2008-07-21 at 12:37 -0400, John Wells wrote:
Guys,
We have a very large maildir for email auditing purposes. It's currently at 600 GB and continues to grow.
Can dovecot handle this with squat indexing, or am I out of my mind?
You can try of course, but that might be a bit too much. :) I've only tested with a 1,4 GB mailbox and memory usage went somewhere like 700 MB I think.
It would be nice if Squat was able to scale to infinitely large mailboxes, but I currently I don't really see how that would be possible.
There are two issues here:
It needs to keep a trie in memory containing all the 4 character blocks of messages. If the input data doesn't contain all that much unique blocks perhaps this doesn't grow too large with 600 GB of data. Maybe this could be somehow changed so that the rarely used trie branches would be written to disk when memory usage gets too high.
Once the entire index is created Dovecot goes through it again and defragments all the pieces. This reduces the index size and speeds up lookups, but if the index doesn't fit entirely to memory this stage can take a really really long time.
Originally I was thinking about dropping this stage since it seemed to take forever, but then I figured out that once I first sequentially read the entire index into memory before starting the defragmentation it would take a lot less time (with the 1,5 GB mailbox it dropped from somewhere around 10 mins -> 0,5 mins). But if your index is larger than what fits into memory, this sequential read is pointless.