On 10/7/2012 7:11 PM, Timo Sirainen wrote:
I don't think dovecot.index file is much of a problem. With 1M mails it usually only takes something like 8-32 MB of memory depending on what mailbox format is used. dovecot.index.log file doesn't depend on the mailbox size at all. The main problem is dovecot.index.cache file.
I've thought about the cache file problems earlier also, but it's a bit difficult to figure out the best solution for it. And since nobody had actually complained about it, I hadn't really done anything about it. Also I hadn't previously thought of LMTP/LDA processes crashing because of it, that's a bigger problem than IMAP process crashing. Although I think you're getting a lot more of "mmap(dovecot.index.cache) failed: Out of memory" errors than crashes for large mailboxes?
So, subproblems related to this:
Filling out dovecot.index.cache too easily. A rather simple possibility that would catch all the possible ways would be to limit the max. size of a single message's cache entry to X kilobytes (64?). If it becomes larger, it's simply not written to the cache file.
Filling out memory too easily. If a long header is wanted to be cached or used for other purposes (e.g. Message-ID), it's still fully read into memory. Add some reasonable limit to max. length of a single header. Can't be too small, because some headers are legitimately pretty long (DKIM and such). Maybe something like 10kB would be safe enough for everyone?
If existing dovecot.index.cache is larger than X MB, shrink it first below X. Shrinking could begin with trying to do it the nice way of removing only unneeded data, but if that fails it could forcibly just remove some old messages. The X would have to be related to the process's VSZ limit.
Dovecot currently doesn't close index files immediately when mailbox is closed, because it's thinking that IMAP clients might reopen the index soon anyway. Max 3 indexes can be kept open, so 3x already different very large indexes can be too much. I'm not sure if this is actually useful at all. Maybe I should disable it for LMTP, or maybe just remove it completely.
The 3. part is what I like changing the least. An alternative solution would be to just not map the entire cache file into memory all at once. The code was actually originally designed to do just that, but munmap()ing + mmap()ing again wasn't very efficient. But for LMTP there's really no need to map the whole file. All it really wants is to read a couple of header records and then append to the file. Maybe it could use an alternative code path that would simply do that instead of mmap()ing anything. It wouldn't solve it for IMAP though.
50-70 is way too little. The cached subject gets sent to the IMAP client. I think 200 bytes would be minimum and 1000 would be something I could probably even hardcode. But anyway, subject isn't the only way to trigger this and 1000 bytes is too low for some headers.
Nearly all mail servers have two resources of interest here in great excess: CPU cycles, and cache/RAM b/w, due to multicore CPUs and 2-4 memory channels per socket. The two bottlenecks are IO bandwidth/latency, and, for many, RAM capacity. So let's take advantage of both the strengths and weaknesses of our hardware to possibly address the above issue.
What happens if we insert a subroutine to compress/decompress each field in the cache array files individually, in real time? You should still be able to mmap the files. The individual array fields and total cache file sizes would be much smaller on disk and in memory. Any cache file contents mapped to memory, that aren't currently being used, are stored compressed in memory, directly addressing the problem in this thread. When a field is needed we decompress it on the fly after reading it from memory. This should be very fast as the fields are relatively small. When it's written out we compress on the fly. With each field stored compressed on disk, not only is file size decreased, but more importantly, each read/write moves more data per physical IO. So not only are increasing storage capacity, we're also decreasing IOPS.
It would be preferable to do this de/compression in kernel rather than user space, but I don't think that's a real option. However, libz and libbz2 are pretty fast and small, and the code easily fits in CPU cache. Combined with the massive L1/L2/L3 and RAM b/w of modern systems, execution in user space should still be very fast, and not noticeably degrade performance.
I'm not a programmer, so I have no idea if this is even plausible, or possible. But if it is, it seems worth exploring, as it would seem to benefit Dovecot performance in multiples areas, and possibly solve this, and other current/future memory capacity and/or performance related problems.
-- Stan