[Dovecot] Design: Optimizations for high-latency storage backend
Mainly intended for future support of key-value (etc.) databases, but this is probably useful even for regular disk I/O. By using async I/O for disk access, the OS could more intelligently do seeks on disk to read the wanted data with less latency.
Problem #1: Accessing message body while searching (e.g. SEARCH BODY).
The idea is that searching wouldn't block. Caller would typically use mailbox_search_next_nonblock() function, which can return "try again later" result. Whenever search sees a query that could potentially block (e.g. SEARCH BODY, or searching a header that might not be cached), it goes into a slightly different search mode:
- the whole search would typically be run from beginning to end immediately, and the search result state would be stored for each message.
- whenever search runs into a situation where the result can't be determined without opening message body, it issues a prefetch command for that message, marks the result for the message as "incomplete", and continues to the next message
- once the search is done with all messages, it starts replying to the caller. if the next search result is incomplete at the time, the search returns "try again later".
optimizations:
- probably there should be prefetch limit (a queue) so that only maybe 10 requests are actually pending at a time.
- probably the whole search shouldn't run immediately. maybe it should stop after having 10 requests pending.
- prefetch could return only the needed parts of the message, such as "only header" or "only body" (or in future something like "message without attachments")
There would be a new call, something like mailbox_search_set_nonblock_callback() that specifies a function to be called when next search result is available. The callback then would continue the search. This is similar to how IMAP SEARCH is already implemented, except currently when mailbox_search_next_nonblock() returns "try again later", Dovecot simply adds a 0 second timeout after which it's called again (that's enough for Dovecot to do some other pending work).
Problem #2: Accessing index cache while searching.
Cache file is typically in memory or in low-latency disk already. Anyway something similar to #1 could be implemented for it. "Make sure cache x..y bytes in cache file have been read to memory" call or something.
Problem #3: Fetching message body (and uncached fields).
The nice thing about lib-storage's fetching API is that the caller already specifies what fields it intends to access in mailbox_search_init(). So this code can use a similar technique than searching:
- callers only use mailbox_search_next_nonblock()
- which keeps returning "try again later" until the message body has been read to memory
- trying to access a field that's not specified in wanted_fields in mailbox_search_init() could be (optionally) treated as a programming error and cause a crash instead of a blocking wait.
Problem #4: How to drop latencies between IMAP commands?
For example a (stupid) client issues commands:
a FETCH 1 BODY.PEEK[] a FETCH 2 BODY.PEEK[] ..etc..
You really don't want to a delay between each command. So what Dovecot code needs to happen is to start processing the next command while waiting for previous one to finish reading the body to memory. A lot of this actually already works as intended, but there are some cases that don't, such as FETCH + SELECT (see #5).
Problem #5: Opening mailbox index.
So if your indexes are also stored in high-latency disk, how do you optimize getting STATUS for each mailbox? Again, a similar idea than before: Create a new mailbox_open_nonblock() function that calls a given callback function when the indexes are read to memory. During the wait it can continue processing more STATUS commands (hopefully client sends them all at once, instead of waiting for each reply before sending next STATUS).
..
I think that's all of the problematic cases that could cause blocking. And the solution is always the same: Use prefetching and "try again later" status code with a callback function to be called when it's ready. Luckily there are only two functions that need the above behavior, implementing async callbacks for all functions would have made programming it horrible.
The main problem will then be to optimize what and how much to prefetch. For example in the cache file case, should the cache be read to memory when mailbox is opened, or should it be delayed until something is actually wanted to be read from it? Should all of it be read, or should Dovecot attempt to be smarter to reduce memory usage and I/O bandwidth, so e.g. if the cache is wanted to be accessed to 3rd last message, maybe just read the cache file beginning from that message to EOF? Transaction log files have a similar problem. Actually all of this is similar to how Dovecot attempts to cache them to memory when using mmap_disable=yes, so it's not really a new problem.
Oh, and doing all the prefetching stuff is made a lot easier by implementing the filesystem API change I had talked about previously (so at least indexes and dbox code would do all FS access via the API). And the FS API change will be a lot easier if doesn't need to support things like locking or overwriting data, so to do that also requires redesign of index files. I don't remember if I wrote specifics of that to this mailing list, but I've thought out most of that too. And the index redesign is the reason I'm considering v3.0 as the next version number after v2.0. (Although v2.1 might also come with some smaller new features while I'm also working on v3.0 features.)
On 2009-12-29 2:06 PM, Timo Sirainen wrote:
Oh, and doing all the prefetching stuff is made a lot easier by implementing the filesystem API change I had talked about previously (so at least indexes and dbox code would do all FS access via the API). And the FS API change will be a lot easier if doesn't need to support things like locking or overwriting data, so to do that also requires redesign of index files. I don't remember if I wrote specifics of that to this mailing list, but I've thought out most of that too. And the index redesign is the reason I'm considering v3.0 as the next version number after v2.0. (Although v2.1 might also come with some smaller new features while I'm also working on v3.0 features.)
Hi Timo,
Thanks for sharing your thoughts/plans about future changes/enhancements to dovecot like this - I really enjoy reading them, and I'm sure others do too...
--
Best regards,
Charles
participants (2)
-
Charles Marcus
-
Timo Sirainen