[Dovecot] Design: Optimizations for high-latency storage backend

Tue Dec 29 20:28:46 EET 2009

Mainly intended for future support of key-value (etc.) databases, but
this is probably useful even for regular disk I/O. By using async I/O
for disk access, the OS could more intelligently do seeks on disk to
read the wanted data with less latency.

Problem #1: Accessing message body while searching (e.g. SEARCH BODY).

The idea is that searching wouldn't block. Caller would typically use
mailbox_search_next_nonblock() function, which can return "try again
later" result. Whenever search sees a query that could potentially block
(e.g. SEARCH BODY, or searching a header that might not be cached), it
goes into a slightly different search mode:

 - the whole search would typically be run from beginning to end
immediately, and the search result state would be stored for each
message.
 - whenever search runs into a situation where the result can't be
determined without opening message body, it issues a prefetch command
for that message, marks the result for the message as "incomplete", and
continues to the next message
 - once the search is done with all messages, it starts replying to the
caller. if the next search result is incomplete at the time, the search
returns "try again later".

optimizations:
 - probably there should be prefetch limit (a queue) so that only maybe
10 requests are actually pending at a time.
 - probably the whole search shouldn't run immediately. maybe it should
stop after having 10 requests pending.
 - prefetch could return only the needed parts of the message, such as
"only header" or "only body" (or in future something like "message
without attachments")

There would be a new call, something like
mailbox_search_set_nonblock_callback() that specifies a function to be
called when next search result is available. The callback then would
continue the search. This is similar to how IMAP SEARCH is already
implemented, except currently when mailbox_search_next_nonblock()
returns "try again later", Dovecot simply adds a 0 second timeout after
which it's called again (that's enough for Dovecot to do some other
pending work).

Problem #2: Accessing index cache while searching.

Cache file is typically in memory or in low-latency disk already. Anyway
something similar to #1 could be implemented for it. "Make sure cache
x..y bytes in cache file have been read to memory" call or something.

Problem #3: Fetching message body (and uncached fields).

The nice thing about lib-storage's fetching API is that the caller
already specifies what fields it intends to access in
mailbox_search_init(). So this code can use a similar technique than
searching:

 - callers only use mailbox_search_next_nonblock()
 - which keeps returning "try again later" until the message body has
been read to memory
 - trying to access a field that's not specified in wanted_fields in
mailbox_search_init() could be (optionally) treated as a programming
error and cause a crash instead of a blocking wait.

Problem #4: How to drop latencies between IMAP commands?

For example a (stupid) client issues commands:

a FETCH 1 BODY.PEEK[]
a FETCH 2 BODY.PEEK[]
..etc..

You really don't want to a delay between each command. So what Dovecot
code needs to happen is to start processing the next command while
waiting for previous one to finish reading the body to memory. A lot of
this actually already works as intended, but there are some cases that
don't, such as FETCH + SELECT (see #5).

Problem #5: Opening mailbox index.

So if your indexes are also stored in high-latency disk, how do you
optimize getting STATUS for each mailbox? Again, a similar idea than
before: Create a new mailbox_open_nonblock() function that calls a given
callback function when the indexes are read to memory. During the wait
it can continue processing more STATUS commands (hopefully client sends
them all at once, instead of waiting for each reply before sending next
STATUS).

..

I think that's all of the problematic cases that could cause blocking.
And the solution is always the same: Use prefetching and "try again
later" status code with a callback function to be called when it's
ready. Luckily there are only two functions that need the above
behavior, implementing async callbacks for all functions would have made
programming it horrible.

The main problem will then be to optimize what and how much to prefetch.
For example in the cache file case, should the cache be read to memory
when mailbox is opened, or should it be delayed until something is
actually wanted to be read from it? Should all of it be read, or should
Dovecot attempt to be smarter to reduce memory usage and I/O bandwidth,
so e.g. if the cache is wanted to be accessed to 3rd last message, maybe
just read the cache file beginning from that message to EOF? Transaction
log files have a similar problem. Actually all of this is similar to how
Dovecot attempts to cache them to memory when using mmap_disable=yes, so
it's not really a new problem.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20091229/a8161961/attachment.bin