[Dovecot] Design: Asynchronous I/O for single/multi-dbox

Timo Sirainen tss at iki.fi
Mon Mar 15 21:55:31 EET 2010


On 15.3.2010, at 21.37, Edgar Fuß wrote:

>> handle = open(path, mode)
>> - this function can't fail. it's executed asynchronously.
> Does that mean you can successfully open("/nonexistent", mode); write() to it over and over again and only the commit() fails?

Probably. But in mdbox case you lock the file first. Hmm. But I suppose locking can't work until the file is fully opened, so I guess there needs to be a way to set some kind of callback to "open finished" too.

>> handle = create(path, mode, permissions)
> [...]
>> - mode=fail-if-exists: commit() fails if file already exists
>> - mode=replace-if-exists: commit() replaces file if it already exists
> s/commit/create/g

No, commit actually. create() is asynchronous. Only at commit() stage it actually tries to create/replace the file. Writes in general are atomic, they won't be visible until commit() is called.

>> ret = pread(handle, buf, size, offset)
>> - just like pread(), [...]
> Hm, pread() works like pread()? What do I misunderstand?

It was describing an API, so kind of like io_api->pread() is like pread() syscall.

>> ret = try_lock(handle)
>> - this isn't an asynchronous operation! it assumes that locking state
>> is kept in memory, so that the operation will be fast.
> So does this only lock against the same process or how is locking state supposed to be in memory?

I mean it's going to use fcntl() or whatever OS locking (as opposed to some slow remote locking with remote storages).

>> if backend doesn't support locking or it's slow, single-dbox should be
>> used (instead of multi-dbox), because it doesn't need locking.
>> - returns success or "already locked"
> And if the backend doesn't support locking?

It says that above :) If you can't use locking, you can't use multi-dbox.

>> Async input streams' read() would work exactly as file_istream works for
>> non-blocking sockets: It would return data that is already buffered in
>> memory. If there's nothing, it returns EAGAIN. The FS API's
>> set_input_callback() can be used to set a callback function that is
>> called whenever there's more data available in the buffer.
> I don't understand what triggers that reading into memory. Is that supposed to happen automaticly in the background or has it to be initiated by the program?

The pread() call triggers reading into memory the amount of data that was requested. The first pread() fails with EAGAIN, then it starts reading on background and once it finishes, pread() returns the amount of read data. Possibly it could do readahead on background too, but that's optimization and more relevant to high-latency remote storage than local disk storage.

>> * POSIX AIO isn't supported by Linux kernel. And even if it was, it
>> only supports async reads/writes, not async open().
> Doesn't help you, but NetBSD 5 does support Posix AIO, I think.

Yeah, looks like BSDs have it after all. I tried a few Google searches first, but I didn't then find anything specific about it.


More information about the dovecot mailing list