[Dovecot] Patch: event port-based ioloop and notify
Greetings,
thanks to all of you who work on Dovecot!
I have prepared a small patch to support Solaris 10 and Opensolaris' event port mechanism for both the ioloop and the notify subsystems. It seems to work fine for me, but I haven't conducted any extensive testing.
It would be great if someone could review and/or test it (and if it could eventually enter the code base).
I have uploaded a copy at http://cr.opensolaris.org/~e.p/dovecot/dovecot-01-fen.diff .
Thanks!
-- Emanuele
On Sat, 2009-11-14 at 21:09 +0100, Emanuele Pucciarelli wrote:
I have prepared a small patch to support Solaris 10 and Opensolaris' event port mechanism for both the ioloop and the notify subsystems. It seems to work fine for me, but I haven't conducted any extensive testing.
Does the configure check actually have to test that all the port_* functions work? If they exist in libc, aren't they guaranteed to work? If just their existence is enough, I'd just change the check to test to AC_CHECK_FUNC(port_associate, ..).
Why do you call it fen / file event notification? It seems to be called just "port" in the man pages and everywhere else, so I'd call it that.
Why do you silently handle EBADFs and ENOENTs? Are those actually happening in your system? Dovecot should never close fds before removing them from ioloop, other ioloops also log an error if that happens.
For port_getn() error handling I'd probably do the same as all other ioloops: Ignore EINTR/ETIME and treat everything else as fatal. What's the idea behind BAD_PORTEV_USER?
In general when an error that really shouldn't happen does happen I prefer to fatal/panic than trying to limp along.
- io->path = i_new(char, strlen(path)+1);
- strcpy(io->path, path);
io->path = i_strdup(path);
Timo,
thanks for the detailed review! I'm going to work on the patch and make it follow your comments.
Does the configure check actually have to test that all the port_* functions work? If they exist in libc, aren't they guaranteed to work? If just their existence is enough, I'd just change the check to test to AC_CHECK_FUNC(port_associate, ..).
I don't know of any configurations where they exist and don't work. I'll cut the tests, then.
Why do you call it fen / file event notification? It seems to be called just "port" in the man pages and everywhere else, so I'd call it that.
Yes, I mixed that up. The general port_* framework is the "event ports framework", while the part that specifically watches for changes in files and directories is the newer "file event notification API" [1].
The presence of port_* is enough for the ioloop to work, but the FEN API, which introduces PORT_SOURCE_FILE in the headers, is needed for the ioloop-notify. The file event notification is only available since version 72 of Nevada, so all recent versions (since before 2008) of OpenSolaris have it, but I don't know which, if any, versions of Solaris 10 have it.
I would be going to separate the two checks and take a less tortuous path to check for PORT_SOURCE_FILE (AC_CHECK_DECL, I'd say).
Why do you silently handle EBADFs and ENOENTs? Are those actually happening in your system? Dovecot should never close fds before removing them from ioloop, other ioloops also log an error if that happens.
I'll check that again and report back. I'm not sure if I saw an ENOENT. Does this apply to ioloop-notify as well?
For port_getn() error handling I'd probably do the same as all other ioloops: Ignore EINTR/ETIME and treat everything else as fatal. What's the idea behind BAD_PORTEV_USER?
Unfortunately, it's possible (at least in some versions, see [2]) that port_getn() returns EINTR before updating nget to anything useful. In this case, the code would see it set to 1 and try to process the event: it would probably crash immediately, as soon as it tries to dereference events[0].portev_user to update the refcount. It seems to be a rare race condition and I haven't witnessed it personally, but (claimed to be) entirely possible.
In general when an error that really shouldn't happen does happen I prefer to fatal/panic than trying to limp along.
Then I'll let EBADFD be fatal too, instead of trying to recreate the port and proceed, and I'll clean up the two switch statements.
- io->path = i_new(char, strlen(path)+1);
- strcpy(io->path, path);
io->path = i_strdup(path);
Thanks. It kind of shows that I haven't taken the time to read the library – sorry about that.
I'll re-post as soon as I manage to do the clean-up.
[1] http://arc.opensolaris.org/caselog/PSARC/2007/027/mail [2] http://www.mail-archive.com/networking-discuss@opensolaris.org/msg12330.html
-- Emanuele
On Thu, 2009-12-10 at 00:28 +0100, Emanuele Pucciarelli wrote:
Why do you silently handle EBADFs and ENOENTs? Are those actually happening in your system? Dovecot should never close fds before removing them from ioloop, other ioloops also log an error if that happens.
I'll check that again and report back. I'm not sure if I saw an ENOENT. Does this apply to ioloop-notify as well?
Yes.
For port_getn() error handling I'd probably do the same as all other ioloops: Ignore EINTR/ETIME and treat everything else as fatal. What's the idea behind BAD_PORTEV_USER?
Unfortunately, it's possible (at least in some versions, see [2]) that port_getn() returns EINTR before updating nget to anything useful. In this case, the code would see it set to 1 and try to process the event: it would probably crash immediately, as soon as it tries to dereference events[0].portev_user to update the refcount. It seems to be a rare race condition and I haven't witnessed it personally, but (claimed to be) entirely possible.
Oh, so kind of an API design bug. But could BAD_PORTEV_USER be simply NULL? Seems safer than some random memory address that might even be valid.
Oh and in notify code your loops use ret, not read_events.
participants (2)
-
Emanuele Pucciarelli
-
Timo Sirainen