[Dovecot] Patch: ioloop using kqueue/kevent for FreeBSD
Hi, I would like to submit the attached patch. It implements IO loop using FreeBSD's kqueue/kevent syscalls. It is based on snapshot of CVS HEAD as of 2005-12-12.
I could only give it limited testing on FreeBSD 5.4 but it works fine so far.
Vaclav Haisman
Only after posting the first patch I realised it is not good enough wrt/ IO_ERROR handling. The attached patch should be better.
Vaclav Haisman
On Wed, 2005-12-14 at 17:42 +0100, Vaclav Haisman wrote:
Only after posting the first patch I realised it is not good enough wrt/ IO_ERROR handling. The attached patch should be better.
Thanks, committed. I did a few minor changes so that it's consistent with Dovecot's coding style. Also a few other changes:
unsigned char mode:3; -> enum io_condition in the struct. More readable and it doesn't use more space anyway unless you're going to add more fields.
memset() isn't needed after p_new(), it's already guaranteed to be zeroed.
I'm not sure if the IO_ERROR handling isn't exactly correct. Can a callback be called twice there when error happens? Anyway, IO_ERROR isn't currently used anywhere. I added it for code that's going to be in Dovecot 2.0 where it needs to know when pipe gets closed.
What are the AC_CANONICAL_* macros in configure.in? I didn't add them.
On Wed, 14 Dec 2005, Timo Sirainen wrote:
On Wed, 2005-12-14 at 17:42 +0100, Vaclav Haisman wrote:
Only after posting the first patch I realised it is not good enough wrt/ IO_ERROR handling. The attached patch should be better.
Thanks, committed. I did a few minor changes so that it's consistent with Dovecot's coding style. Also a few other changes: I will try to follow the style.
unsigned char mode:3; -> enum io_condition in the struct. More readable and it doesn't use more space anyway unless you're going to add more fields.
memset() isn't needed after p_new(), it's already guaranteed to be zeroed.
Ok, I will remember this.
I'm not sure if the IO_ERROR handling isn't exactly correct. Can a callback be called twice there when error happens? Anyway, IO_ERROR isn't currently used anywhere. I added it for code that's going to be in Dovecot 2.0 where it needs to know when pipe gets closed.
I thought about this too. Maybe it can be fixed if its sole purpose is to watch out for pipe/socket getting disconnected. The EV_EOF flag should do just that.
What are the AC_CANONICAL_* macros in configure.in? I didn't add them.
I needed these to make the libtool I use work.
It would be nice if somebody with FreeBSD or NetBSD could test this patch a little bit under heavier load. My options are in this respect limited.
Vaclav Haisman
On Wed, Dec 14, 2005 at 09:04:29PM +0100, Vaclav Haisman wrote:
On Wed, 14 Dec 2005, Timo Sirainen wrote:
On Wed, 2005-12-14 at 17:42 +0100, Vaclav Haisman wrote:
Only after posting the first patch I realised it is not good enough wrt/ IO_ERROR handling. The attached patch should be better.
Thanks, committed. I did a few minor changes so that it's consistent with Dovecot's coding style. Also a few other changes: I will try to follow the style.
unsigned char mode:3; -> enum io_condition in the struct. More readable and it doesn't use more space anyway unless you're going to add more fields.
memset() isn't needed after p_new(), it's already guaranteed to be zeroed.
Ok, I will remember this.
I'm not sure if the IO_ERROR handling isn't exactly correct. Can a callback be called twice there when error happens? Anyway, IO_ERROR isn't currently used anywhere. I added it for code that's going to be in Dovecot 2.0 where it needs to know when pipe gets closed.
I thought about this too. Maybe it can be fixed if its sole purpose is to watch out for pipe/socket getting disconnected. The EV_EOF flag should do just that.
What are the AC_CANONICAL_* macros in configure.in? I didn't add them.
I needed these to make the libtool I use work.
It would be nice if somebody with FreeBSD or NetBSD could test this patch a little bit under heavier load. My options are in this respect limited.
Vaclav Haisman
More like can somebody with OpenBSD, FreeBSD, NetBSD (2.x or newer), DragonFly or Mac OS X try this out..
We're looking at moving to dovecot from courier, and I'd like to know
what sort of use dovecot has in use on production mail systems.
We're a small ISP and have about 10,000 mail users. If anyone on
this list is using dovecot in a similar environment please drop me a
line privately.
Also have anyone successfully compiled 1.0alpha5 on Mac OS X? I
haven't heard from anyone on my previous email, which makes me a
little nervous.
-- Roger J. Weeks Systems & Network Administrator Mendocino Community Network
We are using dovecot alpha5 in production for 3K users, on Solaris 9 (sparc), IMAP only. We have been using it since alpha3. We saw quite a few asserts/cores in alpha3 (which Timo responded to and fixed very quickly), almost no problems or cores in alpha4, no problems in alpha5 except for my NFS lockd issue two days ago. Dovecot is very solid software for "alpha", IMHO.
Jeff Earickson Colby College
On Thu, 15 Dec 2005, Roger Weeks wrote:
Date: Thu, 15 Dec 2005 12:34:26 -0800 From: Roger Weeks rjw@mcn.org To: dovecot@dovecot.org Subject: [Dovecot] Are you using dovecot in production?
We're looking at moving to dovecot from courier, and I'd like to know what sort of use dovecot has in use on production mail systems.
We're a small ISP and have about 10,000 mail users. If anyone on this list is using dovecot in a similar environment please drop me a line privately.
Also have anyone successfully compiled 1.0alpha5 on Mac OS X? I haven't heard from anyone on my previous email, which makes me a little nervous.
-- Roger J. Weeks Systems & Network Administrator Mendocino Community Network
On Wed, 14 Dec 2005, Timo Sirainen wrote:
On Wed, 2005-12-14 at 17:42 +0100, Vaclav Haisman wrote: [...] the the following:
- unsigned char mode:3; -> enum io_condition in the struct. More readable and it doesn't use more space anyway unless you're going to add more fields. Actually I think that the bitfield might be necessary. I used it because I didn't want to mask out the uninteresting bits of condition flags on lines like
ctx->fds[fd].mode |= condition;
Without either the bitfield or some masking it is possible that this will set some higher bits on. If it happens then the parts of code that compare the mode against zero will break.
Is it not possible for some higher bit to be set? Can the code stay as it is?
[...]
Vaclav Haisman
Can you please change the instances of FreeBSD in the autoconf script and the code to just BSD. On Wed, Dec 14, 2005 at 11:54:10AM +0100, Vaclav Haisman wrote:
Hi, I would like to submit the attached patch. It implements IO loop using FreeBSD's kqueue/kevent syscalls. It is based on snapshot of CVS HEAD as of 2005-12-12.
I could only give it limited testing on FreeBSD 5.4 but it works fine so far.
Vaclav Haisman
Content-Description: ioloop using FreeBSD kqueue/kevent
diff -rN -u old-dovecot-cvs/autogen.sh new-dovecot-cvs/autogen.sh --- old-dovecot-cvs/autogen.sh 2005-12-14 11:35:03.537711451 +0100 +++ new-dovecot-cvs/autogen.sh 2005-12-14 11:35:06.149980951 +0100 @@ -1,5 +1,5 @@ -aclocal -libtoolize --force -automake --add-missing -autoheader -autoconf +aclocal15 +libtoolize13 --force +automake15 --add-missing +autoheader259 +autoconf259 diff -rN -u old-dovecot-cvs/configure.in new-dovecot-cvs/configure.in --- old-dovecot-cvs/configure.in 2005-12-14 11:35:03.545823016 +0100 +++ new-dovecot-cvs/configure.in 2005-12-14 11:35:06.150746230 +0100 @@ -1,6 +1,10 @@ AC_INIT(dovecot, 1.0.alpha5, [dovecot@dovecot.org]) AC_CONFIG_SRCDIR([src])
+AC_CANONICAL_BUILD +AC_CANONICAL_HOST +AC_CANONICAL_TARGET + AC_CONFIG_HEADERS([config.h]) AM_INIT_AUTOMAKE
@@ -327,6 +331,15 @@ ]) fi
+if test "$ioloop" = "kqueue"; then + AC_CHECK_FUNC(kqueue, [ + AC_DEFINE(IOLOOP_KQUEUE,, [Implement I/O loop with FreeBSD kqueue()]) + have_ioloop=yes + ], [ + ioloop="" + ]) +fi + if test "$ioloop" = "" || test "$ioloop" = "poll"; then AC_CHECK_FUNC(poll, [ AC_DEFINE(IOLOOP_POLL,, Implement I/O loop with poll()) diff -rN -u old-dovecot-cvs/src/lib/Makefile.am new-dovecot-cvs/src/lib/Makefile.am --- old-dovecot-cvs/src/lib/Makefile.am 2005-12-14 11:35:03.542457074 +0100 +++ new-dovecot-cvs/src/lib/Makefile.am 2005-12-14 11:35:03.660215582 +0100 @@ -35,6 +35,7 @@ ioloop-poll.c \ ioloop-select.c \ ioloop-epoll.c \ + ioloop-kqueue.c \ lib.c \ lib-signals.c \ md4.c \ diff -rN -u old-dovecot-cvs/src/lib/ioloop-kqueue.c new-dovecot-cvs/src/lib/ioloop-kqueue.c --- old-dovecot-cvs/src/lib/ioloop-kqueue.c 1970-01-01 01:00:00.000000000 +0100 +++ new-dovecot-cvs/src/lib/ioloop-kqueue.c 2005-12-14 11:35:03.751180389 +0100 @@ -0,0 +1,183 @@ +/* + * FreeBSD kqueue() based ioloop handler. + * + * Copyright (c) 2005 Vaclav Haisman
+ * + * This library is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +/* @UNSAFE: whole file */ + +#include "lib.h" +#include "ioloop-internal.h" + +#ifdef IOLOOP_KQUEUE + +#include +#include +#include + +#ifndef INITIAL_BUF_SIZE +# define INITIAL_BUF_SIZE 128 +#endif + + +struct ioloop_handler_context { + int kq; + size_t evbuf_size; + struct kevent *evbuf; + + size_t fds_size; + struct fdrecord *fds; +}; + +struct fdrecord { + /* IO_READ | IO_WRITE | IO_ERROR */ + unsigned char mode : 3; +}; + + +void io_loop_handler_init(struct ioloop *ioloop) +{ + struct ioloop_handler_context *ctx; + + ioloop->handler_context = ctx = + p_new(ioloop->pool, struct ioloop_handler_context, 1); + ctx->evbuf_size = INITIAL_BUF_SIZE; + ctx->evbuf = p_new(ioloop->pool, struct kevent, ctx->evbuf_size); + memset(ctx->evbuf, 0, sizeof(struct kevent) * ctx->evbuf_size); + ctx->kq = kqueue (); + if (ctx->kq < 0) + i_fatal("kqueue(): %m"); + + ctx->fds_size = INITIAL_BUF_SIZE; + ctx->fds = p_new(ioloop->pool, struct fdrecord, ctx->fds_size); + memset(ctx->fds, 0, sizeof(struct fdrecord) * ctx->fds_size); +} + + +void io_loop_handler_deinit(struct ioloop *ioloop) +{ + p_free(ioloop->pool, ioloop->handler_context->evbuf); + p_free(ioloop->pool, ioloop->handler_context->fds); + p_free(ioloop->pool, ioloop->handler_context); +} + + +void io_loop_handle_add(struct ioloop *ioloop, struct io *io) +{ + struct ioloop_handler_context *ctx = ioloop->handler_context; + struct kevent ev = {io->fd, 0, EV_ADD | EV_CLEAR | EV_EOF, 0, 0, io}; + enum io_condition condition = io->condition; + + /* grow ctx->fds array if necessary */ + if ((size_t)io->fd >= ctx->fds_size) { + size_t old_size = ctx->fds_size; + + ctx->fds_size = nearest_power((unsigned int)io->fd+1); + i_assert(ctx->fds_size < (size_t)-1 / sizeof(int)); + + ctx->fds = p_realloc(ioloop->pool, ctx->fds, + sizeof(struct fdrecord) * old_size, + sizeof(struct fdrecord) * ctx->fds_size); + memset(ctx->fds + old_size, 0, + sizeof(struct fdrecord) * (ctx->fds_size - old_size)); + } + + if (condition & (IO_READ | IO_ERROR)) + { + ctx->fds[io->fd].mode |= condition; + ev.filter = EVFILT_READ; + kevent(ctx->kq, &ev, 1, NULL, 0, NULL); + } + if (condition & (IO_WRITE | IO_ERROR)) + { + ctx->fds[io->fd].mode |= condition; + ev.filter = EVFILT_WRITE; + kevent(ctx->kq, &ev, 1, NULL, 0, NULL); + } +} + + +void io_loop_handle_remove(struct ioloop *ioloop, struct io *io) +{ + struct ioloop_handler_context *ctx = ioloop->handler_context; + struct fdrecord * const fds = ctx->fds; + const int fd = io->fd; + struct kevent ev = {fd, 0, EV_DELETE, 0, 0, NULL}; + enum io_condition condition = io->condition; + + + i_assert((size_t)fd < ctx->fds_size); + i_assert(fds[fd].mode != 0); + + if (condition & (IO_READ | IO_ERROR)) + { + ev.filter = EVFILT_READ; + fds[fd].mode &= ~condition; + if ((fds[fd].mode & (IO_READ | IO_ERROR)) == 0) + kevent(ctx->kq, &ev, 1, NULL, 0, NULL); + } + if (condition & (IO_WRITE | IO_ERROR)) + { + ev.filter = EVFILT_WRITE; + fds[fd].mode &= ~condition; + if ((fds[fd].mode & (IO_WRITE | IO_ERROR)) == 0) + kevent(ctx->kq, &ev, 1, NULL, 0, NULL); + } +} + + +void io_loop_handler_run(struct ioloop *ioloop) +{ + struct ioloop_handler_context *ctx = ioloop->handler_context; + struct timeval tv; + struct timespec ts; + unsigned int t_id; + int msecs, ret, i; + + /* get the time left for next timeout task */ + msecs = io_loop_get_wait_time(ioloop->timeouts, &tv, NULL); + ts.tv_sec = tv.tv_sec; + ts.tv_nsec = tv.tv_usec * 1000; + + /* wait for events */ + ret = kevent (ctx->kq, NULL, 0, ctx->evbuf, ctx->evbuf_size, &ts); + if (ret < 0 && errno != EINTR) + i_fatal("kevent(): %m"); + + /* execute timeout handlers */ + io_loop_handle_timeouts(ioloop); + + if (ret <= 0 || !ioloop->running) { + /* no I/O events */ + return; + } + + i_assert((size_t)ret <= ctx->evbuf_size); + + /* loop through all received events */ + for (i = 0; i < ret; ++i) + { + struct io *io = ctx->evbuf[i].udata; + + t_id = t_push(); + io->callback(io->context); + if (t_pop() != t_id) + i_panic("Leaked a t_pop() call in I/O handler %p", + (void *)io->callback); + } +} + + +#endif // IOLOOP_KQUEUE + +/* +Local Variables: +eval: (c-set-style "linux") +whitespace-auto-cleanup: t +End: +*/
participants (5)
-
Brad
-
Jeff A. Earickson
-
Roger Weeks
-
Timo Sirainen
-
Vaclav Haisman