Greetings,
I have some architectural questions regarding dovecot, and though I've half answered them by looking at the source, I'm also interested in hearing whether my (our) wishes and suggestions are already being considered (or can be considered, once built) for inclusion in dovecot itself.
Let me first explain why I'm doing this. I work for XS4ALL, a fairly large ISP in the Netherlands. We provide a wide variety of services, including shell access, pop3, webmail, et cetera. We use Sendmail on several clusters of FreeBSD machines (loadbalanced using layer-4 ethernet switches) and several NetApp Filers (dedicated NFS servers with fail-safe disk-arrays and such) for backend. Several years ago (when we were a lot smaller) we noticed the typical use of the mailboxes included leaving much old email on the server, at least for a while, and that this is a bothersome thing when using mbox mailboxes. (The mboxes basically have to be copied over whenever the status of an email changes, leading to a lot of I/O.)
We briefly played with modifying sendmail and the pop server to avoid the full copy in the common case (only status changes) by doing in-place edits of a pre-generated Status line, as well as avoid full scanning of the mbox file by creating special headers to mark the 'real' length of an email. It worked, for a while, but it wasn't going to scale very well. So we switched to maildir mailboxes for the mail spool. A modified mail.local (which we need for other reasons as well) delivers in /var/spool/mail/u/s/username, and mutt, uqwk, a modified pine and qmail's pop3 daemon read it from there. Until last week our clients could choose to have mbox inboxes, to use with 'elm' or 'mail', but we decided to discontinue that support. Our new shell servers, which are in test, don't have elm installed anymore anyway.
We still have support for mbox mailboxes in a user's homedirectory though, by using procmail and such. So when we needed an IMAP server for use with our webmail (based on SquirrelMail), we were forced to go with the UW-IMAP server, with the maildir patch that's been scattered around the 'net. This worked, for a while; we also use the maildir patch with pine after all. However, the maildir patch is not very good. Not at all, even, and it only seems to work by pure chance. Pine works for the average user who does not get a lot of new mail while his pine is open or does not use procmail, and fortunately a lot of the people that do get a lot of email use mutt, which does work properly. The UW-IMAP server worked fine because SquirrelMail only uses (used) a small subset of the available functionality. But that's changing, as SquirrelMail gets actively developed, and we're also considering other IMAP-based services. But we can't switch to Courier or Cyrus because we need mbox support. And while looking for mbox patches for either of those two, I ran across dovecot. Yay! :)
Dovecot is not everything we'd want, but it comes very close, and contrary to UW-IMAP both the design and the actual source code are clean, readable and logical, which means we can add the features we need and support them. What we need and want to add is fairly simple, but I've only been looking at dovecot since yesterday so I'd be happy to hear if any of it is possible, feasible, unwise or unacceptable.
First off, we need the maildir support to be 'correct' in that it does not rely on the naming of the files in the mailbox, other than the very loose specification DJB gives (doesn't contain a colon or slash and doesn't start with a dot.) The pine/UW-imap patch breaks here because it depends on the first part of the filename being time() or something else that, when sorted alphanumerically, puts new mail at the end. Our LDA does this, but procmail does not, and it shouldn't have to.
Second, we need the maildir support to be 'correct' in that it does not rely on the directory order being persistant. The NetApp Filers use btree-indexed directories, so the order of readdir() can change completely whenever a file is added or removed. The pine/uw-imap patch relies on the '.uidvalidity' file being modified whenever the maildir sort order changed, and this isn't happening.
I *think*, from reading the sources, both of those are correct already. If they aren't, I'd strongly urge you to fix it, as #1 is a problem for anyone using procmail and #2 is a problem for anyone with 'indexed' directories (including such new filesystems as reiserfs, and I assume FreeBSD's new hashed directories.)
We need to avoid using fcntl(). The Netapps support it, but file-locking over NFS is very, very poorly designed and we've had too much problems of various kinds before, with fcntl. We also don't like the idea of having thousands of fcntl locks at the same time ;P Instead, we've switched to the locking method described in the Linux open(2) manpage under O_EXCL. (We call it 'dot-locking', I'm not sure where the name came from.)
The actual implementation of that method is pretty simple, and I have a C version and a Python version hanging around here somewhere (the Python version is being used by GNU Mailman, last I looked.) If we're going to use dovecot, we will replace most, if not all, fcntl()s with dot-locking, the question is whether you want it contributed to dovecot :)
Every user's incoming mailbox is /var/spool/u/s/username. Other mailboxes are in /home/u/username/mail or /home/u/username/Mail (the second if the first does not exist.) We are not yet certain whether we want the inbox to be able to have subdir-mailboxes, as /var/spool and /home have different quotas and we urge people not to store their mail on /var/spool. (for one thing, it doesn't get backed-up.) We want these things to work without magical symlinks or empty files, because people _will_ delete them and cause unnecessary helpdesk calls :) Again, the question is mostly whether this is desirable in dovecot (or something enough like it to reduce local changes.)
We have over 300k mailboxes at the moment. We expect that number to keep growing. The indexer process (as described by design.txt) does not sound as a good idea in our case :) How necessary is it, really ? Especially since we do not expect more than 10% of those mailboxes to be actually used by IMAP, not even once. If disabling the indexer completely just means longer startup times for IMAP sessions, we can live with that.
The UW-IMAP maildir patch stores UID's in the indiviual filenames, using a 'U' flag. Will this interfere with dovecot ? We don't really need dovecot and UW-IMAP to share UIDs, but we would like to have an as painless transition as possible, without having to rename millions of files to remove the U flag and other flags :P It would also be nice to keep pine using the existing maildir patch, even though very few IMAP-users would use pine.
Would dovecot scale, architecturally speaking, to 500k+ active mailboxes ? The amount of hardware is not really an issue, we can add a lot of machines (off-the-shelve intel hadware) to each cluster, but if each dovecot process has to load in an index of all possible mailboxes... that would be a problem. Doing an inordinate number of file-accesses over NFS would also be a problem, but I haven't seen any indication of that in the source, yet.
In case it wasn't clear yet, I'm very happy to have found dovecot. The lack of a decent mbox IMAP server has always dismayed me, let alone an mbox+maildir one :) I should also point out that even though XS4ALL is a commercial company, we would contribute our changes even if the licence didn't require it, and we want to contribute them back the way you want them, not necessarily the way it's easiest for us. We have a lot of experience with opensource software, as a simple google on my name should indicate ;P
Regards,
Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!