[dovecot] Architectural questions
Greetings,
I have some architectural questions regarding dovecot, and though I've half answered them by looking at the source, I'm also interested in hearing whether my (our) wishes and suggestions are already being considered (or can be considered, once built) for inclusion in dovecot itself.
Let me first explain why I'm doing this. I work for XS4ALL, a fairly large ISP in the Netherlands. We provide a wide variety of services, including shell access, pop3, webmail, et cetera. We use Sendmail on several clusters of FreeBSD machines (loadbalanced using layer-4 ethernet switches) and several NetApp Filers (dedicated NFS servers with fail-safe disk-arrays and such) for backend. Several years ago (when we were a lot smaller) we noticed the typical use of the mailboxes included leaving much old email on the server, at least for a while, and that this is a bothersome thing when using mbox mailboxes. (The mboxes basically have to be copied over whenever the status of an email changes, leading to a lot of I/O.)
We briefly played with modifying sendmail and the pop server to avoid the full copy in the common case (only status changes) by doing in-place edits of a pre-generated Status line, as well as avoid full scanning of the mbox file by creating special headers to mark the 'real' length of an email. It worked, for a while, but it wasn't going to scale very well. So we switched to maildir mailboxes for the mail spool. A modified mail.local (which we need for other reasons as well) delivers in /var/spool/mail/u/s/username, and mutt, uqwk, a modified pine and qmail's pop3 daemon read it from there. Until last week our clients could choose to have mbox inboxes, to use with 'elm' or 'mail', but we decided to discontinue that support. Our new shell servers, which are in test, don't have elm installed anymore anyway.
We still have support for mbox mailboxes in a user's homedirectory though, by using procmail and such. So when we needed an IMAP server for use with our webmail (based on SquirrelMail), we were forced to go with the UW-IMAP server, with the maildir patch that's been scattered around the 'net. This worked, for a while; we also use the maildir patch with pine after all. However, the maildir patch is not very good. Not at all, even, and it only seems to work by pure chance. Pine works for the average user who does not get a lot of new mail while his pine is open or does not use procmail, and fortunately a lot of the people that do get a lot of email use mutt, which does work properly. The UW-IMAP server worked fine because SquirrelMail only uses (used) a small subset of the available functionality. But that's changing, as SquirrelMail gets actively developed, and we're also considering other IMAP-based services. But we can't switch to Courier or Cyrus because we need mbox support. And while looking for mbox patches for either of those two, I ran across dovecot. Yay! :)
Dovecot is not everything we'd want, but it comes very close, and contrary to UW-IMAP both the design and the actual source code are clean, readable and logical, which means we can add the features we need and support them. What we need and want to add is fairly simple, but I've only been looking at dovecot since yesterday so I'd be happy to hear if any of it is possible, feasible, unwise or unacceptable.
First off, we need the maildir support to be 'correct' in that it does not rely on the naming of the files in the mailbox, other than the very loose specification DJB gives (doesn't contain a colon or slash and doesn't start with a dot.) The pine/UW-imap patch breaks here because it depends on the first part of the filename being time() or something else that, when sorted alphanumerically, puts new mail at the end. Our LDA does this, but procmail does not, and it shouldn't have to.
Second, we need the maildir support to be 'correct' in that it does not rely on the directory order being persistant. The NetApp Filers use btree-indexed directories, so the order of readdir() can change completely whenever a file is added or removed. The pine/uw-imap patch relies on the '.uidvalidity' file being modified whenever the maildir sort order changed, and this isn't happening.
I *think*, from reading the sources, both of those are correct already. If they aren't, I'd strongly urge you to fix it, as #1 is a problem for anyone using procmail and #2 is a problem for anyone with 'indexed' directories (including such new filesystems as reiserfs, and I assume FreeBSD's new hashed directories.)
We need to avoid using fcntl(). The Netapps support it, but file-locking over NFS is very, very poorly designed and we've had too much problems of various kinds before, with fcntl. We also don't like the idea of having thousands of fcntl locks at the same time ;P Instead, we've switched to the locking method described in the Linux open(2) manpage under O_EXCL. (We call it 'dot-locking', I'm not sure where the name came from.)
The actual implementation of that method is pretty simple, and I have a C version and a Python version hanging around here somewhere (the Python version is being used by GNU Mailman, last I looked.) If we're going to use dovecot, we will replace most, if not all, fcntl()s with dot-locking, the question is whether you want it contributed to dovecot :)
Every user's incoming mailbox is /var/spool/u/s/username. Other mailboxes are in /home/u/username/mail or /home/u/username/Mail (the second if the first does not exist.) We are not yet certain whether we want the inbox to be able to have subdir-mailboxes, as /var/spool and /home have different quotas and we urge people not to store their mail on /var/spool. (for one thing, it doesn't get backed-up.) We want these things to work without magical symlinks or empty files, because people _will_ delete them and cause unnecessary helpdesk calls :) Again, the question is mostly whether this is desirable in dovecot (or something enough like it to reduce local changes.)
We have over 300k mailboxes at the moment. We expect that number to keep growing. The indexer process (as described by design.txt) does not sound as a good idea in our case :) How necessary is it, really ? Especially since we do not expect more than 10% of those mailboxes to be actually used by IMAP, not even once. If disabling the indexer completely just means longer startup times for IMAP sessions, we can live with that.
The UW-IMAP maildir patch stores UID's in the indiviual filenames, using a 'U' flag. Will this interfere with dovecot ? We don't really need dovecot and UW-IMAP to share UIDs, but we would like to have an as painless transition as possible, without having to rename millions of files to remove the U flag and other flags :P It would also be nice to keep pine using the existing maildir patch, even though very few IMAP-users would use pine.
Would dovecot scale, architecturally speaking, to 500k+ active mailboxes ? The amount of hardware is not really an issue, we can add a lot of machines (off-the-shelve intel hadware) to each cluster, but if each dovecot process has to load in an index of all possible mailboxes... that would be a problem. Doing an inordinate number of file-accesses over NFS would also be a problem, but I haven't seen any indication of that in the source, yet.
In case it wasn't clear yet, I'm very happy to have found dovecot. The lack of a decent mbox IMAP server has always dismayed me, let alone an mbox+maildir one :) I should also point out that even though XS4ALL is a commercial company, we would contribute our changes even if the licence didn't require it, and we want to contribute them back the way you want them, not necessarily the way it's easiest for us. We have a lot of experience with opensource software, as a simple google on my name should indicate ;P
Regards,
Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Sat, 2002-10-19 at 15:38, Thomas Wouters wrote:
We briefly played with modifying sendmail and the pop server to avoid the full copy in the common case (only status changes) by doing in-place edits of a pre-generated Status line,
UW-imapd does this as well, creating "X-Keywords: " line for each mail. I had thought about this first with dovecot too, but since mutt rewrote the whole mailbox always I figured I might as well. But with larger mailboxes this is really slow, so I think I'll support the X-keywords trick myself too.
as well as avoid full scanning of the mbox file by creating special headers to mark the 'real' length of an email.
For each mail? Content-Length? With my tests that didn't seem to help much, rather made it just slower.. Could be that I just did something badly, have to look into it more when I begin optimizing mbox handling more. Have to get it at least as fast as UW-imapd :)
We still have support for mbox mailboxes in a user's homedirectory though, by using procmail and such. So when we needed an IMAP server for use with our webmail (based on SquirrelMail), we were forced to go with the UW-IMAP server, with the maildir patch that's been scattered around the 'net. This
Hm. Squirrelmail requires SORT extension which Dovecot doesn't support yet. Notes about SORT from CVS's TODO:
- sort (draft-ietf-imapext-sort)
- basically sorted SEARCH, requiring CHARSET support for UTF-8 and ASCII
- we could create alternative binary tree file(s) for different sort conditions, ".tree-sort" or something. or if we decide to just keep it in memory, btree could still be best choice.
- required by squirrelmail (webmail)
- First off, we need the maildir support to be 'correct' in that it does not rely on the naming of the files in the mailbox, other than the very loose specification DJB gives (doesn't contain a colon or slash and doesn't start with a dot.) The pine/UW-imap patch breaks here because it depends on the first part of the filename being time() or something else that, when sorted alphanumerically, puts new mail at the end. Our LDA does this, but procmail does not, and it shouldn't have to.
Dovecot doesn't care as long as the file name stays same before the ':' character.
- Second, we need the maildir support to be 'correct' in that it does not rely on the directory order being persistant. The NetApp Filers use btree-indexed directories, so the order of readdir() can change completely whenever a file is added or removed. The pine/uw-imap patch relies on the '.uidvalidity' file being modified whenever the maildir sort order changed, and this isn't happening.
Dovecot reads them into hash so it doesn't depend on readdir() behaviour.
- We need to avoid using fcntl(). The Netapps support it, but file-locking over NFS is very, very poorly designed and we've had too much problems of various kinds before, with fcntl. We also don't like the idea of having thousands of fcntl locks at the same time ;P Instead, we've switched to the locking method described in the Linux open(2) manpage under O_EXCL. (We call it 'dot-locking', I'm not sure where the name came from.)
Hmm. The dot-lock means the "mbox.lock" file which gets created when someone wants it exclusively locked. Dovecot supports it, and maildir itself doesn't need locking at all. Dovecot's index files currently use fcntl()-locking, but it would be possible to replace them with lock files.
Then there's modify log file. Dovecot uses fcntl() locking for it as a way to figure out if it's the only one using the log file. Like make everyone read-lock the file, then if someone wants to know if it's the only one using it it tries to set write-lock on, if it fails it knows someone else it using it as well. I'm not sure if there's any good way to replace that by using files, I had pretty complicated (desperate) plans before figuring out fcntl() could be used to do it easily.
It would be possible to just assume that there's always someone else using the modify log, but each flag change or expunge would always write a few bytes to it then, and when log file is switched (there's .log and .log.2) it wouldn't be truncated after last process is finished with it which is not too bad since after the next switch it will be truncated.
Also it would be possible not to use index files at all but just keep them in memory. I've been fixing code to make this possible and somewhat fast.
If we're going to use dovecot, we will replace most, if not all, fcntl()s with dot-locking, the question is whether you want it contributed to dovecot :)
All locking goes through file_*_lock() or mbox_lock_*() functions. mbox locking supports it already, and file_*_lock() could be made to support it. It doesn't get currently file name but that could be done.
- Every user's incoming mailbox is /var/spool/u/s/username. Other mailboxes are in /home/u/username/mail or /home/u/username/Mail (the second if the first does not exist.) We are not yet certain whether we want the inbox to be able to have subdir-mailboxes, as /var/spool and /home have different quotas and we urge people not to store their mail on /var/spool. (for one thing, it doesn't get backed-up.) We want these things to work without magical symlinks or empty files, because people _will_ delete them and cause unnecessary helpdesk calls :) Again, the question is mostly whether this is desirable in dovecot (or something enough like it to reduce local changes.)
Are maildir inboxes also in /var/spool? With mbox sub-inboxes wouldn't be even possible because dir structure == mailbox structure, and since inbox file exists there can't be inbox-dir (except maybe with different case but that's kludgy).
I've also thought I might as well make it possible to read the mbox inbox from /var/mail or whereever it is. Pretty easy to do, but .lock file is problematic if new files can't be added to the /var/mail directory.
- We have over 300k mailboxes at the moment. We expect that number to keep growing. The indexer process (as described by design.txt) does not sound as a good idea in our case :) How necessary is it, really ? Especially since we do not expect more than 10% of those mailboxes to be actually used by IMAP, not even once. If disabling the indexer completely just means longer startup times for IMAP sessions, we can live with that.
Indexer doesn't exist yet, and wouldn't be really needed even. I still think it could be somewhat nice idea, the system load is probably less during night so we could use the extra time to make mailboxes perform faster next day.
It'd be difficult to know when exactly there is "extra time" which is why I haven't yet done the indexer. Probably needs some external program (script) which tells it by maybe looking at some I/O statistics from /proc or doing a few file operations and checking the latency.
Am I right in that CPU usage still isn't any problem but rather the I/O?
- The UW-IMAP maildir patch stores UID's in the indiviual filenames, using a 'U' flag. Will this interfere with dovecot ? We don't really need dovecot and UW-IMAP to share UIDs, but we would like to have an as painless transition as possible, without having to rename millions of files to remove the U flag and other flags :P It would also be nice to keep pine using the existing maildir patch, even though very few IMAP-users would use pine.
How exactly does the U flag work? I hope it's before the ':' character like Courier's S=filesize? Otherwise U=1234 would be thought of as 6 different flags which isn't very good since Dovecot reorders them as 1234=U.
- Would dovecot scale, architecturally speaking, to 500k+ active mailboxes ? The amount of hardware is not really an issue, we can add a lot of machines (off-the-shelve intel hadware) to each cluster, but if each dovecot process has to load in an index of all possible mailboxes... that would be a problem. Doing an inordinate number of file-accesses over NFS would also be a problem, but I haven't seen any indication of that in the source, yet.
Dovecot opens the index when opening mailbox. It doesn't open other mailboxes indexes. Also the indexes should make the file accesses less than otherwise, especially with mbox since it wouldn't need to read and parse the whole mbox file. In general I've tried to keep the file I/O as little as possible.
If your clusters access the files through NFS, there should be no problem. Except I've never tried Dovecot through NFS, and I'm not sure how well mmap()ing works through NFS. I know there's been problems before but hopefully they've been fixed already.
On Sat, Oct 19, 2002 at 05:01:55PM +0300, Timo Sirainen wrote:
On Sat, 2002-10-19 at 15:38, Thomas Wouters wrote:
We briefly played with modifying sendmail and the pop server to avoid the full copy in the common case (only status changes) by doing in-place edits of a pre-generated Status line,
UW-imapd does this as well, creating "X-Keywords: " line for each mail. I had thought about this first with dovecot too, but since mutt rewrote the whole mailbox always I figured I might as well. But with larger mailboxes this is really slow, so I think I'll support the X-keywords trick myself too.
Well, for POP3 servers the story is a bit different than IMAP. The typical use we were seeing was "user", "pass", "list", "retr <new mail>", "quit". Sometimes (for some users) every few minutes. In that case, having to write a 'RO' at a specific location in a large mbox is oodles more efficient than copying the whole thing to local disk and back again (which is what the popserver would do.) I'm not sure if it matters much with typical IMAP usage.
as well as avoid full scanning of the mbox file by creating special headers to mark the 'real' length of an email.
For each mail? Content-Length? With my tests that didn't seem to help much, rather made it just slower.. Could be that I just did something badly, have to look into it more when I begin optimizing mbox handling more. Have to get it at least as fast as UW-imapd :)
Well, if I recall correctly, we added an 'X-Offset' header which pointed to the exact (relative) byte offset for the next 'From ' line. It made our pop3d (a modified qpopper 2.3 by the way) a much happier puppy. I'm not sure what the difference with Content-Length was. I could find the sources, I suppose; since we disabled mbox-inbox support we aren't using that code anymore.
We still have support for mbox mailboxes in a user's homedirectory though, by using procmail and such. So when we needed an IMAP server for use with our webmail (based on SquirrelMail), we were forced to go with the UW-IMAP server, with the maildir patch that's been scattered around the 'net. This
Hm. Squirrelmail requires SORT extension which Dovecot doesn't support yet.
Ah, that's a shame. It means we can't use dovecot for our internal SquirrelMail+IMAP testing yet :) We likely wouldn't start using dovecot for production SquirrelMail anytime soon anyway, so it's not a big issue right now... We'll have to see if our other uses of IMAP require it or not.
Dovecot doesn't care [about maildir-message filenames] as long as the file name stays same before the ':' character.
They do.
It would be possible to just assume that there's always someone else using the modify log, but each flag change or expunge would always write a few bytes to it then, and when log file is switched (there's .log and .log.2) it wouldn't be truncated after last process is finished with it which is not too bad since after the next switch it will be truncated.
Also it would be possible not to use index files at all but just keep them in memory. I've been fixing code to make this possible and somewhat fast.
Hmm. I'd have to look at the code to say for sure, but I think we could live with keeping them in memory. Accessing the same mailbox from two different clients at the same time is not something we're too worried about, at the moment.
- Every user's incoming mailbox is /var/spool/u/s/username.
Are maildir inboxes also in /var/spool?
Yes. We don't use the ~/Maildir structure at all. We've always simply used maildir mailboxes as a directly replacement of mbox mailboxes; a directory instead of a file, and no sub-boxes :) I guess it's a philosphical difference. To me, and to my colleagues, everything can be a mailbox, not just something stored in an arbitrary directory somewhere. I guess we could change that position, if necessary, but so far it hasn't proven to be.
With mbox sub-inboxes wouldn't be even possible because dir structure == mailbox structure, and since inbox file exists there can't be inbox-dir (except maybe with different case but that's kludgy).
Yes... don't worry, we don't even want to consider mbox-subboxes :)
I've also thought I might as well make it possible to read the mbox inbox from /var/mail or whereever it is. Pretty easy to do, but .lock file is problematic if new files can't be added to the /var/mail directory.
Our /var/spool/mail subdirectories are mode 01733 (drwx-wx-wt) owned by root, so creating files and removing them is not an issue, but reading the directory is. You can of course still check for existance of specific filenames.
Am I right in that CPU usage still isn't any problem but rather the I/O?
Yes. As I said, we use several netapp filers (currently two for /home and two for /var/spool/mail, with several hundred gigabytes filespace each) and though they're great boxes, their performance does tend to drop off when it gets flooded with I/O requests :) And they're used by a lot of machines, so if they are slow to respond, a lot of our services do too.
- The UW-IMAP maildir patch stores UID's in the indiviual filenames, using a 'U' flag. Will this interfere with dovecot ? We don't really need dovecot and UW-IMAP to share UIDs, but we would like to have an as painless transition as possible, without having to rename millions of files to remove the U flag and other flags :P It would also be nice to keep pine using the existing maildir patch, even though very few IMAP-users would use pine.
How exactly does the U flag work? I hope it's before the ':' character like Courier's S=filesize? Otherwise U=1234 would be thought of as 6 different flags which isn't very good since Dovecot reorders them as 1234=U.
No, it can't be before the :, because the UID is generated by UW-IMAP, and the maildir spec says you can't change the uniqe part of the name, just the info :) Here are some examples. The ',U*' is the UID.
_k2,6NtZ9.maildrop4.xs4all.nl:2,S,U1030712092 _fmT,O63l8.maildrop8.xs4all.nl:2,RS,U1026644784 990612135.16312.000000002.maildrop2.xs4all.nl:2,S,U991994304 993058841.maildrop7.49267:2,S,U993058888
(In case you're wondering, the first two files were created by standard procmail, the third by our modified procmail which tries to allow for the pine/uw-imap maildir patch, and the last is our mail.local's format.)
As long as dovecot doesn't read a different meaning into those flags (ignoring them is just fine) we should be fine. I don't think we'll have many customers switching back and forth between dovecot and UW-IMAP, just people switching from UW-IMAP to dovecot.
If your clusters access the files through NFS, there should be no problem. Except I've never tried Dovecot through NFS, and I'm not sure how well mmap()ing works through NFS. I know there's been problems before but hopefully they've been fixed already.
I'm not too worried about bugs. I've yet to see a piece of software that we don't find oodles of small and large bugs in just by installing and trying to run on our clientbase. That's what testing is for :) But I wouldn't mind being happily suprised by dovecot, we'll see :)
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Sat, 2002-10-19 at 18:10, Thomas Wouters wrote:
Well, if I recall correctly, we added an 'X-Offset' header which pointed to the exact (relative) byte offset for the next 'From ' line. It made our pop3d (a modified qpopper 2.3 by the way) a much happier puppy. I'm not sure what the difference with Content-Length was. I could find the sources, I suppose; since we disabled mbox-inbox support we aren't using that code anymore.
Content-Length saves just the size of mail body, so it can be skipped over. I implemented it mostly because mutt doesn't escape the "From " lines when saving mails so it was a bit difficult sometimes to figure out if the From-line means a new mail or if it was just written into the mail body.
I'm not sure what I should do with Dovecot, both From-line escaping and Content-Length writing makes it annoyingly slower then now with more code..
Also it would be possible not to use index files at all but just keep them in memory. I've been fixing code to make this possible and somewhat fast.
Hmm. I'd have to look at the code to say for sure, but I think we could live with keeping them in memory. Accessing the same mailbox from two different clients at the same time is not something we're too worried about, at the moment.
Well, Outlook (and OE I think) opens two simultaneous connections sometimes to fetch mails.
Not having index files doesn't affect the possibility to have multiple connections, but it affects the overall performance because it needs to do more I/O, especially with mbox.
I've also thought I might as well make it possible to read the mbox inbox from /var/mail or whereever it is. Pretty easy to do, but .lock file is problematic if new files can't be added to the /var/mail directory.
Our /var/spool/mail subdirectories are mode 01733 (drwx-wx-wt) owned by root, so creating files and removing them is not an issue, but reading the directory is. You can of course still check for existance of specific filenames.
OK, no problem then.
How exactly does the U flag work? I hope it's before the ':' character like Courier's S=filesize? Otherwise U=1234 would be thought of as 6 different flags which isn't very good since Dovecot reorders them as 1234=U.
No, it can't be before the :, because the UID is generated by UW-IMAP, and the maildir spec says you can't change the uniqe part of the name, just the info :) Here are some examples. The ',U*' is the UID.
_k2,6NtZ9.maildrop4.xs4all.nl:2,S,U1030712092 _fmT,O63l8.maildrop8.xs4all.nl:2,RS,U1026644784 990612135.16312.000000002.maildrop2.xs4all.nl:2,S,U991994304 993058841.maildrop7.49267:2,S,U993058888
Well, maildir spec also doesn't say you can add flags with parameters using comma separators :) But I think that's good enough extension that Dovecot could support as well.
Programs supporting Courier's Maildir++ quota writes the mails immediately like "something,S=size". Something like that could have been done by UID-capable mailers too, since UID won't change.
As long as dovecot doesn't read a different meaning into those flags (ignoring them is just fine) we should be fine. I don't think we'll have many customers switching back and forth between dovecot and UW-IMAP, just people switching from UW-IMAP to dovecot.
Keeping the UIDs untouched when changing could be important to some people whose mail clients can save some extra information related to specific messages and use UID to identify the mails. I think Evolution does this with it's labels and Follow-Up marks, but I'm not sure.
Dovecot currently doesn't try to keep the UIDs too heavily itself either, if it notices some corruption it just recreates the index files with new UIDs. Supporting in-memory indexes requires still saving the UIDs somewhere in disk so this should get fixed while doing it.
I'm not too worried about bugs. I've yet to see a piece of software that we don't find oodles of small and large bugs in just by installing and trying to run on our clientbase. That's what testing is for :) But I wouldn't mind being happily suprised by dovecot, we'll see :)
We've had 3 people using it for a few months now, one of them still gets sometimes "message not found" error from Outlook Express, I've yet to figure out when exactly that happens. The mail isn't lost and restarting OE helps, so it's probably something to do with having those two simultaneous connections (and I'm just now making bigger changes there). Other than that it's worked quite fine :)
Then of course CVS has a lot large changes which haven't been tested much yet. Hopefully I'll get them fixed well enough this weekend to be able to start using it myself.
On Sat, Oct 19, 2002 at 06:53:43PM +0300, Timo Sirainen wrote:
As long as dovecot doesn't read a different meaning into those flags (ignoring them is just fine) we should be fine. I don't think we'll have many customers switching back and forth between dovecot and UW-IMAP, just people switching from UW-IMAP to dovecot.
Keeping the UIDs untouched when changing could be important to some people whose mail clients can save some extra information related to specific messages and use UID to identify the mails. I think Evolution does this with it's labels and Follow-Up marks, but I'm not sure.
Well, what I meant was that currently, IMAP is being used only by SquirrelMail, and I'm fairly sure SquirrelMail doesn't store UIDs anywhere. Other IMAP clients would be an issue only if we allowed other IMAP clients to connect, which we don't (except in internal tests :).
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Sat, 2002-10-19 at 18:10, Thomas Wouters wrote:
Am I right in that CPU usage still isn't any problem but rather the I/O?
Yes. As I said, we use several netapp filers (currently two for /home and two for /var/spool/mail, with several hundred gigabytes filespace each) and though they're great boxes, their performance does tend to drop off when it gets flooded with I/O requests :) And they're used by a lot of machines, so if they are slow to respond, a lot of our services do too.
I was mostly wondering with this if the reason to add more computers to cluster is because the imap processes are taking too much memory, too much CPU or if neither of them is any problem and the cluster is just for redundancy or because of other running programs.
I'd be interested to know how many dovecot processes could actually run in a single computer especially with fast NFS file access :) I'd guess it could run a lot, but when memory gets low the I/O usage raises since it needs to read again the parts of indexes which got dropped from memory.
On Sat, Oct 19, 2002 at 05:01:55PM +0300, Timo Sirainen wrote:
If your clusters access the files through NFS, there should be no problem. Except I've never tried Dovecot through NFS, and I'm not sure how well mmap()ing works through NFS. I know there's been problems before but hopefully they've been fixed already.
Hmm. I'm not sure what kind of behaviour you're looking for, but here's what I see, using a little Python script on our FreeBSD servers with a netapp-mounted filesystem. Mapping MAP_SHARED and PROT_READ|PROT_WRITE, two different machines mounting the same directory, two processes on each machine mmap()ing the same file.
When one process alters the data, the other process on the same machine sees it instantly. The processes on the other machine do not see it at all, not even when re-opening the mmap or being restarted. After doing an msync() in the process that altered the data, the processes on the other machine still don't see the change; they have to re-open the mmap or be restarted before they see the change -- but when one of the processes re-opens or restarts, the other does see the change without doing anything.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Mon, 2002-10-21 at 14:51, Thomas Wouters wrote:
Hmm. I'm not sure what kind of behaviour you're looking for, but here's what I see, using a little Python script on our FreeBSD servers with a netapp-mounted filesystem. Mapping MAP_SHARED and PROT_READ|PROT_WRITE, two different machines mounting the same directory, two processes on each machine mmap()ing the same file.
When one process alters the data, the other process on the same machine sees it instantly. The processes on the other machine do not see it at all, not even when re-opening the mmap or being restarted. After doing an msync() in the process that altered the data, the processes on the other machine still don't see the change; they have to re-open the mmap or be restarted before they see the change -- but when one of the processes re-opens or restarts, the other does see the change without doing anything.
Requiring msync() is fine, that's done after each change, but there should be better solution than re-mmap()ing to notice the changes. I think FreeBSD checked the changes after fcntl() locking changes :)
On Mon, Oct 21, 2002 at 04:15:28PM +0300, Timo Sirainen wrote:
Requiring msync() is fine, that's done after each change, but there should be better solution than re-mmap()ing to notice the changes. I think FreeBSD checked the changes after fcntl() locking changes :)
Hmm. More bad news; flock() doesn't work over NFS. That is, local processes see and honor the lock even on NFS filesystems, but other machines don't see the lock at all. fcntl() doesn't work at all (but that's probably because we're not running lockd.)
I've tried various ways of forcing other machines to update their filesystem cache without doing something on those machines (so you can optionally do that after the msync()) by changing atime, mtime, nlinks, but so far, nothing. I should point out that the file-metadata (mtime/ctime/nlinks) returned by fstat() sometimes do get updated, and sometimes they don't. Same for stat().
That aside, this issue isn't that big an issue for us. The same-client-connecting-twice case we can solve by configuring the layer-4 ethernet switch to connect the same ipaddress to the same real server, so that mmaps() are properly shared and all. We might want per-mailbox locks so that only one real server can open a specific mailbox (but do so multiple times) but I'll figure that one out later.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Mon, 2002-10-21 at 17:19, Thomas Wouters wrote:
Requiring msync() is fine, that's done after each change, but there should be better solution than re-mmap()ing to notice the changes. I think FreeBSD checked the changes after fcntl() locking changes :)
Hmm. More bad news; flock() doesn't work over NFS. That is, local processes see and honor the lock even on NFS filesystems, but other machines don't see the lock at all. fcntl() doesn't work at all (but that's probably because we're not running lockd.)
flock() doesn't matter, it's used only for mbox locking where .lock file would work instead just as well.
That aside, this issue isn't that big an issue for us. The same-client-connecting-twice case we can solve by configuring the layer-4 ethernet switch to connect the same ipaddress to the same real server, so that mmaps() are properly shared and all. We might want per-mailbox locks so that only one real server can open a specific mailbox (but do so multiple times) but I'll figure that one out later.
OK. I think this could be fixed internally too. Or this is mostly a problem with index files, mbox/maildir files are currently re-mmap()ed every time they're accessed (but I'll change mbox not to do that later).
Indexes currently have "sync_id" in their header, it's changed whenever the file size is changed so other processes then know to re-mmap() it. This could be optionally changed to be updated every time the file itself has changed to force others to mmap() again. The sync_id change itself could be checked using lseek() + read().
If fstat() or stat() doesn't show mtime changes, that could be a bit worse problem. I think I'm relying on that with some things.. At least new mail is checked by seeing if Maildir/cur's mtime matches .imap.index's mtime.
On Mon, 2002-10-21 at 17:19, Thomas Wouters wrote:
That aside, this issue isn't that big an issue for us. The same-client-connecting-twice case we can solve by configuring the layer-4 ethernet switch to connect the same ipaddress to the same real server, so that mmaps() are properly shared and all. We might want per-mailbox locks so that only one real server can open a specific mailbox (but do so multiple times) but I'll figure that one out later.
Just had a thought. Would it be feasible to _try_ to permanently assign users to one or few specific servers (via ip or maybe login proxy)? If those servers were down, it could fallback to any random one.
I was thinking Dovecot's indexes could just as well be stored in local hard disk - they're not required to exist and they're not required to be in sync when opening, so it's possible to keep multiple indexes lying around in different servers.
That would take care of most of the mmap() and locking problems and should make it perform a _lot_ better than through NFS. I don't know how NFS works internally, but I doubt it has any way for remote OS to determine what parts of file has changed, so re-mmap()ing would most likely always reread the whole file (or the parts that it accesses) which is quite inefficient.
I really like this idea, keeping indexes in local disk where they may be considered as fast non-permanent data and then reading the actual mail data via backed up NFS server. This gets me thinking of a lot more possible optimizations to reduce NFS I/O at the cost of more local.. :)
On Tue, 2002-10-22 at 04:24, Timo Sirainen wrote:
Just had a thought. Would it be feasible to _try_ to permanently assign users to one or few specific servers (via ip or maybe login proxy)? If those servers were down, it could fallback to any random one.
Dovecot could actually do that itself too, authenticate user and then either locally handle it or transfer it to another node based on some configuration.
I really like this idea, keeping indexes in local disk where they may be considered as fast non-permanent data and then reading the actual mail data via backed up NFS server. This gets me thinking of a lot more possible optimizations to reduce NFS I/O at the cost of more local.. :)
This again makes the indexer process possible and useful, since it would be accessing only local disks. It could also delete some of the older indexes if disk space is getting full.
On Tue, Oct 22, 2002 at 05:31:22AM +0300, Timo Sirainen wrote:
On Tue, 2002-10-22 at 04:24, Timo Sirainen wrote:
Just had a thought. Would it be feasible to _try_ to permanently assign users to one or few specific servers (via ip or maybe login proxy)? If those servers were down, it could fallback to any random one.
Yes, the Alteons we use can be configured quite flexibly. We can easily configure, e.g., two servers as 'primary' and two fallback servers, or do load-balancing based on the output of a script, or any number of things. We only use the general load-balancing (actually just active-connection-balancing) while keeping sessions on the same server (based on remote IP) but we could look into the more intricate methods. A single IMAP server with a backup is a good start though.
Dovecot could actually do that itself too, authenticate user and then either locally handle it or transfer it to another node based on some configuration.
You mean if you have a frontend with several backends, and the frontend proxies for the backends (with several frontends possible, for redundancy,) hmm, that might work. Diablo (the news server software) works like this, somewhat, too, and we also use it behind Alteon switches :)
I really like this idea, keeping indexes in local disk where they may be considered as fast non-permanent data and then reading the actual mail data via backed up NFS server. This gets me thinking of a lot more possible optimizations to reduce NFS I/O at the cost of more local.. :)
This again makes the indexer process possible and useful, since it would be accessing only local disks. It could also delete some of the older indexes if disk space is getting full.
Yes. Keeping things on local disk sounds good. As long as opening the same mailbox on another server doesn't break anything (or breaks 'cleanly', doesn't delete the wrong mails etc) we can definately live with much worse performance for those cases. My professional estimate is that they will be very, very rare ;P
But for the time being we're more concerned with the SORT extention :) I've read the spec, and besides the natural "ugh" at having to parse the subject that way, it seems doable... except that charset support for UTF-8, as well as US-ASCII, is mandatory. I don't know any libraries that convert to/from UTF-8 (though ASCII<->UTF-8 is obviously simple :) and though it's probably easy to roll your own for iso8859-1 I'm not sure if you had a solution in mind yet. Also, supporting the other character sets (like UW-IMAP does) is probably a lot trickier.
Other than the charset issues I could probably whip up a working SORT implementation given enough time... but it probably wouldn't be super-efficient, as quicksort is so much easier to start with than, say, a mergesort ;P
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, Oct 22, 2002 at 01:49:22PM +0200, Thomas Wouters wrote:
Yes, the Alteons we use can be configured quite flexibly. We can easily configure, e.g., two servers as 'primary' and two fallback servers, or do load-balancing based on the output of a script, or any number of things. We only use the general load-balancing (actually just active-connection-balancing) while keeping sessions on the same server (based on remote IP) but we could look into the more intricate methods. A single IMAP server with a backup is a good start though.
OK, they'd probably be better than any Dovecot proxies I think.
Dovecot could actually do that itself too, authenticate user and then either locally handle it or transfer it to another node based on some configuration.
You mean if you have a frontend with several backends, and the frontend proxies for the backends (with several frontends possible, for redundancy,) hmm, that might work. Diablo (the news server software) works like this, somewhat, too, and we also use it behind Alteon switches :)
Not necessarily split to frontend/backend.. Well, that's possible too but I was thinking that every running dovecot could handle authentication and transferring connection elsewhere (via another TCP connection, login using some internal password, maybe use TLS too).
Yes. Keeping things on local disk sounds good. As long as opening the same mailbox on another server doesn't break anything (or breaks 'cleanly', doesn't delete the wrong mails etc) we can definately live with much worse performance for those cases. My professional estimate is that they will be very, very rare ;P
Nothing breaks if same mailbox is opened from different computers with different indexes (or no indexes).
But for the time being we're more concerned with the SORT extention :) I've read the spec, and besides the natural "ugh" at having to parse the subject that way, it seems doable...
The subject sorting looked very "ugh" to me too :)
except that charset support for UTF-8, as well as US-ASCII, is mandatory. I don't know any libraries that convert to/from UTF-8 (though ASCII<->UTF-8 is obviously simple :) and though it's probably easy to roll your own for iso8859-1 I'm not sure if you had a solution in mind yet. Also, supporting the other character sets (like UW-IMAP does) is probably a lot trickier.
iconv() does it all, comes with glibc. Only bigger thing to do is to parse the headers and convert the =?xxx?yyy?= things. I think everything should go either through UTF8 or without any conversion if both header and search charsets are same.
Other than the charset issues I could probably whip up a working SORT implementation given enough time... but it probably wouldn't be super-efficient, as quicksort is so much easier to start with than, say, a mergesort ;P
I can think of two ways to do it:
save search results to array, sort the array, send it to clients, delete the array.
sort all mails writing results into btree file, keep the file updated whenever new mails are added or deleted. then do the search in that order so we can just write out the results without any sorting.
I like the 2) more, but that works only if the sort condition isn't changed. Or if it is changed, then we'd need to have multiple btree files.. And in general it slows down things if sorting isn't done often.
On Tue, Oct 22, 2002 at 04:01:46PM +0300, Timo Sirainen wrote:
except that charset support for UTF-8, as well as US-ASCII, is mandatory. I don't know any libraries that convert to/from UTF-8 (though ASCII<->UTF-8 is obviously simple :) and though it's probably easy to roll your own for iso8859-1 I'm not sure if you had a solution in mind yet. Also, supporting the other character sets (like UW-IMAP does) is probably a lot trickier.
iconv() does it all, comes with glibc.
Um. Of course FreeBSD didn't have glibc, but iconv() is anyway pretty standard, man page says "Conforming to UNIX98". It comes as a separate library as well.
BTW. does SquirrelMail also require THREAD extension? It's not much more different from SORT luckily.
On Tue, Oct 22, 2002 at 04:06:50PM +0300, Timo Sirainen wrote:
On Tue, Oct 22, 2002 at 04:01:46PM +0300, Timo Sirainen wrote:
except that charset support for UTF-8, as well as US-ASCII, is mandatory. I don't know any libraries that convert to/from UTF-8 (though ASCII<->UTF-8 is obviously simple :) and though it's probably easy to roll your own for iso8859-1 I'm not sure if you had a solution in mind yet. Also, supporting the other character sets (like UW-IMAP does) is probably a lot trickier.
iconv() does it all, comes with glibc.
Um. Of course FreeBSD didn't have glibc, but iconv() is anyway pretty standard, man page says "Conforming to UNIX98". It comes as a separate library as well.
Yeah, I noticed the same thing :) But it still has to be a concious decision, as not all platforms come with libiconv and I wasn't sure what your target audience is, and might become.
BTW. does SquirrelMail also require THREAD extension? It's not much more different from SORT luckily.
It looks like it's optional. As a matter of fact, so is SORT :) But both would be very nice to have, not just for SquirrelMail.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, Oct 22, 2002 at 03:22:28PM +0200, Thomas Wouters wrote:
Um. Of course FreeBSD didn't have glibc, but iconv() is anyway pretty standard, man page says "Conforming to UNIX98". It comes as a separate library as well.
Yeah, I noticed the same thing :) But it still has to be a concious decision, as not all platforms come with libiconv and I wasn't sure what your target audience is, and might become.
Well, target audience should be as large as possible :) But I think it'd be good enough to make iconv() required for charset-support, without iconv() it would support only charsets which don't need any conversion (ascii and "search charset foo" with "=?foo?..?=")
BTW. does SquirrelMail also require THREAD extension? It's not much more different from SORT luckily.
It looks like it's optional. As a matter of fact, so is SORT :) But both would be very nice to have, not just for SquirrelMail.
Yeah. I'll add support for both later, currently there's a bit more important things to do which you'll want fixed as well :)
On Tue, Oct 22, 2002 at 05:00:13PM +0300, Timo Sirainen wrote:
Yeah. I'll add support for both later, currently there's a bit more important things to do which you'll want fixed as well :)
Heh. That reminds me... Have you considered using, for instance, SourceForge to host dovecot ? Or at least use something like syncmail, which mails out CVS diffs on each checkin ? I use syncmail on every CVS project, both internal and external, and I find I've grown very attached to it, and am too used to seeing what goes on in an project just by looking at the checkins :)
(Not that I'm actively pushing you to use SourceForge or anything... it definately has its downsides too, if you're capable of running your own CVS server anyway.)
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, Oct 22, 2002 at 08:18:19PM +0200, Thomas Wouters wrote:
Heh. That reminds me... Have you considered using, for instance, SourceForge to host dovecot ?
And while on *that* subject, the current CVS tree needs a patch like this to compile. (And the usual Makefile rebuilds.) But I'm sure you already noticed :-)
Index: src/login/Makefile.am
RCS file: /home/thomas/cvs-root/dovecot/src/login/Makefile.am,v retrieving revision 1.1.1.1 diff -c -r1.1.1.1 Makefile.am *** src/login/Makefile.am 9 Aug 2002 09:15:53 -0000 1.1.1.1 --- src/login/Makefile.am 22 Oct 2002 18:48:32 -0000
*** 1,7 **** pkglib_PROGRAMS = imap-login
INCLUDES =
! -I$(top_srcdir)/src/lib
imap_login_LDADD =
../lib/liblib.a
--- 1,8 ----
pkglib_PROGRAMS = imap-login
INCLUDES =
! -I$(top_srcdir)/src/lib
! -DPACKAGE=\""$(PACKAGE)"\"
imap_login_LDADD =
../lib/liblib.a
Index: src/master/Makefile.am
RCS file: /home/thomas/cvs-root/dovecot/src/master/Makefile.am,v retrieving revision 1.1.1.1 diff -c -r1.1.1.1 Makefile.am *** src/master/Makefile.am 9 Aug 2002 09:15:55 -0000 1.1.1.1 --- src/master/Makefile.am 22 Oct 2002 18:48:12 -0000
*** 4,10 ****
-I$(top_srcdir)/src/lib
-DSYSCONFDIR=\""$(sysconfdir)"\"
-DPKG_RUNDIR=\""$(localstatedir)/run/$(PACKAGE)"\"
! -DPKG_LIBDIR=\""$(pkglibdir)"\"
imap_master_LDADD =
../lib/liblib.a
--- 4,11 ----
-I$(top_srcdir)/src/lib
-DSYSCONFDIR=\""$(sysconfdir)"\"
-DPKG_RUNDIR=\""$(localstatedir)/run/$(PACKAGE)"\"
! -DPKG_LIBDIR=\""$(pkglibdir)"\"
! -DPACKAGE=\""$(PACKAGE)"\"
imap_master_LDADD =
../lib/liblib.a
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, 2002-10-22 at 21:49, Thomas Wouters wrote:
And while on *that* subject, the current CVS tree needs a patch like this to compile. (And the usual Makefile rebuilds.) But I'm sure you already noticed :-)
config.h should define the PACKAGE, and it does for me with both autoconf 2.13 and 2.5. Did you use GNU make? automake doesn't work very well without.
Index: src/login/Makefile.am
RCS file: /home/thomas/cvs-root/dovecot/src/login/Makefile.am,v retrieving revision 1.1.1.1 diff -c -r1.1.1.1 Makefile.am
and diff -u in future please :)
On Tue, Oct 22, 2002 at 11:50:40PM +0300, Timo Sirainen wrote:
On Tue, 2002-10-22 at 21:49, Thomas Wouters wrote:
And while on *that* subject, the current CVS tree needs a patch like this to compile. (And the usual Makefile rebuilds.) But I'm sure you already noticed :-)
config.h should define the PACKAGE, and it does for me with both autoconf 2.13 and 2.5. Did you use GNU make? automake doesn't work very well without.
Ah, hm, I must have done something wrong with the aclocal/autoheader/automake/autoconf dance. It wasn't in my config.h or config.h.in, but I reran aclocal and now it's in config.h.in. I'm not used to using automake, just autoconf/autoheader :)
diff -c -r1.1.1.1 Makefile.am
and diff -u in future please :)
But, but, diff -c is so much more readable! :P Sigh :)
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, Oct 22, 2002 at 04:01:46PM +0300, Timo Sirainen wrote:
Only bigger thing to do is to parse the headers and convert the =?xxx?yyy?= things. I think everything should go either through UTF8 or without any conversion if both header and search charsets are same.
I assume you only want to convert to UTF-8 or the other character sets when it's really necessary, not store all data internally as UTF-8 or wchar_t ?
I can think of two ways to do it:
- save search results to array, sort the array, send it to clients, delete the array.
- sort all mails writing results into btree file, keep the file updated whenever new mails are added or deleted. then do the search in that order so we can just write out the results without any sorting.
I like the 2) more, but that works only if the sort condition isn't changed. Or if it is changed, then we'd need to have multiple btree files.. And in general it slows down things if sorting isn't done often.
I think 2) might be an option if you're dealing with very specific SORTs. SquirrelMail, for instance, allows sorting on date, from, subject, arrival and to (but the last one only in 'sent-mail' mailbox, oddly enough) and all reverses, and in various order as well, by little buttony things on the mailbox-index page... easy to play with. (Don't forget, you can
. SORT (SUBJECT REVERSE FROM REVERSE TO REVERSE DATE ARRIVAL) UTF-8 ALL
and which btrees would you use how, in that case ? :) Anyway, in SquirrelMail at least, I don't think there is a system-wide 'default' for the criterium that's most often SORTed on. Simply storing the last SORT might be the optimal solution, as I think (after the initial toying with the mailbox sort order, and the occasional switch to search faster) most people won't touch their sort order once they like or are used with what they have.
-- Thomas Wouters <thomas@xs4all.net>
Hi! I'm a .signature virus! copy me into your .signature file to help me spread!
On Tue, Oct 22, 2002 at 03:40:07PM +0200, Thomas Wouters wrote:
Only bigger thing to do is to parse the headers and convert the =?xxx?yyy?= things. I think everything should go either through UTF8 or without any conversion if both header and search charsets are same.
I assume you only want to convert to UTF-8 or the other character sets when it's really necessary, not store all data internally as UTF-8 or wchar_t ?
Well, there's not much stored in memory, and index files store mostly just FETCH ENVELOPE. The envelope is better to be in format where it's suitable for directly sending to IMAP client and those few things that are stored in memory aren't used by search at all. I think it'll be easier if everything was just converted when needed and it's just more CPU (and maybe memory) usage - there should be plenty of that left :)
I think 2) might be an option if you're dealing with very specific SORTs. SquirrelMail, for instance, allows sorting on date, from, subject, arrival and to (but the last one only in 'sent-mail' mailbox, oddly enough) and all reverses, and in various order as well, by little buttony things on the mailbox-index page... easy to play with. (Don't forget, you can
. SORT (SUBJECT REVERSE FROM REVERSE TO REVERSE DATE ARRIVAL) UTF-8 ALL
and which btrees would you use how, in that case ? :) Anyway, in
Primary condition could be enough to store in the btree, the other conditions are used only when primary compares equal between mails, so we can just read those into memory and then apply the rest of the sorting. Still faster and takes less memory than reading everything into memory and then sorting.
Or the btree could be fully sorted with some condition, but if it's not exactly the same we want we could just use the primary condition.
(uh, a bit badly said, hope it makes some sense :)
On 2002-10-22 05:31:22 +0000, Timo Sirainen wrote:
Subject: [dovecot] Re: Architectural questions From: Timo Sirainen <tss@iki.fi> To: dovecot@procontrol.fi X-Mailer: Ximian Evolution 1.1.1.99 (Preview Release) Date: 22 Oct 2002 05:31:22 +0300
On Tue, 2002-10-22 at 04:24, Timo Sirainen wrote:
Just had a thought. Would it be feasible to _try_ to permanently assign users to one or few specific servers (via ip or maybe login proxy)? If those servers were down, it could fallback to any random one.
Dovecot could actually do that itself too, authenticate user and then either locally handle it or transfer it to another node based on some configuration.
hmm
i just wanted to suggest http://www.vergenet.net/linux/perdition/ for proxying. but if dovecot could do something similar by it self this is not needed :)
bow before god aehm cras ^^
marcus
-- irssi - the client of the smart and beautiful people
http://www.irssi.de/
On Tue, 2002-10-22 at 15:47, Marcus Rueckert wrote:
i just wanted to suggest http://www.vergenet.net/linux/perdition/ for proxying. but if dovecot could do something similar by it self this is not needed :)
It wouldn't be difficult to add proxying for dovecot, since it already does authentication and can SSL connections are pretty much proxied already through separate process.
Anyway I looked at Perdition. It doesn't support AUTHENTICATE, plus I just found buffer overflow from it without even trying much.
participants (3)
-
Marcus Rueckert
-
Thomas Wouters
-
Timo Sirainen