[dovecot] Re: Architectural questions

19 Oct 2002 · *_lock() or mbox_lock_*

      On Sat, 2002-10-19 at 15:38, Thomas Wouters wrote:
...
We briefly played with modifying sendmail and the pop server to avoid the
full copy in the common case (only status changes) by doing in-place edits
of a pre-generated Status line,
UW-imapd does this as well, creating "X-Keywords:          " line for
each mail. I had thought about this first with dovecot too, but since
mutt rewrote the whole mailbox always I figured I might as well. But
with larger mailboxes this is really slow, so I think I'll support the
X-keywords trick myself too.
...
as well as avoid full scanning of the mbox
file by creating special headers to mark the 'real' length of an email.
For each mail? Content-Length? With my tests that didn't seem to help
much, rather made it just slower.. Could be that I just did something
badly, have to look into it more when I begin optimizing mbox handling
more. Have to get it at least as fast as UW-imapd :)
...
We still have support for mbox mailboxes in a user's homedirectory though,
by using procmail and such. So when we needed an IMAP server for use with
our webmail (based on SquirrelMail), we were forced to go with the UW-IMAP
server, with the maildir patch that's been scattered around the 'net. This
Hm. Squirrelmail requires SORT extension which Dovecot doesn't support
yet. Notes about SORT from CVS's TODO:

sort (draft-ietf-imapext-sort)
basically sorted SEARCH, requiring CHARSET support for
UTF-8 and ASCII
we could create alternative binary tree file(s) for different sort
conditions, ".tree-sort" or something. or if we decide to just
keep it in memory, btree could still be best choice.
required by squirrelmail (webmail)

...

First off, we need the maildir support to be 'correct' in that it does
not rely on the naming of the files in the mailbox, other than the very
loose specification DJB gives (doesn't contain a colon or slash and
doesn't start with a dot.) The pine/UW-imap patch breaks here because it
depends on the first part of the filename being time() or something else
that, when sorted alphanumerically, puts new mail at the end. Our
LDA does this, but procmail does not, and it shouldn't have to.

Dovecot doesn't care as long as the file name stays same before the ':'
character.
...

Second, we need the maildir support to be 'correct' in that it does not
rely on the directory order being persistant. The NetApp Filers use
btree-indexed directories, so the order of readdir() can change
completely whenever a file is added or removed. The pine/uw-imap patch
relies on the '.uidvalidity' file being modified whenever the maildir sort
order changed, and this isn't happening.

Dovecot reads them into hash so it doesn't depend on readdir()
behaviour.
...

We need to avoid using fcntl(). The Netapps support it, but file-locking
over NFS is very, very poorly designed and we've had too much problems of
various kinds before, with fcntl. We also don't like the idea of having
thousands of fcntl locks at the same time ;P Instead, we've switched to
the locking method described in the Linux open(2) manpage under O_EXCL.
(We call it 'dot-locking', I'm not sure where the name came from.)

Hmm. The dot-lock means the "mbox.lock" file which gets created when
someone wants it exclusively locked. Dovecot supports it, and maildir
itself doesn't need locking at all. Dovecot's index files currently use
fcntl()-locking, but it would be possible to replace them with lock
files.
Then there's modify log file. Dovecot uses fcntl() locking for it as a
way to figure out if it's the only one using the log file. Like make
everyone read-lock the file, then if someone wants to know if it's the
only one using it it tries to set write-lock on, if it fails it knows
someone else it using it as well. I'm not sure if there's any good way
to replace that by using files, I had pretty complicated (desperate)
plans before figuring out fcntl() could be used to do it easily.
It would be possible to just assume that there's always someone else
using the modify log, but each flag change or expunge would always write
a few bytes to it then, and when log file is switched (there's .log and
.log.2) it wouldn't be truncated after last process is finished with it
which is not too bad since after the next switch it will be truncated.
Also it would be possible not to use index files at all but just keep
them in memory. I've been fixing code to make this possible and somewhat
fast.
...
If we're going to
use dovecot, we will replace most, if not all, fcntl()s with dot-locking,
the question is whether you want it contributed to dovecot :)
All locking goes through file_*_lock() or mbox_lock_*() functions. mbox
locking supports it already, and file_*_lock() could be made to support
it. It doesn't get currently file name but that could be done.
...

Every user's incoming mailbox is /var/spool/u/s/username. Other mailboxes
are in /home/u/username/mail or /home/u/username/Mail (the second if the
first does not exist.) We are not yet certain whether we want the inbox
to be able to have subdir-mailboxes, as /var/spool and /home have
different quotas and we urge people not to store their mail on
/var/spool. (for one thing, it doesn't get backed-up.) We want these
things to work without magical symlinks or empty files, because people
_will_ delete them and cause unnecessary helpdesk calls :) Again, the
question is mostly whether this is desirable in dovecot (or something
enough like it to reduce local changes.)

Are maildir inboxes also in /var/spool? With mbox sub-inboxes wouldn't
be even possible because dir structure == mailbox structure, and since
inbox file exists there can't be inbox-dir (except maybe with different
case but that's kludgy).
I've also thought I might as well make it possible to read the mbox
inbox from /var/mail or whereever it is. Pretty easy to do, but .lock
file is problematic if new files can't be added to the /var/mail
directory.
...

We have over 300k mailboxes at the moment. We expect that number to keep
growing. The indexer process (as described by design.txt) does not sound
as a good idea in our case :) How necessary is it, really ? Especially
since we do not expect more than 10% of those mailboxes to be actually
used by IMAP, not even once. If disabling the indexer completely just
means longer startup times for IMAP sessions, we can live with that.

Indexer doesn't exist yet, and wouldn't be really needed even. I still
think it could be somewhat nice idea, the system load is probably less
during night so we could use the extra time to make mailboxes perform
faster next day.
It'd be difficult to know when exactly there is "extra time" which is
why I haven't yet done the indexer. Probably needs some external program
(script) which tells it by maybe looking at some I/O statistics from
/proc or doing a few file operations and checking the latency.
Am I right in that CPU usage still isn't any problem but rather the I/O?
...

The UW-IMAP maildir patch stores UID's in the indiviual filenames, using
a 'U' flag. Will this interfere with dovecot ? We don't really need
dovecot and UW-IMAP to share UIDs, but we would like to have an as
painless transition as possible, without having to rename millions of
files to remove the U flag and other flags :P It would also be nice to
keep pine using the existing maildir patch, even though very few
IMAP-users would use pine.

How exactly does the U flag work? I hope it's before the ':' character
like Courier's S=filesize? Otherwise U=1234 would be thought of as 6
different flags which isn't very good since Dovecot reorders them as
1234=U.
...

Would dovecot scale, architecturally speaking, to 500k+ active mailboxes ?
The amount of hardware is not really an issue, we can add a lot of
machines (off-the-shelve intel hadware) to each cluster, but if each
dovecot process has to load in an index of all possible mailboxes... that
would be a problem. Doing an inordinate number of file-accesses over NFS
would also be a problem, but I haven't seen any indication of that in the
source, yet.

Dovecot opens the index when opening mailbox. It doesn't open other
mailboxes indexes. Also the indexes should make the file accesses less
than otherwise, especially with mbox since it wouldn't need to read and
parse the whole mbox file. In general I've tried to keep the file I/O as
little as possible.
If your clusters access the files through NFS, there should be no
problem. Except I've never tried Dovecot through NFS, and I'm not sure
how well mmap()ing works through NFS. I know there's been problems
before but hopefully they've been fixed already.

[dovecot] Re: Architectural questions

Timo Sirainen