At 1AM -0500 on 26/10/12 you (Stan Hoeppner) wrote:
On 10/25/2012 10:54 PM, Ben Morrow wrote:
dovecot-lda runs in its own process, and does not involve the imap process in any way. As such it has to do locking.
You apparently know your tools better than I do. Neither ps nor top show a 'dovecot-lda' or similarly named process on my systems. When I send a test message from gmail through Postfix I only see CPU or memory activity in an imap process. When I close the MUA to end the imap processes and then send a test message I don't see any CPU or memory activity in any dovecot processes, only Postfix processes, including local, and spamd. So is devecot-lda running as a sub-process or thread of Postfix' local process? Or is it part of the 'dovecot' process, and the message goes through so quick that top doesn't show any CPU usage by the 'dovecot' process?
Assuming you have
mailbox_command = /.../dovecot-lda -a "${RECIPIENT}"
or something equivalent in your Postfix configuration, dovecot-lda runs as a subprocess of local(8) under the uid of the delivered-to user.
If I have the following in my dovecot.conf: ... <snipped for readability> ...
I'm not sure what you mean by 'processes of [one's own] program' but
I.e. Dovecot has its own set of processes, Postfix has its processes, etc. With "one's one processes" I'd think it makes more sense to use IPC and other tricks to accomplish concurrent access to a file rather than filesystem locking features.
Filesystem locking, at least if NFS is not involved, is not that expensive. Successfully acquiring a flock or fcntl lock takes only a single syscall which doesn't have to touch the disk, and any form of IPC is going to need to do that. (Even something like a shared memory region will need a mutex for synchronisation, and acquiring the mutex has to go through the kernel.)
Dotlocking *is* expensive, because acquiring a dotlock is a complicated process requiring lots of syscalls, some of which have to write to disk; and any scheme involving acquiring several locks on the same file is going to be more so, especially if you can end up getting the first lock but finding you can't get the second, so then you have to undo the first and try again.
More importantly, the biggest problem with mbox as a mailbox format is that any access at all has to lock the whole mailbox. If the LDA is trying to deliver a new message at the same time as an IMAP user is fetching a completely different message, or if two instances of the LDA are trying to deliver at the same time, they will be competing for the same lock even though they don't really need to be. A file-per-message format like Maildir avoids this, to the point of being mostly lockless, but that brings its own efficiency problems; the point of dbox is to find the compromise between these positions that works best.
it's extremely common for a process to have to take locks against another copy of itself. All traditional Unix LDAs and MUAs do this; for instance, procmail will take locks in part so that if another instance of procmail is delivering another mail to the same user at the same time the mbox won't end up corrupted.
I guess I've given MDAs w/mbox too much credit, without actually looking at the guts.
I wouldn't look too hard at the details of the various ways there are of locking and parsing mbox files, or the ways in which they can go wrong. It's enough to make anyone swear off email for life :).
Scalable databases such Oracle, db2, etc, are far more intelligent about this, and can have many thousands of processes reading and writing the same file concurrently, usually via O_DIRECT, not buffered IO, so they have complete control over IO. This is accomplished with a record lock manager and IPC, preventing more than one process from accessing one record concurrently, but allowing massive read/write concurrency to multiple records in a file. I'd think the same concurrency optimization could be done with Dovecot.
However, as Timo has pointed out, so few people use mbox these days that he simply hasn't spent much, if any, time optimizing mbox. Implementing some kind of lock manager and client code just for mbox IO concurrency simply wouldn't be worth the time. Unless he's already done something similar with mdbox. If he has, maybe that could be 'ported' to mbox as well. But again, it's probably not worth the effort given the number of mbox users, and the fact that nobody is complaining about mbox performance. I'm certainly not. It works great here.
The only reason for using mbox is for compatibility with other systems which use mbox, which means you have to do the locking the same way as they do (assuming you can work out what that is). If you're going to change the locking rules you might as well change the file format at the same time, both to remove the insanity and to make it actually suitable for use as an IMAP mailstore. That's what Timo did with dbox, so if you've got your systems to the point where nothing but Dovecot touches the mail files you should seriously consider switching.
Ben