On Tue, 2006-06-06 at 17:02 +0100, Simon Waters wrote:
Well, the problem with a file-based database (Dovecot's indexes etc. are in fact a database) is that you must use the same locking and/or terminate / suspend the service, otherwise there is the possibility that the data and the indexes are out-of-sync.
Yes, but indexes are cheap to rebuild, but expensive to maintain, so you might find this cuts the wrong way.
I'm quite a fan of the idea of putting email in databases, I can see the upside. But those who think it will save any resource at all haven't spent enough time with big database systems. It will be a lot slower, except where you can utilise indexes to speed operations, which will be rarely if at all.
Just consider the number of blocking writes to commit an email to maildir (remember it uses a lot of rename), now consider the kind of indexes you want to maintain on the database that'll be updated when an email is delivered (and possibly when it is read, files etc).
I think the people who expect an improvement from databases over maildir are used to unix filesystems that degrade badly as the number of files in a directory increase. These days many, like Reiserfs and XFS, are much better. My theory is that if your filesystem isn't a good place to store things you should fix that before thinking about databases.
I got into pondering mail in databases from the issues pertaining to consistency of reads of directories in Unix filesystems. Whilst it is easy to guarantee the consistency of a read from an ACID style database (unlike reading directories in a big maildir folder). Of course when I asked Hans Reiser he said it sounds like the kind of modular functionality that modern filesystems ought to provide and offered to write a filesystem plugin for ReiserFS that guarantees the consistency of directory reads for maildir use. Of course there is a performance (or resource) penalty in doing a consistent read of a directory.
The issue is the same in both places, you either speed things up by allowing dirty reads or you take the performance hit by locking for the duration of all writes. When you create a new file you must atomically determine whether or not the name currently exists. Even resiser can't cheat on that without ending up corrupted.
Maybe more than one way to solve a problem, just need to make sure you know precisely which problems you are trying to solve.
Simon, who'll continue moving systems to maildir, till something better arrives.
An extended maildir might make sense where additional subdirectories are used transparently to limit the number of files in any single directory - so it would end up looking something like a squid cache which solves a very similar problem.
-- Les Mikesell lesmikesell@gmail.com