On Wed, 2006-06-07 at 09:29 -0700, Marc Perkel wrote:
Les Mikesell wrote:
I think the people who expect an improvement from databases over maildir are used to unix filesystems that degrade badly as the number of files in a directory increase. These days many, like Reiserfs and XFS, are much better. My theory is that if your filesystem isn't a good place to store things you should fix that before thinking about databases.
Suppose you have a total of one million messages stored for 5000 users across 800 domains and you want to delete all message that were sent from a specific host. With MySQL it's a one line command and would take only a few seconds to execute. Using Maildir it would take hours because you would have to search every message. That's the power of MySQL.
And with great power comes great frustration: Instead of taking 50msec to fetch a message, it takes 80msec, and instead of taking 25msec to load a message into a mailbox, it now takes 200-500msec!
Note that what you REALLY want to do is create a smart index of everything- it exists, it's called ZOE. It's not fast, and I think in practice, it's not all that much fun to use, but it does EXACTLY what you're talking about- builds a relational model of incoming email.
It's Slow. REALLY slow. And don't fool yourself into thinking it's slow because it's written in Java. It's slow because fulltext loading messages is slow.
Maildir only is fast for indexing file names. But if you are indexing across users and domains by host, or headers, or senders, or whatever then only a database can support these multiple indexes. There are things you can do with databases that are way beyond what you can imagine. Especially if you are integrating it with a spam system.
No. The cost is GREATER for MySQL than it is for Maildir because MySQL has to spend the time to index those things whether they're used or not. Building an index on everything is slow- you spend all your time maintaining indexes. Most of the time they're not needed- and for the cases where they're interesting, a specialized solution is always faster.
For example, a new message comes in and you find that sender matches email in 100 people's spam folders and none in any other folder? It can be classified as spam. If however the from address matches ham in people folder and no spam then you can probably deliver it without spam scanning.
Sender-checks are useless- most junk mail in my mailboxes come from random addresses. Language classifiers don't use MySQL as a backend because MySQL is slow. Sparse fields are much faster, and not at all difficult to implement.
This issue isn't performance, it's power.
Really?
Tue, 06 Jun 2006 05:38:41 -0700 (08:38 EDT)
Seems to me that MySQL could somplify all the backend stuff that everyone struggles with and with replication one could create very massive and reliable systems
Tue, 06 Jun 2006 16:40:27 -0700 (19:40 EDT)
Not true for "always". If you have 100,000 messages in a folder the database will win easy.
It seems to me that this is poorly thought out with no real goals in mind. It seems like a way to "keep your options open" - except you neglect that doing so incurs a great cost. I recommend you take a look at ZOE and think a little more if you want all email to work that way.
If you compare it with what people are doing now then MySQL would probably be a little slower.
No, it will definitely be a lot slower.
But if you compare it to what is possible that you can't do now then MySQL wins.
I'm not interested in doing "what's possible"- and in fact, the vast majority of people aren't either. They're interested in solving problems, or having the problems solved for them.
Please go look at ZOE and DBMail and decide if you still think this is a good idea after using these-- maybe it will help you decide exactly what you want to do- and perhaps if you do decide to "make your own LDA" it'll help you decide what's wrong with them- so you don't repeat their mistakes.
-- Internet Connection High Quality Web Hosting http://www.internetconnection.net/