Steve wrote:
dbox-only is fine. I could care less about the storage method chosen - filesystem, db, encrypted, whatever - but I believe the impact on storage - and possibly indexes & searching - would be huge.
On the personal greedy side, if you want to see a mass corporate migration to Dovecot, with potential service contracts - that would be a feature worth talking about. I can see IT manager's eyes light up at hearing about such a item - and I've never heard of any other mail server supporting such a thing.
IBM Lotus Domino has that feature since ages (they call it shared mail). And they don't have that just for normal mails but for archives as well (called single instance store). This feature was first introduced in cc:Mail and then got integrated into Domino and is still there and even extended to work with various backends (like the new DB2 backend). Microsoft copied that concept from them (from my viewpoint the way how MS has done it in the past was horrible. I think newer versions work better but I am not sure).
From my experience in doing messaging since 2 decades I can tell you that it is not worth doing single instance store (or how ever you call it). Storage is ultra cheep these days and backup systems are so fast that all the benefits which where valid some years ago are gone today.
It might rock your geek heart to implement something like that but doing the math on costs versus benefits will soon or later show you that today it's not worth doing it.
I have no experience with Domino, but I just did a Google for "lotus
domino shared mail" and read the brief on lotus.com. Based on what I
read, it has potential - only splits message headers from bodies and
stores the bodies as complete images, without separating attachments.
That helps reduce the load when somebody blasts out a flier to everyone
in the company in a single message - but I'm asking for something more
ambitious.
If every attachment in a given message is individually scanned to generate some unique identifier, and that identifier then used to determine whether or not it exists in the database - this could have HUGE effects. This now addresses not just the simple broadcast - but some really crazy possibilities.
User A receives a message with an attachment (like a product brochure), likes it, and forwards it to Users B-Z. User F recognizes that product, but has a counter-proposal, so he attaches another brochure and replies to A-Z. Being an idiot, the original attachment is still kept in the reply. User H forwards this message to a buddy at another company for discussion. [...time passes...] Three weeks later, User 101 at the other company gets back from vacation, has just received a message with the original brochure. He forwards it to User A (who started this mess). User A, being a total dimwit, doesn't recognize that he already spread this junk throughout the company last month - so he broadcasts it again.
Under the structure I've proposed, net storage consumed by the attachments should be one copy of attachment 1, and one copy of attachment two, plus headers and any comments in the messages times the number of recipients. Domino would store one copy of attachment 1, then a copy of attachment 1 + attachment 2, then another copy of attachment 1.
This is a minor example - but I just wanted to show SOMETHING to justify the effort.
As far as cheap storage - I agree costs are a fraction of what they once were. But by reducing the amount stored, consider the tradeoffs of reduced caching, smaller differential backups, and reduced archival costs (off-site storage costs often calculated per GB), just to name a few. To me the only down side (other than requiring Timo to invest more blood, sweat, & tears in this project) is how much this costs in message READ time. For me, typical user interaction is reading. As I believe previously mentioned, if the server implements some type of delayed delete function, then delete times are not a concern. And write times are also (I think) a minor concern. But the primary issue is how fast can we retrieve a message + attachments and stream it to the client. It seems to be header lists won't be impacted, so simply pointing the mail client at the server to see a list of mail shouldn't change at all. So then the question is the potential latency from when a user selects a message to when it appears on their screen. Will the time spent searching the disk, and assembling the message, be significant when compared with the network communication between server & client?
-- Daniel