Daniel L. Miller wrote:
Timo Sirainen wrote:
Also the mime structure could be torn apart to store attachments individually - the motivation being single instance storage of large attachments with identical content... Anyway, these seem like very speculative directions...
Yes, this is also something in dbox's far future plans.
Speaking as a pathetic little admin of a small site of 20 users, my needs for replication & scalability are quite minor. However, single-instance storage would be a miracle of biblical proportions.
Has any progress been made on this? Do you have a roadmap for how you plan on implementing it?I don't know if you've considered this at all - this was my first thought:
If you're able to store a message with the attachments separately, then you can come up with an attachment database (not meaning to imply SQL backend). Then after breaking the message up into message + attachments, you scan the attachment database to see if it is already present prior to saving it. This could mean that not only could we save on the huge space wasted by idiots merrily forwarding large attachments to multiple people, but even received mails with embedded graphical signatures would benefit.
It would be interesting to quickly script something in perl (see one of the Mime parsers) to simply scan every email on your system, do an MD5 of each mime part, then stick this in a dictionary (with the size) and count the number of hits greater than one (ie duplicate parts). Count the bytes saved and share the script so we can all have a play
I do like the idea of single instance storage, but I'm actually willing to bet it makes only a few percent difference in storage cost for the majority of mail servers (I dare say your mileage will vary, but my point was to benchmark it)
I don't mean this as a negative, but more that I nearly scripted this a couple of months back for my own needs and then ran out of time. I think it won't be more than 50 lines of perl and would be interesting to see how people's numbers vary?
Ed W