[Dovecot] Scalability plans: Abstract out filesystem and make it someone else's problem

Ed W lists at wildgooses.com
Wed Aug 12 22:03:57 EEST 2009


Daniel L. Miller wrote:
> Timo Sirainen wrote:
>>> Also the mime structure could be torn apart to store attachments 
>>> individually - the motivation being single instance storage of large 
>>> attachments with identical content...  Anyway, these seem like very 
>>> speculative directions...
>>>     
>>
>> Yes, this is also something in dbox's far future plans.
>>   
> Speaking as a pathetic little admin of a small site of 20 users, my 
> needs for replication & scalability are quite minor.  However, 
> single-instance storage would be a miracle of biblical proportions.  
> Has any progress been made on this?  Do you have a roadmap for how you 
> plan on implementing it?
>
> I don't know if you've considered this at all - this was my first 
> thought:
>
> If you're able to store a message with the attachments separately, 
> then you can come up with an attachment database (not meaning to imply 
> SQL backend).  Then after breaking the message up into message + 
> attachments, you scan the attachment database to see if it is already 
> present prior to saving it.  This could mean that not only could we 
> save on the huge space wasted by idiots merrily forwarding large 
> attachments to multiple people, but even received mails with embedded 
> graphical signatures would benefit.

It would be interesting to quickly script something in perl (see one of 
the Mime parsers) to simply scan every email on your system, do an MD5 
of each mime part, then stick this in a dictionary (with the size) and 
count the number of hits greater than one (ie duplicate parts).  Count 
the bytes saved and share the script so we can all have a play

I do like the idea of single instance storage, but I'm actually willing 
to bet it makes only a few percent difference in storage cost for the 
majority of mail servers (I dare say your mileage will vary, but my 
point was to benchmark it)

I don't mean this as a negative, but more that I nearly scripted this a 
couple of months back for my own needs and then ran out of time.  I 
think it won't be more than 50 lines of perl and would be interesting to 
see how people's numbers vary?

Ed W


More information about the dovecot mailing list