[Dovecot] Architecture for large Dovecot cluster

Sven Hartge sven at svenhartge.de
Tue Jan 28 11:46:30 EET 2014


Joseph Tam <jtam.home at gmail.com> wrote:
> Sven Hartge <sven at svenhartge.de> wrote:

>> Interesting datapoint: NetApp Deduplication did only recover about 1%
>> of storage space with mdbox-based mail storage, while on an
>> maildir-based mail storage, the rate was about 15%. (This was tested
>> with a copy of real user data, so is accurate for my workload.)

> Just a guess, but I expect the difference is because NetApp de-dupes
> by checksumming blocks and mark whole blocks as duplicates if they
> have the same checksum.

> The message body has the same block offset in maildir (i.e. the start
> of a message is at byte 0), whereas mdbox might align message body
> anywhere in a block, so you might have 512 different block
> configurations for the same message.

True, the start of the message is always at byte 0, but because of
different header length per user for the same message (different mail
address with different lengths) the body will never start at the same
byte.

In the end, a slight compression (gzip 3) via Dovecot resulted in better
space savings than compression and deduplication via NetApp.

The most space can obviously saved via SiS of attachements in dovecot,
but to be frank, this feature scares me a bit.

Grüße,
Sven.

-- 
Sigmentation fault. Core dumped.



More information about the dovecot mailing list