Re: [Dovecot] Architecture for large Dovecot cluster
Sven Hartge <sven@svenhartge.de> wrote:
Interesting datapoint: NetApp Deduplication did only recover about 1% of storage space with mdbox-based mail storage, while on an maildir-based mail storage, the rate was about 15%. (This was tested with a copy of real user data, so is accurate for my workload.)
Just a guess, but I expect the difference is because NetApp de-dupes by checksumming blocks and mark whole blocks as duplicates if they have the same checksum.
The message body has the same block offset in maildir (i.e. the start of a message is at byte 0), whereas mdbox might align message body anywhere in a block, so you might have 512 different block configurations for the same message.
I don't know whether message alignment would be a worthwhile optimization for mdbox.
Joseph Tam <jtam.home@gmail.com>
Joseph Tam <jtam.home@gmail.com> wrote:
Sven Hartge <sven@svenhartge.de> wrote:
Interesting datapoint: NetApp Deduplication did only recover about 1% of storage space with mdbox-based mail storage, while on an maildir-based mail storage, the rate was about 15%. (This was tested with a copy of real user data, so is accurate for my workload.)
Just a guess, but I expect the difference is because NetApp de-dupes by checksumming blocks and mark whole blocks as duplicates if they have the same checksum.
The message body has the same block offset in maildir (i.e. the start of a message is at byte 0), whereas mdbox might align message body anywhere in a block, so you might have 512 different block configurations for the same message.
True, the start of the message is always at byte 0, but because of different header length per user for the same message (different mail address with different lengths) the body will never start at the same byte.
In the end, a slight compression (gzip 3) via Dovecot resulted in better space savings than compression and deduplication via NetApp.
The most space can obviously saved via SiS of attachements in dovecot, but to be frank, this feature scares me a bit.
Grüße, Sven.
-- Sigmentation fault. Core dumped.
participants (2)
-
Joseph Tam
-
Sven Hartge