[Dovecot] mbox vs. maildir storage block waste

Christoph Anton Mitterer calestyo at scientia.net
Mon Oct 29 22:54:09 EET 2012


Hi.

I recently mentioned in several posts, that I'd tended to use mbox
rather than maildir, because you don't loose so much space (due to
always allocating full blocks per maildir file and thus per mail).


I made some tests of my archive, which consists of some 3,4 million
mails at a total of 42GB). Most of these mails are probably normal
sized, but there are also some with bigger attachments.


For those who are interested here are the results:


I used a 53687091200 B image file (via loop device) and tested ext4
only.
btrfs is IMHO not yet ready, I have had often issues with XFS
(corruptions), reiser4 is more or less dead and reiser3 is said to have
issues (see e.g. its wikipedia article, even though it has that mode for
small files which would fit nicely).

As you see the number of mails increased a bit, cause I tested over
several days... but this is only a very small increase so it shouldn't
change the numbers a lot.



1) Original mbox archives (right now in Evolution)
mbox exact space: 38122676224 (does not include meta-data)
mbox guess space: 44625670144 (includes Evolution meta-data which is several GBs)
mbox num mails:   3412999 (occurances of From_ lines)



In the following:
- image file, 1B-blocks, Used_begin, Used_end, Available_begin, Available_end
  result out of df -B 1
- mdir exact used space
  is the sum of du -B 1 for each regular file (i.e. each mdir file)
- mdir guess used space
  du -B 1 on the root dir of the filesystem
- mdir num mails:
  find . type -f | wc -l on the root dir of the filesystem


2) EXT4 with 4096 blocks:
image file:		53687091200
1B-blocks:		52844687360
Used_begin:		  188555264
Used_end:		45198778368
Available_begin:	49971777536
Available_end:		 2444972032

mdir exact used space: 	44810866688
mdir guess used space: 	45010243584
mdir num mails:   	3423296

delta:			 6.688190464 G
delta / mail:		1953 B


3) EXT4 with 2048 blocks:
image file:		53687091200
1B-blocks:		50324295680
Used_begin:		   82857984
Used_end:		41598846976
Available_begin:	47557083136
Available_end:		 6041094144

mdir exact used space: 	41323991040
mdir guess used space: 	41516007424
mdir num mails:   	3425033

delta:			 3.201314816 G
delta / mail:		934 B


4) EXT4  with 1024 blocks:
image file:		53687091200
1B-blocks:		50314834944
Used_begin:		   38287360
Used_end:		39909360640
Available_begin:	47592193024
Available_end:		 7721119744

mdir exact used space: 	39683908608
mdir guess used space: 	39871086592
mdir num mails:   	3425033

delta:			 1.561232384 G
delta / mail:		455 B


As you can see, the delta per mail is rather close to the statistically
expected values of 2048B, 1024B and 512B.



In the end I probably changed my opinion.
~7GB of wasted block space for all my mails is actually quite a lot, but
in days of cheap disk space it's acceptable.
And with mbox one has IMHO the major disadvantage that mailservers
(including dovecot) store some meta-data _in_ it (i.e. in the mails
themselves) , which I don't like a lot.
I still think about reports that mbox is much faster with full text
search (which sounds reasonable)... but therefore one needs probably and
database backend anyway.


HTH,
Chris.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5450 bytes
Desc: not available
URL: <http://dovecot.org/pipermail/dovecot/attachments/20121029/1287a423/attachment.bin>


More information about the dovecot mailing list