Timo Sirainen put forth on 2/5/2010 6:36 PM:
So, compressed mdboxes take 8.5% less space. This was with regular gzip compression with default level. With bzip2 -9 compression the difference was 10%.
Any thoughts on if 8-10% is significant enough improvement to make seeking performance worse? Or perhaps I should just implement both ways.. :)
Given the cost of mechanical storage today (1TB for less than $100 USD) I can't see why anyone would want to implement compression. The cases I can think of would be folks using strictly SSD (if there are any), those doing backups, or very large sites. Then again, I'm thinking most such backup solutions implement their own compression anyway so it makes no difference in that case except possibly LAN/SAN bandwidth in moving compresses vs uncompressed data.
I would think only really large sites would consider compression. 10% space savings for 1 million mailboxen might add up to some significant storage hardware dollar savings, not to mention the power savings. This is just a guess as I've never worked in such an environment. If a projected infrastructure build out is calling for a $1 million back end clustered shared storage array for mailboxen (think NetApp, IBM, SGI), and this compression cuts your number of required spindles by 10%, that's potentially a $100,000 savings. In today's economy, folks would be seriously looking at keeping that $100,000 in their pocket book.
Very large sites would probably want maximum compression while retaining maximum performance. You didn't state the CPU burn difference between the two methods, or the total CPU burn for either method. If one burns 50% CPU and the other 60%, on a loaded system, say 500 concurrent users, the relative difference is minor, but both are so horrible WRT CPU that no one would use them. If the relative load is 10% for the first method and 12% for the other, then I'd say some people would gladly adopt the 2nd, slightly less efficient method.
-- Stan