Greetings,
I noticed a while back someone posted a patch/plugin that allowed Dovecot to
use compressed mbox files. I'm now wondering how far that would put us from having compressed maildir? I have a server with more CPU than disk space, and while I can buy more HDD space, my backup solution doesn't make that practical.
It seems to me that when looking for a message file, if it ends in .gz unpack
it, and otherwise everything acts as normal. Worst case, this is one strcat() and a stat() slower to find.
Newly delivered messages could remain unpacked, and a cron job could come by whenever to compress old/large/un-looked-at-for-months messages. So, new and frequently referenced messages would be as fast as ever, and older messages would be slower.
I would love to dive in an do this myself, but 1- my time is very very limited,
working two jobs, and 2- I'm not running 1.0 yet, as it apparently still doesn't support my Thunderbird users tagging their messages (am I wrong? please tell me I'm wrong... I want to upgrade! :)
-- Curtis Maloney cmaloney@cardgate.net
Curtis Maloney wrote:
Greetings,
I noticed a while back someone posted a patch/plugin that allowed
Dovecot to use compressed mbox files. I'm now wondering how far that would put us from having compressed maildir? I have a server with more CPU than disk space, and while I can buy more HDD space, my backup solution doesn't make that practical.
It seems to me that when looking for a message file, if it ends in
.gz unpack it, and otherwise everything acts as normal. Worst case, this is one strcat() and a stat() slower to find.
Newly delivered messages could remain unpacked, and a cron job could come by whenever to compress old/large/un-looked-at-for-months messages. So, new and frequently referenced messages would be as fast as ever, and older messages would be slower.
in this case it'd be better to use some kind of compressed filesystem: http://lists.q-linux.com/pipermail/plug/2002-September/020687.html
-- Levente "Si vis pacem para bellum!"
Farkas Levente wrote:
in this case it'd be better to use some kind of compressed filesystem: http://lists.q-linux.com/pipermail/plug/2002-September/020687.html
Which might be of some use, if I were running Linux. I'm not.
Next?
-- Curtis
On Wednesday 15 June 2005 09:24, Curtis Maloney wrote:
Farkas Levente wrote:
in this case it'd be better to use some kind of compressed filesystem: http://lists.q-linux.com/pipermail/plug/2002-September/020687.html
Which might be of some use, if I were running Linux. I'm not.
What are you using? If its FreeBSD, you could use the md device.
http://www.freebsd.org/cgi/man.cgi?mdconfig
You can use this to create a vnode (file) backed memory disc, put your maildirs, and whatever else you like in it and it will be compressed.
Next?
-- Curtis
HTH,
Dominic GoodforBusiness.co.uk I.T. Services for SMEs in the UK.
On 166, 06 15, 2005 at 04:54:21PM +1000, Curtis Maloney wrote:
Greetings,
I noticed a while back someone posted a patch/plugin that allowed Dovecot to use compressed mbox files. I'm now wondering how far that would put us from having compressed maildir?
Been here, done that :) You can try the attached patch, it's a little ugly, but works on the production server for more than 2 months.
I have a server with more CPU than disk space, and while I can buy more HDD space, my backup solution doesn't make that practical.
It seems to me that when looking for a message file, if it ends in .gz unpack it, and otherwise everything acts as normal. Worst case, this is one strcat() and a stat() slower to find.
Newly delivered messages could remain unpacked, and a cron job could come by whenever to compress old/large/un-looked-at-for-months messages. So, new and frequently referenced messages would be as fast as ever, and older messages would be slower.
I would love to dive in an do this myself, but 1- my time is very very limited, working two jobs, and 2- I'm not running 1.0 yet, as it apparently still doesn't support my Thunderbird users tagging their messages (am I wrong? please tell me I'm wrong... I want to upgrade! :)
-- Andrey Panin | Linux and UNIX system administrator pazke@donpac.ru | PGP key: wwwkeys.pgp.net
Andrey Panin wrote:
On 166, 06 15, 2005 at 04:54:21PM +1000, Curtis Maloney wrote:
Greetings,
I noticed a while back someone posted a patch/plugin that allowed Dovecot to use compressed mbox files. I'm now wondering how far that would put us from having compressed maildir?
Been here, done that :) You can try the attached patch, it's a little ugly, but works on the production server for more than 2 months.
Thanks, Andrey.
You're the only person who hasn't taken this as an opportunity to tell me why I'm wrong or give me a quick lesson in admining. I wanted to float the idea, not have people try to solve my space issues.
Perhaps I should have phrased it as the space constraints inspired the idea... oh well.
Now I just need either to back-port this to 0.99.14, or for Timo to add keywords to 1.0. (keywords is responsible for .customflags, isn't it? nobody wants to confirm this for me ):
-- Curtis Maloney cmaloney@cardgate.net
Tomi Hakala wrote:
Curtis Maloney wrote:
Now I just need either to back-port this to 0.99.14, or for Timo to add keywords to 1.0.
But keywords are supported in test73.
Good point.
test73 sounds fairly stable (even if it's only a few days old)... but is there a way to migrate my existing keywords to 1.0? I'm not bringing down that wrath upon myself :)
-- Curtis Maloney
hi,
i think with you waste more space with not fully used clusters and stuff. e.g. i have: $ du -sh Maildir ; du -sh --apparent-size Maildir 134M Maildir 123M Maildir
this is not even my full mail archive. with gzipped maildir files this should get even worse as the files are even smaller and still block the cluster.
darix
Marcus Rueckert wrote:
hi,
i think with you waste more space with not fully used clusters and stuff. e.g. i have: $ du -sh Maildir ; du -sh --apparent-size Maildir
I tried this... Solaris du doesn't appear to have --apparent-size..
I see your point, but my users have an unfortunate tendency to keep e-mails with attachments (no matter how big a stick I wield), so there would be a definite saving there.
Since several of my users have over 1GB of mail ('top' user is now around 1.6GB), I think there is a potential to save considerable space.
Besides... when was the last time you saw an e-mail below 512bytes? There is _some_ saving to be made, if not a lot. But a little saving over a few million e-mails? Well, it all ads up.
-- Curtis Maloney cmaloney@cardgate.net
Hi Curtis,
I tried this... Solaris du doesn't appear to have --apparent-size..
FreeBSD also :-(
I see your point, but my users have an unfortunate tendency to keep e-mails with attachments (no matter how big a stick I wield), so there would be a definite saving there.
Since several of my users have over 1GB of mail ('top' user is now around 1.6GB), I think there is a potential to save considerable space.
Besides... when was the last time you saw an e-mail below 512bytes? There is _some_ saving to be made, if not a lot. But a little saving over a few million e-mails? Well, it all ads up.
Well i the "compressed" maildir system can add a check to see if file is > to FS block size... then try to compress it..
Anyway this can be a very good system for archiving purposes...
And IMHO I think it should be a good idea to add/merge the patch into main stream code (with a flag to allow compressed maildir)...
/Xavier
-- Curtis Maloney cmaloney@cardgate.net
Xavier Beaudouin wrote:
Well i the "compressed" maildir system can add a check to see if file is > to FS block size... then try to compress it..
In my original post I suggested a cron job compress the files, so it's easy to vary which files get compressed, and when. Something like (untested, and pre-coffee):
find $MAILPATH ! -name "*.gz" -type f -atime +90 -size +1 -exec /usr/bin/gzip -9 {} ";"
And IMHO I think it should be a good idea to add/merge the patch into main stream code (with a flag to allow compressed maildir)...
I certainly wouldn't object, and it sounds like Andrey and others like the idea, too.
-- Curtis Maloney cmaloney@cardgate.net
On 17.6.2005, at 02:50, Curtis Maloney wrote:
Xavier Beaudouin wrote:
Well i the "compressed" maildir system can add a check to see if file is > to FS block size... then try to compress it..
In my original post I suggested a cron job compress the files, so it's easy to vary which files get compressed, and when. Something like (untested, and pre-coffee):
find $MAILPATH ! -name "*.gz" -type f -atime +90 -size +1 -exec /usr/bin/gzip -9 {} ";"
The filename shouldn't change or Dovecot treats it as a new maildir file. The safest way would be something like:
touch dovecot-uidlist.lock gzip -9 cur/$name > tmp/$name mv tmp/$name cur/$name rm dovecot-uidlist.lock
The uidlist lock is needed so that filenames aren't renamed while compressing. Otherwise you could end up with two same basenames, one compressed and one uncompressed. That won't work very nicely.
And IMHO I think it should be a good idea to add/merge the patch into main stream code (with a flag to allow compressed maildir)...
I certainly wouldn't object, and it sounds like Andrey and others like the idea, too.
It's getting a bit difficult to decide what are plugins and what are not :)
Plugins wouldn't necessarily have to be even dynamically loadable. It could be possible to just have some plugins/builtin/ directory and Dovecot would compile everything in it inside the binary.
Curtis Maloney wrote:
would put us from having compressed maildir? I have a server with more CPU than disk space, and while I can buy more HDD space, my backup solution doesn't make that practical.
would't it be better if you compress only the backups ? In fact most backup solutions already do that automatically, so if you use such a compressed maildir plugin, you will gain hdd space but you won't gain any backup space since compressing two times usually doesn't improve compression ratio ?
Matthieu
matthieu imbert wrote:
would't it be better if you compress only the backups ? In fact most backup solutions already do that automatically, so if you use such a compressed maildir plugin, you will gain hdd space but you won't gain any backup space since compressing two times usually doesn't improve compression ratio ?
You're assuming I don't already. I do. I want to save space in the live system.
-- Curtis Maloney cmaloney@cardgate.net
if space is an issue then mbox is a better choice than maildir... a lot less space wasted on directory/inode metadata and less space wasted on the tails of every message. if you've got a 4096 byte block size filesystem without fragments then you're wasting 2048 bytes per message.
depending on the avg message size in your maildirs you might not even save much at all due to the tail problem.
-dean
On Wed, 15 Jun 2005, Curtis Maloney wrote:
Greetings,
I noticed a while back someone posted a patch/plugin that allowed Dovecot to use compressed mbox files. I'm now wondering how far that would put us from having compressed maildir? I have a server with more CPU than disk space, and while I can buy more HDD space, my backup solution doesn't make that practical.
It seems to me that when looking for a message file, if it ends in .gz unpack it, and otherwise everything acts as normal. Worst case, this is one strcat() and a stat() slower to find.
Newly delivered messages could remain unpacked, and a cron job could come by whenever to compress old/large/un-looked-at-for-months messages. So, new and frequently referenced messages would be as fast as ever, and older messages would be slower.
I would love to dive in an do this myself, but 1- my time is very very limited, working two jobs, and 2- I'm not running 1.0 yet, as it apparently still doesn't support my Thunderbird users tagging their messages (am I wrong? please tell me I'm wrong... I want to upgrade! :)
-- Curtis Maloney cmaloney@cardgate.net
participants (10)
-
Andrey Panin
-
Curtis Maloney
-
dean gaudet
-
Dominic Marks
-
Farkas Levente
-
Marcus Rueckert
-
matthieu imbert
-
Timo Sirainen
-
Tomi Hakala
-
Xavier Beaudouin