[Dovecot] DoveCot compression to save HDD space
Hello,
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
It would be many 1000s email boxes and probably each using lot of space so would be easier if we can just compress emails somehow.
Thanks, Cristian
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
I'd *much* rather see work done on 'single-instance-storage' support, ala cyrus... that would do far more, in most setups, than compressing files - without adding any overhead, which compression definitely would do.
Please, please, please, Timo?
:)
--
Best regards,
Charles
On Wed, 14 Feb 2007, Charles Marcus wrote:
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
I'd *much* rather see work done on 'single-instance-storage' support, ala cyrus... that would do far more, in most setups, than compressing files - without adding any overhead, which compression definitely would do.
Please, please, please, Timo?
Oh, I agree... that would make dovecot absolutely perfect :) I would switch to dovecot with a pleasure in one location, where we are ofthen encountering troubles with cyrus - but because of lots of laarge emails with attachments sent together to hundreds of recepients we can't change it to anything else...
Greetz,
Jacek Osiecki joshua@ceti.pl GG:3828944 "To nie logika, to polityka" (c) Kabaret pod Wydrwigroszem 2006
On Wed, 2007-02-14 at 10:35 -0500, Charles Marcus wrote:
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
I'd *much* rather see work done on 'single-instance-storage' support, ala cyrus... that would do far more, in most setups, than compressing files - without adding any overhead, which compression definitely would do.
It's been somewhat planned for dbox format. No idea when though.
With maildir I suppose it's possible with hardlinks, but that's a bit kludgy.
I'd *much* rather see work done on 'single-instance-storage' support, ala cyrus... that would do far more, in most setups, than compressing files - without adding any overhead, which compression definitely would do.
It's been somewhat planned for dbox format. No idea when though.
I'll take 'planned' over 'no' any day... :)
With maildir I suppose it's possible with hardlinks, but that's a bit kludgy.
Do you know if that is how cyrus does it?
--
Best regards,
Charles
n Wed, 2007-02-14 at 11:42 -0500, Charles Marcus wrote:
With maildir I suppose it's possible with hardlinks, but that's a bit kludgy.
Do you know if that is how cyrus does it?
I think Cyrus also uses hardlinks. Well, the only problem is what to do when the destination users have different UIDs. Deliver would pretty much have to deliver the mail as root, or use some kludgy approaches. In any case the file would then probably have to be owned by root (or another specified user) and be read-only to the destination users.
I suppose it wouldn't be _too_ difficult to implement this though. Something like:
- save the file to some temporary file in a directory where only deliver has access
- change effective UID to destination user (keep the deliver group effective) and hardlink the file to the user's maildir
- change effective UID back to root
- process next user, goto 2
- Unlink the temporary file
This would however mean that there wouldn't be any Delivered-To header since all the mails will be identical.
Timo Sirainen wrote:
[snip]
Deliver would pretty much have to deliver the mail as root, or use some kludgy approaches.
Why not get dovecot to do the delivery; since dovecot already has rw access to all users' files? The LDA mearly connecting to dovecot and passing the email to it.
It might even be possible to get the MTA to connect directly to dovecot instead of having a separate LDA program.
Not that I know enough to be aware of the implications of what I've just suggested ;-)
Dick
On Wed, 2007-02-14 at 17:07 +0000, Dick Middleton wrote:
Why not get dovecot to do the delivery; since dovecot already has rw access to all users' files? The LDA mearly connecting to dovecot and passing the email to it.
No, Dovecot doesn't have access to all users' files. Only root has access to all files. Dovecot drops from root to a specific user before accessing the files. Of course a lot of installations use a single UID for all the users. In that case there isn't a problem, but since that's the less secure way to run Dovecot, I don't want to encourage it by offering features that work only that way.
Timo Sirainen wrote:
On Wed, 2007-02-14 at 17:07 +0000, Dick Middleton wrote:
Why not get dovecot to do the delivery; since dovecot already has rw access to all users' files?
No, Dovecot doesn't have access to all users' files. Only root has access to all files. Dovecot drops from root to a specific user before accessing the files.
Oops, yes; silly me :-)
Dick
Timo Sirainen wrote the following on 2/14/2007 8:59 AM -0800:
n Wed, 2007-02-14 at 11:42 -0500, Charles Marcus wrote:
With maildir I suppose it's possible with hardlinks, but that's a bit kludgy.
Do you know if that is how cyrus does it?
I think Cyrus also uses hardlinks. Well, the only problem is what to do when the destination users have different UIDs. Deliver would pretty much have to deliver the mail as root, or use some kludgy approaches. In any case the file would then probably have to be owned by root (or another specified user) and be read-only to the destination users.
I suppose it wouldn't be _too_ difficult to implement this though. Something like:
- save the file to some temporary file in a directory where only deliver has access
- change effective UID to destination user (keep the deliver group effective) and hardlink the file to the user's maildir
- change effective UID back to root
- process next user, goto 2
- Unlink the temporary file
This would however mean that there wouldn't be any Delivered-To header since all the mails will be identical.
How and when would these shared storage file attachments ever get deleted? If and when the last person that received the e-mail with the shared attachment deletes the e-mail message? Just curious as to how you would manage the shared storage from maintaining files forever.
Bill
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Bill Landry wrote:
- save the file to some temporary file in a directory where only deliver has access
- change effective UID to destination user (keep the deliver group effective) and hardlink the file to the user's maildir
- change effective UID back to root
- process next user, goto 2
- Unlink the temporary file
Permissions are stored in the inode (essentially, with the file itself), and not in the directory entry. As such, each hard link to the file would share the exact same permissions, so the above algorithm doesn't work to give each user their own permissions on the same underlying storage file.
If you use symlinks, that would work with the above algorithm. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFF00ZEhk3bo0lNTrURAgg6AJ9dA6fWeOD+n6oh0E1iQ8z2IxNIYwCdHV/S /kT4eIhN4pXKG5blYthaKDs= =q7vb -----END PGP SIGNATURE-----
On Wed, 2007-02-14 at 10:26 -0700, Stephen Warren wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Bill Landry wrote:
- save the file to some temporary file in a directory where only deliver has access
- change effective UID to destination user (keep the deliver group effective) and hardlink the file to the user's maildir
- change effective UID back to root
- process next user, goto 2
- Unlink the temporary file
Permissions are stored in the inode (essentially, with the file itself), and not in the directory entry. As such, each hard link to the file would share the exact same permissions, so the above algorithm doesn't work to give each user their own permissions on the same underlying storage file.
That's why I mentioned that the file would have to be owned by root or another special user. The file would need some world-readable permissions, which would be enough for IMAP access (there's no need to modify the file). Also the world-readability shouldn't be a problem since the maildir directory itself should be restricted to only the one user.
On Wed, 2007-02-14 at 09:14 -0800, Bill Landry wrote:
How and when would these shared storage file attachments ever get deleted? If and when the last person that received the e-mail with the shared attachment deletes the e-mail message?
Yep. That's how the hardlinks work. Users' quota would be updated whenever the user himself unlinks the mail.
This could be problematic with filesystem quota though, since no user actually owns the file so the quota isn't tracked for anyone (except possibly for that dummy share user). That could even be abused by Ccing a huge mail to yourself and some other user. No-one's quota gets increased..
How and when would these shared storage file attachments ever get deleted? If and when the last person that received the e-mail with the shared attachment deletes the e-mail message?
That is what would make the most sense to me... but you are right, there is potential for problems there...
--
Best regards,
Charles
On Wed, 14 Feb 2007, Timo Sirainen wrote:
n Wed, 2007-02-14 at 11:42 -0500, Charles Marcus wrote:
With maildir I suppose it's possible with hardlinks, but that's a bit kludgy.
Do you know if that is how cyrus does it?
I think Cyrus also uses hardlinks. Well, the only problem is what to do when the destination users have different UIDs. Deliver would pretty
Yes, however usually when we have cyrus it does not need user UIDs anymore and everything is delivered to virtual users' accounts. On our system, if anyone is allowed to get a shell account, he can access local mails with use of pine+local imap only.
I suppose it wouldn't be _too_ difficult to implement this though. Something like:
- save the file to some temporary file in a directory where only deliver has access
- change effective UID to destination user (keep the deliver group effective) and hardlink the file to the user's maildir
- change effective UID back to root
- process next user, goto 2
- Unlink the temporary file
I'm not sure if cyrus does it that way with different UIDs, but it's highly probable :)
This would however mean that there wouldn't be any Delivered-To header since all the mails will be identical.
When comparing this small issue to the vast of space saved on the hard drive
- i think it's worth it :)
This would be really great if it could have been implemented...
Greetz,
Jacek Osiecki joshua@ceti.pl GG:3828944 "To nie logika, to polityka" (c) Kabaret pod Wydrwigroszem 2006
With maildir I suppose it's possible with hardlinks, but that's a bit kludgy.
Do you know if that is how cyrus does it?
I think Cyrus also uses hardlinks. Well, the only problem is what to do when the destination users have different UIDs. Deliver would pretty much have to deliver the mail as root, or use some kludgy approaches. In any case the file would then probably have to be owned by root (or another specified user) and be read-only to the destination users.
Actually, the only thing that would make any significant difference in the amount of space used is implementing this only for attachments...
Wouldn't that be much easier, and bypass all of the UID and Deliver Header issues?
--
Best regards,
Charles
On 14.2.2007, at 22.04, Charles Marcus wrote:
I think Cyrus also uses hardlinks. Well, the only problem is what
to do when the destination users have different UIDs. Deliver would pretty much have to deliver the mail as root, or use some kludgy
approaches. In any case the file would then probably have to be owned by root (or another specified user) and be read-only to the destination users.Actually, the only thing that would make any significant difference
in the amount of space used is implementing this only for
attachments...Wouldn't that be much easier, and bypass all of the UID and Deliver
Header issues?
It wouldn't bypass UID problems, since there would still have to be
some way to share the attachments. It would fix the header issues
though.
Anyway, this is how I was planning on doing it for dbox, but it's not
possible with maildir format unless Dovecot does some weird things
(eg. replace the attachment MIME parts with some external MIME part
pointers, not worth the trouble and performance slowdowns in normal
operation I think).
Compression will save you storage space, but will cost you CPU processing power. If you have many users, you may want to trow a cheap raid1 in your server before upgrading your CPU to expensive a quad-core
But is a nice idea for a small workgroup server
Oliver
Host Expert wrote:
Hello,
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
It would be many 1000s email boxes and probably each using lot of space so would be easier if we can just compress emails somehow.
Thanks, Cristian
-- Oliver Schulze L. | Get my e-mail after a captcha in: Asuncion - Paraguay | http://tinymailto.com/oliver
Hello,
I am thinking on RAID1 and also on compression... To maximise number of users.
Cristian
Compression will save you storage space, but will cost you CPU processing power. If you have many users, you may want to trow a cheap raid1 in your server before upgrading your CPU to expensive a quad-core
But is a nice idea for a small workgroup server
Oliver
Host Expert wrote:
Hello,
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
It would be many 1000s email boxes and probably each using lot of space so would be easier if we can just compress emails somehow.
Thanks, Cristian
On Wed 14 Feb 2007 at 06:17PM, Host Expert wrote:
Hello,
I am thinking on RAID1 and also on compression... To maximise number of users.
Well, I know I'll come off sounding like a shill for the company I work for. But: consider deploying ZFS[1]. RAID and Compression are both built in features. Regardless of filesystem, on modern systems, there's not much performance penalty for using modest levels of compression-- in fact, some workloads counter-intuitively go *faster* due to the reduction in disk I/O!
-dp
[1] http://en.wikipedia.org/wiki/ZFS
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
Hello,
Have you checked out ZFS in Solaris?
It has compression in the filesystem; may be a good candidate for your mail directories.
Cheers.
On Wed, 14 Feb 2007, Host Expert wrote:
Hello,
I am thinking on RAID1 and also on compression... To maximise number of users.
Cristian
Compression will save you storage space, but will cost you CPU processing power. If you have many users, you may want to trow a cheap raid1 in your server before upgrading your CPU to expensive a quad-core
But is a nice idea for a small workgroup server
Oliver
Host Expert wrote:
Hello,
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
It would be many 1000s email boxes and probably each using lot of space so would be easier if we can just compress emails somehow.
Thanks, Cristian
Hello,
Didn't checked by some reasons:
- I don't have a "billion's" budget for Solaris
- I think are much easier to find dedicated servers for other OS than Solaris. Since i am not from USA and target is USA.
Cristian
Hello,
Have you checked out ZFS in Solaris?
It has compression in the filesystem; may be a good candidate for your mail directories.
Cheers.
On Wed, 14 Feb 2007, Host Expert wrote:
Hello,
I am thinking on RAID1 and also on compression... To maximise number of users.
Cristian
Compression will save you storage space, but will cost you CPU processing power. If you have many users, you may want to trow a cheap raid1 in your server before upgrading your CPU to expensive a quad-core
But is a nice idea for a small workgroup server
Oliver
Host Expert wrote:
Hello,
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
It would be many 1000s email boxes and probably each using lot of space so would be easier if we can just compress emails somehow.
Thanks, Cristian
Host Expert wrote:
Hello,
Didn't checked by some reasons:
- I don't have a "billion's" budget for Solaris
It's free, actually.... And it's open source.
- I think are much easier to find dedicated servers for other OS than Solaris. Since i am not from USA and target is USA.
OpenSolaris will run on most PCs... check out opensolaris.org for details.
Oblig. disclaimer: I get paid for working on OpenSolaris by Sun. I use dovecot on ZFS for both my home email server and my server at work.
- Bart Bart Smaalders barts@smaalders.net http://smaalders.net/barts
Hi,
Ok, thanks for comments.
How effective is ZFS ? I mean yeah, even Windows has file system compression... And what ? Is very poor...
Question is, what mailserver would be best to run with Solaris/ZFS ?
Cristian
Hello,
Have you checked out ZFS in Solaris?
It has compression in the filesystem; may be a good candidate for your mail directories.
Cheers.
On Thu 15 Feb 2007 at 08:55AM, Host Expert wrote:
Hi,
Ok, thanks for comments.
Note also that Apple is including ZFS in the next generation of MacOS, so Solaris will not be your only choice.
How effective is ZFS ? I mean yeah, even Windows has file system compression... And what ? Is very poor...
You've not said how much performance you are willing to trade for space savings.
The compression currently in ZFS is a variant of Lempel-Ziv with some tweaks. This has the advantage of being fast and compresses moderately well; for the curious, see http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/os/... In my personal experience, it is lightweight enough that it's hard to know it is present without benchmarking. And as I said before, it has the curious property of *accelerating* some workloads.
Adam Leventhal has done most of the work to include GZIP compression in ZFS, and we expect to see him check that in sometime "soon"; see http://blogs.sun.com/ahl/. The issue with GZIP is that it's not as fast. The good news is that you will be able to choose-- and since everything is an online operation, it's not too hard to switch things back and forth.
I fear that we are running the risk of hijacking this mailing list to talk about ZFS :) You may want to check out zfs-discuss@opensolaris.org and/or google for "zfs".
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
Host Expert wrote:
Hi,
Ok, thanks for comments.
How effective is ZFS ? I mean yeah, even Windows has file system compression... And what ? Is very poor...
sorry but that's just arrogance over ignorance (sigh). windows NT compression is robust, has been available since NT4.0 - i.e. about 10 years now, and depending on your app needs, an excellent trade-off for those who need it. the last time i checked, about 2% additional IO load for a busy fileserver with ext RAID5 array - i.e. 4000+ active smb sessions & around 1.3 overall compression factor.
i do not recall any compression-related issues (either in corruption, or in performance) in about 5 years of managing 500 0.5Tb+ boxes. you'll notice a little more pressure during backup however...
Question is, what mailserver would be best to run with Solaris/ZFS ?
Cristian
with a question like that I am sure you'll get as good an answer as your "poor performance" comment deserved.
question is, what are your needs? on a dovecot list, i'd imagine the answer is already a forgone conclusion :-)
is compression is done on the fly at the disk block level, then you lose a small % of CPU, but hopefully gain some back by reduced disk IO. you'll need to do extensive testing to see if _your_ situation really gains from this.
scorch
out of the frying pan & into the fire
On Thu 15 Feb 2007 at 09:08PM, scorch wrote:
Question is, what mailserver would be best to run with Solaris/ZFS ?
Cristian
with a question like that I am sure you'll get as good an answer as your "poor performance" comment deserved.
question is, what are your needs? on a dovecot list, i'd imagine the answer is already a forgone conclusion :-)
I was sort of thinking the poster was asking about SMTP server (i.e. postfix vs sendmail) even though the question was vague.
In that case: sendmail comes with Solaris, but Postfix, etc. work fine.
There is not an IMAP server which comes bundled with Solaris; in the future, there might be...
-dp
-- Daniel Price - Solaris Kernel Engineering - dp@eng.sun.com - blogs.sun.com/dp
On Wed, 2007-02-14 at 17:09 +0200, Host Expert wrote:
I am wondering if there are any possibilities to save HDD space used by emails by compressing content (not just attachements).
It would be many 1000s email boxes and probably each using lot of space so would be easier if we can just compress emails somehow.
With maildir? Dovecot already supports opening .gz mboxes, although in read-only mode with zlib plugin.
With maildir then.. Well, one problem would be that Dovecot would need to somehow figure out if the file is gzipped. Using some flag in the maildir filename would probably work. This would probably be pretty easy to implement directly into Dovecot's sources, but I'd rather want it as part of the zlib plugin. I'm not sure if it's possible to implement as a plugin currently.
In any case, this is pretty low priority feature to me..
participants (12)
-
Bart Smaalders
-
Bill Landry
-
Charles Marcus
-
Dan Price
-
Dick Middleton
-
Host Expert
-
Jacek Osiecki
-
Oliver Schulze L.
-
scorch
-
Stephen Warren
-
Tan Shao Yi
-
Timo Sirainen