[Dovecot] mailbox format w/ separate headers/data
In the future, it would be cool if there were a mailbox format (dbox2?) where mail headers and each mime part were stored in separate files. This would enable the zfs dedup feature to be used to maximum benefit.
In the zfs filesystem, there is a dedup feature which stores only 1 copy of duplicate blocks. In a normal mail file, the headers will be different for each recipient and the chances of the content of the message being able to be dedup'd are close to zero, because the differences in header length changes the block boundaries for the rest of the message. But if each mime part is stored in a separate file, you get massive compression "for free".
-frank
On Fri, 2010-01-22 at 15:53 -0500, Frank Cusack wrote:
In the future, it would be cool if there were a mailbox format (dbox2?) where mail headers and each mime part were stored in separate files. This would enable the zfs dedup feature to be used to maximum benefit.
This is more or less what dbox's single instance storage is going to do. Maybe in half a year or so.. And you don't even need filesystem deduplication feature. :)
It would also be possible to already write such Maildir feature. Someone on this list already wrote header/body separation code, which was pretty easy to do with a plugin.
In the zfs filesystem, there is a dedup feature which stores only 1 copy of duplicate blocks. In a normal mail file, the headers will be different for each recipient and the chances of the content of the message being able to be dedup'd are close to zero, because the differences in header length changes the block boundaries for the rest of the message. But if each mime part is stored in a separate file, you get massive compression "for free".
Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight.
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight.
zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes?
-frank
On Fri, 2010-01-22 at 16:09 -0500, Frank Cusack wrote:
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight.
zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes?
I don't have any evidence, but my logic goes like: Mail is written to disk once. Most users use a single client, which downloads the message once. Or maybe they're using webmail, and they read the same message approximately once (or maybe max. 1.1 times). In both cases read:write is about 1:1.
Index files are of course a different thing. They're read a lot more often. But dedup doesn't help with them.
On Fri, 2010-01-22 at 23:12 +0200, Timo Sirainen wrote:
I don't have any evidence, but my logic goes like: Mail is written to disk once. Most users use a single client, which downloads the message once. Or maybe they're using webmail, and they read the same message approximately once (or maybe max. 1.1 times). In both cases read:write is about 1:1.
Also if message is read close to after it was read, it's already in cache and won't have to be read from disk. In those cases read:write might be close to 0:1..
On Fri, 22 Jan 2010, Frank Cusack wrote:
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight.
zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes?
Sorry for the tangent, but I wonder if anyone here is running lots of Maildirs on zfs? I just recently started experimenting with it on our backups server (FBSD 8.0), and I really am liking it. I was also surprised at how my little 4 drive raidz volume performed in benchmarks - quite impressive.
I'd seen some comments here in the past that zfs+maildirs = bad. Anything to back that up? Any comparisons to UFS2 on FBSD?
For a number of reasons, running zfs on my main mail host would be very handy (backups and easy expansion being the two big ones).
Thanks,
Charles
-frank
On January 22, 2010 9:03:42 PM -0500 Charles Sprickman <spork@bway.net> wrote:
Sorry for the tangent,
You should probably start a new thread when changing the subject. Then you don't have to be sorry. :)
but I wonder if anyone here is running lots of Maildirs on zfs?
When you say "lots of Maildirs" I assume you mean filesystem-per-user? You can of course use "lots of Maildirs" yet have only a single zfs filesystem but that doesn't seem to me to be worth questioning.
I am running that way but it's less than 100 users so probably not what you would consider "lots".
I'd seen some comments here in the past that zfs+maildirs = bad.
I can't imagine why that would be the case. There are some problem loads for zfs (zfs-backed NFS writes, e.g.) but why maildir would be particularly singled out I wouldn't know.
For filesystem-per-user, if by "lots" you mean 1000 or 1000s then you have the problem that it takes forever to mount all of those filesytems on reboot. That's not a maildir-specific problem though.
-frank
On Jan 22, 2010, at 9:22 PM, Frank Cusack wrote:
but I wonder if anyone here is running lots of Maildirs on zfs?
When you say "lots of Maildirs" I assume you mean filesystem-per-user? You can of course use "lots of Maildirs" yet have only a single zfs filesystem but that doesn't seem to me to be worth questioning.
I am running that way but it's less than 100 users so probably not
what you would consider "lots".
I'm in the same usage range for my ZFS-backed mail server.
I'd seen some comments here in the past that zfs+maildirs = bad.
I can't imagine why that would be the case. There are some problem loads for zfs (zfs-backed NFS writes, e.g.) but why maildir would be particularly singled out I wouldn't know.
I'm doing everything on ZFS now (database loads, web services,
etc) and will never go back to UFS. (or ext3, etc) Zero problems,
with anything, ever.
For filesystem-per-user, if by "lots" you mean 1000 or 1000s then you have the problem that it takes forever to mount all of those
filesytems on reboot. That's not a maildir-specific problem though.
I'm running filesystem-per-domain; I've found that's a good way to
do it for my situation.
-Dave
-- Dave McGuire Port Charlotte, FL
On Jan 22, 2010, at 9:22 PM, Frank Cusack wrote:
On January 22, 2010 9:03:42 PM -0500 Charles Sprickman
<spork@bway.net> wrote:Sorry for the tangent,
You should probably start a new thread when changing the subject.
Then you don't have to be sorry. :)
I figured I was already drifting OT for this list, so... :)
but I wonder if anyone here is running lots of Maildirs on zfs?
When you say "lots of Maildirs" I assume you mean filesystem-per-user? You can of course use "lots of Maildirs" yet have only a single zfs filesystem but that doesn't seem to me to be worth questioning.
No, I just meant a large number of users using Maildir (rather than
mbox, dbox,
whatever else) on a single ZFS filesystem. Although filesystem per
user is
an interesting idea. When my personal box gets upgraded to FBSD 8.0,
I may
try that for fun.
I am running that way but it's less than 100 users so probably not
what you would consider "lots".I'd seen some comments here in the past that zfs+maildirs = bad.
I can't imagine why that would be the case. There are some problem loads for zfs (zfs-backed NFS writes, e.g.) but why maildir would be particularly singled out I wouldn't know.
I think this is the message that got stuck in my head:
http://www.mail-archive.com/dovecot@dovecot.org/msg25478.html
I *think* that when I was doing my massive week-long google binge on zfs
I read a few comments about zfs being "non-optimal" for email. It's
"teh
internets" though, and it could have been someone just talking out their
behind or it could be talking about a much earlier release of ZFS.
I have a small backups box that's got just 4 WD RE3 drives on it. The
benchmarks for this thing pretty much blew me away. We're not talking
top of the line hardware here and it was performing at least as well
as a
good Areca or 3Ware hardware RAID setup.
Anyhow, if I find more places to run ZFS in production and it seems
stable
enough, I'd like to try getting it running on my big mailserver at
some point.
Backing up from UFS to ZFS using rsync is fine, but ZFS send/recv
looks like
a far more interesting backup solution.
Charles
For filesystem-per-user, if by "lots" you mean 1000 or 1000s then you have the problem that it takes forever to mount all of those
filesytems on reboot. That's not a maildir-specific problem though.-frank
Charles Sprickman NetEng/SysAdmin Bway.net - New York's Best Internet - www.bway.net spork@bway.net - 212.655.9344
On January 29, 2010 1:19:33 AM -0500 Charles Sprickman <spork@bway.net> wrote:
Anyhow, if I find more places to run ZFS in production and it seems stable enough, I'd like to try getting it running on my big mailserver at some point. Backing up from UFS to ZFS using rsync is fine, but ZFS send/recv looks like a far more interesting backup solution.
Not for archival backups though. The zfs send stream is not guaranteed to be compatible with any other version of zfs than the one on the machine that generated it. Meaning if you have an archived send stream from 3 years ago and are trying to restore it onto a newer OS version, it might not work.
Of course you can archive the filesystem itself, if you are backing up to disk (that's what I do). Future versions of zfs ARE guaranteed to be able to read older zfs filesystems. But if you are saving the send stream onto tape or DVD or other media like that, as a stream, to restore it you MIGHT need an OS with the same version of zfs on it. zfs has already been through several versions, but I don't know what the compatibility of send streams between the versions are.
OK, we're really OT now! :)
-frank
On Fri, Jan 22, 2010 at 09:03:42PM -0500, Charles Sprickman wrote:
On Fri, 22 Jan 2010, Frank Cusack wrote:
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
Dunno about zfs, but I've heard that at least in one NetApp installation deduplication was way too heavyweight.
zfs dedup is pretty resources intensive -- for writes. For mail I suspect reads overwhelm writes?
Sorry for the tangent, but I wonder if anyone here is running lots of
Maildirs on zfs? I just recently started experimenting with it on our
backups server (FBSD 8.0), and I really am liking it. I was also
surprised at how my little 4 drive raidz volume performed in benchmarks - quite impressive.
We used to have our Maildirs on ZFS but we've moved to ext3. ZFS worked reasonably well, except for the days when it slowed down to less than 10% of normal throughput. After a reboot or a couple of days of slow running it would perform normally again. This was on Solaris 10, at most a couple of months behind on patches. I had read the ZFS evil tuning guide, and the ZFS best practices guide, but they didn't help. It wasn't just mail that was slow - listing the contents of a small directory could take over a minute. We're much happier since switching to ext3; I haven't worried about mail performance since.
-- John Tobin "No no no. You're supposed to test with -march=... -fomit-frame-pointer -ffancy-math -fuse-lots-of-resources-go-very-fast -fsacrifice-more-goats -fsummon-cthulu-if-that-helps as root at nice -20, preferably in single user mode and jumps should be aligned on pentagrams, not 8 byte boundaries. Definitely not use debugging :-)" -- Nicholas Clark, in perl6-internals
On January 22, 2010 11:05:22 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
On Fri, 2010-01-22 at 15:53 -0500, Frank Cusack wrote:
In the future, it would be cool if there were a mailbox format (dbox2?) where mail headers and each mime part were stored in separate files. This would enable the zfs dedup feature to be used to maximum benefit.
This is more or less what dbox's single instance storage is going to do. Maybe in half a year or so.. And you don't even need filesystem deduplication feature. :)
But if the mail system has to handle it, it only knows about mails written at the same time. For example, if postfix delivers mail with a single recipient per mail (the recommended config somewhere, not sure if recommended by postfix or by dovecot), dbox won't get the opportunity to dedup.
And for mails which are re-forwarded (pretty common occurrence), again dbox won't get the chance to dedup.
Or will there be a global index?
-frank
On 22.1.2010, at 23.14, Frank Cusack wrote:
This is more or less what dbox's single instance storage is going to do. Maybe in half a year or so.. And you don't even need filesystem deduplication feature. :)
But if the mail system has to handle it, it only knows about mails written at the same time. For example, if postfix delivers mail with a single recipient per mail (the recommended config somewhere, not sure if recommended by postfix or by dovecot), dbox won't get the opportunity to dedup.
Well, doing the multiple-recipients-at-a-time already works with v1.1+ with Maildir.
And for mails which are re-forwarded (pretty common occurrence), again dbox won't get the chance to dedup.
Or will there be a global index?
Yes. That's what dbox SIS is about. You have a global repository of (large) MIME parts, indexed by their SHA1 sum (or something).
On January 22, 2010 11:21:09 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
Or will there be a global index?
Yes. That's what dbox SIS is about. You have a global repository of (large) MIME parts, indexed by their SHA1 sum (or something).
In the case of zfs then, the filesystem may as well do the dedup'ing.
-frank
On 22.1.2010, at 23.39, Frank Cusack wrote:
On January 22, 2010 11:21:09 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
Or will there be a global index?
Yes. That's what dbox SIS is about. You have a global repository of (large) MIME parts, indexed by their SHA1 sum (or something).
In the case of zfs then, the filesystem may as well do the dedup'ing.
Or "dbox may as well do the deduping"? :) I guess it comes down to whose algorithm is fastest. I suppose they're more or less the same, if it's possible to tell zfs to dedup files only in /mail/attachments/ directory (I guess you can create a separate filesystem for that).
On January 22, 2010 11:44:07 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
On 22.1.2010, at 23.39, Frank Cusack wrote:
On January 22, 2010 11:21:09 PM +0200 Timo Sirainen <tss@iki.fi> wrote:
Or will there be a global index?
Yes. That's what dbox SIS is about. You have a global repository of (large) MIME parts, indexed by their SHA1 sum (or something).
In the case of zfs then, the filesystem may as well do the dedup'ing.
Or "dbox may as well do the deduping"? :) I guess it comes down to whose algorithm is fastest.
Yeah, I just meant that if dbox has a global hash list then either method should have similar overhead. zfs checksums every single block written anyway (regardless of dedup) so I think it would be faster vs dbox.
Of course dbox can be used on systems without zfs.
I would suggest that using zfs would give you more portability (mail files appear "normal" and copied or manipulated however you care to), however normal mail files do not separate the headers and the message parts so that isn't valid.
-frank
participants (5)
-
Charles Sprickman
-
Dave McGuire
-
Frank Cusack
-
John Tobin
-
Timo Sirainen