[Dovecot] Compressed mail archives
Hello, I have recently begun using IMAP to access my mails, and have found that the size of my maildir is quickly getting larger than I had anticipated. Googling around, I've seen a lot of messages related to compressing individual messages in a maildir. However, what I'm really looking for is a solution which will allow multiple messages to be compressed into a single compressed file, while still allowing those messages to be accessed via IMAP. The reasoning behind this is that most of my incoming mail consists of many very-similar messages (for example, it is not unheard of for me to receive 10,000 emails detailing the same error report over a weekend, with the actual "needs to be processed" parts stuck on the end of each message)
I'm expecting a simple "no, there's no way to do that", but I wouldn't mind being pleasantly surprised. My question again: Is there any way to keep a bundle of older mails compressed, so that they can enjoy the high compression of being very similar to each-other, while still allowing access via imap?
A solution may involve (just throwing out ideas here, which I have no idea how to implement) putting messages older than N days into an MBox file, and either somehow linking my account to that file (so that my account consists of [compressed readonly Mbox]+[live uncompressed Maildir]), or perhaps the first half of that, but accessing archived messages through a separate account. (Not the preferred way, but if it's the only way, I'll do it) But really I don't even know if it's possible to have dovecot support MBox and Maildir simultaneously, nor do I know enough about the MBox format to even know if the IMAP-style "multiple mailboxes" thing is possible in MBox.
Please excuse my no-doubt confused terminology and general air of doesn't-know-what-he's-talking-about. The internals of e-mail are generally not my thing, so I'd appreciate any pointing in the right direction which anyone may have to offer.
Any ideas?
On 2010-05-12 8:44 AM, Will Palmer wrote:
Googling around, I've seen a lot of messages related to compressing individual messages in a maildir. However, what I'm really looking for is a solution which will allow multiple messages to be compressed into a single compressed file, while still allowing those messages to be accessed via IMAP.
Heh... you might find this of interest:
http://wiki.dovecot.org/MailboxFormat/dbox
--
Best regards,
Charles
On 2010-05-12 4:56 PM, Charles Marcus wrote:
On 2010-05-12 8:44 AM, Will Palmer wrote:
Googling around, I've seen a lot of messages related to compressing individual messages in a maildir. However, what I'm really looking for is a solution which will allow multiple messages to be compressed into a single compressed file, while still allowing those messages to be accessed via IMAP.
Heh... you might find this of interest:
Sorry - specifically, the mdbox stuff, which is exactly what you are looking for... I don't think it is totally done, but I think it is very close...
--
Best regards,
Charles
On Wed, 2010-05-12 at 17:05 -0400, Charles Marcus wrote:
On 2010-05-12 4:56 PM, Charles Marcus wrote:
On 2010-05-12 8:44 AM, Will Palmer wrote:
Googling around, I've seen a lot of messages related to compressing individual messages in a maildir. However, what I'm really looking for is a solution which will allow multiple messages to be compressed into a single compressed file, while still allowing those messages to be accessed via IMAP.
Heh... you might find this of interest:
Sorry - specifically, the mdbox stuff, which is exactly what you are looking for... I don't think it is totally done, but I think it is very close...
does sound exactly like what I'm looking for, though that question-mark by the word "Compression" is a little less hopeful. I'll bite my lip for now and see that it's something that's at least being worked-on.
Single-instance attachment storage sounds very useful too, though I can't help but wonder if it can be extended to message bodies in general. After all, no sense in storing five copies of a message just because five people are on the list for it, right? Of course, that sounds suspiciously like premature optimization, but still like something that could help.
When using something like mdbox, does the smtp server (postfix, in my case) need to be made aware of it, or does some other process take responsibility for the conversion? (academic at this point, as I won't be upgrading to 2.x any time soon)
-- Will
Am 12.05.2010 um 23:49 schrieb Will Palmer:
When using something like mdbox, does the smtp server (postfix, in my case) need to be made aware of it, or does some other process take responsibility for the conversion? (academic at this point, as I won't be upgrading to 2.x any time soon)
This is completely transparent to the MTA as the mail storage will be handled by Dovecot. Mails will either be injected via the deliver command or the LMTP server in 2.x.
Regards Thomas
On 2010-05-12 5:49 PM, Will Palmer wrote:
On Wed, 2010-05-12 at 17:05 -0400, Charles Marcus wrote:
On 2010-05-12 4:56 PM, Charles Marcus wrote:
you might find this of interest:
Sorry - specifically, the mdbox stuff, which is exactly what you are looking for... I don't think it is totally done, but I think it is very close...
does sound exactly like what I'm looking for, though that question-mark by the word "Compression" is a little less hopeful. I'll bite my lip for now and see that it's something that's at least being worked-on.
Also forgot:
http://wiki.dovecot.org/Plugins/Zlib?highlight=%28compress%29
I'm reasonably sure you could use this with dbox/mdbox now...
Single-instance attachment storage sounds very useful too, though I can't help but wonder if it can be extended to message bodies in general.
It *could*, but personally I think that complicates the task dramatically while providing much less benefit. The vast majority of storage for email is due to binary attachments, not email body text.
Also, with compression, you get the best of both worlds (since text compresses so well...
After all, no sense in storing five copies of a message just because five people are on the list for it, right? Of course, that sounds suspiciously like premature optimization, but still like something that could help.
When using something like mdbox, does the smtp server (postfix, in my case) need to be made aware of it, or does some other process take responsibility for the conversion? (academic at this point, as I won't be upgrading to 2.x any time soon)
You would obviously want to use the dovecot LDA, so which MTA used is irrelevant...
--
Best regards,
Charles
On Thu, 2010-05-13 at 10:11 -0400, Charles Marcus wrote:
Also forgot:
http://wiki.dovecot.org/Plugins/Zlib?highlight=%28compress%29
Yeah, that's one of the things I was referring to when I mentioned solutions which apply only to single mails (though read-only archives would possibly benefit from inter-mail compression, I don't know how I'd set up "live" mails to use Maildir while archives use mbox)
Single-instance attachment storage sounds very useful too, though I can't help but wonder if it can be extended to message bodies in general.
It *could*, but personally I think that complicates the task dramatically while providing much less benefit. The vast majority of storage for email is due to binary attachments, not email body text.
That's just not the case here, as none of the messages I'm wishing to compress have an attachments. I'm talking about high-volume mails which sometimes need to be looked back on for reference, weeks or months later.The mails themselves are all often very similar, and are sent to several people.
Also, with compression, you get the best of both worlds (since text compresses so well...
Not sure what this is supposed to mean. Over the course of the past two weeks I've received 213MB of plain-text Emails, with no attachments. Compressing each of these results in 60% savings (du -h shows 85mb), but compressing the entire directory (tar.gz) results in 90% savings (ls -lh shows 22mb) And of course, any way you slice it, 20,000 mails, compressed or not, sent to five people, take up 100,000 mails worth of space if you don't share them between accounts. (perhaps they could even simply be hard-linked to the same file?)
I'm not saying this is a serious problem and we should all make it a priority to solve it: it's most certainly not. Worst case scenario we archive them somewhere else, this is purely about laziness and convenience, and of course my desire to play with a new toy to see what I can get it to do. But to hand-wave with "text compresses well" seems mathematically absurd. 5X > X.
Feel free to ignore this message, as I've passed the point of practical necessity long ago.
-- -- Will
On 12.5.2010, at 14.44, Will Palmer wrote:
Googling around, I've seen a lot of messages related to compressing individual messages in a maildir. However, what I'm really looking for is a solution which will allow multiple messages to be compressed into a single compressed file, while still allowing those messages to be accessed via IMAP.
mbox is the only possibility for that. mdbox supports compression, but only one message at a time.
But really I don't even know if it's possible to have dovecot support MBox and Maildir simultaneously, nor do I know enough about the MBox format to even know if the IMAP-style "multiple mailboxes" thing is possible in MBox.
It is. Just create two namespaces, one for maildir (prefix="") and one for mbox (prefix=archive/). See examples in http://wiki.dovecot.org/Namespaces
And of course, any way you slice it, 20,000 mails, compressed or not, sent to five people, take up 100,000 mails worth of space if you don't share them between accounts. (perhaps they could even simply be hard-linked to the same file?)
http://wiki.dovecot.org/LDA : -p <path>: Path to the mail to be delivered instead of reading from stdin. If using maildir the file is hard linked to the destination if possible. This allows a single mail to be delivered to multiple users using hard links, but currently it also prevents deliver from updating cache file so it shouldn't be used unless really necessary. (v1.1+)
participants (4)
-
Charles Marcus
-
Thomas Leuxner
-
Timo Sirainen
-
Will Palmer