[Dovecot] deliver saving mails with hard linking
http://dovecot.org/tmp/deliver-multiple.diff for Dovecot v1.1 implements -p <path> parameter for deliver, which reads the input mail from the specified path instead of stdin. With maildir and hardlink copying enabled, it also tries to hard link the file to destination instead of copying it.
So to get the same mail delivered to multiple recipients using hard links, you'd have to write a small wrapper script that stored the mail temporarily somewhere and then called deliver for each recipient using the -p parameter.
Any thoughts on if this should go to v1.1 like this? Some problems:
- If you're not using a single UID/GID you probably have to make the file world-readable.
- A wrapper script is kind of ugly, so it would be nice if deliver took a list of all recipients as input. But the deliver code is currently ugly enough that this would require larger changes.
- Am I forgetting something?..
On Sun, 2008-06-01 at 22:32 +0300, Timo Sirainen wrote:
http://dovecot.org/tmp/deliver-multiple.diff for Dovecot v1.1 implements -p <path> parameter for deliver, which reads the input mail from the specified path instead of stdin. With maildir and hardlink copying enabled, it also tries to hard link the file to destination instead of copying it.
Wow. ;)
- Am I forgetting something?..
Cleaning up? Quoting the part from option (2) of your previous mail, which this seems to implement:
All messages could be then stored in some global directory and hard linked from there to users' mailboxes.
When reading that I already wondered about cleaning up and freeing disk space. If every recipient deleted their own "copy of the mail", the inodes link count will go down to 1 due to the still existing global copy. But it will survive despite being "deleted" (from the collective users POV), occupying disk space -- and possibly keeping data around that is assumed to be removed.
Of course, one could periodically check the link counts in yet another external script, carefully ensuring deliver is not currently doing its job, and get rid of the stale copies...
guenther
-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
On Jun 1, 2008, at 11:59 PM, Karsten Bräckelmann wrote:
- Am I forgetting something?..
Cleaning up? Quoting the part from option (2) of your previous mail, which this seems to implement:
All messages could be then stored in some global directory and hard linked from there to users' mailboxes.
When reading that I already wondered about cleaning up and freeing
disk space. If every recipient deleted their own "copy of the mail", the inodes link count will go down to 1 due to the still existing global copy. But it will survive despite being "deleted" (from the collective users POV), occupying disk space -- and possibly keeping data around that is assumed to be removed.
I think my previous mail about it described some persistent uniqueness
checks. This patch is only about delivery-time hard linking. If two
different deliveries sent the same message they would be stored using
different files. So the deliver wrapper script would be like:
cat > tempfile deliver -p tempfile -d user1 deliver -p tempfile -d user2 rm -f tempfile
The result would be that user1 and user2 had the same file with link
count 2 and the file is gone when both of them delete it.
On Mon, 2008-06-02 at 00:06 +0300, Timo Sirainen wrote:
I think my previous mail about it described some persistent uniqueness
checks. This patch is only about delivery-time hard linking. If two
different deliveries sent the same message they would be stored using
different files. So the deliver wrapper script would be like:cat > tempfile deliver -p tempfile -d user1 deliver -p tempfile -d user2 rm -f tempfile
The result would be that user1 and user2 had the same file with link
count 2 and the file is gone when both of them delete it.
Right. Coincidentally, I just got back to check the patch for unlink... Walking around indeed does help thinking, even though I guess you got more space than me. ;-)
If either deliver right before finishing or the wrapper script deletes the source, cleaning up will not be an issue.
guenther
-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Sorry, posting in chunks today.
On Mon, 2008-06-02 at 00:06 +0300, Timo Sirainen wrote:
I think my previous mail about it described some persistent uniqueness
checks.
You didn't mention persistence, but uniqueness checks with SHA-1 sums stored in some database. Assuming databases are designed to be dynamic and not constantly growing, the hashes would be removed when the mail is being deleted (for the last time).
Unless it really is about persistence, which is not the request by the OP, option (2) lacks cleaning up.
Just something to keep in mind. Same for this patch, where the wrapper script should perform this, needing to be documented.
guenther
-- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1: (c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Timo Sirainen wrote:
http://dovecot.org/tmp/deliver-multiple.diff for Dovecot v1.1 implements -p <path> parameter for deliver, which reads the input mail from the specified path instead of stdin. With maildir and hardlink copying enabled, it also tries to hard link the file to destination instead of copying it.
Excellent, this could save lots of disk space.
Does this also work through the Sieve plugin, or is Sieve doing its own writing?
Anders
On Jun 2, 2008, at 10:37 AM, Anders wrote:
Timo Sirainen wrote:
http://dovecot.org/tmp/deliver-multiple.diff for Dovecot v1.1
implements -p <path> parameter for deliver, which reads the input mail from the specified path instead of stdin. With maildir and hardlink copying enabled, it also tries to hard link the file to destination instead
of copying it.Excellent, this could save lots of disk space.
Does this also work through the Sieve plugin, or is Sieve doing its
own writing?
Works with Sieve too.
Timo Sirainen wrote:
Works with Sieve too.
So, just to be sure, this patch will solve my problem described here: http://www.dovecot.org/list/dovecot/2008-March/029548.html ?
Thanks, Anders.
On Mon, 2008-06-02 at 13:10 +0200, Anders wrote:
Timo Sirainen wrote:
Works with Sieve too.
So, just to be sure, this patch will solve my problem described here: http://www.dovecot.org/list/dovecot/2008-March/029548.html ?
It should.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Sun, 1 Jun 2008, Timo Sirainen wrote:
The SHA1 idea took me:
If deliver calculates the hash for each mail, then hardlinks each mail into, say, $HOME{target user}/../.inspool/{sha1_hash} (to support if you have your users on different physical disks). If the hash already exists and the size is equal, hardlink the new recpient's file to it.
This cache would be cleaned by a cron script, e.g. once a day or something like that. Deliver could update the atime to reflect its last hard link or so.
The calculation overhead might be heavy.
But the "-p" approach seems to cry for to support LMTP.
BTW: For some accounts I run a nightly cron script that hardlinks any equal files, because they archive everything. I do first collect all mails sorted by size, then really compare them byte-by-byte if one mail is newer than one day.
Bye,
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFIQ8ETVJMDrex4hCIRAv/8AKCrBNsxHvpfAk0Zi17N+t+SgZy3BgCfTOl3 irK1VrncoMf874FywCrb6/o= =ZI+x -----END PGP SIGNATURE-----
On Jun 2, 2008, at 12:44 PM, Steffen Kaiser wrote:
The SHA1 idea took me:
If deliver calculates the hash for each mail, then hardlinks each
mail into, say, $HOME{target user}/../.inspool/{sha1_hash} (to
support if you have your users on different physical disks). If the hash already exists and the size is equal, hardlink the new
recpient's file to it.
I also thought about this, but is it ever useful? Isn't the Received:
header always different for different deliveries?
BTW. This also could be done by the wrapper script if really needed.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Mon, 2 Jun 2008, Timo Sirainen wrote:
I also thought about this, but is it ever useful? Isn't the Received: header always different for different deliveries?
In sendmail the Recieved header is added when the mail comes in. Then it is locally delivered, regardless if directly or from queue. When two mails with the same content, but different headers come in, a server cannot (or should not) consider the mails the same anyway, IMHO, otherwise the admin looses the possibility the backtrack the mail.
So the server is to stick to deliveries with more than one recipients. Therefore I mentioned LMTP in last mail. Well, just rememebered something: When the first delivery attempt tempfailed, e.g. because of out-of-quota, and the delivery is attempted next time, LMTP won't help to link the previously delivered mail.
This would be different, if headers and body would be storred separately, but not in Maildir.
====
BTW. This also could be done by the wrapper script if really needed.
Yeah, though that is a "wrapper" script ;) which slows down processing and increases server load.
Actually, I think, there are maintainance usages of "deliver -p", so if the SHA-1 algorithm is present, it would be good to have "-p", too. :)
Yet something else that (slightly) opposes the SHA-1 variant: If the mail has exactly one local recipient, there is no need to do the SHA-1 hashing. Different MTAs have different detection methods for this case, I guess. E.g. sendmail does not add "(for XYZ)" in the Recieved header. Here LMTP would help - or the performance decreasing wrapper :(
Bye,
Steffen Kaiser -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux)
iD8DBQFIQ+CJVJMDrex4hCIRAi5AAKDVFxhPgBOonXzOixULh5HXrFNnVACgjqI4 n/N/avuigJiNL98sznFfmhY= =YxwW -----END PGP SIGNATURE-----
participants (4)
-
Anders
-
Karsten Bräckelmann
-
Steffen Kaiser
-
Timo Sirainen