On Thu, 2006-06-01 at 09:10 -0400, Tore André Klock wrote:
Timo Sirainen wrote:
I think that could anyway be a good idea, but how about hash collisions? I could just ignore that since they would practically never happen. Hash that'd just slow it down unneededly.. Perhaps it should be an option.
- attachment size would be even safer. The only truly safe way would be to read the whole attachment from disk and compare it byte-by-byte, but
Collisions won't be a big problem if you use something like SHA, but it would be slow. You have to generate a checksum for both sides of the comparison, meaning you have to generate it at least once per message. Generating it always means reading every byte of it.
The delivered mail's every byte has to be read anyway, and for the stored attachment the filename would already contains the checksum. I don't think it takes too much extra time to calculate the attachment's checksum while it's being read.
A better solution might be for the LDA to detect which messages are being delivered locally to more than one user.
That's what I was originally thinking instead of checksums.
I could then make the message file shared (in the case of Maildir anyway), for example by hard-linking the files. The message would then exist on disk in one copy until each client have removed it (bringing the link count to 0).
One problem that I see with hardlinking maildir files is that then you can't have Delivered-To (or similar) header separate for each user. I don't know if that's a real problem though..