On 24/08/2010 16:48, Timo Sirainen wrote:
Current implementation checks how many hard links are left for the hash while deleting it. If it's deleting the last reference then the final hashes/hash file is also deleted.
I sense an interesting race possibility here?
The hash is already a full hash of the message. I don't really like the idea of trusting that a hash is unique.
If SHA-1 becomes breakable in sensible time then you have a whole host of other attack vectors right now. I believe your mercurial repo is using SHA-1 hashes to detect tampering for example? (Also SSL, TLS, PGP, SSH and a bunch of other rarely used applications...)
At the moment SHA-256 is considered "good enough for the US government". SHA-3 should be out in a couple of years
Especially because this could be attacked against. Someone could read another user's attachment if they only knew its hash and then were able to create another file with the same hash and send it to themselves in the same system.
I can't argue that unknown security issues won't be found, because you can only talk about the known ones by definition...
That said I don't see that you can ever solve the de-duplicating problem if you don't trust your hash algorithm? At some point you are going to bite the bullet and say that attachment A and B have the same hash so lets hard link them together? At that point you are vulnerable to someone pulling off some way to disrupt your system if they can figure out how to generate attachments with arbitrary hashes?
At the moment I would claim that you are just automatically generating a very complicated filename. If you never trust your hash then you might as well instead simply use one of the existing GUID algorithms, if you trust your hash then you use that. I don't really see the point of a halfway house really?
I might make Dovecot trust the hash optionally anyway, but not unconditionally.
I don't really see how you can get around trusting the hashes at some point if you are de-duping?
SHA-1 will become breakable at some point for certain. I don't think that makes trusting SHA-1 hashes useless though. Various programming techniques can still be used to push out the life of this technique quite a bit further. For example:
- Compute relatively cheap secondary hash, eg even CRC32. Causing a
collision in two hashes is likely to be more difficult than a single hash - Check attachment length. Likely this will make it harder to generate a collision - You already commented that it's reasonably hard to access the hash in the first place (caveat idiots like me...) - Use SHA-2 or some other hash, as of right now there are no attacks against SHA-2, likely it has a few years life..?
Just a thought?
Cheers
Ed W