16 Aug
2024
16 Aug
'24
7:27 a.m.
Pedro Ribeiro wrote:
Ooops, we are using SIS, guess the solution for a similar optimization will be a native deduplicated filesystem.
A non-deduplicated filesystem is fine considering the current hash-based folder structure. Just:
- Switch to a hash with no known collision method (i.e. not sha1)
- Run a crontab that causes all files with the same hash in filename to be hard linked together.
This kind of "skip byte-by-byte" thinking was there since the initial implementation of SIS1, but was never added for some reason.
I currently do a much, much reduced version of that: I just run a
crontab to get jdupes
to do its byte-by-byte comparison over the
attachment directory, daily. Duplicates get hard linked.
PS: I swear the documentation for sis-queue is wrong.
Sincerely, Mingye Wang