Registriert seit: 3. Sep 2023
346 Beiträge
|
AW: Optimaler Hash-Algorithmus und Strategie für Dateivergleiche, Verzeichnisbaum
3. Mai 2024, 19:19
Hi,
Sorry if i am missing the subject,
I would like to suggest to skip any multi parameters, and use concatenation, example: MD5 could be more than enough and as "Delphi.Narium" mentioned use the file size this will make collision probability a lot smaller, but prefix the hash with the (aligned to 4 byte by zeros) size like this :
the MD5 hash for "This is MD5" = f6eda1d8b4b2dba89938db14285cf78a
Length("This is MD5") = 11 then your custom hash will be
0000000bf6eda1d8b4b2dba89938db14285cf78a
The advantage here :
1) removing the need for multiple parameters.
2) reduce the collision chance.
3) in its binary (non hex format) it can be used in Trees or DBs .. with only 20 bytes length.
4) if you have files size that need 64bit then use the lower 32bit only and it will be fine, this doesn't affect the collision chance, if there is too many big files then use 5 bytes for the file size, but this really not needed.
5) and as Stefan did point, use an optimized implementation for MD5.
Kas
|