H
H
happycodecom2015-03-04 20:47:50
Hashing
happycodecom, 2015-03-04 20:47:50

How can you guarantee the uniqueness of the hash string for a particular file?

When developing a file hosting service, the question arose of quickly finding duplicates.
Each hashing algorithm has a chance of collisions due to a limitation on the length of the generated string.
Comparing the contents of files is expensive for large volumes, and different files can have the same sha1 / md5 amounts.
Can generate a long string from two/three or more algorithms?
How is it better?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
M
MiiNiPaa, 2015-03-04
@MiiNiPaa

Each hashing algorithm has a chance of collisions due to a limitation on the length of the generated string.
This is an integral part of hashing itself. If you want duplicates to be impossible in principle, the length of the hash must be at least as long as the length of the original file.
The chances of an accidental collision are so incredible that you can not even think about them.
In practice, it is enough to store a sufficiently long hash (Even SHA1 will do) plus, perhaps, the length of the file for a preliminary check for uniqueness (before hashing)

V
Vladimir Martyanov, 2015-03-04
@vilgeforce

SHA1 is quite unique. Collisions in MD5 are searched for in minutes, I have not heard of SHA1 and even more so SHA512.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question