Answer the question
In order to leave comments, you need to log in
C#. Which checksum algorithm to choose?
Task: compare two files and determine that they are copies of each other.
File type - unlimited. (most of them are project files in different ide, documents, etc. The number of files is unlimited. (tested for 10,000 pcs.)
As I understand it, the best solution to this problem would be to find checksums and compare them. The main criterion is speed, which algorithm costs
PS I tried CRC32 and MD5, MD turned out to be about 2 times faster, but I think my implementation of CRC32 was not the best ...
Answer the question
In order to leave comments, you need to log in
MD5 and CRC32 do not ensure the absence of collisions, so it is incorrect to use only checksums. At least compare also the size, and at first it.
In fact, I would choose the algorithm for which you do not need to write an implementation by hand. Because the task is to compare two files, and not write a checksum calculation.
Actually, the table suggests itself in the form
"full file name"
"CRC"
"MD5"
and if the task does not prohibit sql, then the clone files will be perfectly found as
select * from table where MD5 in (
select MD5 from table group by MD5 having count(*)>1
)
order by MD5
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question