D
D
Dima Kim2016-07-21 13:03:43
linux
Dima Kim, 2016-07-21 13:03:43

How to check hash sum of files in linux?

Good afternoon, there are about 500 txt files in the directory, some of the contents are similar, you need to check the hash of the sum to identify similar files, how to implement this?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
K
Kirill Romanov, 2016-07-21
@Djaler

The hash sum will be the same only for absolutely identical files. Similar to her not to be tracked

D
Dmitry Shitskov, 2016-07-21
@Zarom

You can compare text files by content using the diff utility.
https://www.opennet.ru/man.shtml?topic=diff&catego...

V
Vladimir Martyanov, 2016-07-21
@vilgeforce

You can use fuzzy hashes like ssdeep

V
Vladimir Kuts, 2016-07-21
@fox_12

Be specific with the meaning of "similar".
Similar in meaning and similar minus, say, - the number of spaces - these are different things ....
You can only determine exact matches with hash sums.

V
vaut, 2016-07-21
@vaut

The sum hash can be calculated using md5sum (fast but collisions are possible) and sha256sum (considered reliable). There are other implemented algorithms but they are not commonly used.
The same hash sum is only with a complete match of the files: one bit changed and the hash sum is completely different. Collisions are possible with md5 (two files correspond to the same hash), but the chance of running into this is vanishingly small.
On the top, this can be done like this:

find -name  "*txt" -exec sha256sum {} \; | sort | uniq -D -w 65

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question