V
V
Vernyj2017-02-10 01:31:00
PHP
Vernyj, 2017-02-10 01:31:00

How to remove duplicate lines in a large list of files?

Goodnight. There are 4700 files with no format specified and 2Kb in size. I looked all over the Internet, but I did not find a program or a simple script in which you can specify the line number and break files through it for a duplicate, followed by deleting one of the copies of the duplicate file.
In advance, thank you very much!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
abcd0x00, 2017-02-13
@Vernyj

You need to write a function in which you give the paths to two files and the line number and it compares these files on this line and returns true / false (equal on the line, not equal on the line). And then when you have this function, you just take each pair of files and feed them into this function. If it returns true, delete the second file and substitute the next one for verification.
The calls will look like this:
For files file1.txt file2.txt file3.txt file4.txt
func(file1.txt, file2.txt, 3)
func(file1.txt, file3.txt, 3)
func(file1.txt, file4 .txt, 3)
func(file2.txt, file3.txt, 3)
func(file2.txt, file4.txt, 3)
func(file3.txt, file4.txt, 3)
There you will get a cycle within a cycle. The outer loop iterates over the left files. The inner loop iterates over the right files. And now the right files are deleted if they contain a duplicate.
Of course, there will be a problem: if the file has already been deleted as a right file, then it does not need to be checked as a left file with other right files. Therefore, in order not to get confused, you can not delete the right files, but simply write down the paths to them somewhere. And only then, after all, you can take these paths and delete files using them.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question