P
P
Printip2016-04-27 16:46:13
linux
Printip, 2016-04-27 16:46:13

Why is Diff not working?

There is a file a.txt in it 10 million records
There is a file b.txt in it 100 million records
Each record from a new line.
Problem a.txt minus b.txt
"Solution"
diff a.txt b.txt > c.txt
cat c.txt | grep ">" > result.txt
wc result.txt = 100 million records
When it should have been 90 million records.
Tell me what am I doing wrong?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
P
Printip, 2016-04-27
@Printip

Found the problem. It was necessary to sort the b.txt file first. Sorted and everything worked.
cat b.txt | sort

V
Vladimir Kuts, 2016-04-27
@fox_12

Is this command correct?
diff -w a.txt b.txt | grep -e '^>' | wc -l

S
Slava Kryvel, 2016-04-27
@kryvel

there is nothing to try now, but you can do it differently.
if my memory serves me:
grep -v -f b.txt a.txt

J
jcmvbkbc, 2016-04-27
@jcmvbkbc

If the records can be sorted try comm :
comm -2 -3 <(sort a.txt) <(sort b.txt)

There is a file a.txt in it 10 million records
There is a file b.txt in it 100 million records
Each record from a new line.
Task a.txt minus b.txt
...
should have been 90 million records.

You either have the number of lines in the files mixed up, or b.txt - a.txt. In the existing formulation, more than 10 million records cannot be obtained as a result.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question