S
S
Sergey2015-11-03 18:40:09
linux
Sergey, 2015-11-03 18:40:09

How to compare 2 grep files?

There are 2 files in one million records in the other 500 million
both files have the same structure but differ in data Seven
columns separated by a coma $1,$2,$3,$4,$5,$6, $ 7 , ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…,ั…ั… ั…ั…ั…ั…ั…ั…ั…, ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…ั…,ั…ั…ั…ั…ั…,ั…ั… you need to find the lines in the second file (and output the entire line to the file) that have the same $1 and $3 but another $6 in the first file $6=q1 in the second $6=q1-q3 column $1 always starts with 9 there is a solution but script execution = week nano /tmp/comand #filtered file 2 removing the values โ€‹โ€‹of $6=q1 grep -E "^9.*q2|^9 .*q3" /tmp/file2.txt > /tmp/file22.txt
# Convert to executable file with search on columns 1 and 3 from source file
cat /tmp/file22.txt | awk -F"," '{print "grep -E \"" $1 ".*" $3 ".*q2" "|" $1 ".*" $3 ".*q3" "\"" " /tmp/file1 .txt >> /tmp/results.txt"}' > /tmp/comand
#clean file
cat /dev/null > /tmp/results.txt
save as /tmp/comand, make executable chmod +x /tmp/comand , run ./tmp/comand
help make it faster, or criticize my noob approach to smithereens, please push in the right direction.
PS if it were not necessary to output the entire line from file 2, I think the problem would be solved with
comm -2 file1 file2 > file3 but not a fact :)

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Saboteur, 2015-11-03
@Yestestvenno

in Perl:

#/usr/bin/perl
open(FILE1,"a.txt");
open(FILE2,"b.txt");

foreach $line (<FILE1>) {
 chomp $line;
 ($a, $b, $c, $d, $e, $f, $g)=split(",",$line);
  $array{$a}{$c}=$f;
}

foreach $line (<FILE2>) {
 chomp $line;
 ($a, $b, $c, $d, $e, $f, $g)=split(",",$line);
  if ($array{$a}{$c}!=$f) {
    print "$line, [differs from: $array{$a}{$c}]\n";
  }
}

Specify the smaller one as the first file.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question