S
S
Sergey Voronezhev2013-08-01 10:12:48
Text Processing Automation
Sergey Voronezhev, 2013-08-01 10:12:48

Compare two text files, excluding duplicate lines

The task will probably seem strange, but I hope that someone can help.

And so, there are two *.txt files

. The first is "base.txt":

01
02
02

03
04
05

Second "exceptions.txt":
04
08
15
16
23
42


Is it possible to make lines with identical content automatically removed from the first file ?
Only regardless of whether the match is on the same lines in order.
Ideally, of course, so that duplicates in the first file are also automatically deleted, but this is no longer critical.

image

The result should be like this: "base.txt"
01
02
03
05


Can you recommend any suitable software or plugin? The solution must be found within the OS: Windows 7 / Ubuntu / Centos. And a plugin for any program, as long as it works.

Thank you.

UPD. for the time being, grep saved, now it swears at the exception file (2000 thousand lines) Regular expression too big

Answer the question

In order to leave comments, you need to log in

4 answer(s)
@
@sledopit, 2013-08-01
_

Well, you're unlikely to find such a plugin. The task is quite specific. But it’s very easy to make one-liners for this case:

grep -vf exceptions.txt base.txt | sort -u

will give you a cleared base.txt without duplicates.
if you need to save it back to base.txt, but don't want to add > base.txt at the end, you need to work through a temporary file:
grep -vf exceptions.txt base.txt | sort -u > base.tmp ; mv base.tmp base.txt

Although it is not very clear from the condition whether exceptions.txt should also be cleared of the same occurrences. If necessary, then the logic will change.

M
Maxim, 2013-08-01
@might

I recommend WinMerge software

I
Ilya Evseev, 2013-08-01
@IlyaEvseev

A bike:

#!/usr/bin/perl

use strict;
use warnings;

die "Usage: $0 filtered.txt filter.txt\n" if @ARGV != 2;

my %filter;

open F, $ARGV[1] or die "Cannot open $ARGV[1]: $!\n";
while(<F>) {
    chomp;
    $filter{$_} = 1;
}
close F;

open F, $ARGV[0] or die "Cannot open $ARGV[0]: $!\n";
while(<F>) {
    chomp;
    print "$_\n" unless $filter{$_};
    $filter{$_} = 1;
}

D
DancingOnWater, 2013-08-01
@DancingOnWater

diff??
en.wikipedia.org/wiki/Diff

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question