N
N
Nubzilo2015-03-31 23:42:51
Algorithms
Nubzilo, 2015-03-31 23:42:51

How to process a huge text file?

Good evening. What is the fastest or easiest way to accomplish this task:
There are 2 huge text files. In 1 - 70 million lines, in 2 - 60 million lines.
The task is to extract to file 3 all lines from file 1 that do not occur in 2. That is, extract all unique lines from the first in relation to the second file.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
M
mikhail_404, 2015-03-31
@Nubzilo

Use hashing for this task.

B
bobrovskyserg, 2015-04-01
@bobrovskyserg

Sort both files .
Then it's simple:
read line by line from the files -> compare-> write / skip -> read more lines as needed. Very similar to merge sort.

B
beduin01, 2015-04-01
@beduin01

An example easily adapted to your task is described at the very beginning of the book: The D Programming Language.
It's very simple there. I recently did something similar, although there were not a million lines, but about 700 thousand.

E
Egor Kazantsev, 2015-04-02
@saintbyte

Hashes, external sort and compare hashes, then search by hash

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question