Answer the question
In order to leave comments, you need to log in
How to process a huge text file?
Good evening. What is the fastest or easiest way to accomplish this task:
There are 2 huge text files. In 1 - 70 million lines, in 2 - 60 million lines.
The task is to extract to file 3 all lines from file 1 that do not occur in 2. That is, extract all unique lines from the first in relation to the second file.
Answer the question
In order to leave comments, you need to log in
Sort both files .
Then it's simple:
read line by line from the files -> compare-> write / skip -> read more lines as needed. Very similar to merge sort.
An example easily adapted to your task is described at the very beginning of the book: The D Programming Language.
It's very simple there. I recently did something similar, although there were not a million lines, but about 700 thousand.
Hashes, external sort and compare hashes, then search by hash
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question