Answer the question
In order to leave comments, you need to log in
Why searching for unique values in two Excel tables is slow and how to make it faster?
There are two CSV files with 100k records each. The task is to find the lines that are in the first file, but not in the second. The search procedure itself is assumed by the key of the records i.e.
File 1:
1
2
4
5
File 2
:
1
2
3
4
5
this ID is not in the first file, but it is in the second.
Now everything is done using Excel. People complain that such reconciliation takes a long time. How long I can't say. I think tens of minutes.
Hence the questions.
Why is everything so slow in Excel.
Will there be a difference in speed if the decision is made in C# or, for example, Python.
Will there be a profit if everything is unloaded into which SQLite thread and done using its means?
Answer the question
In order to leave comments, you need to log in
In short, if you transfer everything to SQL, the difference in speed will be huge.
yoksel has excel power query ("query editor" in Russian), just for such things.
and if you are now checking the rows in one table by searching on the second, you can greatly speed up the process by sorting the second table by this column, and using the value search with the third parameter (interval search) and an additional match check later.
in Russian Excel it's probably something like=ВПР(<искомое>;<таблица>;1;ИСТИНА)=<искомое>
Are the keys sorted?
Tens of minutes even for Excel is somehow slow.
It's like for each key of one file to run through all the keys of another.
If you take two sorted lists and walk through them, then the missing keys can be found in one pass - these are seconds for 100,000 entries.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question