Answer the question
In order to leave comments, you need to log in
How to organize a quick search on 78 million lines?
There is a csv file with about 78 million lines. How to quickly organize the search for these lines?
In practice, I'm only interested in one column, out of the six available in it. I can move this column into a word document if that makes searching faster.
I understand that, in theory, you need to load this file into RAM so that it is constantly open in it for the fastest search for strings.
It is necessary to achieve at least a few 10-20 million searches per second. What can you advise to solve this problem? What to store, how to look for, what is the best iron to use for this?
Preferred languages python or C#.
Answer the question
In order to leave comments, you need to log in
Depends on what search and what data.
Again - if there is a lot of data, then it is unlikely that everything will be loaded into RAM
. If by exact match, and they are all unique - use the hash of the table.
If they are sortable - sort and use binary search.
If you need a full-text / fuzzy search - it's easier to take a third-party DBMS.
In order not to fence the garden, it is enough to import into SQLite. Well, add an index to the desired column. If necessary, there is also a full-text search.
System.IO.MemoryMappedFiles
, which is about 30x faster than just reading from diskSystem.Collection.Generics
Dictionary
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question