Answer the question
In order to leave comments, you need to log in
How to check the occurrence of a string in a large csv file?
There is a csv file with a total weight of 50mb+ and there is a list of keys (list of strings). It is necessary to determine the occurrence of each key in csv at the lowest cost. At the same time, csv is located on a remote server (github) and is regularly updated.
Example:
There is a key "O. Henry". You need to determine if there is at least one occurrence of this key in the csv file.
Answer the question
In order to leave comments, you need to log in
Well, there is only one adequate solution: stop using csv for other purposes, study any SQL database and use it.
Perhaps it will help someone:
So far I have found one option - parsing in a stream using scramjet . It does not work at lightning speed, but with the current file size it is quite tolerable, with about a hundred keys and ~ 50MB of csv databases, processing took less than a minute.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question