Answer the question
In order to leave comments, you need to log in
How to parse a large amount of data?
Hello. I'm trying to use PHP HTML DOM Parser to parse a lot of pages: initially about a hundred, I get it through file_get_html, find the right one and form an associative array that stores a hundred links, then again through file_get_html I run through the array of these links and get another hundred pages, in each of which I find by ~ 50 lines I need.
As a result, everything falls down and it takes minutes to get everything right.
How to be in such situations, what to use?
Answer the question
In order to leave comments, you need to log in
file_get_html? really?
discover parallel loading of documents,
then parse them locally in the background however you like.
In addition to the rest of the speakers: instead of any house parsers, try using the usual preg_match_all and regular expressions.
Acceleration will be 10-100+ times most likely)
Create a task queue in the form of a simple table in the database.
Write one script that will take the address from the table, download it, put it in a folder and end.
Write another script that will parse the downloaded file, extract links and put links in the table for the first script.
Run each script with cron and a small bash script N times per minute.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question