K
K
kappka2015-08-09 20:58:35
PHP
kappka, 2015-08-09 20:58:35

How to parse a large amount of data?

Hello. I'm trying to use PHP HTML DOM Parser to parse a lot of pages: initially about a hundred, I get it through file_get_html, find the right one and form an associative array that stores a hundred links, then again through file_get_html I run through the array of these links and get another hundred pages, in each of which I find by ~ 50 lines I need.
As a result, everything falls down and it takes minutes to get everything right.
How to be in such situations, what to use?

Answer the question

In order to leave comments, you need to log in

4 answer(s)
D
DevMan, 2015-08-09
@kappka

file_get_html? really?
discover parallel loading of documents,
then parse them locally in the background however you like.

D
Dmitry Entelis, 2015-08-09
@DmitriyEntelis

In addition to the rest of the speakers: instead of any house parsers, try using the usual preg_match_all and regular expressions.
Acceleration will be 10-100+ times most likely)

I
index0h, 2015-08-09
@index0h

set_time_limit(0);
ini_set('memory_limit', '512M');

A
Anton B, 2015-08-09
@bigton

Create a task queue in the form of a simple table in the database.
Write one script that will take the address from the table, download it, put it in a folder and end.
Write another script that will parse the downloaded file, extract links and put links in the table for the first script.
Run each script with cron and a small bash script N times per minute.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question