G
G
Gura2015-10-18 14:07:20
PHP
Gura, 2015-10-18 14:07:20

How to properly implement a PHP parser for large amounts of information?

Greetings to fresh toasts) There was another question, doubtful for me as a beginner. There is a source of how to extract information from there (exactly as I need), I figured it out. Everything works, even better than it should, but there is one caveat, when I process information in a loop of up to 50 pages, everything works fine. But I need to upload at least more than 200 pages, and in the future it may be more than 600. To reduce the load on the database, I initially throw everything into an array, then I load it with one big query into the database.
Palu code:


<?php
include 'settings/functions.php';
function curl( $url ) {
$ua=curl_init();
curl_setopt($ua, CURLOPT_URL, $url);
curl_setopt($ua, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($ua, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ua, CURLOPT_USERAGENT, 'User-Agent:Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.75 Safari/537.1\r\n');
curl_setopt($ua, CURLOPT_HTTPHEADER, array('Content-type: text/html'));
return curl_exec($ua);
}
require_once 'simple_html_dom.php'; // turn on the parser
$mass = array();
$more_info = $mysqli->query("SELECT id, tag, name FROM info WHERE source<>'' LIMIT 1");
while($info = mysqli_fetch_array($more_info)) {
$html = str_get_html(curl(' http://victim_site/ '.$info['tag'].'/'));
foreach ($html->find('main_tag') as $article) {
$id++;
foreach ($article->find('inner_tag') as $article) {
$mass[] = "('".$info['id']."','".$id."','". $article->find("time", 0)->plaintext."','".$article->find("span._title", 0)->plaintext."')";
}
}
}
echo implode(',', $mass);
if ($mass) {
$mysqli->query("INSERT INTO new_info (info_id, day,

There is an idea, to equate each page with a get parameter and pass the result to the session. Tell me guys, I really need an urgent solution.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey, 2015-10-18
@Gudzera

Message Queue, a log of processed pages, a daemon that will process all this stuff, a script that will queue tasks for processing. It is also possible, for flexibility, to transfer the recording of results to the database to a separate queue (or generally store temporarily in some kind of redis). Then more scope for scaling.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question