I
I
Ivan2019-01-09 19:31:19
PHP
Ivan, 2019-01-09 19:31:19

How to organize batch addition of information to the database?

I wrote a heavy script that receives information from an XML file, then follows the link from the file and receives information from the site that I went to, returns the information received to the script and saves the data to the MySQL database.
There are thousands of addresses in the XML file from which you need to get additional information, which is why I encountered the problem of the script freezing, not due to lack of memory (the reason is not clear to me, there is enough free memory).
Out of 6000 addresses, in a couple of hours the script adds about 450 to the database and stops.
Please help me write a function, change the script so that it works all 6000 addresses without stopping.
I tried adding sleep(SLEEP); but could not correctly write a function for this.
The code:

<?php
set_time_limit(0);

header('Content-type: text/html; charset=utf8');
ini_set("display_errors", "On");
require_once 'config.php';

/* Вариант 2. 2 строки ниже для вставки только новых значений(не заменяя старых) */
$Query = "INSERT IGNORE INTO `xml`(`id_xml`,`date_xml`,`title_xml`,`link_xml`) VALUES";
$ODKU = ';';

$values = "";
$text = "";

function page_title($url){
    $fp = file_get_contents($url);
    if (!$fp) 
        return null;

    $res = preg_match("/<title>(.*)<\/title>/siU", $fp, $title_matches);
    if (!$res) 
        return null; 

    // Clean up title: remove EOL and excessive whitespace.
    $title = preg_replace('/\s+/', ' ', $title_matches[1]);
    $title = trim($title);
    return $title;
}

// Получаем XML файл

$file = '/var/www/test.xml'; // сохраняем файл XML

$read = simplexml_load_file($file); // получаем объект класса из файла

$xml = $read->Category->Club;

$count = count($xml); // кол-во элементов массива


for($i = 0; $i < $count; $i++){

$text = page_title('https://test.com='.$xml[$i]->Id.'');    

$date_xml = mysql_real_escape_string($xml[$i]->Id);
$title_xml = mysql_real_escape_string($xml[$i]->Title);
$id_xml = mysql_real_escape_string($xml[$i]->Description);
$link_xml = mysql_real_escape_string($xml[$i]->Title);

if (strlen($text) > 0) {
    $link_xml = mysql_real_escape_string($text);
} else {
    $link_xml = mysql_real_escape_string($xml[$i]->Title);
}

//    sleep(SLEEP);

  $values .= "('$id_xml', '$date_xml','$title_xml','$link_xml'),";
  /* заносим данные в БД если накопилось 1000 записей при подготовке запроса */ 
  if($i % 1000 == 0)
  {
    $values[strlen($values)-1]=' ';
    $res = mysql_query("$Query $values $ODKU");
    $values = "";
  }

}

if(strlen($values)>0)
{
  $values[strlen($values)-1]=' ';
  /* заносим данные в БД */
  $res = mysql_query("$Query $values $ODKU");    
}

echo 'Done';

Thank you!

Answer the question

In order to leave comments, you need to log in

2 answer(s)
A
Antony Tkachenko, 2019-01-09
@9StarRu

Firstly, you don't have mysql error handling
php.net/manual/ru/function.mysql-query.php
Secondly, use pdo/mysqli (fortunately musqli_* is backwards compatible and you can switch to it by global replacement for the project) - mysql-extension is outdated for a long time
Thirdly, try using XMLreader, because simpleXML can run into memory limits on large volumes.

N
NerfX, 2019-03-19
@NerfX

1) Try curl instead of file_get_contents
2) Try a different approach: store the addresses in an intermediate table, and then process them in several passes. Just pull from cron or ajax if there is a web interface.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question