Answer the question
In order to leave comments, you need to log in
How to setup parsing from html to csv (or sql) without 5xx server error?
During the execution of the php-script of the parser, at about the 300th record, the server issues a 5xx error. After that, the script can add another 500-600 records in the background (out of 30,000)
How to configure the parser so that it writes all 30,000 records without server errors?
include "simple_html_dom.php";
header('Content-type: text/plain');
$filename = 'name.csv'; //файл для записи csv
$file = "urls.txt"; //файл со ссылками на все 30000 статей
$fields = file($file, FILE_SKIP_EMPTY_LINES | FILE_IGNORE_NEW_LINES );
$fp = fopen($filename, 'a');
$i=1;
foreach($fields as $field) :
$url1 = $field;
//ссылка на статью для создания dom, надо метатеги и заголовок
$url2 = $field/content.html;
//ссылка на файл с контентом каждой статьи. Здесь только контент, без заголовка и метатегов
$content1 = @file_get_contents($url2);
$content1 = str_replace(array("\r\n", "\n", "<br />", "<br/>"), "", $content1);
$_content1 = addslashes($content1);
$html = new simple_html_dom();
$html = file_get_html($url1);
$title = $html->find('h1',0)->plaintext;
$_title = addslashes($title);
$metakey = $html->find( "meta[name=keywords]" );
$metadesc = $html->find( "meta[name=description]" );
$html->clear();
unset($html);
$metakey1 = $metakey[0]->content;
$metadesc1 = $metadesc[0]->content;
fputcsv($fp, array($i, $_title, $metakey1, $metadesc1, $_content1 ));
$i++;
endforeach;
fclose($fp);
Answer the question
In order to leave comments, you need to log in
How can I configure the parser to write all 30000 records without server errors?
To begin with, check what exactly is causing the 500 error (enable and see the php log, details in it).
Most likely, the max execution time has been exceeded, if the server is yours - you can increase it, if not - you need to divide the source file into pieces (300 records, ok, you say?) And execute the script in turn with each one.
It's also possible that parsing html with regexps will be faster than building the DOM.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question