Answer the question
In order to leave comments, you need to log in
How to parse more than 90 pages on Simple HTML DOM?
When set to 30 or 60 , everything parses, but more than 90 pages gives an error
Fatal error: Call to a member function find() on boolean in E:\srv\OpenServer\domains\parser\index.php on line 51
<form method="POST">
<input name="url" type="text" value="<?=isset($_REQUEST['url'])?$_REQUEST['url']:'http://citymarket.ua/';?>"/><input type="submit" value="Пошел">
</form>
<?php
include 'simple_html_dom.php';
function request($url,$post = 0){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url ); // отправляем на
curl_setopt($ch, CURLOPT_HEADER, 0); // пустые заголовки
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); // возвратить то что вернул сервер
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // следовать за редиректами
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);// таймаут4
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__).'/cookie.txt'); // сохранять куки в файл
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__).'/cookie.txt');
curl_setopt($ch, CURLOPT_POST, $post!==0 ); // использовать данные в post
if($post)
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
class parser{
var $cacheurl = array();
var $result = array();
var $_allcount = 60;
function __construct(){
if(isset($_POST['url'])){
$this->parse($_POST['url']);
}
}
function parse($url){
$url = $this->readUrl($url);
if( !$url or $this->cacheurl[$url] or $this->cacheurl[preg_replace('#/$#','',$url)] )
return false;
$this->_allcount--;
if( $this->_allcount<=0 )
return false;
$this->cacheurl[$url] = true;
$item = array();
$data = str_get_html(request($url));
$item['url'] = $url;
$item['title'] = count($data->find('title'))?$data->find('title')->plaintext:'';
$item['text'] = count($data->find('img.item-image'))?$data->find('img.item-image')->src:'';
$this->result[] = $item;
if(count($data->find('a'))){
foreach($data->find('a') as $a){
$this->parse($a->href);
}
}
$data->clear();
unset($data);
}
function printresult(){
foreach($this->result as $item){
echo '<h2>'.$item['title'].' - <small>'.$item['url'].'</small></h2>';
echo '<p style="margin:20px 0px;background:#eee; padding:20px;">'.'<img src="'.$item['text'].'"/>'.'</p>';
};
exit();
}
var $protocol = '';
var $host = '';
var $path = '';
function readUrl($url){
$urldata = parse_url($url);
if( isset($urldata['host']) ){
if($this->host and $this->host!=$urldata['host'])
return false;
$this->protocol = $urldata['scheme'];
$this->host = $urldata['host'];
$this->path = $urldata['path'];
return $url;
}
if( preg_match('#^/#',$url) ){
$this->path = $urldata['path'];
return $this->protocol.'://'.$this->host.$url;
}else{
if(preg_match('#/$#',$this->path))
return $this->protocol.'://'.$this->host.$this->path.$url;
else{
if( strrpos($this->path,'/')!==false ){
return $this->protocol.'://'.$this->host.substr($this->path,0,strrpos($this->path,'/')+1).$url;
}else
return $this->protocol.'://'.$this->host.'/'.$url;
}
}
}
}
$pr = new Parser();
$pr->printresult();
Answer the question
In order to leave comments, you need to log in
https://github.com/chuyskywalker/rolling-curl
+
https://github.com/olamedia/nokogiri
I'm still amazed by the people
$data->clear(); // чистим - молодцы
unset($data);
I had a similar problem but with 1 page. The issue was resolved by increasing the value of the constant "define('MAX_FILE_SIZE', 600000)" in the file "simple_html_dom.php". For example: 60000000. It helped me. In my case, the file size was larger than the specified limit and its download was interrupted at 600000. Good luck. Experiment.
I found the solution here:
https://www.canbike.org/information-technology/php...
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question