T
T
Thegaar2016-03-11 15:48:07
PHP
Thegaar, 2016-03-11 15:48:07

Curl and heavy body how to get?

Guys I'm trying to parse a page with 1000+ products. cURL crashes with an error

Proc with args = ['php', 'parser.php', 'https://site.ru/products'] exited with code = 255
stderr = PHP Warning:  Module 'curl' already loaded in Unknown on line 0
PHP Fatal error:  Call to a member function find() on a non-object in /data/Projects/website/app/parsers/parser.php on line 42
``

Here is the parser code
function parse_site($url, $cookie){

    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_HEADER, 0); 
    curl_setopt($ch, CURLOPT_NOBODY, 0); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
    curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 900);
    curl_setopt($ch, CURLOPT_TIMEOUT, 900);
    curl_setopt($ch, CURLOPT_COOKIE, $cookie);
    $curl_body = curl_exec($ch);  
    curl_close($ch);
  
    $html_body = str_get_html($curl_body);

    if(!$html_body->find(CONST_ITEMS_TAG))               //42 строчка
      return getErrorMsg(1, CONST_ITEMS_TAG);     //43 строчка
    
    foreach($html_body->find(CONST_ITEMS_TAG) as $items){
                   .........парсим
                }
             }

I use simple html dom for body parsing. What is the jamb? I suspect cURL can't get the body properly... because if there are less than 100 products on the page, everything is fine.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
M
Max, 2016-03-11
@AloneCoder

Well, look what you have back in $curl_body

D
Dimonchik, 2016-03-11
@dimonchik2013

set PHP max_execution_time to the same 900-1000
set gz headers so that it is transmitted in compressed form (does not always roll)

E
Everal, 2016-03-11
@Everal

add more RAM for puffing)

T
ThunderCat, 2016-03-11
@ThunderCat

simple html dom is unrealistically tight, set the execution time of the script more, but rather throw this shit out and use regular expressions. Perhaps the problem is in the speed of uploading from the server, but for starters, just run the script without simple html dom. For starters, I would just parse into files on the local host and parse them with local scripts.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question