I
I
Ilya19882019-05-08 13:37:58
Parsing
Ilya1988, 2019-05-08 13:37:58

What is the problem with pagination parsing?

Hello, please tell me, I'm trying to parse the forum for educational purposes. The code works for those topics where there are less than 5-7 topics, and there is no forward arrow (I think)
Here is an example of a paginated link that parses all pages - https://www.forumhouse.ru/threads/425179/
Here is an example where there is a "next" arrow in the pagination and, accordingly, returns only 1 page.
https://www.forumhouse.ru/threads/102027/
Please tell me how to bypass it, thanks
Code

function parser($url, $start, $end){
  if($start < $end){
    $file = file_get_contents($url);
    $doc = phpQuery::newDocument($file);
    foreach($doc->find('.messageList') as $article){
      $article = pq($article);

      //$img = $article->find('.img-cont img')->attr('src');
      $text = $article->find('.messageText')->html();			

      //echo "<img src='$img'>";
      //echo $text;
      echo '<hr>';
    }
    //перебираем пагинацию в теме
    $next = $doc->find('nav .currentPage ')->next()->attr('href');
    if( !empty($next) ){
      $start++;
      $full_url="https://www.forumhouse.ru/$next";
      parser($full_url, $start, $end);
      echo $full_url;
      echo $next;
    }
  }
}
$url = "https://www.forumhouse.ru/threads/102027/";
$start = 0;
$end = 9;
parser($url, $start, $end);

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question