Answer the question
In order to leave comments, you need to log in
What is the problem with pagination parsing?
Hello, please tell me, I'm trying to parse the forum for educational purposes. The code works for those topics where there are less than 5-7 topics, and there is no forward arrow (I think)
Here is an example of a paginated link that parses all pages - https://www.forumhouse.ru/threads/425179/
Here is an example where there is a "next" arrow in the pagination and, accordingly, returns only 1 page.
https://www.forumhouse.ru/threads/102027/
Please tell me how to bypass it, thanks
Code
function parser($url, $start, $end){
if($start < $end){
$file = file_get_contents($url);
$doc = phpQuery::newDocument($file);
foreach($doc->find('.messageList') as $article){
$article = pq($article);
//$img = $article->find('.img-cont img')->attr('src');
$text = $article->find('.messageText')->html();
//echo "<img src='$img'>";
//echo $text;
echo '<hr>';
}
//перебираем пагинацию в теме
$next = $doc->find('nav .currentPage ')->next()->attr('href');
if( !empty($next) ){
$start++;
$full_url="https://www.forumhouse.ru/$next";
parser($full_url, $start, $end);
echo $full_url;
echo $next;
}
}
}
$url = "https://www.forumhouse.ru/threads/102027/";
$start = 0;
$end = 9;
parser($url, $start, $end);
Answer the question
In order to leave comments, you need to log in
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question