Answer the question
In order to leave comments, you need to log in
Php Query + Curl + Pagination How to parse paginated pages?
First you need to organize the parsing of information from the google search engine. Everything is written and everything almost works.
There is a code, on other sites it goes through pagination. Not on google. In the code, we get the page elements by the selector and do something with them. Now I'm displaying on the screen to see the result.
public function pagination($url, $start, $end){
//Получение данных на странице с пагинацией
if ($start < $end) {
$file = file_get_contents($url);
$doc = phpQuery::newDocument($file);
foreach ($doc->find('#res') as $art) {
$art = pq($art);
$this->range = $art->find('#search');
echo '<hr>';
}
$next = 'https://www.google.com' . $doc->find('#nav a')->next()->attr('href');
var_dump($next);
if (!empty($next)) {
$start++;
$this->pagination($next, $start, $end);
}
}
}
Answer the question
In order to leave comments, you need to log in
load html as xml
DOMDocument::loadHTML - Load HTML from string
then use XPath to find required
DOMXPath::query - Executes given XPath expression
Approximately like this:
/* Находим ссылку на следующую страницу */
$next_page = pq($doc)->find('li.pagination__item--next > a')->attr('href');
$next_url = $base_url . $next_page;
/* Проверяем, есть ли следующая страница */
if (!empty($next_page)){
sleep(5);
urls_parser($next_url);
}
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question