Answer the question
In order to leave comments, you need to log in
How to make it clear that js is included and how to introduce yourself as a robot when parsing a page?
I'm trying to parse content like this:
$url = "https://site.com/url";
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 3);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$html = curl_exec($ch);
curl_close($ch);
echo $html;
Answer the question
In order to leave comments, you need to log in
Try to specify the same headers that a real browser would send.
Example:
<?php
// Указываем тип документа и кодировку:
header('Content-Type: text/html; charset=utf-8');
// Включаем отображение ошибок:
ini_set('error_reporting', E_ALL);
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);
// Адрес для парсинга:
$url = 'https://yousite.com';
// Создаём новый сеанс:
$curl = curl_init();
// Указываем адрес целевой страницы:
curl_setopt($curl, CURLOPT_URL, $url);
// О отключаем проверку SSL сертификата:
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);
// Устанавливаем заголовки для имитации браузера:
$headers = [];
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9';
$headers[] = 'Accept-Encoding: identity';
$headers[] = 'Accept-Language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7';
$headers[] = 'Cache-Control: no-cache';
$headers[] = 'Connection: keep-alive';
$headers[] = 'Host: ' . parse_url($url)['host'];
$headers[] = 'Pragma: no-cache';
$headers[] = 'Sec-Fetch-Dest: document';
$headers[] = 'Sec-Fetch-Mode: navigate';
$headers[] = 'Sec-Fetch-Site: none';
$headers[] = 'Sec-Fetch-User: ?1';
$headers[] = 'Upgrade-Insecure-Requests: 1';
$headers[] = 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36';
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
// Разрешаем переадресацию:
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);
// Запрещаем прямяой вывод результата запроса:
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
// Делаем сам запрос:
$result = curl_exec($curl);
// Завершаем сеанс:
curl_close($curl);
// Смотрим результат:
echo $result;
look at the ajax page requests in the browser, most likely the cookie is missing
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question