N
N
Name2020-12-02 15:04:47
PHP
Name, 2020-12-02 15:04:47

How to make it clear that js is included and how to introduce yourself as a robot when parsing a page?

I'm trying to parse content like this:

$url = "https://site.com/url";
$ch = curl_init();
curl_setopt($ch, CURLOPT_AUTOREFERER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);  
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 3);     
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
$html = curl_exec($ch);
curl_close($ch);

echo $html;


in response comes: Please enable JS

how to make it clear js is enabled
and is it possible to pretend to be a yandex robot, for example, to fix this problem?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
Nadim Zakirov, 2020-12-02
@User782

Try to specify the same headers that a real browser would send.
Example:

<?php

// Указываем тип документа и кодировку:
header('Content-Type: text/html; charset=utf-8');

// Включаем отображение ошибок:

ini_set('error_reporting', E_ALL);
ini_set('display_errors', 1);
ini_set('display_startup_errors', 1);

// Адрес для парсинга:
$url = 'https://yousite.com';

// Создаём новый сеанс:
$curl = curl_init();

// Указываем адрес целевой страницы:
curl_setopt($curl, CURLOPT_URL, $url);

// О отключаем проверку SSL сертификата:
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curl, CURLOPT_SSL_VERIFYHOST, 0);

// Устанавливаем заголовки для имитации браузера:

$headers = [];
$headers[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9';
$headers[] = 'Accept-Encoding: identity';
$headers[] = 'Accept-Language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7';
$headers[] = 'Cache-Control: no-cache';
$headers[] = 'Connection: keep-alive';
$headers[] = 'Host: ' . parse_url($url)['host'];
$headers[] = 'Pragma: no-cache';
$headers[] = 'Sec-Fetch-Dest: document';
$headers[] = 'Sec-Fetch-Mode: navigate';
$headers[] = 'Sec-Fetch-Site: none';
$headers[] = 'Sec-Fetch-User: ?1';
$headers[] = 'Upgrade-Insecure-Requests: 1';
$headers[] = 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.193 Safari/537.36';


curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);

// Разрешаем переадресацию:
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, 1);

// Запрещаем прямяой вывод результата запроса:
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);

// Делаем сам запрос:
$result = curl_exec($curl);

// Завершаем сеанс:
curl_close($curl);

// Смотрим результат:
echo $result;

If the method does not help, write a link to the site, maybe I'll tell you what.

A
Anton Shamanov, 2020-12-02
@SilenceOfWinter

look at the ajax page requests in the browser, most likely the cookie is missing

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question