How to pretend to be a browser and download a page?

H

Herman Martin2018-10-25 19:39:40

PHP

Herman Martin, 2018-10-25 19:39:40

Actually there is a code, I want to parse 1 page, but it stubbornly fails:

$cookie_jar = Yii::$app->getBasePath() . '/web/_ssuper_cccoookkie.txt';
//echo $cookie_jar;

$c = curl_init();
$url = Yii::$app->params['parse2_siteUrl']; // https://technopoint.ru/catalog/17a89a0416404e77/materinskie-platy/       
curl_setopt($c, CURLOPT_URL, $url);
curl_setopt($c, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.87 Safari/537.36");
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLINFO_HEADER_OUT,1);
curl_setopt($c, CURLOPT_HEADER,1);
$referer = 'https://technopoint.ru';
curl_setopt($c, CURLOPT_REFERER, $referer);
//curl_setopt($c, CURLOPT_NOBODY,1);

// #1
$cookie = 'PHPSESSID=cb29e0b7be713cef7f3fb98bf1dd1209;';
//curl_setopt($c, CURLOPT_COOKIE, $cookie);

curl_setopt($c, CURLOPT_FOLLOWLOCATION, true );
curl_setopt($c, CURLOPT_AUTOREFERER, true );
curl_setopt($c, CURLOPT_COOKIESESSION, true );
curl_setopt($c, CURLOPT_COOKIEJAR, $cookie_jar);
curl_setopt($c, CURLOPT_COOKIEFILE, $cookie_jar);
$cohh = [];
$cohh[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8';
// последующие 2 строки вызывают появление кракозябр
//$cohh[] = 'Accept-Encoding: gzip, deflate, br';
//$cohh[] = 'Accept-Language: ru,en-US;q=0.9,en;q=0.8';
$cohh[] = 'Cache-Control: max-age=0';
$cohh[] = 'Connection: keep-alive';
$cohh[] = 'Upgrade-Insecure-Requests: 1';
//$cohh[] = 'Cookie: PHPSESSID=eceeb02bb74949d27e02bfb7b932de4e; city_path=astrahan; path=/; domain=.technopoint.ru';

curl_setopt($c, CURLOPT_HTTPHEADER, $cohh);

curl_setopt($c, CURLOPT_VERBOSE,1);
$curl_fn = "curl_errors.txt";
$curl_log = fopen($curl_fn, 'w');
curl_setopt($c, CURLOPT_STDERR, $curl_log);

$page = curl_exec($c);
$curl_info = curl_getinfo($c);
echo $page;
echo Debug::d($curl_info['request_header'],'CURL_INFOOO');

curl_close($c);

Need a tip, what else do you need to slip kurl to download the page?
It seems that I am doing the same thing that the browser does, I looked through the debugger, what is sent and inserted into the curl.
Z.Y. Manual way - to parse any page from this site, I open the value in $url from the browser, then manually copy the PHPSESSID. After copying it once, it works for about 9-12 hours.
But, it's not serious, to do it manually

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Alexander, 2018-11-13
@alexandrraizer

Most likely, cookies are generated there using js, you can once access the resource through selenium to get cookies, and use them to pull the desired page with a curl. Let's say 1 thousand operations were done again, cookies were received, etc.