I
I
Israpil Akhmedov2019-06-18 09:15:01
PHP
Israpil Akhmedov, 2019-06-18 09:15:01

How can I make the server think I'm not a robot?

Hello World.
I'm trying to parse a resource, but it resists it. Every now and then he thinks I'm a robot. I took all the headers from the browser - it did not help. Everything works from the browser, even if you spam endlessly, but here it doesn’t even want to delay.
The code itself:

/**
     * get запрос на внешний ресурс
     *
     * @param string $url Ссылка на ресурс
     * @param array $headers Дополнительные заголовки запроса
     *
     * @return bool|string false в случае ошибки или текст ответа от ресурса
     */
    public static function get(string $url, array $headers = []){
        $ch = curl_init();

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_COOKIEJAR, __DIR__.DIRECTORY_SEPARATOR."cook.txt");
        curl_setopt($ch, CURLOPT_COOKIEFILE, __DIR__.DIRECTORY_SEPARATOR."cook.txt");

        curl_setopt($ch, CURLOPT_COOKIESESSION, true);
        curl_setopt($ch, CURLOPT_HTTPHEADER, [
                "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
                "accept-language: ru,en;q=0.9",
                "cache-control: max-age=0",
                "upgrade-insecure-requests: 1",
                "user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36"
            ] + $headers);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36");

        //curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

        $result = curl_exec($ch);
        if(($error = curl_error($ch))){
            echo "CUrl вернул ошибку: ".$error;
        }

        curl_close($ch);

        return $result;
    }

The resource itself:
https://www.copart.com/public/data/lotdetails/solr...

Answer the question

In order to leave comments, you need to log in

1 answer(s)
A
AUser0, 2019-06-18
@AUser0

cURL has one feature that conforms to the HTTP standard, but that's what gets caught. During an HTTP request, cURL does not use the full URL (with the server name) like this: "GET http://site.org/path/file.ext?params HTTP/1.1". Here on this "absence of full URL" the server also catches.
Yandex captcha does this, and out of 100 requests, 2-3 randomly worked. I had to write all the cURL functionality myself via fsockopen()/fread()/fwrite(), with loading/saving cookies in the cURL file.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question