How can I make the server think I'm not a robot?

I

Israpil Akhmedov2019-06-18 09:15:01

PHP

Israpil Akhmedov, 2019-06-18 09:15:01

Hello World.
I'm trying to parse a resource, but it resists it. Every now and then he thinks I'm a robot. I took all the headers from the browser - it did not help. Everything works from the browser, even if you spam endlessly, but here it doesn’t even want to delay.
The code itself:

/**
     * get запрос на внешний ресурс
     *
     * @param string $url Ссылка на ресурс
     * @param array $headers Дополнительные заголовки запроса
     *
     * @return bool|string false в случае ошибки или текст ответа от ресурса
     */
    public static function get(string $url, array $headers = []){
        $ch = curl_init();

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);

        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_COOKIEJAR, __DIR__.DIRECTORY_SEPARATOR."cook.txt");
        curl_setopt($ch, CURLOPT_COOKIEFILE, __DIR__.DIRECTORY_SEPARATOR."cook.txt");

        curl_setopt($ch, CURLOPT_COOKIESESSION, true);
        curl_setopt($ch, CURLOPT_HTTPHEADER, [
                "accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3",
                "accept-language: ru,en;q=0.9",
                "cache-control: max-age=0",
                "upgrade-insecure-requests: 1",
                "user-agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36"
            ] + $headers);
        curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 YaBrowser/19.6.1.153 Yowser/2.5 Safari/537.36");

        //curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);

        $result = curl_exec($ch);
        if(($error = curl_error($ch))){
            echo "CUrl вернул ошибку: ".$error;
        }

        curl_close($ch);

        return $result;
    }

The resource itself:
https://www.copart.com/public/data/lotdetails/solr...

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

AUser0, 2019-06-18
@AUser0

cURL has one feature that conforms to the HTTP standard, but that's what gets caught. During an HTTP request, cURL does not use the full URL (with the server name) like this: "GET http://site.org/path/file.ext?params HTTP/1.1". Here on this "absence of full URL" the server also catches.
Yandex captcha does this, and out of 100 requests, 2-3 randomly worked. I had to write all the cURL functionality myself via fsockopen()/fread()/fwrite(), with loading/saving cookies in the cURL file.