Why does the server not give the file via a direct link?

A

AnnaGrimes2016-07-28 19:19:58

PHP

AnnaGrimes, 2016-07-28 19:19:58

Hello! There is a site that I'm trying to parse and I have direct links to all the files that I need from this site and which were collected in advance (all graphic files - jpg). There is some kind of protection on the site, there is a specific number of files that can be downloaded from it, and if there is an excess, then just a void opens on these direct links. How can this be if I access direct links, what is now a graph on this link, and then emptiness? How to get around this limitation?
PS I don't have normal proxies, and everything I've tried is very slow, but nevertheless, everything works through them for some time and each limit you just need to change the proxy, but we are talking about a huge number of files and this method simply does not work.
How to bypass this protection? Help advice please!
Thank you!

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

V

Vadim Misbakh-Soloviev, 2016-07-28
@mva

The answer to the original question is easy. And thousands of ways from Web Application Firewall to NginX's built-in lua module.
Answering the question posted in the comments:

function getContent($url, $referer = null, $proxies = array(null))
    {
        $proxies = (array) $proxies;
        $steps = count($proxies);
        $step = 0;
        $try = true;
        while($try){
            // create curl resource
            $ch = curl_init();
            $proxy = isset($proxies[$step]) ? $proxies[$step] : null;
 
            curl_setopt($ch, CURLOPT_HEADER, 0);
            curl_setopt($ch, CURLOPT_REFERER, $referer);
            curl_setopt($ch, CURLOPT_USERAGENT, "Opera/9.80 (Windows NT 5.1; U; ru) Presto/2.9.168 Version/11.51");
            curl_setopt($ch, CURLOPT_URL, $url);
            curl_setopt($ch, CURLOPT_PROXY, $proxy);
            curl_setopt($ch, CURLOPT_TIMEOUT, 10);
            curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); //return the transfer as a string
            curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
 
            $output = curl_exec($ch); // get content
            $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE); // Получаем HTTP-код
 
            // close curl resource to free up system resources
            curl_close($ch);
 
            $step++;
            $try = (($step < $steps) && ($http_code != 200));
        }
        return $output;
    }

// offtopic: in general, I have a ready-made Lua parser, which changes both User-Agent and proxy for each request (and immediately parsing html into an array with data is attached to pull out the necessary elements in a loop), and saving csv from this .. But, as it is obvious, it (the parser) needs to be sharpened for the layout of each site :)
PS, and you also missed the tags. Your final question is about PHP, not NginX. Is not it so? :)