M
M
Max2016-12-01 04:01:24
PHP
Max, 2016-12-01 04:01:24

How to limit the number of page requests by a user - protect the site from downloading?

Good day!

There is a site of the photographer, English-speaking. Main traffic from Google/Pinterest. Bad guys download the entire site, create a large outgoing traffic that sometimes exceeds the real traffic of users by a dozen times, load the server (download in 70 streams!), And then use the photos on other sites and for their own purposes.

I know that real users do not look at the site for more than 50 pages, but when there are 10,000 requests, everything is clear.

Please tell me how to protect. Perhaps there is a limitation at the Nginx, apache, php level? A script in htaccess or a plugin for Wordpress to limit the number of requests from one IP (or user-agent), but at the same time to allow all search engine bots to run without problems ... Registering IP search engines in Htaccess is not a panacea. The IP of search engine bots can change, not giving the bot a page will be epicfail.

Optimal would be a temporary ban set for a certain period in case of exceeding N requests tied to IP/Useragent.

Answer the question

In order to leave comments, you need to log in

4 answer(s)
P
Philipp, 2016-12-01
@wtfowned

Google has a clear reverse zone in DNS
https://support.google.com/webmasters/answer/80553...
Yandex can also be identified
https://yandex.com/support/webmaster/robot-working...
I would recommended to use Crawl-delay
https://yandex.com/support/webmaster/controlling-r...
There are more modules to control the number of active connections
nginx.org/en/docs/http/ngx_http_limit_conn_module.html
Plus add a watermark to images .
There are more bullies like these www.fleiner.com/bots/#trap
You can do an interesting thing, after 30 page requests from one IP in less than a minute, just display a captcha. A person can easily solve this, he has cookies. Put him a cookie, give access to another 30 pages. Set up robots via Crawl-delay so that they don't bomb more than once every 10 seconds and everything will be fine. Googlebot is configured through the webmaster interface.

P
polifill, 2016-12-01
@polifill

And also close the site from search engines?
;)
And they create a lot of traffic, a bunch of requests.
And just in a row wool - all 100500 pages
Read instructions from search engines.
Let's say Yandex says how to detect their bot:
1. By User-agent (it can be faked, so other checks are needed)
2. By reverse DNS from IP, determine the DNS name.
3. And by the received name to determine the IP
Yandex writes - this protects against fakes.
I don’t know about Google, read, search. It's probably possible too.
Firstly, a competent engine does not load the server at all.
I am developing a website with about 7,000 photos.
I pay for photo hosting 15 rubles per month (the engine is hosted separately) with attendance of about 2000 uniques per day.
Look for a problem in your engine.
Secondly, if they are dragged away to non-Russian sites, then these are Western guys - and there they are very severely punished for copyright infringement. Complain to their host, Pinterest, etc.
Thirdly, we, too, can be punished through Roskomnadzor.
Fourthly, this is what you dreamed about. This is success.
Make your logo on the photos - and don't worry.
This is free advertising.

X
xmoonlight, 2016-12-01
@xmoonlight

Here are the rules for .htaccess

P
Puma Thailand, 2016-12-01
@opium

nginx has an ip limit on the number of simultaneous connections

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question