M
M
miyelar2492020-01-05 15:08:35
Parsing
miyelar249, 2020-01-05 15:08:35

How to protect the site from parsing without hurting search robots?

I see the only option - a hard filter by ip, because all other methods that are offered on the Internet look naive.
We add only Google and Yandex to the white list of search robots.
We download all the ranges of ip subnets of search engines and check each request to the site for compliance, if the search engine is not limited, if 3 requests in 10 seconds with 1 ip - we give a captcha. How is that an option? And then questions immediately arise where to get the ip ranges of Google and Yandex?
Number of pages ~ 5 million.
UPD
, you can not download all ip, but perform a reverse dns request and check hosts with allowed ones + add these ip to the database to further minimize dns requests
https://yandex.ru/support/webmaster/robot -working...

Answer the question

In order to leave comments, you need to log in

5 answer(s)
I
Ivan Yakushenko, 2020-01-05
@kshnkvn

I remember that Roskomnadzor tried to block telegrams by ip, and as a result blocked everything except the telegram.
Nowhere you will find the range of ip-addresses used by Yandex and Google, especially since they change.
And yes, you can’t protect your site from parsing in any way, this is basically impossible.

X
xmoonlight, 2020-01-05
@xmoonlight

How is that an option?
Do not dig a hole for another - let him dig himself!
If you want to hide - please: display it only for authorized users.
1. Post a new article. (link - do not give it to anyone! From the search for your blog and sections / tags - also hide it for now).
2. Add a link to the sitemap. (The name of the sitemap file is non-trivial!)
3. Set a trigger to check the availability of the material in the PS search.
4. As soon as the article appears everywhere in the search results (it is indexed), open it to the public on your website.

R
Ranwise, 2020-01-05
@Ranwise

> 3 requests in 10 seconds with 1 ip - we give a captcha
on your page 10 pictures, a bunch of zhs and styles, and as a result, one user will not even be able to load the page, how will he fly into a ban? and after refreshing the captcha page? Well the site is down...

I
index0h, 2020-01-05
@index0h

How to protect the site from parsing without hurting search robots?

Close access for everyone by IP, except for bots. No other way.

V
Vladimir, 2020-01-05
@HistoryART

Forget it, everyone steals from each other, you can encrypt the code to make it a little more difficult for the parser to steal from you, at most) The same VK was not parsed just because of its size, not one computer can withstand such a load)
PS: And if and endure, it will be a long time, from the word very)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question