YandexAccessibilityBot doesn't sign itself for a reverse request?

A

Alexander Apokin2019-02-19 13:46:34

PHP

Alexander Apokin, 2019-02-19 13:46:34

Faced with the need to catch the site's parsers.
I would not want to accidentally ban search robots.
Wrote a script that analyzes the behavior of requests from an IP address.
I mark robots with a reverse request, as here
. There are several IPs with user_agent YandexAccessibilityBot in the logs, but not confirmed by a reverse request.
At the moment there is not a single confirmed IP with user_agent YandexAccessibilityBot
Bots that are normally defined:
YandexBot/3.0
YandexMobileBot/3.0
YandexImages/3.0
and a whole bunch
of YandexAccessibilityBot are not determined in principle by a reverse request?
Using PHP
$ptr=gethostbyaddr($ip);
gethostbyname($ptr);
ip who subscribed, but were not confirmed by the return request:
178.154.155.102
178.154.155.101
5.45.211.60
5.45.216.109
5.45.216.110
5.45.211.61
Punching ip through services here we get information that ip belongs to Yandex.
What can you say about this situation?
If you have confirmed ip with this bot, then discard ip.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Alexander Apokin, 2019-02-23
@apokin

At the moment, I take into account that it is possible that YandexAccessibilityBot either intentionally, or because of a bug, does not sign itself for a reverse request. Using the service, I mark these ip as Yandex bots.
The problem is currently identified only with this bot. All other bots of Yandex Google mail, etc. determined okay.
As I understand it, the situation is normal and such bots are not taken into account in the search. Who thinks what?
Answer from Yandex:
Hello!
The search engine has a very large number of different robots, in addition, various other Yandex services can also visit sites and make requests to them. You can find the list of search engine robots here:
https://yandex.ru/support/webmaster/robot-workings... .
At the same time, visits to such robots really should not affect the display of the site in the search. For example, some robots interpret robots.txt in a special way, so bans on visiting pages using the Disallow directive can be ignored by them. However, prohibited links should not be included in the search.
The IP address specified earlier does not belong to the indexing robot, so the page document received for the request should not be included in the search results.

A

Anatoly Denisov, 2019-02-20
@Wildcorsa

I specifically looked, all YandexAccessibilityBot requests come from IP addresses on different subnets. All IP addresses have a reverse DNS zone of the form XX-XXX-XX-XXX.spider.yandex.com. So it's pretty well defined.