Answer the question
In order to leave comments, you need to log in
How to determine the fact of site parsing?
Greetings
Sometimes some people parse the site, articles from it, some even manage to parse the design. Is it possible to somehow track the parsing of a site at the program level and, for example, block the parsing process? Maybe someone has good advice or a good bookmarked article.
Answer the question
In order to leave comments, you need to log in
No. A well-functioning parser cannot be distinguished from a regular user(s).
Search bots also crawl websites.
If we are talking about a store, then there will always be parsing and tyrating.
If these are articles, then before publishing them on the site, you can first show them to Yandex so that it knows where the article first appeared.
The single request per IP limit is easily bypassed.
I'm thinking about this problem myself right now.
1) Blocking by ip in the first place
2) Dynamic content. I haven’t figured out how yet, but I need to somehow change the structure and block classes with each request. The parser is primarily focused on the structure of the page.
3) you can load important content with ajax
4) you can make important content with pictures. Like Avito, they show phones with pictures.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question