W
W
weranda2017-03-30 21:53:05
Parsing
weranda, 2017-03-30 21:53:05

How to determine the fact of site parsing?

Greetings
Sometimes some people parse the site, articles from it, some even manage to parse the design. Is it possible to somehow track the parsing of a site at the program level and, for example, block the parsing process? Maybe someone has good advice or a good bookmarked article.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
V
Vasya Petrov, 2017-03-30
@VasyaPertrov

No. A well-functioning parser cannot be distinguished from a regular user(s).

S
Sanes, 2017-03-30
@Sanes

Limit the number of hits from 1 IP for a certain period of time.

E
Eugene Khrustalev, 2017-03-30
@eugenehr

Search bots also crawl websites.
If we are talking about a store, then there will always be parsing and tyrating.
If these are articles, then before publishing them on the site, you can first show them to Yandex so that it knows where the article first appeared.
The single request per IP limit is easily bypassed.

A
Alexander Petrov, 2017-03-30
@Mirkom63

I'm thinking about this problem myself right now.
1) Blocking by ip in the first place
2) Dynamic content. I haven’t figured out how yet, but I need to somehow change the structure and block classes with each request. The parser is primarily focused on the structure of the page.
3) you can load important content with ajax
4) you can make important content with pictures. Like Avito, they show phones with pictures.

D
Dimonchik, 2017-03-30
@dimonchik2013

it’s impossible for articles,
there are ways for Avito, but overheads for the process - for example, Googlebots and others are easily determined

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question