Answer the question
In order to leave comments, you need to log in
How to parse Yandex so as not to be banned?
We are working on a parser that collects certain information from the market.
Previously, there was a sub-network of IPs that shot back as they were banned and returned to the clip after the ban was released.
Now Yandex immediately bans the entire subnet. What to do?
Answer the question
In order to leave comments, you need to log in
I, for this kind of task (scanning several resources with ads), wrote a proxy parser, into which I threw links to various services with lists of proxies, sobbsno.
The general logic is as follows:
1. Once every five minutes we go around all the resources, collect the addresses and put them in the database.
2. Another script gradually checks them for anonymity, location and latency. The good ones are added to another table, the bad ones are marked as inappropriate (non-anonymous / country does not fit) and dangling. Hanging then can be checked again.
3. One more checker runs on the "good" table, which checks whether the proxies have died.
4. Well, here is our worker, which takes proxies with minimal latency and uses them.
Don't forget to rotate proxies.
There are a few more details, but they are not difficult to guess. On average, I constantly had about 40-60 proxies with a minimum delay. Banili quite often. There were no problems.
Legal (to get it you will have to lie and dodge in every possible way): Content API
Illegal: Antigate
1. Make a network of proxy servers based on ordinary cheap shared or vds hosting in different data centers.
2. Access the Internet through a provider with a non-fixed IP. As an option, through a 3G / 4G modem, where usually it is enough to restart the connection to change the IP. It is unlikely that they will ban an entire mobile operator.
3. Use anonymizing services.
4. Do not force Yandex with too fast a flow of requests.
You can search for all sorts of services with ready-made databases. For example ymscanner.com
By the way, I found yandex-parser on the github . Maybe it will help. As far as I understand, it uses XML
Here is another ApiSystem service for accessing the Market content API.
The prices are democratic, they work for a long time.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question