1
1
12bugaga2020-04-15 21:05:09
Parsing
12bugaga, 2020-04-15 21:05:09

How to parse with python requests?

Everyone knows perfectly well the main library for parsing in python, these are requests. So the question arose, I need to access the site every 5-7 seconds, it is obvious that the site will perceive such frequent requests as ddos ​​and restrict access to the site. Is it possible, for example, to open the site once and constantly read information from it, so as not to establish a connection every 5 seconds? I tried to access the site from Tor, a captcha arrives immediately, that is, not an option, TorCrawler is also by (the same Tor, only a little on the side).

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
Sergey Karbivnichy, 2020-04-15
@hottabxp

1) 5-7 seconds is a lot, so this is not a ddos ​​attack. Many commercial companies that scrape websites (including mvideo, ozone and other stores) scrap one product every 5-7 seconds.
2)

Is it possible, for example, to open the site once and constantly read information from it, so as not to establish a connection every 5 seconds?
There is, if only the site gives information via websocket (not encrypted).
3) Either load the page every 5-7 seconds, or you can look in devtools, maybe the site gives information via xhr request. In this case, it might be easier to parse.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question