How to parse with python requests?

1

12bugaga2020-04-15 21:05:09

Parsing

12bugaga, 2020-04-15 21:05:09

Everyone knows perfectly well the main library for parsing in python, these are requests. So the question arose, I need to access the site every 5-7 seconds, it is obvious that the site will perceive such frequent requests as ddos and restrict access to the site. Is it possible, for example, to open the site once and constantly read information from it, so as not to establish a connection every 5 seconds? I tried to access the site from Tor, a captcha arrives immediately, that is, not an option, TorCrawler is also by (the same Tor, only a little on the side).

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

S

Sergey Karbivnichy, 2020-04-15
@hottabxp

1) 5-7 seconds is a lot, so this is not a ddos attack. Many commercial companies that scrape websites (including mvideo, ozone and other stores) scrap one product every 5-7 seconds.
2)

Is it possible, for example, to open the site once and constantly read information from it, so as not to establish a connection every 5 seconds?

There is, if only the site gives information via websocket (not encrypted).
3) Either load the page every 5-7 seconds, or you can look in devtools, maybe the site gives information via xhr request. In this case, it might be easier to parse.