D
D
d0dge2020-11-23 14:00:07
Python
d0dge, 2020-11-23 14:00:07

How to bypass blocking by ip (via requests + tor)?

Good afternoon!

It is necessary to parse some data, the site rests and bans by ip. I found such an article about parsing with the help of Thor.
https://habr.com/ru/company/ods/blog/346632/
But there are non-working moments from paragraphs "2.2 Thor is the son of Odin" and "2.3 The first path" of this article.

def get_html(x):
    UserAgent().firefox
    socks.set_default_proxy(socks.SOCKS5, "localhost", 9150)
    socket.socket = socks.socksocket
    ip = requests.get(x, headers={'User-Agent': UserAgent().firefox})
    soup = BeautifulSoup(ip, 'html.parser')
    temp = soup.find(attrs={'class': 'datatable dt-outline dt-bordered dt-striped'})
    print(temp)

What is the point. Through the socket library, I changed the ip and took it from under the Tor. And everything works, ip really changes. But the site that I parse for some reason sees something and does not allow me to enter it without a captcha, if I try to make a request to it in this way.
At the same time, in the Tor browser itself, I go to the site without problems.
And also if I remove the line about the socket, like this:
def get_html(x):
    UserAgent().firefox
    # socks.set_default_proxy(socks.SOCKS5, "localhost", 9150)
    # socket.socket = socks.socksocket
    ip = requests.get(x, headers={'User-Agent': UserAgent().firefox})
    soup = BeautifulSoup(ip, 'html.parser')
    temp = soup.find(attrs={'class': 'datatable dt-outline dt-bordered dt-striped'})
    print(temp)

Then the get request is also done normally and everything is parsed, but only from under my real ip.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question