K
K
kopelev20002019-12-11 21:13:31
Python
kopelev2000, 2019-12-11 21:13:31

I'm parsing OLX, collecting phone numbers from pages, the bottom line is that it bans me, how can I fix this?

I am parsing OLX, collecting phone numbers from pages, the bottom line is that it bans me, this inscription appears (instead of a certain page) 5df12dc91d449759575844.png, I tried to use uBlock, at first it works fine, the phones are collected, everything is fine, but then it starts blocking the script that opens the text 5df12e96b3d93987907635.pngand then what you see in the first picture appears.
The question is, maybe uBlock stops understanding what needs to be blocked, is it possible to ask it (before opening the window) what needs to be blocked?
And is it necessary to use a proxy in conjunction with uBlock, tried to use it without, but did not help at all (IPv4 proxy)?
The code:

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time

f = open('text-for-OLX.txt', 'a', encoding='utf8')
urls = open("input.txt", "r")
for url in urls:

    def get_url(driver):
        driver.get(url)
        print("GOT URL")
        time.sleep(3)


    def press_cookie_btn(driver):
        cookie_btn = driver.find_element_by_xpath("//div[@class='topinfo rel']"
                                                  "/button[@class='cookie-close abs cookiesBarClose']")
        cookie_btn.click()
        print("COOKIE")
        time.sleep(2)


    def get_content(driver):
        try:
            time.sleep(1)
            driver.find_element_by_xpath("//span[@class='link spoiler small nowrap']/span").click()
            time.sleep(2)
            try:
                phone = driver.find_element_by_xpath("//strong[@class='fnormal xx-large']").text
                print(phone)
                f.write(phone + '\n')
                time.sleep(1)
            except:
                phone_1 = driver.find_element_by_xpath("//strong[@class='fnormal xx-large']/span[@class='block'][1]").text
                phone_2 = driver.find_element_by_xpath("//strong[@class='fnormal xx-large']/span[@class='block'][2]").text
                print(phone_1, phone_2)
                f.write(phone_1 + ' ' + phone_2 + '\n')
                time.sleep(1)
        except:
            pass



    def page_pagination(driver):
        ars = driver.find_elements_by_xpath("//a[@class='marginright5 link linkWithHash detailsLink']")
        urls_1 = []
        for ar in ars:
            url_1 = ar.get_attribute("href")
            urls_1.append(url_1)
        for url_2 in urls_1:
            driver.get(url_2)
            time.sleep(3)
            get_content(driver)
            time.sleep(3)

    def pages_pagination(driver, last_elem):
        page_pagination(driver)
        for i in range(2, int(last_elem)+1):
            driver.get(url+"/?page="+str(i))
            page_pagination(driver)




    def main():
        options = Options()
        options.add_argument('user-agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7')
        options.add_extension("D:\\UB\\cjpalhdlnbpafiamejdnhcphjbkeiagm.crx")
        driver = webdriver.Chrome(options=options)
        driver.implicitly_wait(10)
        get_url(driver)
        try:
            last_elem = driver.find_element_by_xpath("//span[@class='item fleft'][last()]")
        except:
            pass
        press_cookie_btn(driver)
        try:
            pages_pagination(driver, last_elem)
        except:
            page_pagination(driver)
        driver.quit()


    main()

urls.close()
f.close()

Answer the question

In order to leave comments, you need to log in

3 answer(s)
X
xmoonlight, 2019-12-11
@xmoonlight

I'm parsing OLX, collecting phone numbers from pages, the bottom line is that it bans me, how can I fix this?
Stop parsing without understanding the process.

D
Dimonchik, 2019-12-11
@dimonchik2013

proxy in Selenium, let's say, not a very original solution,
only a proxy in selenium under your account is worse))
but it works with Google)), of course, not in the forehead

A
astronotius, 2020-05-04
@astronotius

Use puppeteer, better puppeteer-stealth

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question