S
S
Shato Daltsaev2019-03-19 10:16:33
Python
Shato Daltsaev, 2019-03-19 10:16:33

How to run selenium parsing in multi-threaded mode?

I can't figure out how to run webdriver in multiple threads.
I want to parse two sites at the same time. There is a list ean looks like
single threaded it looks like this

def labirint(eanlist):
        pricelist = []
        for ean in eanlist:
            try:
                driver.get("http://www.labirint.ru/search/" + ean + "/?labsearch=1")            
                time.sleep(1)
                labirintBookState(driver)
                if driver.find_element_by_xpath(labirint_xpath_state).is_displayed():
                    x = driver.find_element_by_xpath(labirint_xpath)
                    price_int = int(x.text)
                    pricelist.append(price_int)
                else:
                    pricelist.append("")
            except:
                pricelist.append("")
        return pricelist

    def chitayGorod(eanlist):
        chitay_gorod_pricelist = []
        for ean in eanlist:
            try:

                driver.get("https://www.chitai-gorod.ru/search/result/?q=" + ean + "&page=1")
                time.sleep(1)
                if driver.find_element_by_xpath(chitay_gorod_xpath).is_displayed():
                    price = driver.find_element_by_xpath(chitay_gorod_xpath)
                    price_int = int(re.search(r'\d+', price.text).group())
                    chitay_gorod_pricelist.append(price_int)
                else:
                    chitay_gorod_pricelist.append("")
            except:
                chitay_gorod_pricelist.append("")
        return chitay_gorod_pricelist

option = webdriver.ChromeOptions()
chrome_prefs = {}
option.experimental_options["prefs"] = chrome_prefs
chrome_prefs["profile.default_content_settings"] = {"images": 2}
chrome_prefs["profile.managed_default_content_settings"] = {"images": 2}

driver = webdriver.Chrome(executable_path='C:\priceUpdater\ChromeDriver\chromedriver.exe', chrome_options=option)


if __name__ == "__main__":
    list_ean = getFileEan()

    labirint = labirint(list_ean)
    chitayGorod = chitayGorod(list_ean)
    print(datetime.now() - startTime)
    print("ok!")

The methods in the loop iterate through the sites and collect data.
How can I implement so that two driver processes are loaded at the same time and one parses chitayGorod and the other labirint ?
I tried what I found on the Internet on this topic and did not find anything suitable.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
S
SHADRIN, 2019-03-22
@shadrin_ss

Documentation on multiprocessing
https://docs.python.org/2/library/multiprocessing.html

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question