V
V
Vladimir Kovalev2021-03-28 14:43:20
Python
Vladimir Kovalev, 2021-03-28 14:43:20

How to reuse a yandexdriver instance in a new thread?

from multiprocessing.pool import ThreadPool
I won't provide all the code. Initiated 3 instances of yandexdriver using the homemade stock_of_drivers function and saved them to the sod variable . Created a container for 3 threads. Created 3 streams, where get_content is a parser, chunk is a list of strings to be added to the dialog box of the page being processed.
sod = stock_of_drivers(3)
pool = ThreadPool(processes=3)

async_result0 = pool.apply_async(get_content, (chunk[0], sod['driver0']))
async_result1 = pool.apply_async(get_content, (chunk[1], sod['driver1']))
async_result2 = pool.apply_async(get_content, (chunk[2], sod['driver2']))

During the first pass, parsing goes into three threads, but at the next iteration, if the driver is not re-created (the previous ones are closed, of course), then a MaxRetryError will occur . If you re-create the driver, then the overhead will cancel out all the benefits of multithreading, and even slow down in comparison with a single-threaded implementation. How to be?

Answer the question

In order to leave comments, you need to log in

1 answer(s)
V
Vladimir Kovalev, 2021-03-29
@Tungsteniac

Understood! In the get_content parsing function , I removed driver.quit() everywhere , I stop the driver from outside. Now everything works, and quite quickly. While I'm combing the code, I'm finishing it, but even with three threads, the acceleration is noticeable.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question