Answer the question
In order to leave comments, you need to log in
How to implement multithreading correctly?
Hello.
There is a small site parser, you need to make it work faster.
Structure:
# блок с импортами
import ....
# блок с подключением к БД (оттуда нужно забрать урлы по которым пройтись)
conn = sqlite3.connect('db.sqlite3')
curs = conn.cursor()
select = curs.execute("SELECT url from test;")
# блок с основной функцией, которая забирает данные с сайта
def scrape(link):
....
# блок записи в БД, используя подключение выше
return "Success"
# блок, который вызывает функцию scrape для каждого элемента, что был взят из БД
for link in select.fetchall():
scrape(link)
thread_list = []
t = threading.Thread(target=scrape, args=(link,))
thread_list.append(t)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
Answer the question
In order to leave comments, you need to log in
I offer you my version, through the task queue:
from Queue import Queue
from threading import Thread
def scrape(link):
conn = sqlite3.connect('db.sqlite3')
curs = conn.cursor()
# insert link, close connection
class Worker(Thread):
def __init__(self, tasks):
super(Worker, self).__init__()
self.tasks = tasks
self.daemon = True
def run(self):
while True:
link = self.tasks.get()
try:
scrape(link)
finally:
self.tasks.task_done()
if __name__ == '__main__':
# максимальное количество одновременных потоков
capacity = 0 # infinite
queue = Queue(capacity)
workers = 3
for _ in range(workers):
Worker(queue).start()
for link in select.fetchall():
queue.put(link)
queue.join()
print 'Done'
For a site parser, it is better to use an asynchronous framework, an example www.py-my.ru/post/4f278211bbddbd0322000000
As advised above - use some kind of asynchronous framework (I liked gevent at one time). You still need to make changes anyway, whether it be threads or asynchrony, and asynchrony, in my opinion, is better suited in this case, because you basically have IO operations, besides, all Python code still runs on one core ( Google the GIL) and SQLite can't thread, so I'm not sure if threads will give you any advantage.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question