Answer the question
In order to leave comments, you need to log in
Multithreaded page processing using Python3+Grab. How?
Hello.
There was a need to write a fairly simple site handler (not a parser!).
One of the most important details is multithreading and speed.
Now the following code is written:
from queue import Queue
from threading import Thread
import time
from grab import Grab
def submit_form(i, q):
while True:
link = q.get()
g = Grab()
g.go(link)
# Some actions with page
q.task_done()
start_time = time.time()
num_threads = 5
queue = Queue()
for i in range(num_threads):
worker = Thread(target=submit_form, args=(i, queue))
worker.setDaemon(True)
worker.start()
q = [
"link1",
....
"link100"
]
for item in q:
queue.put(item)
queue.join()
print("--- %s seconds ---" % (time.time() - start_time))
grab.error.GrabConnectionError: [Errno 7] Failed to connect to линк_на_сайт port 80: Connection refused
Answer the question
In order to leave comments, you need to log in
Forget about the hornbeam
Or use python2 and scrapy, or use python3 with its goodies, or just run synchronous scripts in parallel with parallel
cat file_with_links.txt | \
parallel -j количество_потоков myscript.py --param1={}
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question