Answer the question
In order to leave comments, you need to log in
How to run python html parser in multiple threads?
There is data (> 1k lines), for each line there is a request urllib.request.urlopen(). I split the base into 30 parts. Now we need to simultaneously run 30 identical scripts for each part. How to implement it?
I tried to run 2 scripts in the terminal at once, through "python_script_1&python_script_2", everything works fine, but what if there are 30 scripts? The most problematic thing here is the organization of directories (of course, I wrote os.path.abspath (os.curdir) in each script).
Please help with practical advice, there is no time to read about multithreading.
Answer the question
In order to leave comments, you need to log in
The simplest multithreading:
import urllib2
from multiprocessing.dummy import Pool as ThreadPool
urls = [
'http://www.python.org',
'http://www.python.org/about/',
'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
'http://www.python.org/doc/'
]
# Make the Pool of workers
pool = ThreadPool(4)
# Open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)
#close the pool and wait for the work to finish
pool.close()
pool.join()
master MultiCurl , it’s quite easy to rebuild the names of the files you save, download
, then go through the directory and parse, you can even without any multiprocessors, the main waste of time is getting from a remote server
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question