Answer the question
In order to leave comments, you need to log in
Efficient multithreading in python?
I need multithreading in a scripting language - I tried pthreads for php 7, but despite the statements of the developers, everything works crookedly and crashes for no reason. Therefore, I look in the direction of python (are there any alternatives?), but here and there they write that multithreading in it is implemented in such a way that multithreaded applications are inferior to ordinary ones and that there are some problems with locks and synchronization. Is it so? The experience of multithreading in python is small - simple file downloads using thread, but there, as I understand it, there are more convenient and powerful tools. Actually the question is - what is there with multithreading and what modules are best to use? Tasks: parsing web pages and entering data into the database. Thank you!
Answer the question
In order to leave comments, you need to log in
asyncio will solve all the problems (if you master it, of course)))
if like in PHP - see, for example,
toly.github.io/blog/2014/02/13/parallelism-in-one-line
but aiohttp for parsing will be more fun by the
way, Scrapy has already been ported to the 3rd Python, but they themselves say that it’s a bit damp
with regards to multithreading, here is a code example:
if __name__ == '__main__':
freeze_support()
pool = Pool(processes=8)
names = pool.imap_unordered(extract, glob.iglob(GLOBDIR), chunksize=1000)
for name in names:
extract(name)
Well, everyone reads the question so sucks. All the answers above have nothing to do with multithreading. In python, it’s better to forget that there is such a thing as “multithreading”, you choose the wrong technology for this (although there is, of course, pypy, but you don’t know at what stage everything is there. There is also an option using processes, but for me it’s more crutch). And in terms of solving the problem of parsing - yes, you can use asynchrony, but one thread will be used.
It is difficult to make the parser parallel to the
Spider - it is possible, but it makes no sense The
generally recognized spider on python scrapy.org
When it comes to multithreading, you should immediately note what types of tasks you are interested in.
If we are talking about CPU-bound tasks, and you need to load a multi-core processor, then yes GIL interferes (even prohibits), and in Python you need to compile an extension or use several processes.
If the matter is in IO-bound - these are almost all tasks related to the network, including the web - then, as a rule, the blocking operation of waiting for a response from the network releases the GIL and you can safely use multithreading.
Another thing is that now many people advise using asynchronous frameworks (the same asyncio), which for network tasks give much better performance than native threads (although, it seems, greenlets is even better).
Hello, sorry for answering with a question: do you need to use scripting languages? If yes, then python is better, there is a library in the box: multiprocessing (well, or something like that, I don’t remember exactly, it was a long time ago), in fact, it is dangerous, calling its functions is limited to the main loop:
if __name__ == "__main__":
вот тут вызываем
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question