Answer the question
In order to leave comments, you need to log in
What is better to choose, multithreading or multiprocessing?
There was a task to rewrite one thing in the project, we actually decided to do it in python.
The essence of the problem:
> 1000 active processes (well, or threads, you need to choose) that will hammer on other servers, take data, parse, etc., etc., but operations are basically fast, though these processes should hang around the clock and not fall, except perhaps reboot or terminate on demand.
Sobsna has already read a lot of holivars on this topic, and since the main language itself is not python, I ask the community for help, help me decide which is better: more than a thousand processes running simultaneously, but with the convenience of their management, or one process with a thousand threads?
Answer the question
In order to leave comments, you need to log in
A little python trick, if you want to write without a framework (scrapy, etc.), is that the same code is created for both a multi-threaded and multi-process task, differing only in the package and class used (threading / multiprocrssing). So at the development stage, you can try both options, and already decide which is better along the way.
If you are not familiar with Python, then first read about the GIL - global interpreter lock.
A few years ago, the choice of Python would have been justified. But for today, I would choose Go, not Python, to solve such a problem.
Programming in Go is as easy as in Python, and with concurrency and concurrency, Go is much easier than Python.
IMHO, if the language is not sharpened for competition - not Erlang, not Go, etc. - it's better to run multiple instances of your application. One instance per 1 physical core.
Though 1000 flows - not God all what loading.
You only need to bother especially if you want to minimize the payment for hosting or you have some kind of loaded tasks.
I have a project on Go that holds perfectly and 15,000concurrent persistent connections, for example.
On a weak modern server (4 cores, 4 gigabytes of RAM).
About the difference between processes and threads - for processes, the operating system is responsible for isolation. In order for two processes to interact, a call to the OS is needed, which leads to a context switch, which is expensive.
Flows have no such costs. In summary, threads are lighter than processes. But they have worse isolation - the fall of one thread can drop the entire process.
In summary, if tasks in parallel processing should not communicate with each other, the first choice is processes, since isolation is better.
If must - choice - streams. And a good development team. Not figproduction.
scrappy now supports python3
doc.scrapy.org/en/latest/news.html#news-betapy3
no need to write your bike
Well, I would not run 1k. processes. I would make processes by the number of processor cores. For example, we have 8 cores on the server, then we have 8 processes and 125 threads in each.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question