Python multiprocessing, Manager and Big Data

Z

Zerstoren2013-11-28 13:30:39

Python

Zerstoren, 2013-11-28 13:30:39

There is a web application that runs on web sockets, and therefore multi-user.
High loads are expected, including on the processor.
Due to the fact that Python is single-threaded (by default), I'm afraid that I might not be able to handle the influx of people.
On it it would be desirable for a start to give the chance to the server to work with all resources of the server.
The very first problem was the architecture, which is convenient for development, but not very scalable. In short, there are controllers, services, factories, mappers, and domains. The problem arose with domains, in short, domains contain an interface for working with data. The domains themselves are cached inside the main process and live there until the domain has been requested for more than 10 minutes. Also, domains have a locking system implemented so that no one can change it if someone else wants to change it.
And here is the essence of the problem, there can be many domains, in one situation (very unlikely) up to 4k domains of the same type can be created, all data is stored in RAM, because these domains are used by users. Accordingly, if you make a new process, then each process will have its own data and anyone can start the changes, in turn losing the data of another process.
How dangerous is it to use Manager from the multiprocessing package to synchronize data between processes?
In terms of speed, can I get a problem that, due to the large amount of data, I will be very, very slow at the speed of obtaining the same data?
Also, can the Manager be used for horizontal scaling?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

A

Antigluk, 2013-11-30
@Antigluk

As for the security of Managers, I can't say.
But I can advise you to use some kind of queue and a number of workers that perform the tasks of this queue. Alternatively, rabbitmq/pika for python. For cache, use a separate in-memory service like redis.
That is, the Web -> Queue -> Worker -> Data
Worker architecture communicates with the cache.
Thus, by horizontally expanding the number of workers and radish replicas, we can increase performance linearly

E

Evgeny, 2014-05-14
Batkovich @quickhabr

celery can be of some help to you.
But is it possible to store all your structures not in the current process, but in something third-party? then it will be no problem to handle different processes.
mongodb allows you to store unstructured data, maybe you need it