What is the best way to organize work with worker servers so that they themselves take work from the pool or the master server gives them work?

K

Konstantin T2016-12-20 17:21:02

Highload

Konstantin T, 2016-12-20 17:21:02

There is a server pool of 15-20 servers, one of them is dedicated to the front, the rest are for hard work .
by hard work I mean long tasks that use almost all the power of the processor and RAM, tasks can be run in parallel on the same server i.e. 4-8 tasks can work simultaneously, the number depends on the load overage of the server.
So, the task pool is stored on the front server that accepts requests from users. How best to organize the distribution of tasks between servers, I still see 2 options and I can’t decide which one is better: (further on, the master server is the front server)

Each worker server, look at its load average, and until it polls the master server for tasks, if a task appears, it grabs it and tells the mother server that this task is now his and other servers should not deal with it in the future
The master server looks at the statistics that each worker sends to it, which worker is currently the least loaded and gives it a task and manages the task distribution processes

The downside of the first approach is that collisions can occur and 2 servers can take over the task at the same time and will perform double work, so I'm more inclined to think that option 2 is suitable, but maybe I'm wrong or didn't take something into account? Or is there a 3rd option?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

J

jacob1237, 2016-12-20
@RooTooZ

The downside of the first approach is that collisions can occur and 2 servers can take over the task at the same time and will do double work

Depends on the technologies you are going to use. In option #1, the data structure with tasks for workers will be called a shared queue. This data structure has just the same main task - to distribute data to units, preventing duplication and negative effects such as race conditions, etc.
It is implemented differently in different software packages. I recommend looking at Beanstalkd , for example, where all your problems have already been solved, or using the List data structure built into Redis . She basically does what needs to be done.
However, the advantage of Beanstalkd will be that it is specially tailored specifically for task queues: it supports sorting tasks in a given numerical order, reserving tasks, automatically removing the reserve when processing time is exceeded, etc.
Plus, it provides the ability to store tasks on a hard disk (with the key - b) in addition to memory storage (which in Redis is implemented only through snapshots, or through a full log of operations - which is not the best option).

M

Max, 2016-12-20
@MaxDukov

yes, in general it is the same. Organize a task lock while someone picks it up - and pull will work correctly. If you don't want to, push.
or something in between - the worker periodically says "I'm ready to take the next", the master shoves the task at him.
And the master does not hammer the worker with requests "how are you?" and the worker will not receive an already running task.
those. the queue is managed by the master, the status report is created by the worker