How to organize fault-tolerant distributed task processing?

A

Andrew2016-02-03 22:24:40

C++ / C#

Andrew, 2016-02-03 22:24:40

Hello everyone
I am writing a data miner, which must collect and process data from different sources at certain intervals. At the same time, different sources may have different intervals - from one minute to two hours.
As I see this task: there will be some kind of central worker who will know where and how often to take tasks. At the right time, it will create tasks and add them to the queue. Auxiliary workers do not know anything about periodicity, they only monitor the queue, take and execute tasks from it. If the worker falls, then the team will not notice the loss of a fighter - other workers will continue to work. Is that correct, or are such tasks solved differently?
And I have another question - how to store tasks? While there is a table in the ms sql database, in which the main worker will add tasks, and others will take rows with tasks. But it is not entirely clear whether it is possible to make the appearance of jobs event-based, and not for each worker to check the database at intervals?
Another question, if the worker tries to take the data, puts a lock on the table so that others do not take the task in parallel, and at the same time falls, then the lock will remain hanging, how to deal with it?
Is it possible to solve this task in such a way that the task of the fallen worker does not disappear with him? So that another worker can take this task? To do this, I see the following option - to make a field "the beginning of the solution of the problem" in the table, in which to place the time when the task was taken. And if the time becomes too old (for example, after a minute), then another worker will take over this task and update the "start task solution" field.
If you wrote nonsense, don't laugh, I've never written such systems before) Better tell me how serious people do it in serious places)
ps I use C#, database hosting - MS SQL in Azure. For the solution, I can use any tools that openwork gives.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

P

Peter, 2016-02-04
@petermzg

1. Why do you need a database if everything is done in one application. In the database, save only when the application is closed and read once at startup.
2. Make a task manager, let it give new threads, if there is no new one, then hang up the threads to wait for the event. Something came and put the event in a signal state. The streams have started. Mark the completion of the task there. While the job is running, you can save the object to the thread in which the job is running. If the thread aborts, the manager can check the state of the thread via thread.Join(10). The thread is dead, reset the job flag.

V

Vitaly Pukhov, 2016-02-04
@Neuroware

For the sake of boredom, I made a similar system based on sharpmq. It turned out well, sort of.

A

Artur Nurullin, 2016-02-04
@Splo1ter

We use the Service Bus and we must wrap the worker task in a try catch.