Is there a data structure for multithreading with non-parallelism limits by ID?

A

AlexHell2020-02-20 00:47:36

Programming

AlexHell, 2020-02-20 00:47:36

While I'm collecting information - is there such a data structure - its name, and perhaps implementation in some of the open libraries.

Should be like this:
1) Message queue (possibly several), while each message has a SubjectId (say user ID, etc.)
2) Thread pool (i.e. more than 1 thread in general) that take messages from the queue and start executing them (consumer
)
you can start executing in the 1st thread Message with SubjectId == 1, you can start executing in
the 2nd thread Message with SubjectId == 2,
but if we conditionally have 3 threads, then you can’t start executing in the 3rd thread Message with SubjectId == 1 or 2, it should be any other SubjectId
* as soon as one of the threads has finished working, that SubjectId becomes free, say == 1, and then a new thread can take a Message with SubjectId == 1 from the queue (structure)

Why such a structure is needed - so that additional critical sections are not required (lock \ synchronized) in the context of identical objects, because when we work in parallel with an object with SubjectId == 1 of 2 threads, then we are required to lock access to its structures, and in the case of such a structure, there is no need to lock on the object itself (with SubjectId) (although lock- and on other common structures - not the essence of the question)

In contrast to single-threaded execution, the performance will be higher. several threads, a whole pool, i.e. as many kernels as you like can be loaded if requests in most cases will be with different SubjectIds.

The main thing is that everything here should not be covered by locks or their analogues, which will severely cut the performance due to idle CPU cores.

As I see it at first glance, a HashMap with a SubjectId key and a List value is needed here. all messages that have this SubjectId.
And a bunch of bindings in the form of the beginning and end of processing, selecting the desired SubjectId, etc.

There are similarities with Akka actors, but I'm not an expert on them. But the most important thing that doesn’t suit them at first glance is that each actor is an instance through new and with its own queue, and if in most cases we have different SubjectIds, then it’s wasteful to have a class instance with its own queue for one or two Messages and plus it is not known how it will be in terms of performance in such a number of actors. For example, if we have a SubjectId from 1 to 1_000_000, then 1_000_000 actors will be created, each with its own queue, which is probably troubles (need to be profiled).

__ update

the global order is not needed (indeed, if it was needed - only in a single thread can this be done), but the local one is desirable

and if it works out, the global one is at least somewhat needed in the sense that tasks do not hang in the queue forever, for example, if the 1st task comes, the 2nd .. 1000th, it is not necessary that the 1st hangs until the end of the centuries, because new tasks will appear (1000th - 2000th, etc.)

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

S

Sergey Gornostaev, 2020-02-20
@sergey-gornostaev

Looks like a regular striped lock. You take a normal queue, a normal thread pool, and fill an array with Lock instances. The thread at the beginning of the work takes the Message from the queue, receives the SubjectId from it, calculates its hash and tries to grab the lock from the array element corresponding to the hash. If the lock succeeds, the thread does its work. If not, return the Message to the end of the queue and take the next one from the beginning. It remains only to choose the effective size of the array of locks.

A

Alexander, 2020-02-20
@NeiroNx

I think you need to read about "Message Queues" and the RabbitMQ service https://habr.com/en/post/150134/
"This is exactly what you NEED, Potter"