Can Kafka merge task results based on a common id?

A

Alexander Lebedev2020-11-15 21:35:59

Java

Alexander Lebedev, 2020-11-15 21:35:59

Is it possible, using Kafka, to combine task results into a single data entry (eg JSON) based on a specific parameter (eg order_id)?

That is, the order falls into the system in the form of JSON, to process it, you need to perform N tasks - some are local, some are waiting for a response from the external API, that is, different execution times. Tasks are uploaded to Kafka, which knocks on microservices. Each task gets the result in JSON too, including the order_id.

Actually, the question is whether it is possible to merge the results of these tasks automatically, immediately after the last order task is completed? If so, how?

If I misunderstood the logic/order of events/etc. - drink. If you chose the wrong tool and it is better to implement it through the X solution - too. Otherwise, I'd be grateful for any topic/stream-based advice.

Reply

Answer the question

In order to leave comments, you need to log in

3 answer(s)

V

Vamp, 2020-11-16
@sortarage

Your task in such statement can be solved quite well. That's just gluing the results will have to be done manually.
Create a topic with results and take order_id as a key. Next, read the results from the topic and add them to the Map<Integer, Set<TaskResult>> collection (where Integer is order_id). As soon as the number of elements in the Set becomes equal to the number of previously sent tasks for a given order_id, we can assume that all answers have been received and pass them all at once for further processing.
It remains only to think of extreme cases. For example, you cannot wait indefinitely for all results to arrive - the external api may not respond, and the local task may fly out with an exception and not generate a TaskResult. In this case, the number of responses will be less than the number of submitted tasks. You will have to tighten timeouts and / or resend tasks. But what if suddenly there are more answers than requests were sent?
Plus another question is when to commit offsets. If at once, then there is a danger of getting only half of the results. For example, if the result collector crashes after it collects the first half and commits it, then after restart it will only subtract the second half and never collect the full answer.
You can create a separate topic for each order. This simplifies the processing of some corner cases, but there is a problem if there are a lot of orders (hundreds of thousands - millions).
I didn’t work with kafka streams, but after skimming through the documentation, I can assume that the groupByKey() + reduce() combination can solve the issue with less code than the previous two options.

I

Igor Cherny, 2020-11-16
@freeg0r

The question is not formulated clearly. But in general, Kafka does not unite anything and does not knock anywhere. Kafka is simply distributed data feeds, a queue of messages with guaranteed delivery.

M

mayton2019, 2020-11-16
@mayton2019

The author does not correctly formulate the problem.