Which architecture to choose for a distributed image processing system (docker, microservices)?

L

log952019-05-07 22:40:18

Software design

log95, 2019-05-07 22:40:18

There is a web application. He has a need to get some data from the image. Three services are involved in data generation.

The image is fed to the input Service1. It returns the data that is fed to the input of Service2. The result of Service2 is fed to the input of Service3. And the result of Service3 is returned to the client.
Since the services are very fragmented with their environment, it was decided to separate them into separate containers.
The question is how to properly organize data transfer in this case?
As I understand it, we need some kind of controller, as indicated in the image, which will transfer data between services and return the result to the main application.
Move all nodes (application, controller, services) to containers.
Transfer data between all nodes via HTTP requests (in particular, transfer images via base64).
Since the number of nodes of any service can be a different number, it turns out that we also need a balancer for the controller, which will help determine a random node of the service.
The above is building a "bicycle" and are there any standard techniques?
Or is there an article describing something similar (data transfer, controller for services)?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

Roman Mirilaczvili, 2019-05-08
@log95

Based

this

Не знаю, насколько вам интересно будет погружаться в саму бизнес-логику. Вкратце она такая:
- Service1 делает обработку изображения. OpenCV + python скорее всего (нахождение контуров, вырезка и прочее). На выходе будет порядка 3-9 изображений
- Service2 делает распознавание. Тут или tesseract или нейронная сеть. На выходе несколько текстов.
- Service3 делает обработку нескольких текстов и выдаёт результирующий текст. Скорее всего тут будут python-библиотеки.

clarification, I see the architecture as follows:
Since the services are interconnected so that they need to be called one by one, it is logical to connect data processing using the message queue:
Image (ID=123) -> Image Slicing Queue (OpenCV) -> put each received sliced image in the OCR Queue -> the recognized text we put in the Text Analysis Queue .
To process an image in the Image Slicing Queuewe put it along a certain path (it can be a path in AWS S3) and the worker of this queue from the instance of some VM accesses along the specified path and saves the results of the work along the specified path for processed images and each of them separately (path and image ID) puts in OCR queue .

Y

Yerlan Ibraev, 2019-05-08
@mad_nazgul

Asynchronously, via Service Bus on Kafka.
<:o)