How to transfer large files between microservices?

R

Rodion2021-12-05 15:42:55

Service Oriented Architecture

Rodion, 2021-12-05 15:42:55

Good afternoon!

I immediately apologize if the tag is incorrect: there are problems with this. I ask and allow me to correct

it. Briefly:
- There is a storage of heavy files (about 1TB for each)
- There is a set of microservices-workers who, according to the task at a certain moment, want to download a file from the storage and at a certain moment upload it back
- Between the storage and microservices - sit down one, local
- The services themselves are written in Python
- File transfer is currently organized by opening subprocesses (standard library subprocess ) with the command scp /local-file [email protected]:remote

I consider the current option not optimal and almost unmanageable. Are there any better practices for transferring large files back and forth? It is desirable that this works well in Docker containers via docker volumes .

I'm considering a variant towards pure Nginx (the usual return of statics) or the solution found by minio (which I have not studied yet).

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

I

Ivan Shumov, 2021-12-05
@inoise

Depends on the nature of file processing and division of responsibility. There are both options for s3 storages, as well as streaming through Kafka, and network protocols that do not disappear anywhere (nfs and iscsi, if I remember the abbreviations correctly)

U

uvelichitel, 2021-12-05
@uvelichitel

The fastest way to transfer between nix systems is with the nc command . You can use it instead of scp .
If the processing profile is such that the microservice often uses the same heavy files, then it is possible to keep a local copy of such a frequently used file on the machine with the service. Synchronize with the repository with the rsync command . In this case, only the changes made will be transmitted, the traffic will be significantly reduced, and it will work faster.
Well, the first thing that comes to mind is the obvious question - why not run processing on the storage machine so as not to drive heavy files?