Answer the question
In order to leave comments, you need to log in
How to scale the image storage service?
There is a microservice for storing image files. Let's say its url is s1.site.ru The structure
is simple:
- uploading files via API using a POST request s1.site.ru/upload
- deleting files via API using a POST request s1.site.ru/delete or GET- request s1.site.ru/delete/some_filename.jpg
- receiving files s1.site.ru/storage/some_filename.jpg
Now this service needs to be scaled horizontally + hang an ssl certificate on it.
Let's say I raise a few more such services, for example s2.site.ru, s3.site.ru, ... - and they need to be somehow combined.
The obvious solution that comes to mind is to raise the balancer with an address like https://files.site.ru, attach an ssl certificate to it, and proxy requests (already without ssl) for services.
The API request https://files.site.ru/upload is distributed among services using the weight of the service. It should return the absolute url of the downloaded file to the client, for example " https://files.site.ru/storage/some_filename.jpg ".
The request https://files.site.ru/storage/some_filename.jpg should be immediately proxied to the service on which this file is located. How to do it? Maybe some_filename.jpg should contain something else, some key, so that the balancer knows where to send the request? Or is there a smarter solution?
The same applies to https://files.site.ru/delete/some_filename.jpg
I would be very grateful if someone tells me which balancer and how to implement it, and draws an example of the config, at least schematically, otherwise I don’t clearly imagine it)
READY SOLUTIONS OF THE S3 TYPE OR THE SAME ARE NOT CONSIDERED!
Answer the question
In order to leave comments, you need to log in
Thanks to everyone who gave helpful advice on the merits of the issue. Briefly again. There is a microservice for storing images that needs to be scaled.
The solution turned out to be the following. The microservice is raised on several servers:
s1.site.ru
s2.site.ru
s3.site.ru
The nginx balancer s.site.ru is also raised:
# Список серверов для балансировки запросов на запись (upload).
upstream storage_backend {
server s1.site.ru:80;
server s2.site.ru:80;
server s3.site.ru:80;
}
# Определение локации сервера для запросов на запись и удаление файла. Опция default нужна
# только для того, чтобы в случае неопределённого запроса получить 404, а не 500.
map $uri $storage_location {
"~/(storage|delete)/s1-" "s1.site.ru:80";
"~/(storage|delete)/s2-" "s2.site.ru:80";
"~/(storage|delete)/s3-" "s3.site.ru:80";
default "s1.site.ru:80";
}
server {
listen 80;
server_name s.site.ru;
location / {
proxy_pass http://$storage_location;
}
location /upload/ {
proxy_pass http://storage_backend;
}
}
So, we learn materiel:
GET https://f.s.com/path/to/file.ext скачать
POST https://f.s.com/path/to/file.ext закачать
PUT https://f.s.com/path/to/file.ext заменить
DELETE https://f.s.com/path/to/file.ext удалить
Maybe some_filename.jpg should contain something else, some keyYou can use some_filename as a key. Choose a hash function that will uniformly map the character string some_filename to service addresses.
In a simple version:
Upload a file to files.site.ru, and according to some algorithm (randomly, the remainder of dividing the convolution by the number of servers, taking into account the weight / load / performance of the servers, or somehow) uploads the file to the selected server and returns a direct link to it. If the bandwidth of files.site.ru is limited, then it only asks about the server, and uploads directly to the specified one.
Reading occurs directly from the server where the file is uploaded. Without files.site.ru, so that it does not become a bottleneck
Typically, such subdomains (s1, s2, etc.) are created in order to balance the load on the file server, using intermediate caching servers for this.
Let's assume that we have several caching servers in different locations. Each of them constantly transmits statistics about its workload to the main server.
When returning statics to the user, the script on the main (backend) server accesses the cache (for example, Redis) and receives from it the server load statistics at the current time and, for example, their location. After that, based on IP, the server closest to the user is selected and then the least loaded server from the list (let's call it s3) is selected. Thus, the client receives a similar url in the response from the script:https://s3.example.org/cache/images/sample.jpg .
After that, a request is made to this server and the file is received from the cache or directly from the file server (if the file is not on the cache server).
Physically deleting files is HIGHLY not recommended (especially if this happens frequently), instead you should use the hide flag at the program level.
For reliability, it is desirable to use RAID-5 on the file server. In the case of a full highload, you can create several replication file servers and combine them into one cluster.
For more details, you can read here:
https://winitpro.ru/index.php/2013/09/25/ustanovka...
PS Correct me if I'm wrong
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question