S
S
skvot2013-11-05 10:50:35
PHP
skvot, 2013-11-05 10:50:35

Distributed Image Storage

Hey habr!
The task is to create a distributed storage of images that need to be stored in different sizes, the original images can be stored on any of the servers.

We came up with the following scheme: there is a physical machine that is the entry point, and N additional storages. On the main server, nginx is installed, listening on port 80 and apache. When accessing the server, nginx looks for the processed image in the file system, if it does not find it, it tries to find the image on additional servers. If the image is not there either, through apache nginx calls a PHP script that searches for the source of the desired image on the machines (the image can be loaded directly on any of the machines), performs the necessary processing and saves the processed images to the storage. If the original image is not found, php sends 404 headers, in response to which nginx sends a stub image.

I ask for advice on the scheme of work itself (maybe there are alternative options for building interaction that will suit us?), As well as help with configuring nginx (specifically, I can’t process the headers returned by apache in nginx via error_page).

Thank you all in advance!

Answer the question

In order to leave comments, you need to log in

5 answer(s)
N
Nikolai Vasilchuk, 2013-11-05
@Anonym

It is not entirely clear why "original pictures can be stored on any of the servers."
It would be logical to make such an architecture, in which the location of the image can be unambiguously determined by the name of the image.
1. At the time of downloading the picture, determine the server on which it should be stored.
2. Name the image according to the storage.
3. When giving back, you know exactly where to look for it.

A
AxisPod, 2013-11-05
@AxisPod

IMHO it won't be fast. Well, it won't take long to check the file locally, but it won't be fast to check it on other N servers. And again, the bottleneck may be the channel to the balancer (entry point).
I would advise you to consider the option when the location of the images is already known at the stage of generating html content. Each server has its own subdomain, for example: img1.domain.com, img2.domain.com, etc. Here you can also easily throw dns balancing.
Well, accordingly, if there is no ready-made image, you already give the path to the php script.
Keeping the storage id next to the desired version of the image will not be difficult.

S
Stepan, 2013-11-05
@L3n1n

N are connected to the main machine via NFS with previews.
Through NFS, we can easily check on which of the servers there is an already processed image and redirect to it.
Initially, it was planned to only check the existence of a file through NFS, but after long tests it turned out that even with writing previews to other servers, it copes with a bang. Falls no more than once a year.

G
Gregory, 2013-11-05
@difiso

And why not make a simple database at the entry point, which will store the path to the image (or at least the server name)? The search will still be performed, but on a properly made database, you will have to search much less, and it will work much faster.

R
rozhik, 2013-11-05
@rozhik

I would like to suggest the following.
Idea:
0 optional - search on the local file system
1 creates a hash function from the path to the picture, returning an integer. hashVal
2. The front server with the hashVal % serversCount number is selected, and if it is alive, the picture is pulled from this server. if it does not exist, it is generated.
3. if the server is not live, we take the next one and proceed to step 2.
In practice, this solution is detrimental due to the fact that after the server fails, the next server becomes overloaded.
In a live project, a modification is used:
There are 1000 entries in the memcache. Initiated by the values ​​from point 2. After the fall of the front server, its indexes are changed to the indexes of live servers randomly. After lifting, they are restored.
(actually, it's a little more complicated, since each picture always lives on 3 servers, and access to them is balanced by a carousel, but this is not important for this question)

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question