M
M
ML2016-10-30 19:36:06
Data storage
ML, 2016-10-30 19:36:06

How to implement storage of pictures from the user on the server?

Interested in the approach in terms of security and performance.
At first I decided to do it like in VK, so that the address of the picture was something like:
"example.com/c637820/v637821842/115d2/imageName.jpg" I'm
interested here, what
does 1. c637820 mean exactly
2. v637821842
Where do these id come from?
I also really liked the implementation on Facebook,
there is generally access by a certain token in the get parameter of the picture: "
example.com/v/t1.0-9/283048_103813746439563_31482813_n.jpg?oh=63f4724d166e6e322316283885f15ddc&oe=58D3C4F4"
code that is not safe.

Answer the question

In order to leave comments, you need to log in

3 answer(s)
D
Dimonchik, 2016-10-30
@dimonchik2013

from here
, if not in that article, then there is definitely a general principle on the site - md5 and folder distribution,
and if you want to burn VK, then yes, the last three digits match the numbers in the ID

T
T_y_l_e_r, 2016-10-30
@T_y_l_e_r

Exhaustive answer
bablogon.net/view.php?p=151
In social networks, files are scattered across different servers and are also tied to a specific id, so their storage principle differs from a single-server implementation

P
Philipp, 2016-10-31
@zoonman

It turns out that the pictures have access to the program code, which is not safe.

The pictures do not have access to the program code. This is absolute nonsense.
What is actually happening?
To do this, you need to understand how modern architecture works.
When you request an image, your request is sent to balancers (yes, there may be several), and the anycast protocol is used to select the closest one.
Next, the balancer selects the nearest least loaded web server.
The web server receives the request and parses it into components.
For example, in the above example, you can see that the request looks something like this
"scontent-lga3-1.xx.fbcdn.net/v/t1.0-9/283048_103813746439563_31482813_n.jpg?oh=63f4724d166e6e322316283885f15ddc&oe=58D3C4F4"
scontent-lga3-1 is the name of the host where the image can be stored. This host is part of the content delivery network.
v/t1.0-9 - it is clear that this is the version, most likely the version of the storage option. Facebook is constantly evolving, so there is probably a procedure for migrating between different versions of data storage.
283048_103813746439563_31482813 - a unique image identifier, three parts of which are different kinds of identifiers.
oh - most likely object hash - some kind of digital signature of the object.
oe - object expiration - Facebook equivalent of e-tag, a special parameter indicating object expiration.
If you are interested in how Facebook photo storage works, you can read on their blog https://code.facebook.com/posts/685565858139515/ne...
As far as I understand, Facebook does not calculate access to an object on the fly, because knowing the exact address of the image, it can be opened for some time on another machine.
So, after the image identifier and metadata are received, the system knows from which part of the file system it can be read. Then everything is simple, send the header, read the data from the disk, send it to the client.
So, in terms of performance, the most stressful thing is the speed of reading data from the disk, i.e. normal I / O, so if you need speed, then you need to focus on fast storage, i.e. An SSD is a must.
Further considerations are needed regarding reliability, balancing, access security.
You need to understand that the world doesn't end with Apache and PHP. For example, there may be a special module for nginx that will go to the database and check the availability of the object and only then read it from disk. Those. what you see in the address bar may not have anything to do with how this data is actually stored on disk, for example, it may lie in HDFS or GridFS.
In your case, taking into account your level of knowledge, it is best to use proven solutions like storing images in S3 and serving them through Cloudflare with caching, for example,
https://habrahabr.ru/post/245165/
or https://habrahabr.ru/ company/io/blog/257533/

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question