Answer the question
In order to leave comments, you need to log in
Storing static files on the server
There is 1 service. Currently, user files (images, PDFs, etc.) are stored in a shared folder on the server. Every day their number is increasing significantly.
Debian 7 x64 system, EXT3 The
question is whether it is worth thinking about splitting the repository into different directories, since there will be several million files in 1 directory. I am not very familiar with low-level file manipulation in Linux, so I will be glad to hear from knowledgeable people which way to go: leave everything as it is or take care of the future, as well as what problems are possible if left "as is".
Thanks in advance for your advice.
Answer the question
In order to leave comments, you need to log in
It is not recommended to save a huge number of files in one folder. There will be problems ... Even with 20k files in one folder, WA on the server will start to grow strongly.
What are the file names? Make at least 2-3 subfolders and everything will be fine.
For example, I have the first 3 letters of the file hash on all projects.
For the file 71d9e9817d9d2e661abbaf3368e01529.jpg will be in ..../7/1/d/71d9e9817d9d2e661abbaf3368e01529.jpg
You can banal breakdown by date - years, months, days. Or, if there are not very many files, only years and months. But, in any case, it is better to foresee such a breakdown in advance.
I always wanted to try the approach with storing a bunch of small files in one file, where the necessary info is pulled out by offset + length. This approach is used on Facebook. But something of the tasks in this part has never loomed.
Later it turned out that even better could be done. Images began to be stored in large binary files (blobs), providing the application with information about in which file and with what indent (in fact, an identifier) each photo is located from the beginning. Such a service in Facebook was called Haystack and turned out to be ten times more effective than the “simple” approach and three times more effective than the “optimized” one.
not far off in 1 directory there will be several million files ... what problems are possible if left "as is".Even deleting will not be so easy:
Be sure to separate the folders. It will slow down like hell, and the mechanics of the disks will wear out reading huge amounts of files (they even say the disks fell because of this). From personal experience, I’ll say in a folder, ideally, there should not be more than 1000 files or folders (more is possible, but this already beats performance).
I usually distribute files according to the counter.
Let's say there are ten folders 0-9 (you can also mix in Latin letters - then more), each has 10 more of the same in them for another 10. Let's say I want to save a million files (1000 files per folder). All you need is three levels of nesting.
Of course, if it is envisaged that the folders will be created automatically (you will have to create this manually).
The first 1000 files (from the first to the 1000th file) will be in the folder 0/0/0
The second 1000 files (from 1000 to 2000 files) will be in folder 0/0/1
etc. But it is better to do it with a margin, for example, 6-8 levels (for example, 0/0/0/0/0/0/0/0, and files should be poured only into the last levels in order to avoid confusion when there are folders with names 0-9, files from which it will spill into levels - you will not collect).
And one more thing, you can create folders like this: 000 - 999, without putting a folder in a folder, as you like, but also take into account the length of the path to the last nested file, so that it also stays within the acceptable range (it is advisable not to exceed 256 characters - it also depends on file system).
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question