E
E
Eugene2016-01-10 00:02:00
Debian
Eugene, 2016-01-10 00:02:00

What is the best way to store a huge number of small files?

PNG images with formulas are generated from LaTeX expressions on the site. They are small, but there are a lot of them. In order not to generate them often, they are cached. While there are not very many pictures - there are no visible problems. But I'm already wondering what's next. What is the best way to store a huge number of small files? Whether to store them in the same folder, or make a tree of nested folders of the form /x1/x2/x3/.../xn/filename, where the characters x1,...,xn run, for example, all values ​​from 0 to F. If it is better to store in a folder tree, then what is the optimal nesting depth?
Debian system, files are read and served using a PHP script.

Answer the question

In order to leave comments, you need to log in

2 answer(s)
N
neatsoft, 2016-01-10
@eugene8086

The classic option is to count the hash (md5, sha1 , or
sha256 ) for each file when adding, store the hash in the database - file name - size (optional), and use the following path to write to disk:
nesting level - the first two characters of the hash in hexadecimal representation, the second level - the third and fourth characters, the file name - the hash; the number of nesting levels can be increased if there are a lot of files.
As an alternative, consider storing these objects in a database. This is usually not the best idea, but in your case it may be preferable. If each page contains many small objects, you can reduce the number of requests to the server by packing them all into one ajax response on the server side (getting the content from the database), and unpacking them with javascript on the client. In some cases, this may lead to an increase in performance (it is necessary to check on real data).

N
nirvimel, 2016-01-10
@nirvimel

Whether to store them in the same folder

In no case!
The goal is to distribute the files so that there are no more than a thousand (maximum several thousand) files in any directory. That is, up to several million files - two levels of hierarchy are enough.
Why is that? - See answer How best to organize an electronic library?

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question