Answer the question
In order to leave comments, you need to log in
How to store millions of small files as compactly and efficiently as possible?
There are about 20 million small files (10-100 kb on average, although there are "peaks" of about 500 kb, but they are rare). They are not overwritten, created-destroyed everything. We will assume that this is all read-only. Reliability and scalability are not needed. Initially, they planned to simply break the entire structure into subdirectories and store everything on a server with an SSD, giving the file if necessary (the loads are also small). But it turned out that the overhead turns out to be too large: the files spread into blocks of the file system. Ideally, it would be great to store everything in one file, with some kind of light compression, in memory - the indices (addresses) of the necessary "files" in this single block, and already refer to them if necessary. The technical implementation is clear and not very complicated, but I would not like to reinvent the wheel, but use some ready-made simple solution,
Answer the question
In order to leave comments, you need to log in
The minimum overhead per file is here, as far as I know:
code.google.com/p/weed-fs
NO POSIX (HTTP-only) support
There are also several alternatives in the project description, with pluses and minuses.
try squashfs. The downside is that you will need to periodically re-create it to update the data, because. this fs is Read-Only. Plus - compression.
Try storing data in a database. For example, MySQL has an Archive table type - a search is performed by 1 primary key, in turn, the data is also compressed, and also a read-only table, records can be added, but not changed.
You can organize multiple volumes/tables and update them as needed by moving data from one table/volume to another.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question