How to store millions of small files as compactly and efficiently as possible?

A

andyN2013-12-07 06:37:54

big data

andyN, 2013-12-07 06:37:54

There are about 20 million small files (10-100 kb on average, although there are "peaks" of about 500 kb, but they are rare). They are not overwritten, created-destroyed everything. We will assume that this is all read-only. Reliability and scalability are not needed. Initially, they planned to simply break the entire structure into subdirectories and store everything on a server with an SSD, giving the file if necessary (the loads are also small). But it turned out that the overhead turns out to be too large: the files spread into blocks of the file system. Ideally, it would be great to store everything in one file, with some kind of light compression, in memory - the indices (addresses) of the necessary "files" in this single block, and already refer to them if necessary. The technical implementation is clear and not very complicated, but I would not like to reinvent the wheel, but use some ready-made simple solution,

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

J

Jedi_PHP, 2013-12-07
@Jedi_PHP

The minimum overhead per file is here, as far as I know:
code.google.com/p/weed-fs
NO POSIX (HTTP-only) support
There are also several alternatives in the project description, with pluses and minuses.

S

Sergey, 2013-12-07
@begemot_sun

try squashfs. The downside is that you will need to periodically re-create it to update the data, because. this fs is Read-Only. Plus - compression.
Try storing data in a database. For example, MySQL has an Archive table type - a search is performed by 1 primary key, in turn, the data is also compressed, and also a read-only table, records can be added, but not changed.
You can organize multiple volumes/tables and update them as needed by moving data from one table/volume to another.