V
V
VMConsult2015-05-25 13:29:07
linux
VMConsult, 2015-05-25 13:29:07

Storage of a large number of files: database or disk?

Welcome all!
There are ~2 million JPG files, they need to be stored well to be fast and convenient. Load in peaks (2000 Rand requests/sec.)
1. Store on disk in different md5 folders (linux, ext4)
2. Store in the database (key => value) (innodb, mysql)
How will it be faster and more reliable? Or is there no fundamental difference? I tend to cram everything into a database.

Answer the question

In order to leave comments, you need to log in

5 answer(s)
B
BloodySucker, 2015-05-25
@BloodySucker

DB is much slower.
I advise jfs in general, not ext4.
JFS gives much higher performance, and even more so on small files.

S
Sergey, 2015-05-25
@begemot_sun

If your files fit in memory, then it's better to sort it out at the server level. Let's say write a specialized server and give pictures on request. If the memory does not fit, then no matter what FS - it will be slow, you can put an SSD then.

M
Melkij, 2015-05-25
@melkij

If the files do not need to be searched, but only read along a known path, then the database will only be an extra layer of abstraction. And unless you install oracle, it will still be useless to duplicate the operating system cache.
Just smearing it into directories will be easier and quite enough.
And if you need to search for something by metadata, then usually the metadata is written to the database, and the files themselves are still on the disk, and not in the database.
From the not obvious: a couple of years ago I read that one of the big projects packs small files into large (several GB) binaries and saves the offset from the beginning and the length of the saved file separately - this worked much faster than a regular file system. More, unfortunately, I do not remember.

P
Puma Thailand, 2015-05-26
@opium

Assholes are those who stuff pictures into the database. Scatter them into a three-level hierarchy of folders and even on ext4 it will run smartly

K
Konstantin Samilko, 2015-05-28
@kuroneco

That's not a lot.
1. In the database, you can store the path to them if you want the possibility of sorting by some parameters and if you plan to give them through some kind of api or something like that.
2. Store on disk. But before that, put something that will cache (the most frequent pictures) the given nginx or squid material. By file system - take care of a large number of inodes and sorting files into folders, and not one bunch in one folder.
If you plan to grow images, look towards clustered fs so that there are no bottlenecks, but with your size, one disk is enough.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question