L
L
Lsh2021-12-29 12:17:38
linux
Lsh, 2021-12-29 12:17:38

What is the best way to store 13 TiB of data (mdadm / lvm / zfs / btrfs)?

Good afternoon, Khabrochan!

Tell me how best to organize reliable storage of a large amount of data, with a bunch of files. The data is only about 13 TiB, maybe a little more.

They can be divided into two groups:
* 8 TiB, approximately 450,000 files, ranging in size from 10 MiB to 25 MiB on average. In 90% of cases they will only be read, in 8% they will be added, in 2% they will be deleted / overwritten. The data is already compressed, it is unlikely that it will be better to compress.
* Approximately 5 TiB and about 5,300,000 files of very different sizes, several hundred thousand very small ones, 5 KiB each, some large ones, several gibibytes each. This group will be actively updated, overwritten, deleted. Here, theoretically, the data is compressible, but I'm not sure if this makes sense.

For this, 5 TOSHIBA HDWG160, 6 TB (5.4 TiB) disks are in store.

I planned to assemble something like software RAID 6 from them, i.e. the useful volume will be equal to the volume of only three disks, 18 TB.

What is better to use for this? Build using mdadm / lvm and place ext4 on top or use fashionable ZFS / BTRFS? What is more reliable? What is expandable? Theoretically, there may come a time when the volume will no longer be enough. Will I be able to add the same disk to the array without losing reliability? Which option will be easier / faster to recover if one disk is covered? And two?

At the moment, no special features are required, snapshots, probably, too. Maybe "versioning" the states of some piece of data, for which a script is enough, which will create directories with a date in the name and hardlinks for files between them. But, if this happens by means of the FS, then it's good.

PS: 32 GiB memory, Error Correction Type: Multi-bit ECC, which should be quite good for ZFS.
PPS: Can you recommend any tutorial?

Answer the question

In order to leave comments, you need to log in

5 answer(s)
S
Saboteur, 2021-12-29
@saboteur_kiev

It seems to me that zfs would be better suited here than a bunch of mdadm + lvm,
as if all the functionality of zfs allows you to do it yourself, and if necessary, adding a disk or replacing it will be easier.

A
AlexVWill, 2021-12-29
@AlexVWill

If the data is not required to be instantly accessible, and will be stored as an archive, and there is a budget, then I would think about a tape drive, aka a streamer.

D
Drno, 2021-12-29
@Drno

If snapshots, extension, change on the fly - ZFS
If just store - bare etx4
If, again, change (expansion) of the file system - LVM
It's easier to restore with etx4 bare (my personal opinion)

R
res2001, 2021-12-29
@res2001

What you know better/had experience: llvm or zfs/btrfs.
You can use something like TrueNAS and don't bother with the choice.
Perhaps it is worth somehow dividing the files into groups, and not keeping them in a common file dump, and decomposing the groups into different raid volumes created on disks independent of other groups ...
It is also worth thinking about backing up all this stuff.

O
Oleg Volkov, 2021-01-01
@voleg4u

Btrfs is better to forget if you want to store data, not lose it. Mdadm is not bad and reliable, but there are nuances with rebuilding. You can do it with LVM, described in detail HERE . But actually I recommend storing the date in ZFS. The file system itself protects against a physical drop of the disk. I have snapshots configured as protection against accidental deletion. And an incremental replica was made to a ZFS server in a different location in case it happened here. Some sections are encrypted in case of theft. Described HERE , although when I wrote this, I didn’t turn on encryption yet, so it wasn’t written about.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question