How to combine heterogeneous disks with data replication?

D

Dmitry Logvinenko2022-04-14 23:42:02

linux

Dmitry Logvinenko, 2022-04-14 23:42:02

Hello hive mind!

Tell the unreasonable and banned.

Suppose I have a certain number of commodity disks of varying degrees of wear and reliability, which I have collected over my long life (here is a travelstar for 500, there is a green for a terabyte), but, of course, it’s a pity for me to throw them away and I’ll put everything into action And I'll stick it in my home file storage.

Connecting everything and arranging it into an array using LVM is not difficult, but it seems not very reliable (backups of course, but I want everything to be “fast” too!).

Is there an option for them to cooperate with something like K-safety? That is, I specify a replication factor, say, a C grade, and each block is certainly present on at least three physical disks, and if one fails, we promptly and automatically rebalance.

(No, Vertica and HDFS didn't give me that idea.)

Well, the nail in the lid: what if you replicate not blocks, but entire files? So that even if almost everything died completely, then full-fledged data would remain on the survivors, and not separate clusters.

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

R

rPman, 2022-04-15
@rPman

The most convenient tool for solving this problem
is btrfs , it has native support for raid, there is no additional initialization overhead and allows you to perform operations with the file system on the fly . part is solved and this is more often not a problem with data loss, but a loss of free space, I had a raid5 storage for five terabytes for quite a long time, and even when the process of removing the disk from the array was interrupted (the space ran out), I calmly copied the data and recreated the array based on the already mdadm. to create a mirror on btrfs is relatively reliable.
On the topic of the question, in a similar situation, I first manually made a plan for placing data on disks (I had very different sizes from 350GB to 1TB, about 6 disks), then I partitioned larger disks into sections, some of which were exactly the size of small ones. disks, and combining where the entire disk is, where the partition, collected everything into several separate file systems. It is very important to save a disk map (it is convenient to draw in a document with pictures), which file system is which disk, and mark physical hard disks with symbols so that replacing failed ones is easier.
Now I don’t have such a zoo, I collect an array based on 3TB disks, but still I don’t add them entirely, but by dividing their capacity into 3 parts and making several file systems (1TB each so that you can add, for example, a 1TB disk or vice versa, add to a 4TB array without rebuilding the entire array), I also abandoned raid5 btrfs, I use mdadm, but this is more for reinsurance

A

Armenian Radio, 2022-04-15
@gbg

CEPH on one node (giggles)

and if you replicate not blocks, but entire files? So that even if almost everything died completely, then full-fledged data would remain on the survivors, and not separate clusters.

I can congratulate you on the invention of the good old backups.
In summary, if you ignore the irony, you will be helped by combining your trash into linear arrays of approximately equal size, followed by regular rsync between them.