Evidence of the dangers of software RAID?

G

Grigory Bondarenko2021-01-14 11:16:20

RAID

Grigory Bondarenko, 2021-01-14 11:16:20

The task is to install Microsoft Hyper-V Core 2012 R2 on an old server. Its RAID controller drivers are not compatible with this OS. There was an idea to set up software RAID using Windows tools (I hadn't done this before). In the process of studying, I came across an article where the author claims that:

There may be a situation in which both disks become inaccessible. For example, the first disk starts to actively crumble (bad sectors appear). The system freezes because cannot perform a read/write operation. After rebooting the server, disk resynchronization starts automatically, i.e. information from the first disk is overwritten on the second. If the resynchronization is interrupted, then the second disk will become a useless heap of metal, because. it does not contain a full system image. And resynchronization will definitely be interrupted due to bad sectors on the first disk. As a result, we will have two faulty disks on our hands ...

This scared me, and I began to look for information confirming / refuting this statement. And I couldn't find it. Can anyone come across articles on this topic? I'm also interested in the risks of using "fake RAID" by means of Intel RST.

Reply

Answer the question

In order to leave comments, you need to log in

6 answer(s)

R

rPman, 2021-01-14
@rPman

The security of a software raid is not much different from a hardware one, but a software raid allows you to create very flexible configurations, for example, a raid on top of iscsi disks located on different physical machines (and this is already an increase in reliability, since a failure of a machine, for example, due to a fire can be purely do not territorially affect other disks).
A software raid probably has only one problem with regards to reliability - the lack of a non-volatile write cache (but no one turns it on on win), but not all hardware ones have one either. The hardware one can also be equipped with its own read cache and its own access optimization algorithms, which increases the speed of the result - i.e. Ultimately, it's a matter of performance, not reliability.
But such things as vendorlock in hardware raids create an incredibly large one. headache and generate unnecessary spending of money (often very large)

X

xmoonlight, 2021-01-14
@xmoonlight

The system freezes because cannot perform a read/write operation. After rebooting the server, disk resynchronization is automatically started,

1.It is strange that the system freezes if it cannot perform any operation on the disk.
2. Strange that the software does not check the integrity of disks before resynchronization.
Extremely dubious lines from the author of the article ...

I began to look for information confirming / refuting this statement. And I couldn't find it.

not surprising...

N

nucleon, 2021-01-14
@nucleon

In general, this behavior was more typical for old raid controllers.
Personally faced with the fact that the first disk in the 1st raid is considered the main one. And accordingly, everything really collapsed if the first disk collapsed. It was solved by putting the second disk in the place of the first one in case of failure and replacing the disk, but there was a bug.
In general, now it depends more on the implementation of the raid, or rather its logic. The software raid may not actually load, since the raid in the system is enabled only when the driver is loaded, so the system must somehow start before it is enabled. Previously, it was solved by prescribing the bootloader to the second disk and booting from it.
Also, a soft raid cannot be bootable if it is not a mirror, i.e. vskie raid 0,5,6 and further - cannot start the system.
But the soft raid has another problem. namely, performance redundancy.
those. it cannot be installed on systems with a high load ...
Let me explain: a driver is essentially a program, with a high priority but a program, i. Let's say you've loaded your server with tasks that heavily exploit disk, memory, and CPU. and as a result, your driver will be forced out of the list of active tasks or memory - at this moment everything will stop for you.
Recovery can be another problem... I remember my software raid 5 out of 4 disks dropped in performance by 16 times when the disk was restored...

A

AntHTML, 2021-01-14
@anthtml

RAID is chosen for a specific purpose and there are a wagon and two carts:
First, RAID is needed only on highly loaded systems 0, 0X, X0 or on systems that require non-stop operation 1, 1X, X1, etc. In other cases, backups and synchronization are enough.
Secondly, it looks at the device / program: the hardware does not eat system resources, because all operations are performed on a separate controller with full logic, but in the case of a flight controller, you need to have a compatible analogue in stock, without it the array cannot be revived. The software processes everything on the resources of the system, respectively, it can be assembled on any system, as well as restored.
So it all depends on the specific situation. Critical services are usually not brought to the "old server", so it is quite possible that a regular backup +/- of a simple software array will be enough (if it is really needed there, and not just that "the server should be a priori with a raid")

V

Vladimir Korotenko, 2021-01-14
@firedragon

The main problem is that on a fake raid you hope that 10 is extremely reliable, and you don’t notice or just wait for everything to go wrong at some point. In the rest they are reliable, but as always people.