How often do discs die? RAID1 is an output rescue or illusion?

Alexander Lozovoy2015-09-17 12:45:15

Backup

Alexander Lozovoy, 2015-09-17 12:45:15

On the way to me is a supermicro 5018D-mtf platform (1U server with 4 hot swap bins) + Xeon E3-1240v3 CPU + 32GB of memory. Already now there are two pairs of almost unused drives: 2 x WD30EFRX and 2 x WD1002FAEX. I plan to make two arrays RAID1 (mirror) of these screws.
On WD1002FAEX I plan vSphere 6 and 3 VMs with windows server 2012r2 (separately for dc, dns, dhcp, and antivirus server, separately for file trash, and separately for 1s with postgress) + 2 VMs with CentOS for asterisk and zabbix.
Now all this is on 5 old system blocks with Athlon II x3 455 and in fact the system engineers are idle - there is no load, only 50 users, 7 accountants.
On WD30EFRX, backups of file trash will be added (in total 400 gigabytes of trash, and dumps of 1s databases (7 databases from 100 to 450 megabytes).
In my memory, over the past 15 years, my hard drives have died twice: the first time in 2001, the IBM 20GB rattled six months after purchase, the second time in 2008, both screws from the same RAID1 array died with a difference of several hours. In the first case, it was saved by the fact that the screw was new and a copy of the documents remained on the old 800MB quantum, and in the second case, backups were made on the same screws that died (not my fault - I was sitting there on support, then I just changed the admin).
So, what is the essence of the question: how often modern expensive screws die in our time and is it an illusion that RAID1 is a dead screw, replaced it and everything works.

Answer the question

In order to leave comments, you need to log in

6 answer(s)

Artem @Jump, 2015-09-17
Tag

Screws always die, sometimes rarely, sometimes often, as you're lucky.
But the most important thing to understand is that RAID protects against interruptions in work, but does not protect against information loss.
That is, if you are afraid for information - make a backup, you don't need a raid.
Afraid that the server will have a break in work for several tens of minutes - put a raid.

Alejandro Esquire, 2015-09-17
@A1ejandro

Screws are such a thing, some work for years, some fail in batches.
According to the law of meanness, it is impossible to be sure which screw you will get.
IMHO all RAID's are just an illusion of security, which often reduces the overall performance of the system. Personally, if I had two disks, I would insert one into the server, and the second into the NAS, which is located somewhere else, also in a reliable place, and set up the backup correctly.
In total, daily backups certainly occupy the system, but this is at night. And during the day we have a higher performance of a stand-alone disk. That's all, in my opinion, the mechanics ...

Vlad Zhivotnev, 2015-09-17
@inkvizitor68sl

raid1 is a lifesaver if it's mdadm, and if it's monitored - disks are changed at the first read-write errors, disks are changed at night (during a synch, everything will really slow down), they periodically resync to check the surface of the disks.
There are many problems with hardware raids.
I would not say that there is 100% reliability (no one canceled backups), but it is close to 99.9+. To break raid1, you need to be either an idiot, or clumsy, or catch an extremely rare bug in the kernel and not be able to collect data from individual disks after that.

Denis Ineshin, 2015-09-17
@IonDen

They die quite rarely when it comes to small systems.
Here you can see some statistics:
habrahabr.ru/post/209894
habrahabr.ru/post/237887

Extor, 2015-09-17
@Extor

The lifetime of screws depends on their performance characteristics, and the intensity of their use. In your case, I believe that they can live quietly and up to 5 years or more. But it all depends on the load.
RAID1 is one of the solutions. which has its advantages and disadvantages. But definitely not an illusion, but a working solution. If the screw fails, the controller itself will say that it has failed and needs to be replaced. Replaced - waited for replication and everything works.
There is only one case in my history when another screw died during RAID5 recovery. But this is rather an exception to the rule.

Puma Thailand, 2015-09-17
@opium

In terms of your number of disks, it is 50 to 50 either they will die or they will not die.
Raid 1 is a means of increasing availability but not data storage, about reliability, these are backups.