Why do mdadm RAID5 files fight after a period of time and not immediately?

K

Kenny002021-05-27 12:44:46

linux

Kenny00, 2021-05-27 12:44:46

On the previous question , I thought the problem was in the disk, mdadm. (before that I did a check of the array, no problems were found, he did not throw away the disk with troubles) I
took out the disk that was with bads, the array switched to clean, degraded mode. I think if you insert a working disk, the problem will not go away.

/dev/md127:
           Version : 1.2
     Creation Time : Mon Mar 16 21:27:21 2020
        Raid Level : raid5
        Array Size : 9743319040 (9291.95 GiB 9977.16 GB)
     Used Dev Size : 1948663808 (1858.39 GiB 1995.43 GB)
      Raid Devices : 6
     Total Devices : 5
       Persistence : Superblock is persistent

       Update Time : Thu May 27 12:25:01 2021
             State : clean, degraded
    Active Devices : 5
   Working Devices : 5
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : unknown

              Name : 33ea55f9:RAID-5-0  (local to host 33ea55f9)
              UUID : 04d214c4:ee331e6a:74ca0a04:5e846481
            Events : 468

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3
       5       8       83        5      active sync   /dev/sdf3

Next, I check, create a 1GB file exactly, check the md5 checksum, wait 10 minutes, check again, the file is broken ... the checksum does not match.

[email protected]:/RAID-5/srv1/# dd if=/dev/urandom of=Test.flie bs=64M count=32
dd: warning: partial read (33554431 bytes); suggest iflag=fullblock
0+32 records in
0+32 records out
1073741792 bytes (1.1 GB, 1.0 GiB) copied, 105.833 s, 10.1 MB/s

[email protected]:/RAID-5/srv1/# md5sum Test.flie
594eacb844ae053ab8bccadb9f3e43b4  Test.flie

[email protected]:/RAID-5/srv1/# md5sum Test.flie
522c8afffd428e14b425d31d8b5d7f52  Test.flie

btrfs check did not reveal any problems.
true cat /sys/block/md127/md/mismatch_cnt gives 132567704

Where to dig?

Reply

Answer the question

In order to leave comments, you need to log in

2 answer(s)

V

Vladimir, 2021-05-27
@MechanID

Are there any errors in dmesg? this can clarify the situation, if there are no errors, then
1 backup the data
2 do a check with error correction echo repair > /sys/block/mdX/md/sync_action
3 repeat the experiment with writing the file and checking the checksums
4 of the exotic oddities I know - if you centos and kernel version 3.10.0-1160.15.2 or newer - try to return to kernel 3.10.0-1160.11.1 and repeat steps 2 and 3

R

rPman, 2021-05-29
@rPman

Problems can be not only in the machine itself (for example, the memory is broken, you need to test it if possible) but also on the client device, what / where you copy over the network
mdadm should throw errors in dmesg or in the first console of the machine if there is damage, they will give more information.