K
K
Kenny002021-05-27 12:44:46
linux
Kenny00, 2021-05-27 12:44:46

Why do mdadm RAID5 files fight after a period of time and not immediately?

On the previous question , I thought the problem was in the disk, mdadm. (before that I did a check of the array, no problems were found, he did not throw away the disk with troubles) I
took out the disk that was with bads, the array switched to clean, degraded mode. I think if you insert a working disk, the problem will not go away.

/dev/md127:
           Version : 1.2
     Creation Time : Mon Mar 16 21:27:21 2020
        Raid Level : raid5
        Array Size : 9743319040 (9291.95 GiB 9977.16 GB)
     Used Dev Size : 1948663808 (1858.39 GiB 1995.43 GB)
      Raid Devices : 6
     Total Devices : 5
       Persistence : Superblock is persistent

       Update Time : Thu May 27 12:25:01 2021
             State : clean, degraded
    Active Devices : 5
   Working Devices : 5
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 64K

Consistency Policy : unknown

              Name : 33ea55f9:RAID-5-0  (local to host 33ea55f9)
              UUID : 04d214c4:ee331e6a:74ca0a04:5e846481
            Events : 468

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       1       8       19        1      active sync   /dev/sdb3
       2       8       35        2      active sync   /dev/sdc3
       3       8       51        3      active sync   /dev/sdd3
       4       8       67        4      active sync   /dev/sde3
       5       8       83        5      active sync   /dev/sdf3


Next, I check, create a 1GB file exactly, check the md5 checksum, wait 10 minutes, check again, the file is broken ... the checksum does not match.

[email protected]:/RAID-5/srv1/# dd if=/dev/urandom of=Test.flie bs=64M count=32
dd: warning: partial read (33554431 bytes); suggest iflag=fullblock
0+32 records in
0+32 records out
1073741792 bytes (1.1 GB, 1.0 GiB) copied, 105.833 s, 10.1 MB/s

[email protected]:/RAID-5/srv1/# md5sum Test.flie
594eacb844ae053ab8bccadb9f3e43b4  Test.flie

[email protected]:/RAID-5/srv1/# md5sum Test.flie
522c8afffd428e14b425d31d8b5d7f52  Test.flie


btrfs check did not reveal any problems.
true cat /sys/block/md127/md/mismatch_cnt gives 132567704

Where to dig?

Answer the question

In order to leave comments, you need to log in

2 answer(s)
V
Vladimir, 2021-05-27
@MechanID

Are there any errors in dmesg? this can clarify the situation, if there are no errors, then
1 backup the data
2 do a check with error correction echo repair > /sys/block/mdX/md/sync_action
3 repeat the experiment with writing the file and checking the checksums
4 of the exotic oddities I know - if you centos and kernel version 3.10.0-1160.15.2 or newer - try to return to kernel 3.10.0-1160.11.1 and repeat steps 2 and 3

R
rPman, 2021-05-29
@rPman

Problems can be not only in the machine itself (for example, the memory is broken, you need to test it if possible) but also on the client device, what / where you copy over the network
mdadm should throw errors in dmesg or in the first console of the machine if there is damage, they will give more information.

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question