Need help with gluster, problems with XFS file system, how to set volume correctly?

A

Alexander Mikhailov2020-01-19 12:09:26

LVM

Alexander Mikhailov, 2020-01-19 12:09:26

Good afternoon.
I have been struggling with strange problems with gluster for several days. I ask for the help of experts in this matter.
We have 3 servers, each with hardware RAID with 8 disks combined in RAID 10. The arrays are divided into 3 partitions, boot, OS partition and data partition (sda - sda1 (boot) sda2 (lvm) sda3). Servers on CentOS Linux release 7.7.1908 (Core) are used as virtualization hosts under OVIRT control in the HostedEngine configuration. Ovirt version 4.3.7.2-1.el7.
Before that, there were 2 servers, so NFS was used to store data disks of virtual machines (including those configured on the servers themselves. After I got a third server, I wanted to implement storage of machines on GLUSTER Volume, as in the hyperconverged configuration. For this, the network itself was redone (separate links for glusterfs, etc. are selected according to the recommendations of the manual
After that, with the help of ansible, a volume (volume) of three bricks was created from under the cockpit of the first host. The only thing that was embarrassing at that moment was the choice of possible RAID options, there are only 3 options - RAID5, RAID6 and JBOD. If I understand correctly, this information is used to set up the correct dataalignment (I don’t know how to say it in Russian) of the LVM physical volume, etc. I think these are more performance issues, and maybe there is an answer to my question. There is no choice of RAID10, left RAID6 with a block size of 256.
After creating the volume, I created a new data domain and moved the virtual machine disks there. Briks were filled with data, everything was fine for the first 2-3 days. Then one of the servers had to be restarted, there were some discrepancies in self heal, not everything was synchronized, but it seemed that this was normal and the healing would pass with time. It turned out that no, 2 some files for some strange reason on different nodes had different xattrs attributes. Attempts to figure it out did not lead to anything, it was decided to transfer all the data from the volume, delete the volume and bricks, and create a new volume. Which is what was done.
I don’t know if this led to interesting consequences, or if they would have started anyway, but we have a situation that for 2 days, first, on one server on the XFS partition with the data of the brick, file system errors of the form fell down:
kernel: XFS (dm-6): Failing async write on buffer block 0x3a0673f8. Retrying async write. and
kernel: XFS (dm-6): metadata I/O error: block 0x3a0673f8 ("xfs_buf_iodone_callback_error") e
rror 28 numblks 8
After that, an attempt was made to run xfs_regair -L Errors were fixed, but when starting the brick and resetting it, they fell sharply again. Several attempts to restore the FS on the partition led to the same result, at first everything was fine, after a few minutes - errors.
After deleting the volume and all partition data associated with it, etc. Another attempt was made to create a volume. The volume was created without errors, started, etc. After transferring data to it, after a while (several hours) the same problems again, but already on ANOTHER server! On the one that had errors before, no problems!
What to do? Where to dig? Please tell me, I spent three days with this trouble, do I really have to give up Gluster? It's a shame, I wanted to do something like hyperconverged, and I also planned to transfer HE there, but I didn’t have time.
The version with array hardware problems can be discarded, the root partition is on the same array, and NFS data was stored right there before, no problems have ever been observed. The servers do not report any disk errors. Thanks in advance for any insight.
Host software versions:
OS Version: RHEL - 7 - 7.1908.0.el7.centos
OS Description: CentOS Linux 7 (Core)
Kernel Version: 3.10.0 - 1062.9.1.el7.x86_64
KVM Version: 2.12.0 - 33.1 .el7_7.4
LIBVIRT Version: libvirt-4.5.0-23.el7_7.3
VDSM Version: vdsm-4.30.24-1.el7
SPICE Version: 0.14.0 - 7.el7
GlusterFS Version: glusterfs-6.6-1.el7
CEPH Version: librbd1-10.2.5-4.el7
Open vSwitch Version: openvswitch-2.11.0-4.el7

Reply

Answer the question

In order to leave comments, you need to log in

1 answer(s)

A

Alexandr Mikhailov, 2020-01-21
@sasa_mi

In general, the problem with errors on the XFS file system, as I wrote in the comments, is related to LVM thin provisioning. In addition, it seems that also with the behavior of gluster. The fact is that when writing to a volume, a gradual increase in the size of the thin provisioning volume occurs. For some strange reason, % usage across the three hosts is not growing evenly. In my case, on one of the hosts, it grew one and a half times faster than on the other two. After deleting data from the volume, this% naturally does not decrease, you need to manually run fstrim.
So, after some time of work, this volume allegedly grows to 100% (although there may be much less data on it), and translates the FS into a kind of read-only. In short, the data on the FS will be lost, gluster starts to feel sad, and everything becomes very bad. But apparently this is the "normal" behavior of LVM thin.
I solved this problem by manually creating bricks without using thin lvm volumes. Now everything is fine with my FS, but strange problems with gluster remain (it cannot be cured after one of the bricks is offline), but this is another question, I'll issue it as a new one, maybe the gurus will tell me.