G
G
GR212019-06-15 10:07:46
linux
GR21, 2019-06-15 10:07:46

Corosync loaded the CPU at 100% when one node fell off. How to fix?

Good day!
My first experience with Corosync+Pacemaker. Before that, only heartbeat with manual installation.
I installed the pacemaker/corosync link at https://habr.com/ru/company/postgrespro/blog/359230/ without PostgreSQL though.
So, a bunch of CentOS 7 x 4 servers in different DCs. Between them OpenVPN and network 172.16.172.0/24.
In normal mode, there are no problems, there is no increased load. If you run a server reboot, then VirtualIP switches perfectly. Of the resources, only VirtualIP and a transparent proxy are used:

# pcs status
Cluster name: hacluster
Stack: corosync
Current DC: node2 (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum
Last updated: Sat Jun 15 14:00:36 2019
Last change: Sat Jun 15 02:25:39 2019 by hacluster via crmd on platinum

4 nodes configured
1 resource configured

Online: [ node1 node2 node3 master ]

Full list of resources:

 virtualIP      (ocf::heartbeat:IPaddr2):       Started node1

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled

Yesterday, due to problems with the network in one of the DCs, one node fell out. So the load on the CPU from corosync immediately on each node soared to 100%, and the cluster could not be restored without one fallen out node. The cluster rose only after 4 hours later the previously inaccessible node returned to the network.
Removing the node did not help:
pcs cluster localnode remove node1
Tell me, maybe I missed something? Does something need to be tweaked?
Sincerely,
Alexey.

Answer the question

In order to leave comments, you need to log in

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question