HA Cluster – Strange Communication Between Nodes
Hi,
For a testing purposes I’m trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4
KVM guests.
I was able to create a cluster and some resources. I followed the document https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/index.html
But my cluster behaves not as expected:
After start of cluster sw on both nodes, they can see each other.
————————————–
8 thoughts on - HA Cluster – Strange Communication Between Nodes
Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.
Regards, Patrick
I’m sorry. My systems are fully updated CentOS 6.5. I’m using only standard CentOS repositories.
martin
2014/1/13 Martin Moravcik:
Hy Martin, I’ve not looked carefully at what your problem is and don’t know how skilled in HA you are but I heartily suggest you – if you haven’t done before – to read/study Digimer’s tutorial https://alteeve.ca/w/AN!Cluster_Tutorial_2
I think it’s unbeatable!
Best regards, Giorgio
Hi Martin.
if you could provide us your config like, put the output of the command below.
pcs configure show
or
crm configure show
maybe we could get a better idea of your setup.
Thanks for your interest and for your help. Here is the output from command (pcs config show)
[root@lb1 ~]# pcs config show Cluster Name: LB.STK
Corosync Nodes:
Pacemaker Nodes:
lb1.asol.local lb2.asol.local
Resources:
Group: LB
Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip2.16.139.113 cidr_netmask$ nic=eth1
Operations: monitor intervals (LAN.VIP-monitor-interval-15s)
Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip2.16.139.110 cidr_netmask$ nic=eth0
Operations: monitor intervals (WAN.VIP-monitor-interval-15s)
Resource: OPENVPN (class=lsb type=openvpn)
Operations: monitor interval s (OPENVPN-monitor-interval-20s)
start interval=0s timeout s (OPENVPN-start-timeout-20s)
stop interval=0s timeout s (OPENVPN-stop-timeout-20s)
Stonith Devices:
Fencing Levels:
Location Constraints:
Ordering Constraints:
Colocation Constraints:
Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.10-14.el6_5.1-368c726
stonith-enabled: false
When I start cluster after reboot of both nodes, everythings looks fine. But when shoot command “pcs resource delete OPENVPN” from node lb1 in the log starts to popup these lines:
Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202
Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203
Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204
Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206
Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208
Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209
I also noticed, that these retransmit entries starts to appear even after some time (7 minutes) from fresh cluster start without doing any change or manipulation with cluster.
Thanks
martin
Hi Martin.
for how long you turned off the other node? I suspect that you need to configure time-outs to the cluster. Additional cluster parameters can be found here.
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-
Am 15.01.2014 um 11:56 schrieb Martin Moravcik:
there exists multicast issues on virtual nodes – therefore your bridged network will for sure not operate reliable out of the box for HA setups.
try
echo 1 > /sys/class/net/YOURDEVICE/bridge/multicast_querier
For a two node cluster using unicast is probably easier and less error prone way.
Regards,
Dennis