HA Cluster – Strange Communication Between Nodes

Home » CentOS » HA Cluster – Strange Communication Between Nodes

January 13, 2014 Martin Moravcik CentOS 8 Comments

Hi,

For a testing purposes I’m trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4
KVM guests.

I was able to create a cluster and some resources. I followed the document https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/index.html

But my cluster behaves not as expected:
After start of cluster sw on both nodes, they can see each other.
————————————–

8 thoughts on - HA Cluster – Strange Communication Between Nodes

Patrick Lists says:

January 13, 2014 at 8:17 am

Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.

Regards, Patrick
Martin Moravcik says:

January 13, 2014 at 8:39 am

I’m sorry. My systems are fully updated CentOS 6.5. I’m using only standard CentOS repositories.

martin
Giorgio Bersano says:

January 14, 2014 at 4:34 am

2014/1/13 Martin Moravcik :

Hy Martin, I’ve not looked carefully at what your problem is and don’t know how skilled in HA you are but I heartily suggest you – if you haven’t done before – to read/study Digimer’s tutorial https://alteeve.ca/w/AN!Cluster_Tutorial_2

I think it’s unbeatable!

Best regards, Giorgio
marlon guao says:

January 14, 2014 at 12:37 pm

Hi Martin.

if you could provide us your config like, put the output of the command below.

pcs configure show

or

crm configure show

maybe we could get a better idea of your setup.
Martin Moravcik says:

January 15, 2014 at 4:57 am

Thanks for your interest and for your help. Here is the output from command (pcs config show)

[root@lb1 ~]# pcs config show Cluster Name: LB.STK
Corosync Nodes:

Pacemaker Nodes:
lb1.asol.local lb2.asol.local

Resources:
Group: LB
Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip2.16.139.113 cidr_netmask$ nic=eth1
Operations: monitor intervals (LAN.VIP-monitor-interval-15s)
Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
Attributes: ip2.16.139.110 cidr_netmask$ nic=eth0
Operations: monitor intervals (WAN.VIP-monitor-interval-15s)
Resource: OPENVPN (class=lsb type=openvpn)
Operations: monitor interval s (OPENVPN-monitor-interval-20s)
start interval=0s timeout s (OPENVPN-start-timeout-20s)
stop interval=0s timeout s (OPENVPN-stop-timeout-20s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
Colocation Constraints:

Cluster Properties:
cluster-infrastructure: cman
dc-version: 1.1.10-14.el6_5.1-368c726
stonith-enabled: false

When I start cluster after reboot of both nodes, everythings looks fine. But when shoot command “pcs resource delete OPENVPN” from node lb1 in the log starts to popup these lines:
Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202
Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203
Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204
Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206
Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208
Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209

I also noticed, that these retransmit entries starts to appear even after some time (7 minutes) from fresh cluster start without doing any change or manipulation with cluster.

Thanks

martin
marlon guao says:

January 15, 2014 at 8:27 am

Hi Martin.

for how long you turned off the other node? I suspect that you need to configure time-outs to the cluster. Additional cluster parameters can be found here.

https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-
Leon Fauster says:

January 15, 2014 at 5:29 pm

Am 15.01.2014 um 11:56 schrieb Martin Moravcik :

there exists multicast issues on virtual nodes – therefore your bridged network will for sure not operate reliable out of the box for HA setups.

try

echo 1 > /sys/class/net/YOURDEVICE/bridge/multicast_querier
Dennis Jacobfeuerborn says:

January 16, 2014 at 12:25 am

For a two node cluster using unicast is probably easier and less error prone way.

Regards,
Dennis

HA Cluster – Strange Communication Between Nodes

8 thoughts on - HA Cluster – Strange Communication Between Nodes

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta