HA Cluster – Strange Communication Between Nodes

Home » CentOS » HA Cluster – Strange Communication Between Nodes
CentOS 8 Comments

Hi,

For a testing purposes I’m trying to create two node HA environment for running some service (openvpn and haproxy). I installed two CentOS 6.4
KVM guests.

I was able to create a cluster and some resources. I followed the document https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Configuring_the_Red_Hat_High_Availability_Add-On_with_Pacemaker/index.html

But my cluster behaves not as expected:
After start of cluster sw on both nodes, they can see each other.
————————————–

8 thoughts on - HA Cluster – Strange Communication Between Nodes

  • Iirc CentOS 6.5 came with several updates to cluster related packages so you may want to investigate and update to 6.5.

    Regards, Patrick

  • I’m sorry. My systems are fully updated CentOS 6.5. I’m using only standard CentOS repositories.

    martin

  • 2014/1/13 Martin Moravcik :

    Hy Martin, I’ve not looked carefully at what your problem is and don’t know how skilled in HA you are but I heartily suggest you – if you haven’t done before – to read/study Digimer’s tutorial https://alteeve.ca/w/AN!Cluster_Tutorial_2

    I think it’s unbeatable!

    Best regards, Giorgio

  • Hi Martin.

    if you could provide us your config like, put the output of the command below.

    pcs configure show

    or

    crm configure show

    maybe we could get a better idea of your setup.

  • Thanks for your interest and for your help. Here is the output from command (pcs config show)

    [root@lb1 ~]# pcs config show Cluster Name: LB.STK
    Corosync Nodes:

    Pacemaker Nodes:
    lb1.asol.local lb2.asol.local

    Resources:
    Group: LB
    Resource: LAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: ip2.16.139.113 cidr_netmask$ nic=eth1
    Operations: monitor intervals (LAN.VIP-monitor-interval-15s)
    Resource: WAN.VIP (class=ocf provider=heartbeat type=IPaddr2)
    Attributes: ip2.16.139.110 cidr_netmask$ nic=eth0
    Operations: monitor intervals (WAN.VIP-monitor-interval-15s)
    Resource: OPENVPN (class=lsb type=openvpn)
    Operations: monitor interval s (OPENVPN-monitor-interval-20s)
    start interval=0s timeout s (OPENVPN-start-timeout-20s)
    stop interval=0s timeout s (OPENVPN-stop-timeout-20s)

    Stonith Devices:
    Fencing Levels:

    Location Constraints:
    Ordering Constraints:
    Colocation Constraints:

    Cluster Properties:
    cluster-infrastructure: cman
    dc-version: 1.1.10-14.el6_5.1-368c726
    stonith-enabled: false

    When I start cluster after reboot of both nodes, everythings looks fine. But when shoot command “pcs resource delete OPENVPN” from node lb1 in the log starts to popup these lines:
    Jan 15 13:56:37 corosync [TOTEM ] Retransmit List: 202
    Jan 15 13:57:08 corosync [TOTEM ] Retransmit List: 202 203
    Jan 15 13:57:38 corosync [TOTEM ] Retransmit List: 202 203 204
    Jan 15 13:58:08 corosync [TOTEM ] Retransmit List: 202 203 204 206
    Jan 15 13:58:38 corosync [TOTEM ] Retransmit List: 202 203 204 206 208
    Jan 15 13:59:08 corosync [TOTEM ] Retransmit List: 202 203 204 206 208 209

    I also noticed, that these retransmit entries starts to appear even after some time (7 minutes) from fresh cluster start without doing any change or manipulation with cluster.

    Thanks

    martin

  • Am 15.01.2014 um 11:56 schrieb Martin Moravcik :

    there exists multicast issues on virtual nodes – therefore your bridged network will for sure not operate reliable out of the box for HA setups.

    try

    echo 1 > /sys/class/net/YOURDEVICE/bridge/multicast_querier

  • For a two node cluster using unicast is probably easier and less error prone way.

    Regards,
    Dennis