Fencing Nodes With Drac Under 5.9

Home » CentOS » Fencing Nodes With Drac Under 5.9
CentOS 4 Comments

Hi all,

A recent update to CentOS 5.9 has broken my cluster’s ability to fence nodes. I have two Dell’s which are both fenced via their DRAC6 cards. The current configuration in cluster.conf for the fencing devices is:

After the updates, when one of the two systems comes online and starts the cman service, it will startup fencing. At that point it contacts the other node and reboots it repeatedly, never allowing the system to come back online. If I disable the cman service and renable it after both systems come back online, when cman starts up, it reboots the other and the process starts over again.

Fencing with DRAC devices is not terribly well-documented, so I immagine there was something in the update that changed the way this worked. I
originally used system-config-cluster to created the configuration file, but that had to tweak it adding the fence_drac5 agent manually because the configuration tool didn’t support it. I also tried recreating the cluster with conga, via luci and ricci, but no success there either.

Is anyone out there doing clustering with Dells and DRAC6 cards under CentOS 5.9? Or under CentOS 6 for that matter… I’m willing to update if it fixes this.

thanks in advance,

…adam

4 thoughts on - Fencing Nodes With Drac Under 5.9

  • Without your cluster config, we can only guess. Fencing w/ two nodes requires specific startup config for this scenario. Given that, I presume you can find your issue, or post your conf.

    jlc

  • Turns out I spoke too soon.

    Increasing the post_join_delay did at least allow me to restart the cman+clvmd+gfs2+rgmanager services on each node, but if I reboot a node, it will not rejoin the cluster.

    If I start with both machines up, and cman stopped, I can start cman on one, then then the other and they’ll both join the cluster. After that, I
    start clvmd, gfs2 and rgmanager on one node then the other (all in that order) and the gfs2 partition is mount on both nodes.

    Now, if I reboot one of the other nodes, it will leave the cluster, but when it comes back online, it starts up cman, and hangs forever on fencing. After awhile, “dlm closing connection to node 0” and “dlm closing connection to node 1” appear in the console and the system finishes boot up. At that point, it is not in the cluster. I have to stop the cman service (in the reverse order: rgmanager, gfs2, clvmd, cman) on both nodes and then restart cman on both nodes, and proceed with the rest of the services.

    I should add, ricci is running on both nodes, but I’m not using luci and configured the setup with system-config-cluster.

    Anyway, I’d appreciate it if anyone could shed light on this. I’m stumped as to why this has changed in 5.9, but it could be just my ignorance of the changes that were made with this latest release.

    Many thanks,

    …adam

LEAVE A COMMENT