Question About Clustering

Home » CentOS » Question About Clustering

June 15, 2014 Alessandro Baggi CentOS 16 Comments

Hi list, I’m new to clustering, and I’m running a little cluster@home. The cluster is running on a workstation hardware and running on CentOS 6.5. Component: corosync, pacemaker, drbd and pcs. All works good. This cluster has different resources:

1) drbd0
2) drbd1
3) drbd0_fs
4) drbd1_fs
5) pgsql
6) smb + nmb
7) libvirt (lbs)
8) libvirt_guests (lsb)

I’ve this constraint colocation and ordering constraint (sorry for format):

Ordering Constraints:
promote drbd_ms then start drbd0_fs (Mandatory)
(id:order-drbd_ms-drbd0_fs-mandatory)
promote drbd1_ms then start drbd1_fs (Mandatory)
(id:order-drbd1_ms-drbd1_fs-mandatory)
start drbd1_fs then start pgsql_res (Mandatory)
(id:order-drbd1_fs-pgsql_res-mandatory)
start drbd0_fs then start samba_res (Mandatory)
(id:order-drbd0_fs-samba_res-mandatory)
start samba_res then start nmbd_res (Mandatory)
(id:order-samba_res-nmbd_res-mandatory)
start drbd1_fs then start libvirt_res (Mandatory)
(id:order-drbd1_fs-libvirt_res-mandatory)
start libvirt_res then start libvirt_guest_res (Mandatory)
(id:order-libvirt_res-libvirt_guest_res-mandatory)
Colocation Constraints:
drbd0_fs with drbd_ms (INFINITY) (with-rsc-role:Master)
(id:colocation-drbd0_fs-drbd_ms-INFINITY)
drbd1_fs with drbd1_ms (INFINITY) (with-rsc-role:Master)
(id:colocation-drbd1_fs-drbd1_ms-INFINITY)
drbd1_fs with pgsql_res (INFINITY)
(id:colocation-drbd1_fs-pgsql_res-INFINITY)
drbd0_fs with samba_res (INFINITY)
(id:colocation-drbd0_fs-samba_res-INFINITY)
samba_res with nmbd_res (INFINITY)
(id:colocation-samba_res-nmbd_res-INFINITY)
drbd1_fs with libvirt_res (INFINITY)
(id:colocation-drbd1_fs-libvirt_res-INFINITY)
libvirt_res with libvirt_guest_res (INFINITY)
(id:colocation-libvirt_res-libvirt_guest_res-INFINITY)

Today, starting my “cluster”, I’ve encontered a (obscure-not identified) problem during libvirt_guest_res start, and I’ve noticed that drbd1_fs was stopped because libvirt_guest_res has failed, and consequently all other resource that depend from drbd1_fs was stopped.

I suppose that this behaviour is colocation constraint related (correct me if I’m wrong).

In my case I’ve:

drbd1_fs with libvirt_res libvirt_res with libvirt_guest_res drbd1_fs with pgsql

There is a way to avoid that if libvirt_guest_res fails, drbd1_fs will not be stopped and then all other resources depending from drbd1_fs?

Another question is about fencing. I’ve ridden that a cluster must have fencing to be considered as such. On CentOS 6.5 there is stonith that concerns node level fencing. For this type of fencing I must have ilo, ilom, drac, and other. It’s possible to have fencing without Light-out devices, blade power control device and other? There are other device that can be used for fencing? Supposing a 2 node cluster with two server assembled (no hp, dell, sun, ibm..) how I can implement fencing with stonith? I can run a cluster without fencing and what implies do not use fencing?

Thanks in advance.

Alessandro.

16 thoughts on - Question About Clustering

Digimer says:

June 15, 2014 at 10:29 am

A lot of odd problems go away once fencing is working. So this is a good time to sort this out, then go back and see if your problems remain.

A very good way to fence machines without IPMI (etc) is to use an external switched PDU, like the APC AP7900 (or your country version on).

http://www.apc.com/resource/include/techspec_index.cfm?base_sku=ap7900

If your budget is tight, I have seen these models frequently go on sale for ~$200 (Canadian).

These work be allowing another node to log into the PDU and turn off the power going to the lost/failed node. Please do note that the brand of switched PDU you buy matters. For a device to work for fencing, it must be possible for the cluster to talk to it. This is done using a “fence handler”, and there are many types that are supported (APC, Eaton, etc). So if you want to get a different make/model, please first make sure there is a fence handler.

Once you get this setup, see if your problems remain. If so, there is a good clustering mailing list at:

https://www.redhat.com/mailman/listinfo/linux-cluster

And if you’re on freenode, #linux-cluster is also very good.

Cheers!
Alessandro Baggi says:

June 16, 2014 at 4:39 am

Hi Digimer, there is a chance to make fencing without hardware, but only software?

Il 15/06/2014 17:28, Digimer ha scritto:
Digimer says:

June 16, 2014 at 9:08 am

No. For fencing to be worthwhile, it *must* work when the node is in any state. For this, it must be independent of node. A great way to see why is to test crashing the node (echo c > /proc/sysrq-trigger) or simply cutting the power to the node. With the node totally disabled, the surviving peer will fail to fence and, not being allowed to make assumptions, block.
John R says:

June 16, 2014 at 12:36 pm

the most common fence in TCP connected systems is to disable the ethernet ports of the fenced system, done via a ‘smart ethernet switch’. if you’re using shared fiber storage, then you fence via the fiber switch. a more extreme fence is to switch off the power to the fenced system via a ‘smart PDU’.
Digimer says:

June 16, 2014 at 12:55 pm

Disconnecting a lost node from the network is a process called “Fabric Fencing”. It is not the most common, at least not these days. In the past, fencing was done primarily at the fabric switch to disconnect the node from shared storage, however this isn’t as common anymore. These days, as you say, you can do something similar by turning down managed switch ports.

The main downside to fabric fencing is that the failed node will have no chance of recovering without human intervention. Further, it places the onus on the admin to not simply unfence the node without first doing proper cleanup/recovery. For these reasons, I always recommend power fencing (IPMI, PDUs, etc).

digimer
John R says:

June 16, 2014 at 1:19 pm

how does power fencing change your first 2 statements in any fashion ?
as I see it, it would make manual recovery even harder, as you couldn’t even power up the failed system without first disconnecting it from the network

When I have used network fencing, I’ve left the admin ports live, that way, the operator can access the system console to find out WHY it is fubar, and put it in a proper state for recovery. of course, this implies you have several LAN connections, which is always a good idea for a clustered system anyways.
Digimer says:

June 16, 2014 at 1:40 pm

Most power fencing methods are set to “reboot”, which is “off -> verify
-> try to boot”, with the “try to boot” part not effecting success of the overall fence call. In my experience (dozens of clusters going back to 2009), this has always left the nodes booted, save for cases where the node itself had totally failed. I also do not start the cluster on boot in most cases, so the node is there and waiting for an admin to login, in a clean state (no concept of cluster state in memory, thanks to the reboot).

If you’re curious, this is how I build my clusters. It also goes into length on the fencing topology and rationale:

https://alteeve.ca/w/AN!Cluster_Tutorial_2

Cheers
says:

June 16, 2014 at 1:55 pm

Digimer wrote:

One can also set the cluster nodes to failover, and when the failed node comes up, to *not* try to take back the services, leaving it in a state for you to fix it.

mark, first work on h/a clusters 1997-2001
Digimer says:

June 16, 2014 at 2:20 pm

Failover and recovery are secondary to fencing. The surviving node(s)
can’t begin recovery until the lost node is in a known state. To make an assumption about the node’s state (by, for example, assuming that no access to the node is sufficient to determine it is off) is to risk a split-brain. Even something as relatively “minor” as a floating IP can potentially cause problems with ARP, for example.

Cheers
Denniston, Todd says:

June 17, 2014 at 9:25 am

Having operated a file serving cluster for a few years (~2001-2006) without ANY fencing device, I can tell you that it causes split-brain in the admins too, i.e., I AGREE. Earlier, Alessandro Baggi wrote:
To which Digimer, answered: No.

However, there is an *Almost* software only fence. Unfortunately for me I learned about (or at least understood) the stonith devices late in the above system’s life. I expect even meatware stonith[1] could have saved me considerable pain five or six times. Understand that I am not recommending meatware stonith to be a good operational stonith device, see [2] for how much subtle understanding the meat has to have, but it would be much better than NO operational stonith device.

[1] http://clusterlabs.org/doc/crm_fencing.html#_meatware
[2] http://oss.clusterlabs.org/pipermail/pacemaker/2011-June/010693.html

Even when this disclaimer is not here:
I am not a contracting officer. I do not have authority to make or modify the terms of any contract.
Digimer says:

June 17, 2014 at 9:33 am

To which I can use the analogy that in the 18 years I’ve driven a car, I’ve never needed my seat belt or airbags. I still put my seatbelt on every time I go anywhere though, and I won’t buy a car without airbags. ;)

If you goal is high-availability, there is a strong argument that
“almost” isn’t enough.

Manual fencing was dropped as a supported fence method in RHEL 6 because it was too prone to human mistakes. When an HA cluster is hung and an admin who might not have touched the cluster in months has users and managers yelling at them, mistakes with potentially massive consequences happen.

Manual fencing is just not safe.

Bingo on the meat, disagree on “no stonith” at all. A cluster must have fencing.

Cheers
Alessandro Baggi says:

June 18, 2014 at 11:32 am

Il 17/06/2014 16:32, Digimer ha scritto:

Ok, fencing is a requirement for a cluster for hardware failure. I’ve another question about this arg, but for software failure. Supposing to have a cluster of httpd installation on 6 virtualized hosts, each one on a different server. Suppose also that a guest (named host6) has a problem and can’t start apache. With this scenario, the ipmi, ups are unnecessary. How to work fencing in this way? How to make fencing node?

Thanks in advance.

Alessandro.
John R says:

June 18, 2014 at 12:24 pm

thats not a high availability cluster, thats just 6 different servers.

I totally don’t understand your statement that IPMI and UPS are unnecessary, unless you mean for external reasons, such as none of this being important.
Digimer says:

June 18, 2014 at 1:14 pm

I’m not sure I understand properly… You mean that you have 6 VMs which are nodes in a cluster, or 6 nodes, each hosting a VM you want to make HA?

If the VMs are themselves nodes, then you could use fence_virt as the primary fence method, which would power off/on the VM itself by talking to the hypervisor on the host. If that fails (say because the host itself fails), the use IPMI or PDU based fencing as a backup, in which case the host itself would be powered off/on (thus ensuring the VM on the host is down as well).

If the hardware servers themselves are the nodes, then the VMs are not a factor in the equation. Services, including VMs, will recover on another host after the fence has succeeded.
says:

June 18, 2014 at 1:46 pm

Digimer wrote:

I’m not clear on what you mean, either. Is this supposed to be a load-balancing cluster? If so, and you expect that kind of load, I, personally, would *never* put a VM on them – I’d want the full resources of the o/s brought to bear on that load, and use multiple real hardware for the other members. Doing this with VMs is only multi-threading the load, and adding more, with all the context switches.

mark
SilverTip257 says:

June 18, 2014 at 7:12 pm

My interpretation mirrors Mark’s … sounds like you want to load balance between the six servers. If so, you might create a proxy with haproxy [0] [1] that will relegate connections to each of your six nodes.

[0] http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
[1] http://www.rackspace.com/knowledge_center/article/setting-up-haproxy

Makes sense in the terms of hardware redundancy, but if it’s OS level redundancy then this could fly. _BUT_ it isn’t as ideal as having separate hardware. If Alessandro’s VMs are clustered between a pair of nodes with shared/replicated storage, then I’d say it is a realistic scenario. Layered redundancy for the win! (akin to layered security)

Rackspace has haproxy as part of their cloud offerings [2]. Albeit likely resources not on the same single piece of hardware!

Question About Clustering

16 thoughts on - Question About Clustering

Recommended

Recent Posts

Recent Comments

Archives

Categories

Meta