Fencing

Fencing is a process of restricting the running of resources unless there is a consistent view of the cluster state. It also plays a part in trying to return the cluster to a known state by controlling the power state of cluster nodes or other methods. As mentioned previously, if the cluster can't be sure of the current state of a cluster node, services cannot simply be restarted on other nodes, as there is no way of knowing whether the affected node is actually dead or is still running those resources. Unless configured to risk data consistency, the cluster will simply wait indefinitely until it can be sure of the state, and unless the affected node returns by itself, the cluster resources will remain offline.

The solution is to employ fencing with a method such as Shoot The Other Node In The Head (STONITH), which is designed to be able to return the cluster to a normal state by manipulating an external control mechanism. The most popular approach is to use a server's bulit-in IPMI functionality to power cycle the node. As the server's IPMI is external to the operating system and usually connected to a different network than the server's LAN, it is highly unlikely that it would be affected by whatever has caused the server to appear offline. By power cycling the server and getting confirmation from IPMI that this has happened, the cluster can now be 100% certain that the the cluster resources are no longer running on that node. The cluster is then OK to restart resources on other nodes, without risk of conflict or corruption.