[Slony1-general] Proposed Failover changes for 2.2

Tue Dec 13 06:32:59 PST 2011

I've been working on improving the reliability of the FAILOVER command 
when multiple nodes fail at the same time.  The changes I've had to make 
have made the failover time logic very complicated.

An alternative is restrict cluster configurations that can be used with 
the failover command.

I am proposing something along the lines of:

If you have an origin node that you want to failover then  the failover 
command would take a list of the failed nodes.  It would then look for 
backup nodes that meet the following criteria

* The backup node for a given origin is a subscribed forwarder to ALL 
sets that the failed node is an origin for.
* The backup node has bi-directional paths to all nodes that the failed 
origin has paths to
* Any other nodes that are not being fed from the one of the potential 
failover targets will be dropped

Out of the nodes that meet the above criteria the failover command would 
then pick the most ahead node and make that the new origin for the sets 
from the failed node.

After the failover command finishes you could then use MOVE SET and 
SUBSCRIBE SET to reshape the cluster as you please.

How would this work:

Example  1:
1---->2

FAILOVER ( node=(id=1));
would fail node 1 to 2.

Example 2:
1---->2---->4
|          .
\----3......
(3 and 4 are connected with a PATH but have no subscription using it)

FAILOVER(node=(id=1));

would result in a message such as
'node 1 has no failover targets'

because node 1 has paths to both 2 and 3, but no other node has paths to 
both nodes 2 and 3.

Example 3:
1---->2---->4
|     .
\-----3

FAILOVER(node=(id=1))

slonik would pick one of 2 or 3 and failover to it.  It would pick the 
one that is most ahead.

Example 4:

1---->2---->4
       |
       v
       3

FAILOVER (node=(id=1), node=(id=2));

Results in 'no 1 has no failover targets'
The above cluster can't survive both node 1 and 2 failing at the same time.

Example 5:

1(set1)----->2(set1)----->4(set1)
| (set2)        .
|               .
V ...............
3 (set1,set2)
|
|
5(set2)

Node 3 is the only acceptable failover target.  Node 4 would be 
unsubscribed or dropped.

Example 6

   |<--------------->4 (set2)
   1(set1)------>2--\
   |
   V       .   7
   3........
   |
5   6

In this example node 4 is the origin for set2, it replicates to node 1 
which is the origin for set 1.  Nodes 2,3 then receive sets 1 and 2 from 
node 1.  Node 4 is a subscriber for set 1.

FAILOVER( node=(id=1))
  would give node 2 or 3 as a failover target.  Node 4 would be 
unsubscribed/dropped from set 1.   It is possible that set 2 would need 
to be dropped from all nodes.

I realize that this means some existing clusters will no longer work 
with failover but I have doubts if the existing failover code will work 
100% of the time for clusters of that type of configuration anyway.

I also think it is safer for slonik to make the most ahead node the new 
master and then let you reshape the cluster with move set.  Today if 
additional things go wrong in the middle of a FAILOVER procedure it can 
be very difficult to recover the cluster.  I feel that if we just 
promote the most ahead node to the new master things will be safer.

I am proposing this change for 2.2,  do any users object to this type of 
change?  Is anyone using slony for failover building non-standard 
cluster configurations?