Steve Singer ssinger at ca.afilias.info
Thu Jan 5 07:13:44 PST 2012
Attached is an updated patch on the FAILOVER work that I have been doing.

This is a major overhaul of how failover works to address concerns with

a) Multiple nodes failing at once
b) Failing over to a non-direct subscriber of the failed node

The major changes can be summarized as follows

1) The FAILOVER command will now look at the view sl_failover_targets to
find nodes that are acceptable failover candidates.  A failover
candidate must be directly subscribed to all sets originating on the
failed node and must have direct paths to any receivers of the failed
node.  The FAILOVER command will then failover to the most ahead node of
that list.    Later the FAILOVER command will perform a MOVE SET to make
the desired backup node be the new master.

2) Any nodes that are further ahead of than the most ahead failover
candidate will be recursively unsubscribed from the sets originating on
the failed node.

3) The failover process no longer fakes an event from the failed origin.
   Instead a FAILOVER_NODE event is generated from the temporary backup
node that is processed by all other nodes.

This version of the patches addresses Chris's concern that the post
failover origin is deterministic by having slonik perform the MOVE_SET
(and it actually uses the MOVE_SET code) to leave you with the new
origin that you have choosen.

This version of the patch is open to criticism that you might have a
forwarder that doesn't meet the critieria specified in (1) that is more
ahead of all nodes that do meet the criteria in (1).  That forwarder and
any providers will be unsubscribed.  I argue that you can avoid this by
designing your clusters so they don't have nodes that don't meet the
criteria of (1).  I am of the opinion that dealing with keeping those
nodes makes the failover code far too complex.  I will listen/read
arguments that this is not the case.


Outstanding issues:
-------------------
* I still have to update the documentation
* More test cases to verify states?
* More test cases to force different cluster ahead states?

https://github.com/ssinger/slony1-engine/tree/multi_node_limited



-------------- next part --------------
A non-text attachment was scrubbed...
Name: failover.jan5.diff.gz
Type: application/x-gzip
Size: 26322 bytes
Desc: not available
Url : http://lists.slony.info/pipermail/slony1-hackers/attachments/20120105/be5c0337/attachment-0001.bin 


More information about the Slony1-hackers mailing list