[Slony1-general] Slony 2.2 failover changes

Wed Apr 30 08:57:34 PDT 2014

On 04/28/2014 09:56 AM, Glyn Astill wrote:

This is sounding like a bug, what is your path network?
Do you have paths between all nodes in both directions or something else?

Does it happen everytime you test or only sometimes?

> Hi All,
>
> I'm testing the changes to failover in 2.2.2 and seem to be running into
> issues passing multiple nodes to failover.  In the following scenario
> with 4 nodes, node 2 is the origin of all sets and node 3 is a
> forwarding provider to node 4, i.e.
>
> 1 <---- 2 ----> 3 ----> 4
>
> I'm attempting to fail over in a scenario where both nodes 2 and 3 have
> failed, so postgres is stopped for both of those nodes.  I'm running the
> following script:
>
> CLUSTER NAME = test_replication;
> NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432 user=slony';
> NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433 user=slony';
> NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434 user=slony';
> NODE 4 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5435 user=slony';
> FAILOVER (
>      NODE = (ID = 2, BACKUP NODE = 1),
>      NODE = (ID = 3, BACKUP NODE = 1)
> );
>
> However it would appear that slonik will wait indefinitely for node 4 to
> catch up via failed node 3:
>
> $ slonik test.scr
> test.scr:3: could not connect to server: Connection refused
>          Is the server running on host "localhost" (127.0.0.1) and accepting
>          TCP/IP connections on port 5433?
> test.scr:4: could not connect to server: Connection refused
>          Is the server running on host "localhost" (127.0.0.1) and accepting
>          TCP/IP connections on port 5434?
> executing preFailover(2,1) on 1
> NOTICE: executing "_test_replication".failedNode2 on node 1
> test.scr:6: NOTICE:  calling restart node 2
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157). node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
> test.scr:6: waiting for event (1,5000000157).  node 4 only on event
> 5000000156
>
> It'll only complete if I bring node 3 back up, which of course I
> couldn't do if it was really dead:
>
> NOTICE: executing "_test_replication".failedNode3 on node 1
>
> Have I totally got the wrong end of the stick here?
>
> Thanks
> Glyn
>
>
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at lists.slony.info
> http://lists.slony.info/mailman/listinfo/slony1-general
>