[Slony1-general] Re: Slony1-1.0.5 Failover does not work

Tue Oct 4 18:12:25 PDT 2005

Fiel,

In my own tests, with node 10->20->30, failover from 10 to 20 failed
because node 30 was unusable and had to be recreated from scratch.  
This is a serious bug in my book.  

In one case the problem seemed to be dropping the first node 
"too soon".  I have not tested that case so I don't know that
this was the problem.  

What I have verified is that the third node never recieved any message
regarding the failover and did not change its information
to get its table set from the new origin, 20.

Also, try not to use Node 1, 2, 3.  Node 1 has some special meaning
in some cases that you will want to avoid.  

We are with you, not ignoring you.  

--elein 

On Tue, Oct 04, 2005 at 11:13:19AM -0400, Fiel Cabral wrote:
> Right after running the failover command I issue the DROP NODE command to drop
> node 1. slonik prints an error message and exits with return value 12:
> 
> sys:17: TRY: drop node
> sys:19: PGRES_FATAL_ERROR select "_whatever".dropNode(1);  - ERROR:  Slony-I:
> Node 1 is still origin of one or more sets
> 
> Something should have changed the origin to node 3 but it isn't happening.
> 
> 
> On 10/4/05, Fiel Cabral <e4696wyoa63emq6w3250kiw60i45e1 at gmail.com> wrote:
> 
>     I have 3 nodes. Nodes 2 and 3 are subscribers of node 1 and I'm trying to
>     failover from node 1 to node 3. The failover command succeeds but the
>     database of node 3 is still read-only and the origin is still node 1. I
>     don't have the same problem when doing failover with only two nodes because
>     the set is moved immediately by failedNode.
> 
>     failedNode (in the code below) is able to set the provider successfully.
> 
>     Some code elsewhere is actually moving the replication set. Where is that
>     code? Is it in slon or slonik or in the sql scripts?
> 
>     How do I find out that slon caught the signal and is doing the right thing
>     in response to the signal?
> 
>         784 raise notice ''failedNode: set % has other direct receivers -
>     change providers only'', v_row.set_id;
>         785                         -- ----
>         786                         -- Backup node is not the only direct
>     subscriber. This
>         787                         -- means that at this moment, we redirect
>     all direct
>         788                         -- subscribers to receive from the backup
>     node, and the
>         789                         -- backup node itself to receive from
>     another one.
>         790                         -- The admin utility will wait for the slon
>     engine to
>         791                         -- restart and then call failedNode2() on
>     the node with
>         792                         -- the highest SYNC and redirect this to it
>     on
>         793                         -- backup node later.
>         794                         -- ----
>     ... etc ...
>         811
>         812         -- ----
>         813         -- Make sure the node daemon will restart
>         814         -- ----
>         815         notify "_ at CLUSTERNAME@_Restart";
>         816
> 
>     -Fiel
> 
> 
> 
> 
> 
> 

> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general