Tue Oct 4 22:35:57 PDT 2005
- Previous message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Next message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
The problem persists after the node IDs were changed from [1, 2, 3] to [10, 20, 30]. Inside gdb, the failedNode2 query did not return an error (function return value was 0). Node 2 was able to move the set_origin = node 3. Nodes 3 is stuck with set_origin = node 1. On 10/4/05, Fiel Cabral <e4696wyoa63emq6w3250kiw60i45e1 at gmail.com> wrote: > > Thanks Elein. I'll run gdb and step through slonik_failed_node to (maybe) > see if failedNode2 is failing. > > > On 10/4/05, elein <elein at varlena.com> wrote: > > > > Fiel, > > > > In my own tests, with node 10->20->30, failover from 10 to 20 failed > > because node 30 was unusable and had to be recreated from scratch. > > This is a serious bug in my book. > > > > In one case the problem seemed to be dropping the first node > > "too soon". I have not tested that case so I don't know that > > this was the problem. > > > > What I have verified is that the third node never recieved any message > > regarding the failover and did not change its information > > to get its table set from the new origin, 20. > > > > Also, try not to use Node 1, 2, 3. Node 1 has some special meaning > > in some cases that you will want to avoid. > > > > We are with you, not ignoring you. > > > > --elein > > > > On Tue, Oct 04, 2005 at 11:13:19AM -0400, Fiel Cabral wrote: > > > Right after running the failover command I issue the DROP NODE command > > to drop > > > node 1. slonik prints an error message and exits with return value 12: > > > > > > > > sys:17: TRY: drop node > > > sys:19: PGRES_FATAL_ERROR select "_whatever".dropNode(1); - ERROR: > > Slony-I: > > > Node 1 is still origin of one or more sets > > > > > > Something should have changed the origin to node 3 but it isn't > > happening. > > > > > > > > > On 10/4/05, Fiel Cabral <e4696wyoa63emq6w3250kiw60i45e1 at gmail.com> > > wrote: > > > > > > I have 3 nodes. Nodes 2 and 3 are subscribers of node 1 and I'm trying > > to > > > failover from node 1 to node 3. The failover command succeeds but the > > > database of node 3 is still read-only and the origin is still node 1. > > I > > > don't have the same problem when doing failover with only two nodes > > because > > > the set is moved immediately by failedNode. > > > > > > failedNode (in the code below) is able to set the provider > > successfully. > > > > > > Some code elsewhere is actually moving the replication set. Where is > > that > > > code? Is it in slon or slonik or in the sql scripts? > > > > > > How do I find out that slon caught the signal and is doing the right > > thing > > > in response to the signal? > > > > > > 784 raise notice ''failedNode: set % has other direct receivers - > > > change providers only'', v_row.set_id; > > > 785 -- ---- > > > 786 -- Backup node is not the only direct > > > subscriber. This > > > 787 -- means that at this moment, we redirect > > > all direct > > > 788 -- subscribers to receive from the backup > > > node, and the > > > 789 -- backup node itself to receive from > > > another one. > > > 790 -- The admin utility will wait for the slon > > > engine to > > > 791 -- restart and then call failedNode2() on > > > the node with > > > 792 -- the highest SYNC and redirect this to it > > > on > > > 793 -- backup node later. > > > 794 -- ---- > > > ... etc ... > > > 811 > > > 812 -- ---- > > > 813 -- Make sure the node daemon will restart > > > 814 -- ---- > > > 815 notify "_ at CLUSTERNAME@_Restart"; > > > 816 > > > > > > -Fiel > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Slony1-general mailing list > > > Slony1-general at gborg.postgresql.org > > > http://gborg.postgresql.org/mailman/listinfo/slony1-general > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20051004/e1a2dc53/attachment-0001.html
- Previous message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Next message: [Slony1-general] Re: Slony1-1.0.5 Failover does not work - replication set isn't being moved
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list