[Slony1-general] Master Failover

Thu Jun 24 08:35:13 PDT 2004

Thomas Draband (madcow) schrieb:
> Jan Wieck schrieb:
> 
>> On 6/18/2004 11:05 AM, Thomas Draband (madcow) wrote:
>>
>>> I've tested Slony-I and found, that with lock set and move set I can 
>>> switch the master to a forwarded slave. But this only works if the 
>>> current master is running. If master crashs I'm not able to switch a 
>>> slave to master this way, cause slonik wants to connect to the master.
>>> How can I do so?
>>
>>
>>
>> The procedure to abandon a failed master (failover) is different from 
>> what you did (switchover). I am currently configuring a couple of test 
>> systems here and will dig deep into failover as soon as I have them up 
>> and running. Stay tuned.
>>
>>
>> Jan
>>
> What is the procedure for failover. Will do the slon prozesses the 
> propagation of the new master automaticaly? I think nothing happens 
> after the master db goes down. In my slony cluster there was no way of 
> doing updates in a set until the master db came up again.
> 
> I have to nodes. The master has the virual ip at eth0:0. I thougth that 
> I can watch the postgresql prozesses on the nodes and could do a switch 
> over of the set origin and the virtual ip to the other node on an mond 
> alert.
> 
> _______________________________________________
> Slony1-general mailing list
> Slony1-general at gborg.postgresql.org
> http://gborg.postgresql.org/mailman/listinfo/slony1-general

Some details of my test scenario:

#/bin/bash

/usr/local/pgsql/bin/slonik <<EOF

cluster name = test;

node 1 admin conninfo = 'host=tigger port=5433 dbname=db1 user=slony';
node 2 admin conninfo = 'host=pooh port=5433 dbname=db1 user=slony';

init cluster (id = 1, comment = 'tigger');

store node (id = 2, comment = 'pooh');

store path (server = 1, client = 2, conninfo = 'host=tigger port=5433 
dbname=db1 user=slony');
store path (server = 2, client = 1, conninfo = 'host=pooh port=5433 
dbname=db1 user=slony');

store listen (origin = 1, provider = 1, receiver = 2);
store listen (origin = 2, provider = 2, receiver = 1);

create set (id = 1, origin = 1, comment = 'test');
set add table (set id = 1, origin = 1, id = 1, full qualified name = 
'public.t1');
set add sequence (set id = 1, origin = 1, id = 1, full qualified name = 
'public.seq_t1_id');

subscribe set (id = 1, provider = 1, receiver = 2, forward = yes);

EOF

Replication works fine. An update on node 1 (tigger) propagates to node 
2 (pooh). On node 2 (pooh) updates can't be made.
The concept of Slony-I says on a failure of the origin (node 1 (tigger)) 
a subscriber will be promoted to the new origin. I've stop postgresql on 
node 1 (tigger) immediate. The slon process on node 1 tell's following:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
FATAL  syncThread: "start transaction;set transaction isolation level 
serializable;select last_value from "_test".sl_action_seq;" - WARNING: 
terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
FATAL  localListenThread: cannot start transaction - WARNING: 
terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back 
the current transaction and exit, because another server process exited 
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and 
repeat your command.
FATAL  cleanupThread: "select "_test".cleanupEvent();" - INFO 
remoteListenThread_2: disconnecting from 'host=pooh port=5433 dbname=db1'

So I expect that node 2 (pooh) will become origin for set 1. But node 2 
only tries to connect node 1 every 10 seconds. Set 1 stays locked on 
node 2. No origin promotion occures.

What's wrong on my configuration? Or is failover of the origin not yet 
implemented?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: madcow.vcf
Type: text/x-vcard
Size: 150 bytes
Desc: not available
Url : http://gborg.postgresql.org/pipermail/slony1-general/attachments/20040624/503bd3a2/madcow.vcf