Cezary Bołdak echo
Wed Apr 6 12:50:10 PDT 2005
In reponse to 
http://gborg.postgresql.org/pipermail/slony1-general/2005-January/001493.html

I have similar problem with the failover problem described as above.

I have tried another thing (it also crashes):

*** 4 nodes in a cascade: 1<-2<-3<-4

*** 'postmaster' running locally under 'postgres' account and using the 
Unix sockets ('postmaster -D /var/local/pgsql/data')

*** created 4 databases as (from 'postgres' account also):
    createdb stats1
    createlang plpgsql stats1
    psql stats1 << _EOF_
        CREATE TABLE stats_test(ID serial PRIMARY KEY, name varchar(32) );
    _EOF_
 and son on for 'stats2', 'stats3', 'stats4'

*** then:
slonik << _EOF_
    cluster name = stats;
    node 1 admin conninfo = 'dbname=stats1';
    node 2 admin conninfo = 'dbname=stats2';
    node 3 admin conninfo = 'dbname=stats3';
    node 4 admin conninfo = 'dbname=stats4';

    init cluster (id=1, comment='Master node');

    create set (id=1, origin=1, comment='Test set');

    set add table (set id =1, origin =1, id=1, fully qualified name = 
'public.stats_test', comment = 'test table');

    store node (id=2, comment = '1st slave');
    store node (id=3, comment = '2nd slave');
    store node (id=4, comment = '3nd slave');

    store path (server = 1, client = 2, conninfo='dbname=stats1');
    store path (server = 1, client = 3, conninfo='dbname=stats1');
    store path (server = 1, client = 4, conninfo='dbname=stats1');

    store path (server = 2, client = 1, conninfo='dbname=stats2');
    store path (server = 2, client = 3, conninfo='dbname=stats2');
    store path (server = 2, client = 4, conninfo='dbname=stats2');

    store path (server = 3, client = 1, conninfo='dbname=stats3');
    store path (server = 3, client = 2, conninfo='dbname=stats3');
    store path (server = 3, client = 4, conninfo='dbname=stats3');

    store path (server = 4, client = 1, conninfo='dbname=stats4');
    store path (server = 4, client = 2, conninfo='dbname=stats4');
    store path (server = 4, client = 3, conninfo='dbname=stats4');

# cascade
    store listen (receiver=1, origin=2, provider=2);
    store listen (receiver=1, origin=3, provider=2);
    store listen (receiver=1, origin=4, provider=2);

    store listen (receiver=2, origin=1, provider=1);
    store listen (receiver=2, origin=3, provider=3);
    store listen (receiver=2, origin=4, provider=3);

    store listen (receiver=3, origin=1, provider=2);
    store listen (receiver=3, origin=2, provider=2);
    store listen (receiver=3, origin=4, provider=4);

    store listen (receiver=4, origin=1, provider=3);
    store listen (receiver=4, origin=2, provider=3);
    store listen (receiver=4, origin=3, provider=3);

_EOF_

*** then 4 times :'slon stats "dbname=stats1' and son on for 'stats2' 
'stats3' 'stats4' in a separate console each one under 'postgres' accont.

*** then I start the replication:
slonik << _EOF_
    cluster name = stats;

    node 1 admin conninfo = 'dbname=stats1';
    node 2 admin conninfo = 'dbname=stats2';
    node 3 admin conninfo = 'dbname=stats3';
    node 4 admin conninfo = 'dbname=stats4';
# cascade
    subscribe set (id=1, provider=1, receiver=2, forward=true);
    subscribe set (id=1, provider=2, receiver=3, forward=true);
    subscribe set (id=1, provider=3, receiver=4, forward=true);

_EOF_

** 'failover' as below works fine:
slonik << _EOF_
    cluster name = stats;

    node 1 admin conninfo = 'dbname=stats1';
    node 2 admin conninfo = 'dbname=stats2';
    node 3 admin conninfo = 'dbname=stats3';
    node 4 admin conninfo = 'dbname=stats4';

    failover (id=1, backup node=2);
_EOF_

* with the output as below:
<stdin>:9: NOTICE:  Slony-I: terminating DB connection of faile node 
with pid 23392
CONTEXT:  SQL statement "SELECT  "_stats".terminateNodeConnections( 
'_stats_Node_' ||  $1 )"
PL/pgSQL function "failednode" line 75 at perform
<stdin>:9: NOTICE:  failedNode: set 1 has no other direct receivers - 
move now
<stdin>:9: NOTICE:  failedNode: set 1 has no other direct receivers - 
move now
<stdin>:9: NOTICE:  failedNode: set 1 has no other direct receivers - 
move now

*in that time 'slon' for 'stats1' creates some errors:
FATAL:  terminating connection due to administrator command
ERROR  remoteListenThread_2: "select ev_origin, ev_seqno, 
ev_timestamp,        ev_minxid, ev_maxxid, ev_xip,        
ev_type,        ev_data1, ev_data2,        ev_data3, ev_data4,        
ev_data5, ev_data6,        ev_data7, ev_data8 from "_stats".sl_event e 
where (e.ev_origin = '2' and e.ev_seqno > '9') or (e.ev_origin= '3' and 
e.ev_seqno > '9') or (e.ev_origin = '4' and e.ev_seqno > '9') order by 
e.ev_origin, e.ev_seqno" - server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

*** but when I make 'dropnode' as:
slonik << _EOF_
    cluster name = stats;
    node 1 admin conninfo = 'dbname=stats1';
    node 2 admin conninfo = 'dbname=stats2';
    node 3 admin conninfo = 'dbname=stats3';
    node 4 admin conninfo = 'dbname=stats4';

    drop node (id=1, event node=2);
_EOF_

* 'slon' for 'stats1' ends up but so does 'slon' for 'stats3' !!! with 
the following error:

ERROR  remoteWorkerThread_2: "begin transaction; set transaction 
isolation level serializable; lock table "_stats".sl_config_lock; select 
"_stats".dropNode_int(1); notify "_stats_Restart"; notify 
"_stats_Event"; notify "_stats_Confirm"; insert into 
"_stats".sl_event     (ev_origin, ev_seqno, ev_timestamp,      
ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1    ) values ('2', '14', 
'2005-04-06 09:27:12.543614', '131477', '131478', '', 'DROP_NODE', '1'); 
insert into "_stats".sl_confirm       (con_origin, con_received, 
con_seqno, con_timestamp)    values (2, 3, '14', CURRENT_TIMESTAMP); 
commit transaction;" PGRES_FATAL_ERROR ERROR:  update or delete on 
"sl_node" violates foreign key constraint "ssy_origin-no_id-ref" on 
"sl_setsync"
DETAIL:  Key (no_id)=(1) is still referenced from table "sl_setsync".
CONTEXT:  SQL statement "delete from "_stats".sl_node where no_id =  $1 "
PL/pgSQL function "dropnode_int" line 29 at SQL statement
INFO   remoteListenThread_2: disconnecting from 'dbname=stats2'
INFO   remoteListenThread_4: disconnecting from 'dbname=stats4'

*** The databases are empty. I have tried before: the repliction was 
working fine before the failover and dropnode but no more after (of 
course, 'slon' for 'stats3' does not work), but 2 tables '_stats.sl_set' 
and '_stats.sl_subscribe' on all three nodes (2,3,4) look the same:

select * from _stats.sl_subscribe;
 sub_set | sub_provider | sub_receiver | sub_forward | sub_active
---------+--------------+--------------+-------------+------------
       1 |            2 |            3 | t           | t
       1 |            3 |            4 | t           | t

select * from _stats.sl_set;
 set_id | set_origin | set_locked | set_comment
--------+------------+------------+-------------
      1 |          2 |            | Test set


*** *** If I am doing something wrong with the system configuration or 
commands ?

I work under Debian sarge, kernel 2.6.8, PostgreSQL 8.0.1, Slony 1.0.5

Cezary Boldak

boldak_at_ii_pb_bialystok_pl



More information about the Slony1-general mailing list