Wed Apr 6 12:50:10 PDT 2005
- Previous message: [Slony1-general] strange failover problem
- Next message: [Slony1-general] Failing over
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In reponse to http://gborg.postgresql.org/pipermail/slony1-general/2005-January/001493.html I have similar problem with the failover problem described as above. I have tried another thing (it also crashes): *** 4 nodes in a cascade: 1<-2<-3<-4 *** 'postmaster' running locally under 'postgres' account and using the Unix sockets ('postmaster -D /var/local/pgsql/data') *** created 4 databases as (from 'postgres' account also): createdb stats1 createlang plpgsql stats1 psql stats1 << _EOF_ CREATE TABLE stats_test(ID serial PRIMARY KEY, name varchar(32) ); _EOF_ and son on for 'stats2', 'stats3', 'stats4' *** then: slonik << _EOF_ cluster name = stats; node 1 admin conninfo = 'dbname=stats1'; node 2 admin conninfo = 'dbname=stats2'; node 3 admin conninfo = 'dbname=stats3'; node 4 admin conninfo = 'dbname=stats4'; init cluster (id=1, comment='Master node'); create set (id=1, origin=1, comment='Test set'); set add table (set id =1, origin =1, id=1, fully qualified name = 'public.stats_test', comment = 'test table'); store node (id=2, comment = '1st slave'); store node (id=3, comment = '2nd slave'); store node (id=4, comment = '3nd slave'); store path (server = 1, client = 2, conninfo='dbname=stats1'); store path (server = 1, client = 3, conninfo='dbname=stats1'); store path (server = 1, client = 4, conninfo='dbname=stats1'); store path (server = 2, client = 1, conninfo='dbname=stats2'); store path (server = 2, client = 3, conninfo='dbname=stats2'); store path (server = 2, client = 4, conninfo='dbname=stats2'); store path (server = 3, client = 1, conninfo='dbname=stats3'); store path (server = 3, client = 2, conninfo='dbname=stats3'); store path (server = 3, client = 4, conninfo='dbname=stats3'); store path (server = 4, client = 1, conninfo='dbname=stats4'); store path (server = 4, client = 2, conninfo='dbname=stats4'); store path (server = 4, client = 3, conninfo='dbname=stats4'); # cascade store listen (receiver=1, origin=2, provider=2); store listen (receiver=1, origin=3, provider=2); store listen (receiver=1, origin=4, provider=2); store listen (receiver=2, origin=1, provider=1); store listen (receiver=2, origin=3, provider=3); store listen (receiver=2, origin=4, provider=3); store listen (receiver=3, origin=1, provider=2); store listen (receiver=3, origin=2, provider=2); store listen (receiver=3, origin=4, provider=4); store listen (receiver=4, origin=1, provider=3); store listen (receiver=4, origin=2, provider=3); store listen (receiver=4, origin=3, provider=3); _EOF_ *** then 4 times :'slon stats "dbname=stats1' and son on for 'stats2' 'stats3' 'stats4' in a separate console each one under 'postgres' accont. *** then I start the replication: slonik << _EOF_ cluster name = stats; node 1 admin conninfo = 'dbname=stats1'; node 2 admin conninfo = 'dbname=stats2'; node 3 admin conninfo = 'dbname=stats3'; node 4 admin conninfo = 'dbname=stats4'; # cascade subscribe set (id=1, provider=1, receiver=2, forward=true); subscribe set (id=1, provider=2, receiver=3, forward=true); subscribe set (id=1, provider=3, receiver=4, forward=true); _EOF_ ** 'failover' as below works fine: slonik << _EOF_ cluster name = stats; node 1 admin conninfo = 'dbname=stats1'; node 2 admin conninfo = 'dbname=stats2'; node 3 admin conninfo = 'dbname=stats3'; node 4 admin conninfo = 'dbname=stats4'; failover (id=1, backup node=2); _EOF_ * with the output as below: <stdin>:9: NOTICE: Slony-I: terminating DB connection of faile node with pid 23392 CONTEXT: SQL statement "SELECT "_stats".terminateNodeConnections( '_stats_Node_' || $1 )" PL/pgSQL function "failednode" line 75 at perform <stdin>:9: NOTICE: failedNode: set 1 has no other direct receivers - move now <stdin>:9: NOTICE: failedNode: set 1 has no other direct receivers - move now <stdin>:9: NOTICE: failedNode: set 1 has no other direct receivers - move now *in that time 'slon' for 'stats1' creates some errors: FATAL: terminating connection due to administrator command ERROR remoteListenThread_2: "select ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type, ev_data1, ev_data2, ev_data3, ev_data4, ev_data5, ev_data6, ev_data7, ev_data8 from "_stats".sl_event e where (e.ev_origin = '2' and e.ev_seqno > '9') or (e.ev_origin= '3' and e.ev_seqno > '9') or (e.ev_origin = '4' and e.ev_seqno > '9') order by e.ev_origin, e.ev_seqno" - server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. *** but when I make 'dropnode' as: slonik << _EOF_ cluster name = stats; node 1 admin conninfo = 'dbname=stats1'; node 2 admin conninfo = 'dbname=stats2'; node 3 admin conninfo = 'dbname=stats3'; node 4 admin conninfo = 'dbname=stats4'; drop node (id=1, event node=2); _EOF_ * 'slon' for 'stats1' ends up but so does 'slon' for 'stats3' !!! with the following error: ERROR remoteWorkerThread_2: "begin transaction; set transaction isolation level serializable; lock table "_stats".sl_config_lock; select "_stats".dropNode_int(1); notify "_stats_Restart"; notify "_stats_Event"; notify "_stats_Confirm"; insert into "_stats".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_minxid, ev_maxxid, ev_xip, ev_type , ev_data1 ) values ('2', '14', '2005-04-06 09:27:12.543614', '131477', '131478', '', 'DROP_NODE', '1'); insert into "_stats".sl_confirm (con_origin, con_received, con_seqno, con_timestamp) values (2, 3, '14', CURRENT_TIMESTAMP); commit transaction;" PGRES_FATAL_ERROR ERROR: update or delete on "sl_node" violates foreign key constraint "ssy_origin-no_id-ref" on "sl_setsync" DETAIL: Key (no_id)=(1) is still referenced from table "sl_setsync". CONTEXT: SQL statement "delete from "_stats".sl_node where no_id = $1 " PL/pgSQL function "dropnode_int" line 29 at SQL statement INFO remoteListenThread_2: disconnecting from 'dbname=stats2' INFO remoteListenThread_4: disconnecting from 'dbname=stats4' *** The databases are empty. I have tried before: the repliction was working fine before the failover and dropnode but no more after (of course, 'slon' for 'stats3' does not work), but 2 tables '_stats.sl_set' and '_stats.sl_subscribe' on all three nodes (2,3,4) look the same: select * from _stats.sl_subscribe; sub_set | sub_provider | sub_receiver | sub_forward | sub_active ---------+--------------+--------------+-------------+------------ 1 | 2 | 3 | t | t 1 | 3 | 4 | t | t select * from _stats.sl_set; set_id | set_origin | set_locked | set_comment --------+------------+------------+------------- 1 | 2 | | Test set *** *** If I am doing something wrong with the system configuration or commands ? I work under Debian sarge, kernel 2.6.8, PostgreSQL 8.0.1, Slony 1.0.5 Cezary Boldak boldak_at_ii_pb_bialystok_pl
- Previous message: [Slony1-general] strange failover problem
- Next message: [Slony1-general] Failing over
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list