Fri Oct 3 05:35:49 PDT 2014
- Previous message: [Slony1-general] PostgreSQL 9.4 support?
- Next message: [Slony1-general] Slony 2.1.4 - Issues re-subscribing provider when origin down
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi All, I'm looking at a slony setup using 2.1.4, with 4 nodes in the following configuration: Node 1 --> Node 2 Node 1 --> Node 3 --> Node 4 Node 1 is the origin of all sets, and node 3 is a provider of all to node 4. What I'm looking to do is fail over to node 2 when both nodes 1 and 3 have gone down. Is this possible? In both a live environment that I've not had chance to move to 2.2 and my test environment I'm seeing the same issues, for my test environment the slonik script is: CLUSTER NAME = test_replication; NODE 1 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5432 user=slony'; NODE 2 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5433 user=slony'; NODE 3 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5434 user=slony'; NODE 4 ADMIN CONNINFO = 'dbname=TEST host=localhost port=5435 user=slony'; SUBSCRIBE SET (ID = 1, PROVIDER = 2, RECEIVER = 4, FORWARD = YES); WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2); SUBSCRIBE SET (ID = 2, PROVIDER = 2, RECEIVER = 4, FORWARD = YES); WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2); SUBSCRIBE SET (ID = 3, PROVIDER = 2, RECEIVER = 4, FORWARD = YES); WAIT FOR EVENT (ORIGIN = 2, CONFIRMED = 4, WAIT ON = 2); DROP NODE (ID = 3, EVENT NODE = 2); FAILOVER ( ID = 1, BACKUP NODE = 2 ); DROP NODE (ID = 1, EVENT NODE = 2); slonik is failing at the first subscribe set line as follows: $ slonik test.scr test.scr:8: could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5432? test.scr:8: could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5434? test.scr:8: could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5432? Segmentation fault I get the same behaviour until I bring node 1 back up, then the script almost succeeds, but for an error stating that a record in sl_event already exists: $ slonik ~/test.scr ~/test.scr:8: could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5434? waiting for events (1,5000000172) only at (1,5000000162) to be confirmed on node 4 executing failedNode() on 2 ~/test.scr:17: NOTICE: failedNode: set 1 has no other direct receivers - move now ~/test.scr:17: NOTICE: failedNode: set 2 has no other direct receivers - move now ~/test.scr:17: NOTICE: failedNode: set 3 has no other direct receivers - move now ~/test.scr:17: NOTICE: failedNode: set 1 has other direct receivers - change providers only ~/test.scr:17: NOTICE: failedNode: set 2 has other direct receivers - change providers only ~/test.scr:17: NOTICE: failedNode: set 3 has other direct receivers - change providers only NOTICE: executing "_test_replication".failedNode2 on node 2 ~/test.scr:17: waiting for event (1,5000000175). node 4 only on event 5000000162 NOTICE: executing "_test_replication".failedNode2 on node 2 ~/test.scr:17: PGRES_FATAL_ERROR lock table "_test_replication".sl_event_lock, "_test_replication".sl_config_lock;select "_test_replication".failedNode2(1,2,2,'5000000174','5000000176'); - ERROR: duplicate key value violates unique constraint "sl_event-pkey" DETAIL: Key (ev_origin, ev_seqno)=(1, 5000000176) already exists. CONTEXT: SQL statement "insert into "_test_replication".sl_event (ev_origin, ev_seqno, ev_timestamp, ev_snapshot, ev_type, ev_data1, ev_data2, ev_data3) values (p_failed_node, p_ev_seqfake, CURRENT_TIMESTAMP, v_row.ev_snapshot, 'FAILOVER_SET', p_failed_node::text, p_backup_node::text, p_set_id::text)" PL/pgSQL function _test_replication.failednode2(integer,integer,integer,bigint,bigint) line 14 at SQL statement NOTICE: executing "_test_replication".failedNode2 on node 2 ~/test.scr:17: waiting for event (1,5000000177). node 4 only on event 5000000175 ~/test.scr:21: begin transaction; - After this sl_set on node 4 still has node 1 as the origin for one of the sets (Is this possibly becasuse I'm not waiting properly or waiting on the wrong node?): TEST=# table _test_replication.sl_set; set_id | set_origin | set_locked | set_comment --------+------------+------------+------------------- 2 | 1 | | Replication set 2 1 | 2 | | Replication set 1 3 | 2 | | Replication set 3 (3 rows) I had attached the slon logs, but my mail to the list bounced, if that would provide any better insight I can provide them. Any help would be greatly appreciated. ThanksGlyn -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20141003/21c6a823/attachment.htm
- Previous message: [Slony1-general] PostgreSQL 9.4 support?
- Next message: [Slony1-general] Slony 2.1.4 - Issues re-subscribing provider when origin down
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list