Thu Oct 10 13:34:02 PDT 2019
- Previous message: [Slony1-general] Changing node names without adding/removing.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello all! We have Slony 2.1.4 with Postgres 9.3 and Postgres 10 Nodes. There is one master (node=1) and 53 slaves (node=2..54). Node 1 is always the master - no other node will ever be master, all slaves subscribe directly to the master. The slaves subscribe a single set from the master. Recently we have frequently performance problems with very slow queries on our master DB. We think this correlates with the size of the sl_event table on the master. With ~10.000 rows everything is fine, but with >50.000 rows we notice rather slow queries, INSERT/SELECT into the sl_event table. So the question is - why does the sl_event table grows? One reason may be an offline slave. If the slave is offline too long, we kick him out of the cluster - problem solved. But sometimes all slaves are in sync but the sl_event table grows. I am not a Slony-expert, but it seems to the related to old confirmations in the sl_conf table. For example, today I removed and resubscribed node 9. On node 1 I see: select ev_origin, ev_seqno, ev_timestamp, ev_type from _regdnscluster.sl_event order by ev_timestamp; ev_origin | ev_seqno | ev_timestamp | ev_type -----------+------------+-------------------------------+------------ 9 | 5000000001 | 2019-10-10 14:19:41.514101+00 | STORE_PATH 9 | 5000000002 | 2019-10-10 14:19:53.097102+00 | SYNC 9 | 5000000003 | 2019-10-10 14:20:03.118798+00 | SYNC thousands of 9 | .......| 14:20 - 19:43 | always SYNC 9 | 5000001943 | 2019-10-10 19:43:34.923708+00 | SYNC 9 | 5000001944 | 2019-10-10 19:43:44.933323+00 | SYNC 9 | 5000001945 | 2019-10-10 19:43:54.943484+00 | SYNC 9 | 5000001946 | 2019-10-10 19:44:04.952488+00 | SYNC 21 | 5003835850 | 2019-10-10 19:44:06.106565+00 | SYNC 47 | 5000021416 | 2019-10-10 19:44:06.148101+00 | SYNC 12 | 5003464447 | 2019-10-10 19:44:06.212744+00 | SYNC 51 | 5000021650 | 2019-10-10 19:44:07.262471+00 | SYNC select ev_origin,count(*) from _regdnscluster.sl_event group by 1 order by 2 desc; ev_origin | count -----------+------- 9 | 2029 1 | 530 21 | 89 45 | 89 18 | 88 30 | 87 16 | 87 51 | 87 42 | 87 14 | 87 50 | 87 So, events from node 9 a kept although all nodes are in sync and have a small lag. a) Why is node 9 producing events at all? It does not have any replications sets and is only a slave. b) As far as I understand the cleanup function, these old events are not deleted as there are older confirmations in sl_confirm. select * from _regdnscluster.sl_confirm order by con_timestamp ; con_origin | con_received | con_seqno | con_timestamp ------------+--------------+------------+------------------------------- 9 | 43 | 0 | 2019-10-10 14:19:41.447874+00 9 | 8 | 0 | 2019-10-10 14:19:41.456426+00 9 | 41 | 0 | 2019-10-10 14:19:41.45702+00 8 | 34 | 5000008516 | 2019-10-10 19:56:34.146168+00 8 | 31 | 5000008516 | 2019-10-10 19:56:34.262533+00 8 | 54 | 5000008516 | 2019-10-10 19:56:34.413593+00 8 | 43 | 5000008516 | 2019-10-10 19:56:34.422288+00 ..... select * from _regdnscluster.sl_confirm where con_origin=9 and con_received=43 order by con_timestamp ; con_origin | con_received | con_seqno | con_timestamp ------------+--------------+-----------+------------------------------- 9 | 43 | 0 | 2019-10-10 14:19:41.447874+00 If I understand it correct, this means that node 43 acknowledges the receipt of an event from node 9 (probably the STORE_PATH event) and this confirmation is not cleared as there is no newer confirmation from node 43 about events from node 9. select con_origin, con_received, count(*) from _regdnscluster.sl_confirm WHERE con_received = 43 group by 1,2 order by 3; con_origin | con_received | count ------------+--------------+------- 9 | 43 | 1 4 | 43 | 7 30 | 43 | 8 8 | 43 | 8 42 | 43 | 9 54 | 43 | 9 ... c) So, why is node 43 not more confirming other events originated by node 9? d) How can I solve this problem? My slaves will always be pure slaves. So can I blindly delete all events from the events table with ev_origin <>1? (or at least all SYNC events)? Thanks Klaus
- Previous message: [Slony1-general] Changing node names without adding/removing.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list