Thu Jan 28 10:09:34 PST 2016
- Previous message: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated
- Next message: [Slony1-general] Replication inexplicably stops
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 01/28/2016 11:11 AM, Tignor, Tom wrote: > > Output below. They seem to be replicating normally, except for the sl_log > growth. Indeed. Is there anything in the slon logs for those nodes that says why it doesn't finish the log switch? Connect to the database as a the slony user. To check if a log switch is indeed in progress, do SELECT last_value FROM _ams_cluster.sl_log_status; It should be either 2 or 3. If it is 0 or 1, no log switch is in progress and you can start one with SELECCT _ams_cluster.logswitch_start(); If it is 2 or 3, then you can do SELECT _ams_cluster.logswitch_finish(); All these operations are harmless and will only do what is safely possible. Look at the code of logswitch_finish() to find out how it determines if the current log switch can be finished. In short, the cleanup thread is removing events from sl_event that have been confirmed by all nodes in the cluster. The function logswitch_finish() looks if there is anything left in sl_event, that belonged to that old log. If so it will not finish. Running those queries manually you can find out what that event is that is preventing the switch to finish. > > > ams at ams-repl2.ams.netmgmt:~$ /a/third-party/postgresql/bin/psql -U > ams_slony -d ams -c 'select * from _ams_cluster.sl_status' > st_origin | st_received | st_last_event | st_last_event_ts | > st_last_received | st_last_received_ts | > st_last_received_event_ts | st_lag_num_events | st_lag_time > -----------+-------------+---------------+-------------------------------+- > -----------------+-------------------------------+------------------------- > ------+-------------------+----------------- > 2 | 1 | 5000611610 | 2016-01-28 16:06:37.343826+00 | > 5000611610 | 2016-01-28 16:06:38.843562+00 | 2016-01-28 > 16:06:37.343826+00 | 0 | 00:00:09.201996 > 2 | 3 | 5000611610 | 2016-01-28 16:06:37.343826+00 | > 5000611609 | 2016-01-28 16:06:29.851545+00 | 2016-01-28 > 16:06:27.341894+00 | 1 | 00:00:19.203928 > 2 | 4 | 5000611610 | 2016-01-28 16:06:37.343826+00 | > 5000611610 | 2016-01-28 16:06:38.710974+00 | 2016-01-28 > 16:06:37.343826+00 | 0 | 00:00:09.201996 > (3 rows) > > > ams at ams-repl3.lga.netmgmt:~$ /a/third-party/postgresql/bin/psql -U > ams_slony -d ams -c 'select * from _ams_cluster.sl_status' > st_origin | st_received | st_last_event | st_last_event_ts | > st_last_received | st_last_received_ts | > st_last_received_event_ts | st_lag_num_events | st_lag_time > -----------+-------------+---------------+-------------------------------+- > -----------------+-------------------------------+------------------------- > ------+-------------------+----------------- > 3 | 4 | 5000654642 | 2016-01-28 16:07:05.493455+00 | > 5000654642 | 2016-01-28 16:07:06.486539+00 | 2016-01-28 > 16:07:05.493455+00 | 0 | 00:00:08.522529 > 3 | 1 | 5000654642 | 2016-01-28 16:07:05.493455+00 | > 5000654642 | 2016-01-28 16:07:08.040292+00 | 2016-01-28 > 16:07:05.493455+00 | 0 | 00:00:08.522529 > 3 | 2 | 5000654642 | 2016-01-28 16:07:05.493455+00 | > 5000654642 | 2016-01-28 16:07:08.472049+00 | 2016-01-28 > 16:07:05.493455+00 | 0 | 00:00:08.522529 > (3 rows) > > > ams at ams-repl4.blr.netmgmt:~$ /a/third-party/postgresql/bin/psql -U > ams_slony -d ams -c 'select * from _ams_cluster.sl_status' > st_origin | st_received | st_last_event | st_last_event_ts | > st_last_received | st_last_received_ts | > st_last_received_event_ts | st_lag_num_events | st_lag_time > -----------+-------------+---------------+-------------------------------+- > -----------------+-------------------------------+------------------------- > ------+-------------------+----------------- > 4 | 3 | 5000637483 | 2016-01-28 16:07:32.698809+00 | > 5000637482 | 2016-01-28 16:07:28.731404+00 | 2016-01-28 > 16:07:22.695826+00 | 1 | 00:00:19.077657 > 4 | 1 | 5000637483 | 2016-01-28 16:07:32.698809+00 | > 5000637482 | 2016-01-28 16:07:24.839978+00 | 2016-01-28 > 16:07:22.695826+00 | 1 | 00:00:19.077657 > 4 | 2 | 5000637483 | 2016-01-28 16:07:32.698809+00 | > 5000637482 | 2016-01-28 16:07:22.926411+00 | 2016-01-28 > 16:07:22.695826+00 | 1 | 00:00:19.077657 > (3 rows) > > > > Tom :-) > > > > On 1/28/16, 10:38 AM, "Jan Wieck" <jan at wi3ck.info> wrote: > >>On 01/28/2016 08:30 AM, Tignor, Tom wrote: >>> >>> Hello slony folks, >>> From my reading I¹m guessing (hoping) this isn¹t a new problem. I have >>> a simple cluster with one provider replicating to three subscribers. The >>> provider¹s changelog tables (sl_log_[1|2]) are fine, but the subscribers >>> (with forwarding enabled) are all showing runaway growth. Looked through >>> the FAQ and I don¹t see the node I dropped or any idle transactions as >>> viable culprits. Are there other thoughts on the cause? Can I safely >>> manually delete/truncate some/all of the changelog tables? These >>> replicas are all leaf nodes. I only have forwarding turned on to allow >>> for failover, and my replication rate is the 2 sec default. >>> Thanks in advance for any insights. >> >>What is the output of the sl_status view "on those leaf nodes"? >> >> >>> >>> ams=# select >>> pg_size_pretty(pg_total_relation_size('_ams_cluster.sl_log_1')); >>> >>> pg_size_pretty >>> >>> ---------------- >>> >>> 75 MB >>> >>> (1 row) >>> >>> >>> ams=# select >>> pg_size_pretty(pg_total_relation_size('_ams_cluster.sl_log_2')); >>> >>> pg_size_pretty >>> >>> ---------------- >>> >>> 34 GB >>> >>> (1 row) >>> >>> >>> ams=# select * from _ams_cluster.sl_confirm where con_origin not in >>> (select no_id from _ams_cluster.sl_node) or con_received not in (select >>> no_id from _ams_cluster.sl_node); >>> >>> con_origin | con_received | con_seqno | con_timestamp >>> >>> ------------+--------------+-----------+--------------- >>> >>> (0 rows) >>> >>> >>> ams=# select * from pg_stat_activity where current_query like '%IDLE%'; >>> >>> datid | datname | procpid | usesysid | usename | >>> application_name | client_addr | client_hostname | client_port | >>> backend_start | xact_start | >>> query_start | waiting | >>> >>> current_query >>> >>> >>>-------+---------+---------+----------+------------+--------------------- >>>------+----------------+-----------------+-------------+----------------- >>>--------------+-------------------------------+-------------------------- >>>-----+---------+--- >>> >>> ---------------------------------------------------------------- >>> >>> 16393 | ams | 2611 | 212995 | ams_viewer | >>> | 88.221.209.10 | | 43328 | 2016-01-28 >>> 12:24:49.706389+00 | | 2016-01-28 >>> 13:18:02.427848+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 12894 | 212995 | ams_viewer | >>> | 88.221.209.10 | | 60112 | 2016-01-28 >>> 12:47:26.230681+00 | | 2016-01-28 >>> 13:15:27.744242+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 12884 | 212995 | ams_viewer | >>> | 88.221.209.10 | | 44302 | 2016-01-28 >>> 12:47:25.100006+00 | | 2016-01-28 >>> 13:15:27.936059+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 23466 | 213867 | ams_slony | psql >>> | | | -1 | 2016-01-28 >>> 13:11:32.030343+00 | 2016-01-28 13:18:37.283992+00 | 2016-01-28 >>> 13:18:37.283992+00 | f | se >>> >>> lect * from pg_stat_activity where current_query like '%IDLE%'; >>> >>> 16393 | ams | 6719 | 213867 | ams_slony | >>> slon.origin_2_provider_2 | 60.254.150.133 | | >>> 61806 | 2016-01-22 01:59:14.800129+00 | | >>> 2016-01-28 13:18:25.935111+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 6718 | 213867 | ams_slony | >>> slon.origin_3_provider_2 | 60.254.150.133 | | >>> 61805 | 2016-01-22 01:59:14.797655+00 | | >>> 2016-01-28 13:18:34.304475+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 5505 | 213867 | ams_slony | >>> slon.origin_4_provider_2 | 80.67.75.105 | | >>> 36477 | 2016-01-22 01:56:25.637046+00 | | >>> 2016-01-28 13:18:36.1348+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 5504 | 213867 | ams_slony | >>> slon.origin_3_provider_2 | 72.246.50.22 | | >>> 51813 | 2016-01-22 01:56:25.240798+00 | | >>> 2016-01-28 13:18:28.961629+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 5487 | 213867 | ams_slony | >>> slon.origin_4_provider_2 | 72.246.50.22 | | >>> 51803 | 2016-01-22 01:56:22.896388+00 | | >>> 2016-01-28 13:18:35.858913+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 5047 | 213867 | ams_slony | >>> slon.origin_2_provider_2 | 72.246.50.22 | | >>> 51564 | 2016-01-22 01:55:23.600296+00 | | >>> 2016-01-28 13:18:34.487192+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 5041 | 213867 | ams_slony | >>> slon.origin_2_provider_2 | 80.67.75.105 | | >>> 36402 | 2016-01-22 01:55:22.964462+00 | | >>> 2016-01-28 13:18:34.519066+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 6694 | 213867 | ams_slony | >>> slon.node_2_listen | 60.254.150.133 | | >>> 61795 | 2016-01-22 01:59:12.095052+00 | | >>> 2016-01-28 13:18:27.928384+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4456 | 213867 | ams_slony | >>> slon.node_2_listen | 72.246.50.22 | | >>> 51238 | 2016-01-22 01:54:21.481355+00 | | >>> 2016-01-28 13:18:36.766973+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4457 | 213867 | ams_slony | >>> slon.node_2_listen | 80.67.75.105 | | >>> 36333 | 2016-01-22 01:54:21.500456+00 | | >>> 2016-01-28 13:18:36.204482+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4428 | 213867 | ams_slony | >>> slon.local_monitor | | | >>> -1 | 2016-01-22 01:54:18.977015+00 | | >>> 2016-01-28 13:18:36.652567+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4427 | 213867 | ams_slony | slon.local_sync >>> | | | -1 | 2016-01-22 >>> 01:54:18.976932+00 | | 2016-01-28 >>> 13:18:36.151998+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4426 | 213867 | ams_slony | >>> slon.local_cleanup | | | >>> -1 | 2016-01-22 01:54:18.976842+00 | | >>> 2016-01-28 13:12:12.582921+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4425 | 213867 | ams_slony | >>> slon.remoteWorkerThread_4 | | | >>> -1 | 2016-01-22 01:54:18.976783+00 | | >>> 2016-01-28 13:18:33.99715+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4420 | 213867 | ams_slony | >>> slon.remoteWorkerThread_1 | | | >>> -1 | 2016-01-22 01:54:18.976548+00 | | >>> 2016-01-28 13:18:33.561531+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4419 | 213867 | ams_slony | >>> slon.remoteWorkerThread_3 | | | >>> -1 | 2016-01-22 01:54:18.97647+00 | | >>> 2016-01-28 13:18:34.808907+00 | f | <I >>> >>> DLE> >>> >>> 16393 | ams | 4413 | 213867 | ams_slony | slon.local_listen >>> | | | -1 | 2016-01-22 >>> 01:54:18.965568+00 | | 2016-01-28 >>> 13:18:37.096159+00 | f | <I >>> >>> DLE> >>> >>> (21 rows) >>> >>> >>> ams=# >>> >>> >>> >>> Tom :-) >>> >>> >>> >>> >>> _______________________________________________ >>> Slony1-general mailing list >>> Slony1-general at lists.slony.info >>> http://lists.slony.info/mailman/listinfo/slony1-general >>> >> >> >>-- >>Jan Wieck >>Senior Software Engineer >>http://slony.info > -- Jan Wieck Senior Software Engineer http://slony.info
- Previous message: [Slony1-general] Slony-I: log switch to sl_log_2 still in progress - sl_log_1 not truncated
- Next message: [Slony1-general] Replication inexplicably stops
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list