Thu Jun 13 09:48:40 PDT 2019
- Previous message: [Slony1-general] slony backup during pg_dump
- Next message: [Slony1-general] slony backup during pg_dump
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, Jun 13, 2019 at 7:59 AM Christopher Browne <cbbrowne at afilias.info> wrote: > On Wed, Jun 12, 2019 at 11:58 PM Tory M Blue <tmblue at gmail.com> wrote: > >> First postgres 9.x >> CentOS 7.2 >> Replication flows: >> primary ->secondary-> 3 query slaves >> >> Okay here it is, >> >> My dump started: >> >> Start export of schema clsdb for user postgres at Wed Jun 12 19:00:48 PDT >> 2019 >> >> So if I'm reading this correct it's 20:45 but something is showing that >> the last _received_event_ts was at 19:38? over an hour ago? >> sl_lag_time 1 hour? >> >> Primary: >> >> select * from _cls.sl_status limit 3; >> >> st_origin | st_received | st_last_event | st_last_event_ts | >> st_last_receive >> >> d | st_last_received_ts | st_last_received_event_ts | >> st_lag_num_events | >> >> st_lag_time >> >> >> -----------+-------------+---------------+-------------------------------+---------------- >> >> >> --+-------------------------------+-------------------------------+-------------------+--- >> >> -------------- >> >> 11 | 14 | 5036030946 | 2019-06-12 20:45:29.465785-07 >> | 503603013 >> >> 8 | 2019-06-12 19:38:11.065976-07 | 2019-06-12 19:38:03.311267-07 | >> 808 | 01 >> >> :07:33.039694 >> >> 11 | 15 | 5036030946 | 2019-06-12 20:45:29.465785-07 >> | 503603013 >> >> 8 | 2019-06-12 19:38:12.202539-07 | 2019-06-12 19:38:03.311267-07 | >> 808 | 01 >> >> :07:33.039694 >> >> 11 | 12 | 5036030946 | 2019-06-12 20:45:29.465785-07 >> | 503603013 >> >> 8 | 2019-06-12 19:38:08.342637-07 | 2019-06-12 19:38:03.311267-07 | >> 808 | 01 >> >> :07:33.039694 >> >> (3 rows) >> >> >> I have 2.4 million rows in the sl_log_1 table, which looks like it >> started to hockey stick right around 19:30. >> >> >> > You're reading it pretty well right. > > It is mighty curious that it waited 37-ish minutes for replication to fall > off. > > It would be interesting (and it's too late for today) to capture the lag > between those times. > > Thus, something like... > > #!/bin/sh > PGDATABASE=postgresql://cbbrowne@localhost:7096/shared-sample > psql -d $PGDATABASE -c "create table if not exists lag_by_time (as_at > timestamp default now(), replication_lag interval);" > for i in `seq 180`; do > psql -d $PGDATABASE -c "insert into lag_by_time (replication_lag) > select min(st_lag_time) from _cls.sl_status;" > sleep 60 > done > > and launch that at about 19:00. At whatever point replication_lag heads > towards the Stanley Cup, that's the time at which Something Happened. > > It's probably interesting to look at pg_stat_activity to see what > long-running queries there might be. > > I'm somewhat suspicious that, despite what you were intending, perhaps the > pg_dump is nevertheless capturing the schema _cls; it would be "sorta > normal" if, at the point at which pg_dump starts dumping _cls material, > replication starts lagging. > > In particular, anything in pg_stat_activity that started at 19:38 ought to > be looked at with particular suspicion as a candidate for being the cause. > > >> >> On the secondary (which replicates to my query db's (where the dump is >> run from (not run from the primary), it includes these lines >> >> >> Secondary: >> >> >> select * from _cls.sl_status limit 3; >> >> st_origin | st_received | st_last_event | st_last_event_ts | >> st_la >> >> st_received | st_last_received_ts | st_last_received_event_ts >> | st >> >> _lag_num_events | st_lag_time >> >> >> -----------+-------------+---------------+-------------------------------+------ >> >> >> ------------+-------------------------------+-------------------------------+--- >> >> ----------------+----------------- >> >> 12 | 11 | 5008650405 | 2019-06-12 20:45:35.365757-07 >> | >> >> 5008650000 | 2019-06-12 19:38:03.815508-07 | 2019-06-12 >> 19:38:03.338486-07 | >> >> 405 | 01:07:35.106539 >> >> 12 | 13 | 5008650405 | 2019-06-12 20:45:35.365757-07 >> | >> >> 5008650403 | 2019-06-12 20:45:17.701455-07 | 2019-06-12 >> 20:45:15.355233-07 | >> >> 2 | 00:00:23.089792 >> >> 12 | 14 | 5008650405 | 2019-06-12 20:45:35.365757-07 >> | >> >> 5008650404 | 2019-06-12 20:45:26.642094-07 | 2019-06-12 >> 20:45:25.360167-07 | >> >> 1 | 00:00:13.084858 >> >> (3 rows) >> > > May I presume that the time you ran this query was around 20:45 or 20:46? > That looks unsurprising/consistent with the first query; events going > through node 11, which appears to be the master, are held back to 19:38. > And the few events that go through coming from other nodes are lightly > laggy, consistent with those nodes not generating SYNC events very often. > You assume correctly. i ran the queries back to back, and yes node 11 is the primary/master and node 12 is the standby, with nodes 13-15 just query nodes and only a subset of the data, they are not available for failover/switchover, only nodes 11-12 are. I saw nothing in the locks when I looked. We did notice that in our second dump, while we are using a -n table, we are not including a -N _slonschema,. Now that should work!! but we are going to add the -N _slonschema as well as the -n table, just to see. Will poke it some more this eve! Thanks again Tory -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20190613/791660d3/attachment-0001.htm
- Previous message: [Slony1-general] slony backup during pg_dump
- Next message: [Slony1-general] slony backup during pg_dump
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list