Fri Sep 7 08:50:59 PDT 2012
- Previous message: [Slony1-general] Slony lagging, no errors, due to bulk data change?
- Next message: [Slony1-general] Lagtime by set ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 09/07/2012 09:46 AM, Steve Singer wrote: > On 09/06/2012 05:49 PM, Brian Fehrle wrote: >> More information: > >> Looking at the pg_stat_activity, it looks like 'fetch 1200 from LOG' >> has the same transaction start time as the query start time, which >> suggests that each time it fetches 100 from the log it is starting a >> new cursor, is this normal / expected? >> > > Yes this is normal for your version of slony, but a bug. It was fixed > in slony 2.1. I think the bug number was bug 167. > Ok thanks. I did look at the source for both 1.2 and 2.1 and noticed that the worker thread changed significantly, and suspected it changed. Unfortunately it's been running for a full day in this state, and in 1 day it processed about 1 hour worth of events. So we're going to be uninstalling both nodes and re-setting it up. Long process because we have dozens of remote subscribers via log shipping too. In the future, if we need to perform mass updates to tables like this, it may be a good idea to remove the table from replication, do all the updates, then add it back. It would still be annoying for remote subscribers but we have processes in place to handle it. - Brian F >> >> select now() - xact_start as transaction_age, now() - query_start as >> query_age, current_query::varchar(80), waiting from pg_stat_activity >> where usename = 'slony' and current_query like '%LOG%'; >> transaction_age | query_age | current_query | waiting >> -----------------+-----------------+----------------------+--------- >> 00:27:01.652087 | 00:27:01.650933 | fetch 100 from LOG; | f >> (1 row) >> >> >> So this has to be my issue. The query that hits sl_log_2 is a >> sequential scan, does this mean every time we retrieve 100 from the >> cursor, we do another sequential scan on the table to get the next >> 100 rows? >> >> - Brian F >> >> >> On 09/06/2012 03:09 PM, Brian Fehrle wrote: >>> Hi all, >>> >>> postgresql 8.4 >>> slony 1.2 >>> centos system >>> >>> I have single master to single slave slony cluster where slony is very >>> far behind. There are no errors in the logs or anything, but what looks >>> to be happeneing is that queries slony is executing on the slony log >>> table on the master are taking a long time to complete. >>> >>> The query is "fetch 100 from LOG;" which can take a long time to >>> complete, over 15 minutes at times. Each time this happens we process 1 >>> event. This usually takes milliseconds to complete. >>> >>> At this point, the query on the master 'fetch 100 from log' takes about >>> 20 minutes to complete, and after it completes the slave processes 1 >>> more event, and then 'fetch 100 from log' kicks off again and takes yet >>> another 20 or so minutes. So the slave is processing an event about once >>> per 20 minutes. >>> >>> As for a cause I believe it's due to the follow up work after adding a >>> column to a table in replication. After adding the column, I believe >>> that the table is being updated to set new values in the newly added >>> column. This could result in millions of new items for slony to process, >>> which may have caused the tables to become so large that they are >>> resulting in sequential scans or something. >>> >>> I'm trying to dig in and see what exactly 'fetch 100 from log' is doing >>> on the master, and if I can speed it up. Is this querying sl_log_<1/2> >>> tables? >>> >>> the pg table pg_notify does not have outstanding dead rows, it's at 0. >>> also out of all the slony tables in the slony schema, the one with the >>> most dead rows is at about 2K dead rows. >>> >>> sl_log_1 has 0 rows, sl_log_2 has about 9,326,260 rows (and zero dead rows). >>> >>> I'm going to see if we can reduce group size, see if for whatever reason >>> that reduces the query result set so it does a index scan vs sequential >>> (if that's even the issue). >>> >>> Any help is greatly appreciated. >>> >>> - Brian F >>> _______________________________________________ >>> Slony1-general mailing list >>> Slony1-general at lists.slony.info >>> http://lists.slony.info/mailman/listinfo/slony1-general >> >> >> >> _______________________________________________ >> Slony1-general mailing list >> Slony1-general at lists.slony.info >> http://lists.slony.info/mailman/listinfo/slony1-general > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20120907/ff466aef/attachment.htm
- Previous message: [Slony1-general] Slony lagging, no errors, due to bulk data change?
- Next message: [Slony1-general] Lagtime by set ?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list