Fri Nov 20 08:00:30 PST 2015
- Previous message: [Slony1-general] slony newbie setup trouble
- Next message: [Slony1-general] Slony replication lag with moderate quantity of deletes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Using Slony 2.2.4, Pg 9.4.5 I infrequently see replication lag when deleting a moderately large quantity of records. I've seen this very infrequently (i.e 3 or so times in a year), but came to a dead end investigating the cause last time (previous thread here http://comments.gmane.org/gmane.comp.db.postgresql.slony1.general/11556). I'm hoping someone might be able to nudge me in the right direction this time. The cause seems to be chunks of deletes on 2 specific tables (within a transaction), although I can't see a reason why these specific tables cause issues other than the tables having pretty poor primary keys and indexes (Table definitions at the bottom). Changes to each table are logged by a table_log trigger (https://github.com/glynastill/pg_table_audit), which we admittedly need to move away from, but I can't see a reason for this to be an issue here either. There is activity from other tables, but I'm pretty certain it's something to do with deletes from these specific tables, although I can't reliably reproduce the behaviour. In this case deleting around 100,000 records from each table on the origin (taking around 68 seconds), causes the remote worker thread on all the subscribers to take approximately 90 minutes copying into it's sl_log table. E.g: 2015-11-20 12:28:21 GMT [9843]: [5-1] app=slon.remoteWorkerThread_7,user=slony,db=db,host=10.16.10.101 LOG: duration: 6096341.085 ms statement: COPY "_replication"."sl_log_2" ( log_origin, log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols,log_cmdargs) FROM STDIN The hardware on the subscribers is identical to the origin, and the postgres backend on the subscriber is at 100% cpu during this time. It's not waiting on I/O, and nothing jumps out to me from pg_locks during this time other than the slow deletes: SELECT pl.*, c.relname FROM pg_locks pl LEFT JOIN pg_class c on pl.relation = c.oid ORDER BY c.relname; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath | relname ---------------+-----------+------------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------+----------+------------------------------------------ relation | 501924233 | 1013283824 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | t | PartInd_main_replication_sl_log_1-node-7 relation | 501924233 | 501930895 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | at_seats relation | 501924233 | 501930899 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | t | at_seats2 relation | 501924233 | 501930902 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | f | at_seats2_trigger_id_seq relation | 501924233 | 501930904 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | f | at_seats_trigger_id_seq relation | 501924233 | 1259 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | f | pg_class relation | 501924233 | 1259 | | | | | | | | 17/11810762 | 26140 | AccessShareLock | t | t | pg_class relation | 501924233 | 2662 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | pg_class_oid_index relation | 501924233 | 2662 | | | | | | | | 17/11810762 | 26140 | AccessShareLock | t | t | pg_class_oid_index relation | 501924233 | 2663 | | | | | | | | 17/11810762 | 26140 | AccessShareLock | t | t | pg_class_relname_nsp_index relation | 501924233 | 2663 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | pg_class_relname_nsp_index relation | 501924233 | 3455 | | | | | | | | 17/11810762 | 26140 | AccessShareLock | t | t | pg_class_tblspc_relfilenode_index relation | 501924233 | 3455 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | pg_class_tblspc_relfilenode_index relation | 501924233 | 11187 | | | | | | | | 17/11810762 | 26140 | AccessShareLock | t | t | pg_locks relation | 501924233 | 2615 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | pg_namespace relation | 501924233 | 2684 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | pg_namespace_nspname_index relation | 501924233 | 2685 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | pg_namespace_oid_index relation | 501924233 | 501935104 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats relation | 501924233 | 501935120 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats2 relation | 501924233 | 1003806325 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats2_date1 relation | 501924233 | 1003806324 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats2_index00 relation | 501924233 | 995705316 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats2_index01 relation | 501924233 | 1003806326 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats2_index02 relation | 501924233 | 1003806328 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats2_index04 relation | 501924233 | 1003806327 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats2_index05 relation | 501924233 | 1003806283 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats_index00 relation | 501924233 | 992270676 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats_index01 relation | 501924233 | 1003806286 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats_index02 relation | 501924233 | 1003806285 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats_index03 relation | 501924233 | 1003806284 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | seats_index04 relation | 501924233 | 501927863 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | sl_apply_stats relation | 501924233 | 501927907 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | f | sl_local_node_id relation | 501924233 | 501927909 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | t | sl_log_1 relation | 501924233 | 763450577 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | t | sl_log_1_idx1 relation | 501924233 | 501927921 | | | | | | | | 11/764780 | 80081 | RowExclusiveLock | t | f | sl_log_script relation | 501924233 | 501927927 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | sl_log_status relation | 501924233 | 501927956 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | sl_setsync relation | 501924233 | 758946419 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | sl_setsync-pkey relation | 501924233 | 501927849 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | f | sl_subscribe relation | 501924233 | 755902543 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | f | sl_subscribe-pkey relation | 501924233 | 501927833 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | sl_table relation | 501924233 | 760027156 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | sl_table-pkey relation | 501924233 | 761881818 | | | | | | | | 11/764780 | 80081 | AccessShareLock | t | t | sl_table_tab_reloid_key transactionid | | | | | | 8795986 | | | | 11/764780 | 80081 | ExclusiveLock | t | f | virtualxid | | | | | 17/11810762 | | | | | 17/11810762 | 26140 | ExclusiveLock | t | t | virtualxid | | | | | 11/764780 | | | | | 11/764780 | 80081 | ExclusiveLock | t | t | (46 rows) I'm logging apply stats activity, but don't see anything out of the ordinary there except for the unusually large duration of this copy: as_timestamp | as_origin | as_num_insert | as_num_update | as_num_delete | as_duration | as_cache_prepare | as_cache_hit | as_cache_evict | as_cache_prepare_max ----------------------------+-----------+---------------+---------------+---------------+--------------+------------------+--------------+----------------+---------------------- 2015-11-20 10:47:01.310719 | 7 | 592 | 713 | 41 | 00:00:04.039 | 523 | 823 | 0 | 0 2015-11-20 12:29:01.694959 | 7 | 7312 | 10626 | 200509 | 01:42:16.267 | 605 | 217842 | 77 | 0 2015-11-20 12:30:01.947644 | 7 | 7157 | 21813 | 27925 | 00:01:01.226 | 479 | 56416 | 179 | 0 I was wondering if this could be anything funny happening in logApply, but don't really know enough about that trigger to investigate. Perhaps next time I'll try and get a few backtraces from the backend, but alas for now I'm non the wiser. Any hint's would be greatly appreciated. Glyn Table definitions: db=> \d seating Table "tickets.seating" Column | Type | Modifiers ---------------+-----------------------------+----------------------------------------------------------------------- recnum | bigint | not null default nextval(('"tickets"."seating_dfseq"'::text)::regclass) show | numeric(8,0) | not null default 0 type | numeric(4,0) | not null default 0 block | character varying(8) | not null default ' '::character varying row | numeric(14,0) | not null default 0 seat | numeric(8,0) | not null default 0 flag | character varying(15) | not null default ' '::character varying transno | numeric(8,0) | not null default 0 best | numeric(4,0) | not null default 0 user | character varying(15) | not null default ' '::character varying time | numeric(10,0) | not null default 0 date | date | not null default '0001-01-01'::date date_reserved | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone Indexes: "seating_index01" PRIMARY KEY, btree (show, type, best, block, "row", seat) "seating_index00" UNIQUE, btree (recnum) "seating_index02" UNIQUE, btree (show, type, best, block, flag, "row", seat, recnum) "seating_index03" UNIQUE, btree (show, type, flag, best, block, "row", seat, recnum) "seating_index04" UNIQUE, btree (date_reserved, recnum) Triggers: _replication_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON seating FOR EACH ROW EXECUTE PROCEDURE _replication.denyaccess('_replication') _replication_truncatedeny BEFORE TRUNCATE ON seating FOR EACH STATEMENT EXECUTE PROCEDURE _replication.deny_truncate() Disabled user triggers: _replication_logtrigger AFTER INSERT OR DELETE OR UPDATE ON seating FOR EACH ROW EXECUTE PROCEDURE _replication.logtrigger('_replication', '813', 'vkkkkkvvk') _replication_truncatetrigger BEFORE TRUNCATE ON seating FOR EACH STATEMENT EXECUTE PROCEDURE _replication.log_truncate('813') Triggers firing always: table_log_trigger AFTER INSERT OR DELETE OR UPDATE ON seating FOR EACH ROW EXECUTE PROCEDURE table_log('at_seating', '1', 'audit') sb=> \d seating2 Table "tickets.seating2" Column | Type | Modifiers ---------------+-----------------------+------------------------------------------------------------------------ recnum | bigint | not null default nextval(('"tickets"."seating2_dfseq"'::text)::regclass) show | numeric(8,0) | not null default 0 type | numeric(4,0) | not null default 0 block | character varying(8) | not null default ' '::character varying row | numeric(14,0) | not null default 0 seat | numeric(8,0) | not null default 0 flag | character varying(15) | not null default ' '::character varying transno | numeric(8,0) | not null default 0 best | numeric(4,0) | not null default 0 user | character varying(15) | not null default ' '::character varying time | numeric(10,0) | not null default 0 received | character varying(1) | not null default ' '::character varying asc1 | character varying(20) | not null default ' '::character varying asc2 | character varying(20) | not null default ' '::character varying num1 | numeric(8,2) | not null default 0 num2 | numeric(8,2) | not null default 0 date1 | date | not null default '0001-01-01'::date date2 | date | not null default '0001-01-01'::date ever_received | boolean | Indexes: "seating2_index01" PRIMARY KEY, btree (show, type, best, block, "row", seat) "seating2_index00" UNIQUE, btree (recnum) "seating2_index02" UNIQUE, btree (show, type, block, "row", seat) "seating2_index04" UNIQUE, btree (transno, best, block, "row", seat, recnum) "seating2_index05" UNIQUE, btree (show, best, block, "row", seat, recnum) "seating2_date1" btree (date1) Triggers: _replication_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON seating2 FOR EACH ROW EXECUTE PROCEDURE _replication.denyaccess('_replication') _replication_truncatedeny BEFORE TRUNCATE ON seating2 FOR EACH STATEMENT EXECUTE PROCEDURE _replication.deny_truncate() Disabled user triggers: _replication_logtrigger AFTER INSERT OR DELETE OR UPDATE ON seating2 FOR EACH ROW EXECUTE PROCEDURE _replication.logtrigger('_replication', '812', 'vkkkkkvvk') _replication_truncatetrigger BEFORE TRUNCATE ON seating2 FOR EACH STATEMENT EXECUTE PROCEDURE _replication.log_truncate('812') Triggers firing always: table_log_trigger AFTER INSERT OR DELETE OR UPDATE ON seating2 FOR EACH ROW EXECUTE PROCEDURE table_log('at_seating2', '1', 'audit')
- Previous message: [Slony1-general] slony newbie setup trouble
- Next message: [Slony1-general] Slony replication lag with moderate quantity of deletes
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list