Glyn Astill glynastill at yahoo.co.uk
Fri Nov 20 08:00:30 PST 2015
Using Slony 2.2.4, Pg 9.4.5 I infrequently see replication lag when deleting a moderately large quantity of records.

I've seen this very infrequently (i.e 3 or so times in a year), but came to a dead end investigating the cause last time (previous thread here http://comments.gmane.org/gmane.comp.db.postgresql.slony1.general/11556).  

I'm hoping someone might be able to nudge me in the right direction this time.

The cause seems to be chunks of deletes on 2 specific tables (within a transaction), although I can't see a reason why these specific tables cause issues other than the tables having pretty poor primary keys and 
indexes (Table definitions at the bottom).  

Changes to each table are logged by a table_log trigger (https://github.com/glynastill/pg_table_audit), which we admittedly need to move away from, but I can't see a reason for this to be an issue here either.

There is activity from other tables, but I'm pretty certain it's something to do with deletes from these specific tables, although I can't reliably reproduce the behaviour.

In this case deleting around 100,000 records from each table on the origin (taking around 68 seconds), causes the remote worker thread on all the subscribers to take approximately 90 minutes copying into it's sl_log table. E.g:

2015-11-20 12:28:21 GMT [9843]: [5-1] app=slon.remoteWorkerThread_7,user=slony,db=db,host=10.16.10.101 LOG:  duration: 6096341.085 ms  statement: COPY "_replication"."sl_log_2" ( log_origin, log_txid,log_tableid,log_actionseq,log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols,log_cmdargs) FROM STDIN

The hardware on the subscribers is identical to the origin, and the postgres backend on the subscriber is at 100% cpu during this time.  It's not waiting on I/O, and nothing jumps out to me from pg_locks 
during this time other than the slow deletes:

SELECT pl.*, c.relname FROM pg_locks pl 
LEFT JOIN pg_class c on pl.relation = c.oid 
ORDER BY c.relname;


locktype    | database  |  relation  | page | tuple | virtualxid  | transactionid | classid | objid | objsubid | virtualtransaction |  pid  |       mode       | granted | fastpath |                 relname 
---------------+-----------+------------+------+-------+-------------+---------------+---------+-------+----------+--------------------+-------+------------------+---------+----------+------------------------------------------ 
relation      | 501924233 | 1013283824 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | t        | PartInd_main_replication_sl_log_1-node-7 
relation      | 501924233 |  501930895 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | at_seats 
relation      | 501924233 |  501930899 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | t        | at_seats2 
relation      | 501924233 |  501930902 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | f        | at_seats2_trigger_id_seq 
relation      | 501924233 |  501930904 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | f        | at_seats_trigger_id_seq 
relation      | 501924233 |       1259 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | f        | pg_class 
relation      | 501924233 |       1259 |      |       |             |               |         |       |          | 17/11810762        | 26140 | AccessShareLock  | t       | t        | pg_class 
relation      | 501924233 |       2662 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | pg_class_oid_index 
relation      | 501924233 |       2662 |      |       |             |               |         |       |          | 17/11810762        | 26140 | AccessShareLock  | t       | t        | pg_class_oid_index 
relation      | 501924233 |       2663 |      |       |             |               |         |       |          | 17/11810762        | 26140 | AccessShareLock  | t       | t        | pg_class_relname_nsp_index 
relation      | 501924233 |       2663 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | pg_class_relname_nsp_index 
relation      | 501924233 |       3455 |      |       |             |               |         |       |          | 17/11810762        | 26140 | AccessShareLock  | t       | t        | pg_class_tblspc_relfilenode_index 
relation      | 501924233 |       3455 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | pg_class_tblspc_relfilenode_index 
relation      | 501924233 |      11187 |      |       |             |               |         |       |          | 17/11810762        | 26140 | AccessShareLock  | t       | t        | pg_locks 
relation      | 501924233 |       2615 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | pg_namespace 
relation      | 501924233 |       2684 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | pg_namespace_nspname_index 
relation      | 501924233 |       2685 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | pg_namespace_oid_index 
relation      | 501924233 |  501935104 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats 
relation      | 501924233 |  501935120 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats2 
relation      | 501924233 | 1003806325 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats2_date1 
relation      | 501924233 | 1003806324 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats2_index00 
relation      | 501924233 |  995705316 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats2_index01 
relation      | 501924233 | 1003806326 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats2_index02 
relation      | 501924233 | 1003806328 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats2_index04 
relation      | 501924233 | 1003806327 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats2_index05 
relation      | 501924233 | 1003806283 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats_index00 
relation      | 501924233 |  992270676 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats_index01 
relation      | 501924233 | 1003806286 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats_index02 
relation      | 501924233 | 1003806285 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats_index03 
relation      | 501924233 | 1003806284 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | seats_index04 
relation      | 501924233 |  501927863 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | sl_apply_stats 
relation      | 501924233 |  501927907 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | f        | sl_local_node_id 
relation      | 501924233 |  501927909 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | t        | sl_log_1 
relation      | 501924233 |  763450577 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | t        | sl_log_1_idx1 
relation      | 501924233 |  501927921 |      |       |             |               |         |       |          | 11/764780          | 80081 | RowExclusiveLock | t       | f        | sl_log_script 
relation      | 501924233 |  501927927 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | sl_log_status 
relation      | 501924233 |  501927956 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | sl_setsync 
relation      | 501924233 |  758946419 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | sl_setsync-pkey 
relation      | 501924233 |  501927849 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | f        | sl_subscribe 
relation      | 501924233 |  755902543 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | f        | sl_subscribe-pkey 
relation      | 501924233 |  501927833 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | sl_table 
relation      | 501924233 |  760027156 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | sl_table-pkey 
relation      | 501924233 |  761881818 |      |       |             |               |         |       |          | 11/764780          | 80081 | AccessShareLock  | t       | t        | sl_table_tab_reloid_key 
transactionid |           |            |      |       |             |       8795986 |         |       |          | 11/764780          | 80081 | ExclusiveLock    | t       | f        | 
virtualxid    |           |            |      |       | 17/11810762 |               |         |       |          | 17/11810762        | 26140 | ExclusiveLock    | t       | t        | 
virtualxid    |           |            |      |       | 11/764780   |               |         |       |          | 11/764780          | 80081 | ExclusiveLock    | t       | t        | 
(46 rows)  

I'm logging apply stats activity, but don't see anything out of the ordinary there except for the unusually large duration of this copy:


as_timestamp        | as_origin | as_num_insert | as_num_update | as_num_delete | as_duration  | as_cache_prepare | as_cache_hit | as_cache_evict | as_cache_prepare_max 
----------------------------+-----------+---------------+---------------+---------------+--------------+------------------+--------------+----------------+---------------------- 
2015-11-20 10:47:01.310719 |         7 |           592 |           713 |            41 | 00:00:04.039 |              523 |          823 |              0 |                    0 
2015-11-20 12:29:01.694959 |         7 |          7312 |         10626 |        200509 | 01:42:16.267 |              605 |       217842 |             77 |                    0 
2015-11-20 12:30:01.947644 |         7 |          7157 |         21813 |         27925 | 00:01:01.226 |              479 |        56416 |            179 |                    0

I was wondering if this could be anything funny happening in logApply, but don't really know enough about that trigger to investigate. Perhaps next time I'll try and get a few backtraces from the backend, but alas 
for now I'm non the wiser.

Any hint's would be greatly appreciated.

Glyn


Table definitions:

db=> \d seating
                                                Table "tickets.seating"
    Column     |            Type             |                               Modifiers
---------------+-----------------------------+-----------------------------------------------------------------------
 recnum        | bigint                      | not null default nextval(('"tickets"."seating_dfseq"'::text)::regclass)
 show          | numeric(8,0)                | not null default 0
 type          | numeric(4,0)                | not null default 0
 block         | character varying(8)        | not null default ' '::character varying
 row           | numeric(14,0)               | not null default 0
 seat          | numeric(8,0)                | not null default 0
 flag          | character varying(15)       | not null default ' '::character varying
 transno       | numeric(8,0)                | not null default 0
 best          | numeric(4,0)                | not null default 0
 user          | character varying(15)       | not null default ' '::character varying
 time          | numeric(10,0)               | not null default 0
 date          | date                        | not null default '0001-01-01'::date
 date_reserved | timestamp without time zone | not null default '0001-01-01 00:00:00'::timestamp without time zone
Indexes:
    "seating_index01" PRIMARY KEY, btree (show, type, best, block, "row", seat)
    "seating_index00" UNIQUE, btree (recnum)
    "seating_index02" UNIQUE, btree (show, type, best, block, flag, "row", seat, recnum)
    "seating_index03" UNIQUE, btree (show, type, flag, best, block, "row", seat, recnum)
    "seating_index04" UNIQUE, btree (date_reserved, recnum)
Triggers:
   
_replication_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON seating 
FOR EACH ROW EXECUTE PROCEDURE _replication.denyaccess('_replication')
    _replication_truncatedeny BEFORE TRUNCATE ON seating FOR EACH STATEMENT EXECUTE PROCEDURE _replication.deny_truncate()
Disabled user triggers:
   
_replication_logtrigger AFTER INSERT OR DELETE OR UPDATE ON seating FOR
EACH ROW EXECUTE PROCEDURE _replication.logtrigger('_replication', 
'813', 'vkkkkkvvk')
   
_replication_truncatetrigger BEFORE TRUNCATE ON seating FOR EACH 
STATEMENT EXECUTE PROCEDURE _replication.log_truncate('813')
Triggers firing always:
   
table_log_trigger AFTER INSERT OR DELETE OR UPDATE ON seating FOR EACH 
ROW EXECUTE PROCEDURE table_log('at_seating', '1', 'audit')    
    
sb=> \d seating2
                                             Table "tickets.seating2"
    Column     |         Type          |                               Modifiers
---------------+-----------------------+------------------------------------------------------------------------
 recnum        | bigint                | not null default nextval(('"tickets"."seating2_dfseq"'::text)::regclass)
 show          | numeric(8,0)          | not null default 0
 type          | numeric(4,0)          | not null default 0
 block         | character varying(8)  | not null default ' '::character varying
 row           | numeric(14,0)         | not null default 0
 seat          | numeric(8,0)          | not null default 0
 flag          | character varying(15) | not null default ' '::character varying
 transno       | numeric(8,0)          | not null default 0
 best          | numeric(4,0)          | not null default 0
 user          | character varying(15) | not null default ' '::character varying
 time          | numeric(10,0)         | not null default 0
 received      | character varying(1)  | not null default ' '::character varying
 asc1          | character varying(20) | not null default ' '::character varying
 asc2          | character varying(20) | not null default ' '::character varying
 num1          | numeric(8,2)          | not null default 0
 num2          | numeric(8,2)          | not null default 0
 date1         | date                  | not null default '0001-01-01'::date
 date2         | date                  | not null default '0001-01-01'::date
 ever_received | boolean               |
Indexes:
    "seating2_index01" PRIMARY KEY, btree (show, type, best, block, "row", seat)
    "seating2_index00" UNIQUE, btree (recnum)
    "seating2_index02" UNIQUE, btree (show, type, block, "row", seat)
    "seating2_index04" UNIQUE, btree (transno, best, block, "row", seat, recnum)
    "seating2_index05" UNIQUE, btree (show, best, block, "row", seat, recnum)
    "seating2_date1" btree (date1)
Triggers:
   
_replication_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON seating2 
FOR EACH ROW EXECUTE PROCEDURE _replication.denyaccess('_replication')
    _replication_truncatedeny BEFORE TRUNCATE ON seating2 FOR EACH STATEMENT EXECUTE PROCEDURE _replication.deny_truncate()
Disabled user triggers:
   
_replication_logtrigger AFTER INSERT OR DELETE OR UPDATE ON seating2 
FOR EACH ROW EXECUTE PROCEDURE _replication.logtrigger('_replication', 
'812', 'vkkkkkvvk')
   
_replication_truncatetrigger BEFORE TRUNCATE ON seating2 FOR EACH 
STATEMENT EXECUTE PROCEDURE _replication.log_truncate('812')
Triggers firing always:
   
table_log_trigger AFTER INSERT OR DELETE OR UPDATE ON seating2 FOR EACH
ROW EXECUTE PROCEDURE table_log('at_seating2', '1', 'audit')    


More information about the Slony1-general mailing list