Sun Apr 27 10:06:54 PDT 2014
- Previous message: [Slony1-general] Replication fails on Corrupted PK index
- Next message: [Slony1-general] Replication fails on Corrupted PK index
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 04/25/14 07:32, Raghav wrote: > Hi, > > I am facing an issue, where one of primary key index got corrupted, due > to which duplicate rows got inserted into the table. Same duplicate rows > are captured by slony in its sl_log_* and failing to replay those events > on slave with duplicate key violation and aborting. Later I found a > Slony Faq where it describes the similar situation(I am guessing) under > "*8.4. *Replication Fails - Unique Constraint Violation" section in this > link. http://main.slony.info/documentation/1.2/faq.html#AEN7936 Let me make sure that I understand the issue correctly. You say that your master database has a corrupted index, which allowed duplicate keys to be inserted. This means that without removing those duplicate rows, you could not REINDEX the table on the master either. Replication fails because you do not have the same index corruption on the replica(s). Correct me if I misunderstood. The following is based on this assumption. > > I could able to fix it by removing entries manually from the sl_log_* to > continue syncing. > My curious question is, why slony sl_log_* are not capturing distinct > values. I believe if it captures the distinct values in sl_log_* then > there's no point of sync abort. Correct me if am wrong. > Also, the link describes the situation of Slony 1.2 version whereas am > using latest version of slony. Is this still expected in this version of > slony ? Slony log tables are capturing INSERT, UPDATE, DELETE and TRUNCATE operations. Except for the TRUNCATE, all those operations are captured on the row level. These insert operations that led to duplicate keys did happen, even though they should not have. But Slony has A) no way of detecting that and B) it isn't Slony's duty to second guess the integrity of the master database. What you might have here is that your replica contains all your data up to a second before the index got corrupted. I say "might" because this sort of index corruption could have had other side effects, like not finding rows on UPDATE or DELETE, which therefore contain old data or should not be there at all. Unless your application checks the query execution results for the number of rows affected, those UPDATE/DELETE failures would not have been detected and since no rows were affected, no such operations would have been captured and replicated either. If you indeed find that your replica contains "better" data than the corrupted master, then you could correct the situation easier by performing a FAILOVER, or by uninstalling Slony and rebuilding the cluster based on the healthy replica as initial master. Regards, Jan > > Details of issue for your reference: > *_Version:_* > > PG 9.3, RHEL, Slony 2.2 > > *_On Primary_* > > postgres=# \d dtest > Table "public.dtest" > Column | Type | Modifiers > --------+---------------+----------- > id | integer | not null > name | character(10) | > Indexes: > "dtest_pkey" PRIMARY KEY, btree (id) > Triggers: > _rf_logtrigger AFTER INSERT OR DELETE OR UPDATE ON dtest FOR EACH > ROW EXECUTE PROCEDURE _rf.logtrigger('_rf', '2', 'k') > _rf_truncatetrigger BEFORE TRUNCATE ON dtest FOR EACH STATEMENT > EXECUTE PROCEDURE _rf.log_truncate('2') > Disabled triggers: > _rf_denyaccess BEFORE INSERT OR DELETE OR UPDATE ON dtest FOR EACH > ROW EXECUTE PROCEDURE _rf.denyaccess('_rf') > _rf_truncatedeny BEFORE TRUNCATE ON dtest FOR EACH STATEMENT > EXECUTE PROCEDURE _rf.deny_truncate() > > postgres=# select * from dtest ; /// Due to corruption you have > duplicate rows. > id | name > ----+------------ > 1 | A > 2 | B > 3 | C > 4 | D > 4 | D > > postgres=# select * from _rf.sl_log_2; /// This has captured the event > as it is > log_origin | log_txid | log_tableid | log_actionseq | log_tablenspname > | log_tablerelname | log_cmdtype | log_cmdupdncols | log_cmdargs > ------------+----------+-------------+---------------+------------------+------------------+-------------+-----------------+------------------ > 1 | 12292 | 2 | 9 | public > | dtest | T | 0 | {} > 1 | 12377 | 2 | 10 | public > | dtest | I | 0 | {id,1,name,"A > 1 | 12389 | 2 | 11 | public > | dtest | I | 0 | {id,2,name,"B > 1 | 12400 | 2 | 12 | public > | dtest | I | 0 | {id,3,name,"C > 1 | 13605 | 2 | 13 | public > | dtest | I | 0 | {id,4,name,"D > 1 | 13611 | 2 | 14 | public > | dtest | I | 0 | {id,4,name,"D > (6 rows) > > > *_Slony Log information:_* > > 2014-04-11 11:20:09 PDT ERROR remoteWorkerThread_1_1: error at end of > COPY IN: ERROR: duplicate key value violates unique constraint "dtest_pkey" > DETAIL: Key (id)=(4) already exists. > CONTEXT: SQL statement "INSERT INTO "public"."dtest" ("id", "name") > VALUES ($1, $2);" > COPY sl_log_2, line 1: "1 13611 2 14 public dtest > I 0 {id,4,name,"D "}" > 2014-04-11 11:20:09 PDT ERROR remoteWorkerThread_1_1: failed SYNC's log > selection query was 'COPY ( select log_origin, log_txid, NULL::integer, > log_actionseq, NULL::text, NULL::text, log_cmdtype, NULL::integer, > log_cmdargs from "_rf".sl_log_script where log_origin = 1 and log_txid > >= "pg_catalog".txid_snapshot_xmax('13607:13607:') and log_txid < > '13614' and "pg_catalog".txid_visible_in_snapshot(log_txid, > '13614:13614:') union all select log_origin, log_txid, NULL::integer, > log_actionseq, NULL::text, NULL::text, log_cmdtype, NULL::integer, > log_cmdargs from "_rf".sl_log_script where log_origin = 1 and log_txid > in (select * from "pg_catalog".txid_snapshot_xip('13607:13607:') except > select * from "pg_catalog".txid_snapshot_xip('13614:13614:') ) union all > select log_origin, log_txid, log_tableid, log_actionseq, > log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, > log_cmdargs from "_rf".sl_log_2 where log_origin = 1 and log_tableid in > (1) and log_txid >= '13607' and log_txid < '13614' and > "pg_catalog".txid_visible_in_snapshot(log_txid, '13614:13614:') union > all select log_origin, log_txid, log_tableid, log_actionseq, > log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, > log_cmdargs from "_rf".sl_log_2 where log_origin = 1 and log_tableid in > (1) and log_txid in (select * from > "pg_catalog".txid_snapshot_xip('13607:13607:') except select * from > "pg_catalog".txid_snapshot_xip('13614:13614:') ) union all select > log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, > log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs from > "_rf".sl_log_2 where log_origin = 1 and log_tableid in (2) and log_txid > >= '13607' and log_txid < '13614' and > "pg_catalog".txid_visible_in_snapshot(log_txid, '13614:13614:') union > all select log_origin, log_txid, log_tableid, log_actionseq, > log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, > log_cmdargs from "_rf".sl_log_2 where log_origin = 1 and log_tableid in > (2) and log_txid in (select * from > "pg_catalog".txid_snapshot_xip('13607:13607:') except select * from > "pg_catalog".txid_snapshot_xip('13614:13614:') ) union all select > log_origin, log_txid, log_tableid, log_actionseq, log_tablenspname, > log_tablerelname, log_cmdtype, log_cmdupdncols, log_cmdargs from > "_rf".sl_log_2 where log_origin = 1 and log_tableid in (3) and log_txid > >= '13607' and log_txid < '13614' and > "pg_catalog".txid_visible_in_snapshot(log_txid, '13614:13614:') union > all select log_origin, log_txid, log_tableid, log_actionseq, > log_tablenspname, log_tablerelname, log_cmdtype, log_cmdupdncols, > log_cmdargs from "_rf".sl_log_2 where log_origin = 1 and log_tableid in > (3) and log_txid in (select * from > "pg_catalog".txid_snapshot_xip('13607:13607:') except select * from > "pg_catalog".txid_snapshot_xip('13614:13614:') ) order by log_actionseq) > TO STDOUT' > 2014-04-11 11:20:09 PDT ERROR remoteWorkerThread_1: SYNC aborted > > Thank you in advance. > > -- > Regards > Raghav > Blog: htt://raghavt.blogspot.com/ <http://raghavt.blogspot.com/> > > > _______________________________________________ > Slony1-general mailing list > Slony1-general at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-general > -- Jan Wieck Senior Software Engineer http://slony.info
- Previous message: [Slony1-general] Replication fails on Corrupted PK index
- Next message: [Slony1-general] Replication fails on Corrupted PK index
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list