Thu Sep 27 15:51:04 PDT 2007
- Previous message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Next message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Jeff Frost wrote: > For extra background on this bug, see the thread here: > http://lists.slony.info/pipermail/slony1-general/2007-September/006687.html > > > This is on Slony1-1.2.10, PostgreSQL-8.1.9: > > I've run into a situation on a client cluster composed of a master and > 2 slave nodes where a deadlock on one of the slaves happens quite > regularly during any EXECUTE SCRIT commands. It seems if slony loses > the deadlock, some of the tables are left in a not altered for > replication state and this breaks replication. I am now mighty confused as to how this could happen. I headed down a pretty void rabbit trail, trying to duplicate the condition by trying to induce a deadlock. I had not much luck trying to do that; the tendancy was for deadlocks to occur against the queries with the lower locking levels. So I decided to instrument a slon and try to induce it to get into the described "no longer altered for replication" state. I added a nice "exit(-1);" at exactly the point at which you found the induction of the deadlock. The slon fell over at the end of processing the DDL script, as expected. And when it fell over, the transaction was (implicitly) rolled back, so that the condition of the subscriber was recovered to consistency. 2007-09-27 20:21:16 UTC DEBUG2 remoteWorkerThread_1: Received event 1,30 DDL_SCRIPT 2007-09-27 20:21:16 UTC INFO Checking local node id 2007-09-27 20:21:16 UTC INFO Found local node id 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is 80300 2007-09-27 20:21:16 UTC DEBUG4 remoteWorkerThread_3: update provider configuration 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL request with 10 statements 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 0: [create sequence ncolseq;] 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress3 host=localhost user=cbbrowne port=5834" is 80300 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is 80300 2007-09-27 20:21:16 UTC DEBUG4 version for "dbname=slonyregress2" is 80300 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 1: [ alter table table4 add column newcol timestamptz;] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 2: [ alter table table4 alter column newcol set default now();] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 3: [ alter table table4 rename column id1 to col1;] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 4: [ alter table table4 rename column id2 to col2;] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 5: [ ALTER TABLE billing_discount ADD column "use_term" character(1);] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 6: [ ALTER TABLE billing_discount ALTER column use_term set default 'n';] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 7: [ ALTER TABLE billing_discount add constraint use_term_cons check (use_term in ('y','n'));] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 8: [ alter table table1 rename column id to col1;] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_COMMAND_OK 2007-09-27 20:21:16 UTC CONFIG remoteWorkerThread_1: DDL Statement 9: [] 2007-09-27 20:21:16 UTC CONFIG DDL success - PGRES_EMPTY_QUERY 2007-09-27 20:21:16 UTC DEBUG2 slon: child terminated status: 65280; pid: 24126, current worker pid: 24126 2007-09-27 20:21:16 UTC DEBUG1 slon: restart of worker in 10 seconds 2007-09-27 20:21:20 UTC DEBUG1 slon: shutdown requested 2007-09-27 20:21:20 UTC DEBUG2 slon: notify worker process to shutdown 2007-09-27 20:21:20 UTC DEBUG1 slon: done 2007-09-27 20:21:20 UTC DEBUG2 slon: exit(0) So, for instance, statement #6 was going to alter table billing_discount to add a CHECK constraint: As we can see, that constraint isn't there. slonyregress2@[local]:5834=# \d billing_discount Table "public.billing_discount" Column | Type | Modifiers ---------------------+--------------------------+-------------------------------------------------------------------- discount_code | character(2) | not null billing_object_type | character varying(10) | not null billing_action_type | character varying(10) | not null discount_amount | numeric(7,2) | not null start_date | timestamp with time zone | not null end_date | timestamp with time zone | not null billing_discount_id | integer | not null default nextval(('billing_discount_seq'::text)::regclass) registrar_id | integer | tld_id | integer | zone_id | integer | Indexes: "billing_discount_pkey" PRIMARY KEY, btree (billing_discount_id) Triggers: _slony_regress1_denyaccess_5 BEFORE INSERT OR DELETE OR UPDATE ON billing_discount FOR EACH ROW EXECUTE PROCEDURE _slony_regress1.denyaccess('_slony_regress1') slonyregress2@[local]:5834=# This heads me back to the "wait, that's impossible!" expectation that I started with. When the ERROR message was submitted to the slon log, that means that the transaction rolls back, and everything recovers to where it was. That's what I see happen here; I can't quite fathom why that's not happening there.
- Previous message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Next message: [Slony1-bugs] Slony 1.2.10: Deadlock on slave during execute script
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-bugs mailing list