Thu Mar 9 12:51:37 PST 2006
- Previous message: [Slony1-general] recommended version (postgres)
- Next message: [Slony1-general] Slon not catching up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
slony1-1.1.5-rc2
postgres 8.1.2
With multiple replication sets defined (typically 2-4), in testing
failover, I am consistently getting deadlocks on the sl_setsync table
between the slonik failover connection and the local connection used by
remote_worker.c in the sync_event method.
I have node A as the origin of, say, 3 sets, and node B has a
subscription to each set. The deadlock errors are showing up in the db
log on node B after I attempt run a slonik failover on node B.
Is anybody else using multiple replication sets? We have to dynamically
add schemas to our database, which need to be replicated. Rather than do
a MERGE SET for each schema added, we created a separate replication set
for each one. This now appears to have been a big mistake, and we are
re-working the code to do a MERGE SET when we add new tables, but it
seems that multiple replication sets should work.
Here is yet another example from the log:
2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.148 ms statement:
begin transaction; set transaction isolation level serializable;
2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.534 ms statement:
select SSY.ssy_setid, SSY.ssy_seqno, SSY.ssy_minxid, SSY.ssy_maxxid,
SSY.ssy_xip, SSY.ssy_action_list from "_tzreplic".sl_setsync SSY
where SSY.ssy_seqno < '180' and SSY.ssy_setid in (271,388,400,412);
2006-03-09 10:01:36 EST-[19314] LOG: duration: 1.144 ms statement:
select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' ||
PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from
"_tzreplic".sl_table T, "pg_catalog".pg_class PGC,
"pg_catalog".pg_namespace PGN where T.tab_set = 388 and PGC.oid =
T.tab_reloid and PGC.relnamespace = PGN.oid;
2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.908 ms statement:
select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' ||
PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from
"_tzreplic".sl_table T, "pg_catalog".pg_class PGC,
"pg_catalog".pg_namespace PGN where T.tab_set = 400 and PGC.oid =
T.tab_reloid and PGC.relnamespace = PGN.oid;
2006-03-09 10:01:36 EST-[19314] LOG: duration: 0.886 ms statement:
select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' ||
PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from
"_tzreplic".sl_table T, "pg_catalog".pg_class PGC,
"pg_catalog".pg_namespace PGN where T.tab_set = 412 and PGC.oid =
T.tab_reloid and PGC.relnamespace = PGN.oid;
2006-03-09 10:01:36 EST-[19314] LOG: duration: 4.225 ms statement:
select T.tab_id, T.tab_set, "_tzreplic".slon_quote_input('"' ||
PGN.nspname || '"."' || PGC.relname || '"') as tab_fqname from
"_tzreplic".sl_table T, "pg_catalog".pg_class PGC,
"pg_catalog".pg_namespace PGN where T.tab_set = 271 and PGC.oid =
T.tab_reloid and PGC.relnamespace = PGN.oid;
2006-03-09 10:01:36 EST-[19316] LOG: duration: 0.246 ms statement:
start transaction;set transaction isolation level serializable;select
last_value from "_tzreplic".sl_action_seq;
2006-03-09 10:01:36 EST-[19316] LOG: duration: 0.047 ms statement:
rollback transaction;
2006-03-09 10:01:37 EST-[19316] LOG: duration: 0.267 ms statement:
start transaction;set transaction isolation level serializable;select
last_value from "_tzreplic".sl_action_seq;
2006-03-09 10:01:37 EST-[19316] LOG: duration: 0.050 ms statement:
rollback transaction;
2006-03-09 10:01:37 EST-[19445] NOTICE: failedNode: set 388 has no
other direct receivers - move now
2006-03-09 10:01:37 EST-[19445] STATEMENT: select
"_tzreplic".failedNode(1, 2);
2006-03-09 10:01:38 EST-[19316] LOG: duration: 0.320 ms statement:
start transaction;set transaction isolation level serializable;select
last_value from "_tzreplic".sl_action_seq;
2006-03-09 10:01:38 EST-[19316] LOG: duration: 0.049 ms statement:
rollback transaction;
2006-03-09 10:01:38 EST-[19445] ERROR: deadlock detected
2006-03-09 10:01:38 EST-[19445] DETAIL: Process 19445 waits for
ShareLock on transaction 5609; blocked by process 19314.
Process 19314 waits for ShareLock on transaction 5596; blocked by
process 19445.
2006-03-09 10:01:38 EST-[19445] CONTEXT: SQL statement "delete from
"_tzreplic".sl_setsync where ssy_setid = $1 "
PL/pgSQL function "failednode" line 115 at SQL statement
2006-03-09 10:01:38 EST-[19445] STATEMENT: select
"_tzreplic".failedNode(1, 2);
2006-03-09 10:01:38 EST-[19314] LOG: duration: 2599.338 ms statement:
update "_tzreplic".sl_setsync set ssy_seqno = '180', ssy_minxid =
'6111', ssy_maxxid = '6112', ssy_xip = '', ssy_action_list = ''
where ssy_setid in (271,388,400,412) and ssy_seqno < '180';
2006-03-09 10:01:38 EST-[19445] LOG: duration: 0.064 ms statement:
rollback transaction;
2006-03-09 10:01:38 EST-[19314] LOG: duration: 0.238 ms statement:
insert into "_tzreplic".sl_seqlog (seql_seqid, seql_origin,
seql_ev_seqno, seql_last_value) values (0, 1, '180',
'1000000000000000');
2006-03-09 10:01:38 EST-[19314] LOG: duration: 3.842 ms statement:
notify "_tzreplic_Event"; notify "_tzreplic_Confirm"; insert into
"_tzreplic".sl_event (ev_origin, ev_seqno, ev_timestamp,
ev_minxid, ev_maxxid, ev_xip, ev_type ) values ('1', '180',
'2006-03-09 10:01:36.232092', '6111', '6112', '', 'SYNC'); insert into
"_tzreplic".sl_confirm (con_origin, con_received, con_seqno,
con_timestamp) values (1, 2, '180', now()); commit transaction;
My interpretation of this log is that the sync_event in the
remote_worker thread started running, then a slonik failover command
came in, deadlocked and got kicked out, and the sync_event transaction
was allowed to finish. Does that make sense?
Thanks.
- DAP
======================================================
David Parker Tazz Networks
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://gborg.postgresql.org/pipermail/slony1-general/attachments/20060309/85d3ad8e/attachment.html
- Previous message: [Slony1-general] recommended version (postgres)
- Next message: [Slony1-general] Slon not catching up
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list