Steve Singer ssinger at ca.afilias.info
Thu Jun 24 08:38:44 PDT 2010
In going through some final pre release test results in preperation for 
2.0.4 I MIGHT have hit the following circumstance (I haven't yet 
confirmed that this happened but I think it is possible for this to happen)

T1: transaction starts - via slonik EXECUTE SCRIPT
T1: ddlScriptPrepare  creates a SYNC event in sl_event event #10
T2: transaction starts - some sql session
T2: INSERT into a table 'foo'
T2: COMMITS  - this works so far T1 has had no reason to obtain any 
locks on T1.
T1:  executes the DDL script, this script ALTERS the table 'foo' and 
renames a column.  This obtains the requisite locks on 'foo'.  T2 has 
long since committed and released its locks
T1: Inserts the DDL_SCRIPT event into sl_event event #11
T3:  A worker thread generates the next normal sync event on the 
database this is event #12

The INSERT that T2 did will be picked up by sync #12 since it hadn't yet 
committed when sync #10 happened.

On the replica  the DDL script (event number #11) will happen before 
event #12.  When it gets to sync #12 the data in sl_log for the insert 
will reference the old columns of 'foo' which no longer exist.


I THINK this has happened to me in testing (but I no longer have sl_log 
and sl_event to confirm, just the logs).

Looking at the code I think this is at least possible to happen.  Does 
anyone disagree?


How does should this effect our 2.0.4 plans?  If this an issue it's been 
there all along in 2.0.x, because we lock all the tables in 1.2 on an 
execute script it shouldn't be an issue there.  I don't see any quick 
fixes to this




-- 
Steve Singer
Afilias Canada
Data Services Developer
416-673-1142


More information about the Slony1-hackers mailing list