bugzilla-daemon at main.slony.info bugzilla-daemon at main.slony.info
Thu Jun 24 10:48:39 PDT 2010
http://www.slony.info/bugzilla/show_bug.cgi?id=137

           Summary: excute script does not get applied in the correct
                    order
           Product: Slony-I
           Version: 2.0
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: low
         Component: stored procedures
        AssignedTo: slony1-bugs at lists.slony.info
        ReportedBy: ssinger at ca.afilias.info
                CC: slony1-bugs at lists.slony.info
   Estimated Hours: 0.0


This issue was found while running the javascript version of the
testdeadlockddl test.

We have a table table4 with columns id1,id2.

We insert into the table as follows

INSERT INTO table4 VALUES ('......')

Some DDL via EXECUTE SCRIPT is executed to rename id1=>col1 and id2=>col2.

The test as run when the problem occured did the following
1. Launch a large file of INSERTS via psql. Don't wait for it to finish, the
individual inserts appear to be running in autocommit mode
2. execute the DDL via EXECUTE SCRIPT. Wait for slonik to finish
3. Launch more transactions

After the DDL_SCRIPT event is processed on the slave we start to get
replication failures slon is trying to apply INSERTS with id1,id2 but the
columns on the slave have already been switched.

The relevant part of sl_event is

    1 | 5000000022 | SYNC                |
144375:144525:144375,144383,144393,144407,144412,144423,144476,144485
    1 | 5000000023 | DDL_SCRIPT          | 144522:144538:144529
    1 | 5000000024 | SYNC                |
144375:144529:144375,144383,144393,144407,144412,144423,144476,144485,144522


in sl_log_1 the inserts on table4 switch from referencing id1,id2 to col1,col2
at log_txid=144526(references id1,id2)  log_txid=144583 (references col1,col2)

Notice that sync 24 includes logtx_id numbers below 144538

What I THINK is happening is


T1: transaction starts - via slonik EXECUTE SCRIPT
T1: ddlScriptPrepare  creates a SYNC event in sl_event event #10
T2: transaction starts - some sql session
T2: INSERT into a table 'foo'
T2: COMMITS  - this works so far T1 has had no reason to obtain any locks on
T1.
T1:  executes the DDL script, this script ALTERS the table 'foo' and renames a
column.  This obtains the requisite locks on 'foo'.  T2 has long since
committed and released its locks
T1: Inserts the DDL_SCRIPT event into sl_event event #11
T3:  A worker thread generates the next normal sync event on the database this
is event #12

The INSERT that T2 did will be picked up by sync #12 since it hadn't yet
committed when sync #10 happened.

On the replica  the DDL script (event number #11) will happen before event #12.
 When it gets to sync #12 the data in sl_log for the insert will reference the
old columns of 'foo' which no longer exist.

-- 
Configure bugmail: http://www.slony.info/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.
You are the assignee for the bug.


More information about the Slony1-bugs mailing list