Summary: | execute script does not get applied in the correct order | ||
---|---|---|---|
Product: | Slony-I | Reporter: | Steve Singer <ssinger> |
Component: | stored procedures | Assignee: | Jan Wieck <jan> |
Status: | RESOLVED FIXED | ||
Severity: | major | CC: | slony1-bugs |
Priority: | low | ||
Version: | devel | ||
Hardware: | All | ||
OS: | All | ||
Bug Depends on: | 134 | ||
Bug Blocks: |
Description
Steve Singer
2010-06-24 10:48:39 UTC
This can only really be rectified in 2.1. This is an MVCC problem. Consider the following time line with xact2 being the EXECUTE SCRIPT done by slonik: xact1: begin; xact1: insert ... xact2: begin; xact2: select ddlScript_prepare_int(); -- Creates SYNC event xact1: commit; xact2: delete ... This is running in read committed mode, so the delete of xact2 will see the row, inserted by xact1 and delete it. However, the SYNC was created before xact1 committed, so it is considered IN PROGRESS. This causes the insert not to be part of the SYNC and it will get replicated after the ddl script delete. To fix this, slonik needs to issue LOCK TABLE statements right after "begin;". That way, it will wait until all conflicting transactions have committed before generating its own snapshot. We need new syntax to add this information to the EXECUTE SCRIPT command and it will be the responsibility of the user to provide the list of tables to lock. Jan Concurrent with resolution of the other issues, we might wish to shift the DDL data from sl_event to sl_log_1/2. In that case, we'd parse the statements just once, inside Slonik, and introduce one sl_log_* tuple for each statement. This eliminates the need to parse things later, and eliminates the need to link the C code in src/parsestatements to both slon and slonik - it's only needed by slon. We had speculated that this split might allow avoiding the ordering problems, but that is regrettably not the case, even in the ambitious scenario where we'd have commit timestamps available to us, because the time at which the DDL is executed and the time at which the DDL statement is recorded in sl_log_* isn't identical. But we'd get the other benefits mentioned earlier from the split/shift, e.g. - parse once, and eliminate component from slon. BTW, the dependancy on #134 is there because, if we shift DDL into sl_log_*, then the processing of it on subscribers shifts around into the same logic that bug #134 (implementing TRUNCATE-handling triggers) uses. (In reply to comment #4) > BTW, the dependancy on #134 is there because, if we shift DDL into sl_log_*, > then the processing of it on subscribers shifts around into the same logic that > bug #134 (implementing TRUNCATE-handling triggers) uses. Note that some further discussion is taking place on the mailing list; see the following thread: http://lists.slony.info/pipermail/slony1-general/2010-August/011051.html I am "poking away" at this; see branch at GitHub: https://github.com/cbbrowne/slony1-engine/tree/bug137 After discussion with Marko Kreen and Jan Wieck, it should not be necessary to split off the LOCK statements from the DDL. If it's needful to lock tables in particular order to evade deadlock, that is dealt with by putting LOCK requests into the DDL script. But by running each statement immediately before logging into sl_log_{1,2}, that imposes all necessary locks in an order that protects from race conditions. A version of this that functions against version 2.2 may be found here: https://github.com/cbbrowne/slony1-engine/tree/bug137-squashed Note that this branch was arrived at by merging Jan's 2.2 branch and Steve's remote worker changes atop "master": https://github.com/wieck/slony1-engine/tree/copy-protocol-22 https://github.com/ssinger/slony1-engine/tree/copy-protocol_22_remoteworker In practice, I think my changes are all expressed by one patch thus: https://github.com/cbbrowne/slony1-engine/commit/5187d5343145c3c145e328ad05912d35334cd601 I'll be poking further at the documentation, as some of the old docs are invalidated by this change. I have looked at your changes. Here is my review 1. Your unit test changes should be to the javascript version of the regression ones not the shell ones 2. It would be nice if we could be syntax backwards compatible on EXECUTE SCRIPT. ie EXECUTE SCRIPT(set id=1, FILENAME='foo', EXECUTE ONLY ON=1, EVENT NODE=1); should still work. The grammer accepts this, it is the code in assign_options that need to have an INT_OR_STR option. or something. We would also need to make event_node optional if only 1 ONLY ON node is specifid. 3. In ddlchanges.sgml item 3 I am opposed to saying in the manual that DDL can be submitted directly with the ddlCapture SQL function. I feel that only slonik should be calling these slony functions. By telling people in the manual that they can do this then we are committing to a somewhat stable API. I think it will also encourage people to call other slony SQL functions directly. The counter argument is that now that ddlChange() isn't an event it actually can be safely called by someone other than slonik and it allows applications (ie java, php,perl,python etc...) do do there own DDL without having to make a system() call to slonik. I am somewhat sympathetic to this but I remind people that in any version prior to 2.2 this was a really bad/broken idea. Even in 2.2 calling ddlChange() outside of slonik with more than one EVENT NODE can expose you to race conditions. In slonik_ref.sgml I would like to see any encouragement to calling ddlCapture() directly removed (see above for reasons) slony1_funcs.sql In 2.1 the ddlSCript_complete_int() function calls repair_log_triggers() and updateRelname() to fixup the slony sl_table and log triggers after a DDL event. I don't see these being used in 2.2. I *think* the log_apply trigger needs to perform both of these actions after executing the SQL. slony1_base.sql In sl_log_script definition we are using bigint for log_txid and int8 for log_actionseq. I see sl_log_1 sl_log_2 are defined the same way. Is there a difference between bigint and int8? Should we be consistnet? Should we be changing sl_log_1 and sl_log_2 (since the 2.2 upgrade code is dropping sl_log_1 anyway)? slonik.c -------------- slonik_ddl_script We don't free res1 in the happy path, just in the error case. If the SET ID is never used should we make it an optional element in the grammer (script_check_stmtS) remote_worker.c ---------------------- Create a two node test case. create set(set id=1, origin=1); set add table(set id=1, fully qualified name='public.foo'); subscribe set(id=1,provider=1,receiver=2); execute script(set id=1, filename='bar.sql', execute on=2); The script gets executed on node 2 but never on node 1. Even though I didn't submit a 'EXECUTE ONLY ON'. The problem is that if a node 1 is not receiving any subscription sets from node 2 the node 1 will never see any DDL from node 2 because the provider structures in the remoteWorker don't have it listening for these events. (In reply to comment #9) > I have looked at your changes. > > 1. Your unit test changes should be to the javascript version of the regression > ones not the shell ones Good point; I have patched the clustertest tests. https://github.com/cbbrowne/slony1-engine/commit/e6eb7808f6cc3124385c974c235bb48e925af4f7 https://github.com/cbbrowne/slony1-engine/commit/8065a2e4c4426f8f66a9a276fd90a90407bc9dd0 (In reply to comment #9) > 2. It would be nice if we could be syntax backwards compatible on EXECUTE > SCRIPT. > ie > EXECUTE SCRIPT(set id=1, FILENAME='foo', EXECUTE ONLY ON=1, EVENT NODE=1); > > should still work. The grammer accepts this, it is the code in assign_options > that need to have an INT_OR_STR option. or something. We would also need to > make event_node optional if only 1 ONLY ON node is specifid. Good points. I'm taking a poke at this. (In reply to comment #9) > 2. It would be nice if we could be syntax backwards compatible on EXECUTE > SCRIPT. > ie > EXECUTE SCRIPT(set id=1, FILENAME='foo', EXECUTE ONLY ON=1, EVENT NODE=1); > > should still work. The grammer accepts this, it is the code in assign_options > that need to have an INT_OR_STR option. or something. https://github.com/cbbrowne/slony1-engine/commit/30912a5af6a1077e07437e96a09d788dd412e544 (In reply to comment #9) > 3. In ddlchanges.sgml item 3 > I am opposed to saying in the manual that DDL can be submitted directly with > the ddlCapture SQL function. I feel that only slonik should be calling these > slony functions. By telling people in the manual that they can do this then we > are committing to a somewhat stable API. > I think it will also encourage people to call other slony SQL functions > directly. > > > The counter argument is that now that ddlChange() isn't an event it actually > can be safely called by someone other than slonik and it allows applications > (ie java, php,perl,python etc...) do do there own DDL without having to make a > system() call to slonik. I am somewhat sympathetic to this but I remind people > that in any version prior to 2.2 this was a really bad/broken idea. Even in 2.2 > calling ddlChange() outside of slonik with more than one EVENT NODE can expose > you to race conditions. https://github.com/cbbrowne/slony1-engine/commit/af54adb535146a84923ae3e59c1c6c27217e1ab8 > In slonik_ref.sgml > I would like to see any encouragement to calling ddlCapture() directly removed > (see above for reasons) https://github.com/cbbrowne/slony1-engine/commit/789534ca1658bcc685a59f325fe2658827d3de5c (In reply to comment #9) > 2. It would be nice if we could be syntax backwards compatible on EXECUTE > SCRIPT. > ie > EXECUTE SCRIPT(set id=1, FILENAME='foo', EXECUTE ONLY ON=1, EVENT NODE=1); > > should still work. The grammer accepts this, it is the code in assign_options > that need to have an INT_OR_STR option. or something. We would also need to > make event_node optional if only 1 ONLY ON node is specifid. Added this to docs: https://github.com/cbbrowne/slony1-engine/commit/6dc3401b52b3ee03dd26b0a6d87c22101a19fffb Added regression tests for several cases surrounding this: https://github.com/cbbrowne/slony1-engine/commit/4e6e90dd332c965dea399abde2266b2b9cdc5063 (In reply to comment #9) > slony1_funcs.sql > > In 2.1 the ddlSCript_complete_int() function calls repair_log_triggers() and > updateRelname() to fixup the slony sl_table and log triggers after a DDL event. > I don't see these being used in 2.2. I *think* the log_apply trigger needs to > perform both of these actions after executing the SQL. Yep, quite right. Call these functions: https://github.com/cbbrowne/slony1-engine/commit/b07cf0014b8e52b7c597b54a4765ee3cda6d421e Optimization: only fix values that are busted https://github.com/cbbrowne/slony1-engine/commit/bd913b2adbe356e15b3a59c85a830ea4269d403a This was fixed in the 2.2 development cycle |