Wed Feb 2 08:42:24 PST 2011
- Next message: [Slony1-hackers] automatic WAIT FOR proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 10-12-22 04:30 PM, Steve Singer wrote: Since I haven't had much response on this maybe a plain language example would be useful. Consider a cluster with paths where node 1 is a provider+origin to all other nodes 4--1----2 | \ / |--- 3 EXECUTE SCRIPT( FILE=file1.sql, EVENT NODE=1); wait for event(origin=1, confirmed=2, wait on=1); EXECUTE SCRIPT(file=file2.sql, EVENT NODE=2); Take node 3. Does node 3 perform the SQL in file1.sql first or file2.sql first? Today this is non-deterministic either could win. The two solutions I see are a) Require all nodes to be caught up before going to the next event node. As discussed this seems somewhat limiting b) Make slon wait for the event with origin=1 to be applied on node 3 before applying the event from node 2 (because the event from node 1 had already been processed on node 2 by the time the node 2 event was generated). b) is what I am proposing to implement here. I can create this type of race condition with other event types as well it isn't specific to execute script. > On 10-12-22 10:45 AM, Steve Singer wrote: >> On 10-12-10 04:54 PM, Christopher Browne wrote: > >> >> I will write a design proposal that explores what things might look like >> if we went that route. >> > > First I will define some notation. > > Let an event with sequence number s originating at node n be known as > E(n,s) (ie 1,1234) > > An event E(n,s) has been confirmed by node a if the function > C(a,E(n,s)) returns true. > > If a direct path exists between two nodes a,b then the function P(a,b) > returns true. Note P(a,b) != P(b,a) > > I will now define a function that returns the events from other nodes > confirmed on node a when E(a,s) is generated. > > Visible(a,E(a,s) = [E(1,s1),E(2,s2)....E(z,sz)] > > For example: If when event 1,100 was generated on node 1 node 1 had > sl_confirm entires indicating that it has confirmed events 2,5 3,10 and > 4,50. > Visible(1,E(1,100)) = [E(1,100), E(2,5), E(3,10), E(4,50) ] > > The order in which a set of events leading up to an Event E(n,s) is > processed on a node can be expressed as an ordered list returned by the > Order(a,n,s) function: Order(3,2,2)= E(1,1), E(2,1),E(1,2),E(3,1),E(2,2) > > This means that on node 3, the events leading up to processing of event > E(2,2) were E(2,1) then E(1,2) then E(3,1). > > A race condition is said to exists when ever the system allows two > different orderings for Order(a,n,s) that produce a different resulting > action. > > Two different orderings leading to an event being processed might not > lead to a race condition if their are no dependencies between the events > that were processed in a different order. > > Two events conflict Conflict(E(n1,s1),E(n2,s2))=TRUE > If a different result is produced by > Order(n3,E(n1,s1)=E(n2,s2),E(n1,s1) compared with > Order(n3,E(n2,s2)=E(n1,s1),E(n2,s2) > > Whether two events conflict is dependent on the event type and the > arguments to those events. > > > Proposition P1: If an event E(1,1) followed by an event E(1,1+x) is > submitted to node 1 by slonik then these two events will not cause a > race condition with each other. > > Proof: > The createEvent() function obtains an exclusive lock on sl_event and > gets the next sequence number for the event id. Because the exclusive > lock on sl_event is held until the transaction is committed it is > impossible for a row to be visible in sl_event with a higher sequence > number than another row that will be committed later. (assumes all > events get generated while holding the lock on sl_event). > > This means that on the event node E(1,1+x) will always happen after > E(1,1). It also means that if a remote node sees E(1,1+x) in the > sl_event table then it will also see (or have already processed) E(1,1) > because E(1,1) would have committed before and we don't delete sl_event > rows that have not been processed. Therefore as long as the remote > nodes process the events from a particular node in event id order (which > they do) then the remote node will always process E(1,1) before it > processes E(1,1+x). > > > Proposition 2: > > If an Event (n1,s1) is only applied on node n3 when > Visible(n1,E(n1,s1)) CONTAINED IN Order(n3,E(n1,s1)) > then no race condition is possible. > > Counter Proof: > Consider the following sequence of events: > E(1,2) : Visible(1,2)=>E(2,1) E(3,1) > E(2,2): Visible(2,2)=>E(1,1) E(3,1) > > gives > Order(1,E(2,2)=> E(1,2), E(2,2) > Order(2,E(1,2)=> E(2,2), E(1,2) > > At node three E(2,2) might get applied before or after E(1,2). > > The second part of the counter proof is to show that there exist event > types that can be generated on nodes 1,2 such that > Conflict(E(1,2),E(2,2)=TRUE. > > > create set(id=1,origin=1); #E(1,2) > create set(id=1,origin=2); #E(2,2) > > > The behavior on node 3 is clearly differ based on which command arrives > at node 3 first. Furthermore node 1 and node 2 will be left > in different states as well. > > Proposition 3: > If an event E(n1,s1) is not applied on node n3 until all events in > Visible(n1,E(n1,s1) have all been confirmed on n3 then no event E(n2,s1) > exists that produces a race condition on n3 unless the conflict existed > on the event nodes n1 and n2. > > Proof: > Assume Visible(1,E(1,1)) does not include an event E(2,1) such that > there exists a race condition at node 3. > > This means that > Order(3,E(1,1)==> E(2,1),E(1,1) produces different result than > Order(3,E(1,1)===> E(1,1),E(2,1) > this means that Conflict(E(1,1), E(2,1))=TRUE > and that conflict would exist on node 1 and node 2. > If E(2,1) had reached node 1 before E(1,1) was created then E(2,1) would > be part of Visible(1,E(1,1)) and would always be confirmed on node 3 > before E(2,1) gets applied the assumptions in the proposition. > > If E(1,1) had reached node 2 before E(2,1) was generated then > Visible(2,E(2,1)) would include E(1,1) and E(1,1) would always get > applied on node 3 before E(2,1) because of the assumptions we made in > our proposition. > > > This means that if IF a) we can somehow avoid race conditions at the > event nodes AND b) implement the rule "apply all events that were > visible on the event origin before applying the event" then we can > eliminate race conditions. > > i) I feel (a) needs to be handled by slonik and can be handled by slonik > waiting for the next event node to be caught up to previous event nodes > before submitting an event to it (discussed in a previous email on this > thread, though I have not formally tried to prove this > > ii) can be implemented by adding a new column to sl_event that stores an > array of event tuples - which consist of the highest events from each > node confirmed on the node at the time sl_event is generated. The a > remoteWorker won't apply an event to its local node until the > pre-conditions have been met. > > Questions: What is the performance impact of getting the highest > confirmed event id values? This is unknown - but will probably involve > querying sl_event and/or sl_confirm. > > Is it possible for an event to be included in Visible(1,E(1,1)) that > might never make it to a node n3 that is trying to process E(1,1)? > I think no. Every event E(x,y) that is in Visible(1,E(1,1)) is in the > sl_event table on node 1 by virtue of it having been confirmed on node 1 > but not confirmed elsewhere. If slon always pulls events (no matter > what the event origin) into node 3 from the node 1 sl_event table then > it will include E(x,y). Even if node 3 isn't directly talking to node > 1, any sl_event table in the cluster that contains E(1,1) would also > contain E(x,y) by virtue of that event being on node 1 at the time. > > > I feel this produces a solution to wait for where slonik only needs to > wait to make sure the next event node has received all events so far. > > This does NOT deal with race conditions that can be created by multiple > concurrent slonik scripts using different event nodes but I might be > able to solve this with an sl_sloniklock type of table. > > Issues relating to TRY blocks are still somewhat outstanding - as > discussed in the other email. > > > _______________________________________________ > Slony1-hackers mailing list > Slony1-hackers at lists.slony.info > http://lists.slony.info/mailman/listinfo/slony1-hackers
- Next message: [Slony1-hackers] automatic WAIT FOR proposal
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-hackers mailing list