Jan Wieck JanWieck at Yahoo.com
Thu Feb 3 08:15:39 PST 2011
On 2/3/2011 10:19 AM, Steve Singer wrote:
> On 11-02-03 09:44 AM, Jan Wieck wrote:
>>  On 2/2/2011 11:42 AM, Steve Singer wrote:
>>>  On 10-12-22 04:30 PM, Steve Singer wrote:
>>>
>>>
>>>  Since I haven't had much response on this maybe a plain language example
>>>  would be useful.
>>>
>>>  Consider a cluster with paths where node 1 is a provider+origin to all
>>>  other nodes
>>>
>>>  4--1----2
>>>  | \ /
>>>  |--- 3
>>>
>>>  EXECUTE SCRIPT( FILE=file1.sql, EVENT NODE=1);
>>>  wait for event(origin=1, confirmed=2, wait on=1);
>>>  EXECUTE SCRIPT(file=file2.sql, EVENT NODE=2);
>>>
>>>  Take node 3. Does node 3 perform the SQL in file1.sql first or
>>>  file2.sql first? Today this is non-deterministic either could win.
>>>
>>>  The two solutions I see are
>>>  a) Require all nodes to be caught up before going to the next event
>>>  node. As discussed this seems somewhat limiting
>>>  b) Make slon wait for the event with origin=1 to be applied on node 3
>>>  before applying the event from node 2 (because the event from node 1 had
>>>  already been processed on node 2 by the time the node 2 event was
>>>  generated).
>>>
>>>  b) is what I am proposing to implement here.
>>>
>>>  I can create this type of race condition with other event types as well
>>>  it isn't specific to execute script.
>>
>>  What you are basically asking for is a guaranteed total order in which
>>  events from multiple nodes are processed. Very much like the total order
>>  guarantees provided by group communication systems.
>
> I'm not going as far as a total order over all events just an ordering
> over that deals with events that have already been processed by the
> event origin.
>
> For example if
>
> remote events are processed
>
> node 1:               node 2:
> 2,1233		      1,1233
>
> (node 1 has seen 2,1233 and node 2 has seen 1,1233)
>
> then they each do a sync generating events
> 1,1234                2,1234
>
>
> In the scheme I propose node 3 can either process events in this order
>
>
> 1,1233
> 2,1233
> 1,1234
> 2,1234
>
> OR
> 1,1233
> 2,1233
> 2,1234
> 1,1234
>
> ie I am not requiring any ordering constraints between the two events
> 1,1234 and 2,1234 other than they must come after 1,1233 and 2,1233.
>
>
> What i describe requires no additional communications between nodes over
> what we are already doing.
>
>
> The issue I describe isn't specific to two execute scripts.
>
> For example I have a 3 node cluster with two sets (set 1 origin is node
> 1, set 2 origin is node 2).
>
> subscribe set(set id=1,provider=1,receiver=2)
> subscribe set(set id=2,provider=2,receiver=1)
> wait for event(origin=1,confirmed=2,wait on=1)
> wait for event(origin=2,confirmed=1,wait on=2)
> subscribe set(set id=1,origin=1,receiver=3)
> subscribe set(set id=2,origin=2,receiver=3)
> #
> # subscribing to set 3 takes a LONG time
> # because it is in a remote data centre
> #
> # while it is subscribing I discover
> # I need to make an emergency schema change
> # via EXECUTE SCRIPT such that I can't wait
> # for node 3 to finish subscribing before
> # making the change on node 1 and 2.
>
> If i use node 1 or node 2 as the event node it might get applied on node
> 3 before the set from the other node finishes.

Again, if using the event node where the affected objects originate, 
there will be no conflict and the order in which the things are applied 
doesn't matter.

>
> ---------
>
> Here is an example that doesn't involve execute script. (assume the same
> cluster config as in my last example)
>
>
> create set(id=1, origin=1)
> set add table(set id=1, origin=1, fully qualified table='public.foo');
> #commands execute, dba notices a mistake
> drop set(set id=1,event node=1);
> wait for event(origin=1,confirmed=3,wait on=1);
> create set(id=2, origin=2)
> set add table(set id=2,origin=3);

I don't think this should be possible because set add table makes only 
sense on the set origin.

> set add table(set id=2, origin=2, fully qualified table='public.foo');
>
> Node 3 might process the add table from node 2 BEFORE it proceses the
> drop set from node 1.  The above example probably happens in the real
> world quite a bit, a dba creates a set then notices they are hosting it
> on the wrong node and wants to fix things.

set add table is not an event. The time when a node learns which tables 
belong to a set is when it processes the enable subscription. Node 3 in 
this case did know that there was a set 1 on node 1, but not what tables 
are in it. Again, the order of execution is irrelevant.


Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin


More information about the Slony1-hackers mailing list