[Slony1-general] Using 2PC???

Sat Jul 22 07:17:47 PDT 2006

Andrew Sullivan wrote:
> On Fri, Jul 21, 2006 at 03:23:09AM +0200, Florian G. Pflug wrote:
>> Not only a small value IMHO. Doing schema changes is a major PITA
>> right now - and it's very easy to shoot yourself in the foot. I'd
>> welcome anything that improves that situation.
> 
> Are you sure this will be an improvement?  It might just be a
> foot-gun of a different calibre.
I'm quite sure that it would be an improvement for at least
my usecase of slony1. The worst that could happen is that you get some
transaction stuck at prepared state, and need to manually roll them back
on some nodes. Currently, it's quite easy to destroy your whole cluster
by messing up a schema change.

> Remember, Slony-I can replicate across slow links too.  Are you sure
> you want to have a lock persist over several minutes?
Well, depends on the application ;-) But I don't see how having the
_option_ to use 2pc could hurt anyone - just don't use it if it doesn't
work well for your setup.

> What about if you're in an emergency where (1) your remote site is
> down and (2) you've just discovered a bug that requires a schema
> change to fix.  I know, I know, can't happen, right?  
I fail to see how that relates 2PC. As long as there are no 
schema-change triggers in pg, you'd still have to use something like
"select <slony-schema>.execute_script('alter table ...');" (or use 
slonik) if you want your schema changes to be replicated.

>> 1) When doing execute script, first submit a "try execute script" event.
>> 2) All nodes do "begin; <submitted script>".
>>     < 8.1: If there is no error, they ack the event with "ok", and _ROLLBACK_ (!).
>>            If there is an error, they ack the event with "failed", and ROLLBACK.
>>     <= 8.1: If there is no error, they ack the event with "ok", and "PREPARE".
>>             If there is an error, the ack the event with "failed" and ROLLBACK.
>> 3) If all nodes acked with "ok", then "execute script" event is generated, referencing
>>     the previous "try" event.
> 
> And what about the case where node, say, 4 has just not replied yet. 
> Everybody is stuck in PREPARE mode here, and they might still roll
> back.
Well, thats a downside of using 2PC - but for that "price" you reliable
schema change propagation.

>> Maybe the user could even choose _not_ use 2pc on 8.1, to avoid the
>> heavy locking that 2pc brings.
> 
> I think it's a non-starter without that escape valve.  But I'm
> concerned now that the system becomes pretty complex, and therefore
> prone to bugs.  I'd like to see a simple mechanism, all things
> considered.
Hm.. the only thing that makes this complicated is that all 
communication takes place by posting events, and waiting for a node
to eventually ack them. Connecting to all affected nodes directly, and
doing the schema-change "online" would make things much easier - but I
fail to see how it could be gueranteed that the change takes place at
same "point in time" for a transaction-flow point of view. You'd need
to lock all tables in all affected sets on the origin, wait for all 
slaves to catch up, and then use 2PC to do the schema-change on all 
nodes at the same time.

greetings, Florian Pflug