[Slony1-general] Using 2PC???

Wed Jul 26 07:40:40 PDT 2006

Andrew Sullivan wrote:
> On Sat, Jul 22, 2006 at 04:17:47PM +0200, Florian G. Pflug wrote:
>> Andrew Sullivan wrote:
>>> Are you sure this will be an improvement?  It might just be a
>>> foot-gun of a different calibre.
>> I'm quite sure that it would be an improvement for at least
>> my usecase of slony1. 
> 
> I don't like to be mean, but "this will help me" is not a reason to
> implement, if it makes things worse for others.  The question is not
> merely whether it will work for some cases, but whether it improves
> the system overall for users.  If the tradeoff is that it makes
> things better for some, but makes certain other failure cases way
> more troublesome, that may be a trade-off we don't want to make.

I'd be an _optional_ feature to use. Nobody would force _anyone_
to use 2pc schema updates. But some people, like e.g. I, _could_
use them. I don't see how having the _option_ to do reliable schema
updates could hurt anyone.

>> The worst that could happen is that you get some
>> transaction stuck at prepared state, and need to manually roll them back
>> on some nodes. Currently, it's quite easy to destroy your whole cluster
>> by messing up a schema change.
> 
> The "on some nodes" thing is part of what is making me uneasy here.
> What this says to me is that, to fix the issue that currently it is
> easy for someone who hasn't carefully tested a DDL EXECUTE SCRIPT (or
> who hasn't read the documentation) to break things, we're going to
> introduce a failure mode whereby the DBA may need to intervent
> manually on some nodes.  That seems to me like a step backwards.  If
> the problem is that people are doing things which break stuff, then I
> suspect we need to improve the interface such that it is harder to
> break stuff, rather than introducing a new set of manual-intervention
> steps.

That worst case would only happen if you lost then network connection
to a node while the schema change is still running. And it could be 
solved by some process (slon, or some other process) that checks for
leftover 2pc transactions, and removes them.

And even without that safeguard, it's a step *forward*. If you mess up
a schema change now, you easily get into a state where only "drop 
subscription, resubscribe" will get your cluster going again. With
the current design of slony, this is about the most painfull operation
possible, because during the _whole_ resubscribe the tables on the
slaves are locked, _even_ for concurrent _readers_.

Logging into all nodes and doing "rollback <transaction>" is just
a minor nuisance compared to resubscribing all sets.

> Note that I'm not saying "don't do this".  I'm saying instead that a
> 2PC and a non-2PC approach in the same version of Slony at least
> seems a bad idea to me -- it's too complicated.  Better to drop
> support for non-2PC-capable versions.  Moreover, I'm saying that
> you'd better have a pretty clean design and a nice set of
> administration tools to handle the failure modes, or all you do is
> move the pain around to some new place.  I can't see the point of
> doing a lot of work to get beaten up by people complaining about some
> other failure mode.

At least for a first implemention, I'm thinking about doing it the
other way round. The alogrithmn I'm thinking about would be implemented
purely inside slonik, and do the following:

0) Issue "begin;" on the origin.
1) Lock tables on origin. Concurrent readers are OK, but it must
block inserts and deletes.
2) Wait until all subscriber have catched up.
3) Issue "begin;" on all subscribers
4) Do the schema-change on all nodes
5) Issue "prepare" on all nodes
6) If all nodes have prepared, issue "commit" on all nodes
    Otherwise, issue "rollback", and report the error.

I see two problems with that approach, but don't see them as showstoppers

1) If slonik crashes after string (5), but before finishing (6),
the transaction is prepared on some nodes. In that case, it'd
be the job of the admin to
.) Find out if it was prepared sucessfully on _all_ nodes
.) If yes, "commit" it everwhere
.) If not, "rollback" it everywhere.
Automating this recovery is possible in theory - but it requires
slonik to remeber if step (5)) was successfull on all nodes, or
not.

2) It blocks inserts/updates on the origin. Fixing that would
require this to algorithm to be integrated deeper into slony itself.

greetings, Florian Pflug