[Slony1-general] Enhancement Proposal: Automatic WAIT FOR EVENT

Fri Nov 19 01:45:11 PST 2010

On Thu, Nov 18, 2010 at 5:06 AM, Christopher Browne
<cbbrowne at ca.afilias.info> wrote:
> A thing that several of us have been ruminating over for a while is the
> problem that people get confused about how you submit Slonik scripts,
> you may have some actions that require waits.
>
> For instance if it takes 20 minutes for SUBSCRIBE SET to complete, it's
> pretty likely that you want to wait for that to be complete before
> proceeding with other configuration that depends on it.
>
> That is already supported today, after a fashion - you 'merely' need to
> sprinkle your Slonik script with WAIT FOR EVENT requests.
>
> But the word 'merely' seems unfair; it is rarely particularly obvious
> what semantics are appropriate.  (It is frequently not obvious to me,
> and I have touched a lot of the Slony codebase!)
>
> The "obvious" thought which has occurred is to have Slonik commands
> automatically wait for the appropriate events.  In effect, we'd go thru
> each Slonik command, and have it automatically call slonik_wait_event()
> (found in src/slonik/slonik.c), or some refactoring thereof.
>
> A few questions and issues occur to me...
>
> 1.  Does this seem like a worthwhile exercise?  (Alternatively...  Are
> there other Much Bigger Issues that should be looked at first?)

I'd love it. I've gotten into the habit of sync/wait after nearly
every statement to avoid shooting myself in the foot.

> Some of these may need some more functionality - SUBSCRIBE SET generates
> a pair of events so that it may be necessary to wait for a subsequent
> event, and perhaps to request synthesizing a SYNC, and waiting for
> *that*.

Which is exactly why I do sync/wait. I never know what node I should
be waiting for confirmation from for a particular statement, so just
shove a sync though the system and wait for my entire cluster to
process it.

> 4.  Do we want overrides?
>
> Perhaps some might want the ability to revert to today's functionality,
> so one can run a "fire and forget" series of SUBSCRIBE SET requests.

How do you know what statements you can "fire and forget"? Is this
something guaranteed to not change between Slony releases? Its
something that has never been clear to me and I just don't risk it any
more.

> 6.  Do we possibly need for there to be a way to force aborting a script
> if the WAIT times out?

I'd be more interested in knowing why something is blocked that having
my scripts abort midway, leaving my system in an indeterminate state.

> 7.  I think this invalidates TRY { } ON ERROR { } ON SUCCESS { }
>    handling, for the most part.
>
>    At the very least, if we're waiting for things to succeed on a
>    remote node, it invalidates the notion that we're performing the
>    contents of the TRY block as a single transaction on the initial
>    node.
>
>    It's not particularly obvious how TRY requests get grouped into a
>    single "transaction" anyways.  Perhaps this points at there being
>    something invalid/broken about TRY.

Try seems broken to me. If it only works in limited circumstances then
attempting to use non-transactional commands inside it should fail
(whatever they are - how do you know?). We might be able to get things
working in a properly transactional manner if we used two-phase
commit.

> 8.  This is possibly not totally friendly towards tools like pgAdmin,
>    which, I think, presently take the approach that all you need do to
>    configure Slony clusters is to call the stored functions.

I suspect tools that talk directly to the stored functions do so due
to limitations in slonik, which you are attempting to address. I've
been considering dropping slonik and going direct to stored SQL
myself, but that too doesn't seem clear to me (what statements need to
be run on what node in what order waiting for what confirmations).

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/