[Slony1-general] Flakey network links

Mon Sep 10 10:34:21 PDT 2007

On 9/9/07, Tim Bowden <tim.bowden at westnet.com.au> wrote:
>
> On Mon, 2007-09-10 at 10:54 +0800, Tim Bowden wrote:
> > >From the docs:
> > cases where Slony-I probably won't work out well would include:
> >
> >       * Sites where connectivity is really "flakey"
> >
> >       * Replication to nodes that are unpredictably connected.
> >
> > How flakey/unpredictably connected can nodes be before it all goes
> > haywire?  Is it time critical, or load critical?

Yes, and yes.

> If an origin node goes
> > offline for a day, but there are only a couple of transactions, will
> > that be a problem?  If an origin node goes offline for a few minutes but
> > there are hundreds of transactions, what's the recovery scenario look
> > like?
>

Assuming your run each slon either on the same box as the database it's
supporting or in the same LAN, this is probably survivable. You will need to
restart your slons every time the network status for _any_ of your databases
changes. And yes, detecting and handling this correctly is likely to get
complicated.

Your slons will generate transactions regularly on all nodes in the form of
SYNCs. These need to be propagated between all nodes and then applied. Once
SYNCs (and other events) have been applied, confirmation messages are
propagated between all nodes. Once all nodes have applied events, then the
cleanup thread on each node can remove the information necessary for
confirmed events.

For the cases you mention above, the obvious failure scenarios are as
follows.
1) Network failures at a rate where a slon can not process all the items in
some event before getting reset.
2) Any one node being down long enough to grow sl_log_n to the point that it
enters the "death spiral" (becomes so large that maintenance costs cause it
to grow faster than it can be consumed).

To quote Jan's concept paper (which you really ought to read before going
further in this discussion:
http://developer.postgresql.org/~wieck/slony1/Slony-I-concept.pdf), "Neither
offline nodes that only become available for sporadic synchronization (the
salesman on the road) nor ... will be supported..."

As a follow up, I noticed a post a week ago or thereabouts I think it
> was that mentioned bouncing nodes between standard replication and
> updating by log shipping, but it wasn't currently a viable solution.  Is
> this likely to ever become a viable solution, as it would solve the
> problem of unpredictable network links (at least for some use cases)?
>

That sounds kinda complicated. Has anyone written a proposal for how to do
it yet? It's taken us almost 2 years to get log-shipping to the point where
it seems seriously viable. The project has existed for something approaching
4 years...

Andrew
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.slony.info/pipermail/slony1-general/attachments/20070910/=
45f96493/attachment.htm