Andrew Sullivan ajs at crankycanuck.ca
Wed Jun 3 09:11:42 PDT 2009
On Wed, Jun 03, 2009 at 08:48:11AM -0700, Jeff Frost wrote:

> Hrmmm, I never got Andrew's reply.  I should have mentioned that I'm
> already speculating it's firewall related.  This connection is going
> across a netscreen VPN and I'm pretty sure the netscreen is timing out
> the connection for some reason.  My question was actually why slony
> leaves the node in a broken state instead of just trying the initial
> sync again.  I thought it had acted like that in the past, but I haven't
> seen an initial sync fail for quite a while before now.

The problem is that Slony's start up is a little less granular than it
ought to be in one sense.  That is, it writes out all the Slony
metadata and commits that, and then attempts the COPYing of the
different tables in a separate transaction.  When the transaction is
aborted by a disconnection from the other end, however, Slony doesn't
know it's past bootstrap-pt1 (i.e. the metadata is in place) and needs
to just start from bootstrap-pt2 (i.e. only the data needs to be
copied).  I think this was originally coded this way to introduce
another potential pooint for robustness (i.e. you could automatically
recover from this situation), but never got completed.  I think we
decided it was acceptable to fail in this way because no replication
at all had yet succeeded anyway.  I might be misremembering, however.

A

-- 
Andrew Sullivan
ajs at crankycanuck.ca


More information about the Slony1-general mailing list