Andrew Sullivan ajs at crankycanuck.ca
Wed Jul 2 06:44:31 PDT 2008
On Tue, Jul 01, 2008 at 02:55:32PM -0500, Troy Wolf wrote:
> 
> And even more surprising (disturbing?) is that Slony waits on locks
> that have NOTHING to do with any of the replicated objects. For

Yes.

> I have a patch to the code that is supposed to alleviate this problem.
> It was sent to me by another list member. I've not yet had time to
> review the code or test it.

You could actually go back through the CVS logs and revert the change -- as
I recall, it was all done in one change, but I'm relying on memory here.
 
> I truly wish I understood more about this aspect of Slony.

This wasn't a completely uncontroversial change; I opposed it, for instance,
exactly because I thought it made DDL much harder to do.  But the reason it
was adopted was because there was no mechanism to enforce that all relations
in a foreign key relationship had to be in the same set. (You couldn't make
that a stricture, because if you added a new table with a new FK, you have
to do that in a separate set.) Because of this, we had persistent problems
with people making changes that locked only one set, and didn't lock enough
sets.  This caused inconsistencies which Slony would later notice, and that
would cause the replication to halt at the point of the DDL.  The original
mechanism for this was, "Be careful, and do manual locks if need be." That
didn't work, because we rapidly discovered that altogether too many people
have no clue how their schema is designed, and want to use Slony while
remaining ignorant.

Since we always envisioned a DDL change to require an application outage
anyway, the developers decided that it would be acceptable just to lock
everything in order to perform the DDL.

The flat truth is that you have to take an application outage to perform DDL
on a Slony system.  Anyone claiming differently is thinking about the
academic operation of PostgreSQL, not the actual operation of the system
where locks can be held on a table for a surprising length of time.  (8.3 is
quite a bit better than previous releases if you're using autovacuum, note.)

An improvement that really would help would be a locking system that
abandoned its lock if it couldn't get it.  We don't have that yet, but 8.3
again has the ability to offer such a feature.

One other thing: if you really understand the bare metal functions Slony is
using, there actually _is_ a way to do this for just one table.  If you
spend a great deal of time with the manual, you can figure it out.  I don't
feel comfortable posting the instructions, because it's terribly dangerous. 
You can see a sort-of example in a (broken) script I posted to the list
about a year ago for doing bulk loads on many nodes.

A 



More information about the Slony1-general mailing list