[Slony1-general] proper procedure for re-starting slony after replication slave reboots

Thu Feb 21 06:08:35 PST 2008

Andrew Sullivan wrote:
> On Wed, Feb 20, 2008 at 07:13:37PM -0500, Geoffrey wrote:
> 
>> thing I didn't mention is the actual configuration.  Two boxes connected 
>> to a single data silo.  It's a hot/hot configuration. Separate 
>> postmaster for each database.  Half the postmasters run on one server, 
>> the other half on the other.  If/when one fails, the other picks up the 
>> postmaster processes. 
> 
> How do you guarantee that the first is actually dead before the other "picks
> up the postmaster process"?  I'm assuming what you mean is something like
> this:
> 
> server1 <-------> disk <-------> server2

Correct.  It's all handled by the Red Hat cluster software.

> Server1 and server2 are both attached to the disk at the same time.  When
> server1 "goes away", server2 fires up a postgres instance on the same data
> area server1 was using, goes into recovery mode, and takes over the hostname
> and IP of server1.  

Correct.

> In order for this to work, you have to be absolutely certain that server1 is
> dead and disconnected from the shared disk before server2 starts the
> postgres process on the same data area.  Without that, you are sure to have
> database corruption of some kind.  That is, the data from server1 MUST BE
> FLUSHED and on the platters before server2 starts using the same data area. 
> So it might not be enough to be sure server1 is dead.  You have to be sure
> the disk's cache is flushed too, or you could have a mess.

Agreed. We have tested this extensively.  It's a moot point with regard 
to our slony implementation as we've not had a failover since trying to 
get slony implemented.

-- 
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin