[Slony1-hackers] Failover never completes

Wed Oct 17 19:55:57 PDT 2012

On 10/17/2012 9:38 PM, Joe Conway wrote:
> But the fact that failover seems so fragile is troubling. If it fails to
> failover so often in our "migration" tests, why should we think that it
> won't fail to failover when we really need it? Is failover fragile
> because we need to STONITH before doing the failover? Would that prevent
> these race conditions?

Failover was never meant as a "migration" path. It was meant as a "last 
resort" thing when the old master was found "dead for good".

Unfortunately no other replication system (for PostgreSQL) has any 
mechanism for controlled transfer of the master role, while the old 
master is still alive, so people think "failover" is the right way to do 
migrations and shoot failover from the hip all the time. It is not.

The fact that failover seems so fragile is troubling us too. I am 
thinking for quite a while now that we have tried too hard to "make it 
work" with all the complicated configurations we can think of, instead 
of limiting the possible complexity instead. But that is an entirely 
different discussion.

Jan

-- 
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin