Steve Singer ssinger_pg
Wed Nov 29 17:50:32 PST 2006
Today we were doing some controlled switch-overs/move set's and encountered a 
situation where Postres decided to abort the transaction executing a moveSet 
due to deadlock detection.

This left our cluster in a state where both node 1, and node 2 thought the 
other one was the master/provider of the replication set.

Has anyone else experienced this, there was other activity on the database 
at the time this happened, we might even had accidently had multiple copies 
of the moveset script running at once (but I don't see how both would have 
been able to get the set locks).


I still need to spend some time trying to figure out if I can duplicate the
sequence of events that caused this on a development cluster.



What I really want to know  is there a way we could have manually 
reversed the half completed moveset to have a master again? If so how?

I thought about

1. stopping slon
2. updating sl_set on all machines to make one of them a provider again
3. Restarting slon

Would this have worked?  (We ended up removing slony from the database to 
get it up quickly again)





More information about the Slony1-general mailing list