"Stéphane A. Schildknecht" stephane.schildknecht at postgresqlfr.org
Fri Jul 18 06:24:04 PDT 2008
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Quinn Jones a écrit :
> Hello,
>  
> I've been lurking on the list for a while, but now have a problem that I
> haven't seen:  I tried to failover a node, but a slave was also
> unresponsive and slonik errored out after timing out (so the failover
> didn't happen).
>  
> Here's our set-up: We have a database replicated to three slave nodes
> and a total of three sites, like this
> site 1: db1 (master) and db2
> site 2: db3
> site 3: db4
>  
> Our problem started when site 1 went away completely and abruptly (so
> db1 and db2 were out of commission).  Our plan called for failing the
> database over to db3.  When I tried to failover, though, slonik timed
> out with the message 'could not connect to server: Connection timed
> out.  Is the server running on host "x.x.x.x" and accepting TCP/IP
> connections on port 5432?'.  The ip address was db2, so seeing that
> there is a logical problem to solve I tried dropping the downed slave
> node first.  This timed out as well, and the slave was not dropped.
>  
> While trying to figure out an intelligent next step, short of dropping
> replication entirely and just using db3 stand-alone (and rebuilding the
> cluster from scratch later) site1 mostly came back up.  We lucked out
> and in the end saved some time by not being able to fail over the way we
> wanted, though we did lose an unknown number of sales because we were
> effectively down.
>  
> How do we drop a non-responsive slave, or force the failover to ignore
> it?  This is a situation that shouldn't come up frequently for us, but
> it could and this was rather troublesome.  I understand why failover
> would want to communicate with every other server, but there must be a
> way to step over other dead servers to get a functional cluster (I just
> haven't found it yet).  Also, shouldn't dropping a slave node happen
> whether the node can be seen or not?
>  
> Quinn
>  

AFAIK, failover does not try to join a failed node. Could you tell us what you
tried ?
Did you get any error in PostgreSQL logs ?

- --
Stéphane Schildknecht
PostgreSQLFr : http://www.postgresql.fr

Venez nous rencontrer le 4 octobre lors du plus important événement
PostgreSQL francophone : http://www.pgday.fr

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIgJlzA+REPKWGI0ERAlHwAKCTnTD1sqvb4JIJYR3pefZB93N04ACfUSG7
BVDBGCt4CwDXFk9KHNYeZJI=
=c8Mw
-----END PGP SIGNATURE-----


More information about the Slony1-general mailing list