Fri Jul 18 06:24:04 PDT 2008
- Previous message: [Slony1-general] Failover with unresponsive slaves
- Next message: [Slony1-general] Failover with unresponsive slaves
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Quinn Jones a écrit : > Hello, > > I've been lurking on the list for a while, but now have a problem that I > haven't seen: I tried to failover a node, but a slave was also > unresponsive and slonik errored out after timing out (so the failover > didn't happen). > > Here's our set-up: We have a database replicated to three slave nodes > and a total of three sites, like this > site 1: db1 (master) and db2 > site 2: db3 > site 3: db4 > > Our problem started when site 1 went away completely and abruptly (so > db1 and db2 were out of commission). Our plan called for failing the > database over to db3. When I tried to failover, though, slonik timed > out with the message 'could not connect to server: Connection timed > out. Is the server running on host "x.x.x.x" and accepting TCP/IP > connections on port 5432?'. The ip address was db2, so seeing that > there is a logical problem to solve I tried dropping the downed slave > node first. This timed out as well, and the slave was not dropped. > > While trying to figure out an intelligent next step, short of dropping > replication entirely and just using db3 stand-alone (and rebuilding the > cluster from scratch later) site1 mostly came back up. We lucked out > and in the end saved some time by not being able to fail over the way we > wanted, though we did lose an unknown number of sales because we were > effectively down. > > How do we drop a non-responsive slave, or force the failover to ignore > it? This is a situation that shouldn't come up frequently for us, but > it could and this was rather troublesome. I understand why failover > would want to communicate with every other server, but there must be a > way to step over other dead servers to get a functional cluster (I just > haven't found it yet). Also, shouldn't dropping a slave node happen > whether the node can be seen or not? > > Quinn > AFAIK, failover does not try to join a failed node. Could you tell us what you tried ? Did you get any error in PostgreSQL logs ? - -- Stéphane Schildknecht PostgreSQLFr : http://www.postgresql.fr Venez nous rencontrer le 4 octobre lors du plus important événement PostgreSQL francophone : http://www.pgday.fr -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFIgJlzA+REPKWGI0ERAlHwAKCTnTD1sqvb4JIJYR3pefZB93N04ACfUSG7 BVDBGCt4CwDXFk9KHNYeZJI= =c8Mw -----END PGP SIGNATURE-----
- Previous message: [Slony1-general] Failover with unresponsive slaves
- Next message: [Slony1-general] Failover with unresponsive slaves
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list