Wed Jul 16 09:48:02 PDT 2008
- Previous message: [Slony1-general] sl_nodelock messages
- Next message: [Slony1-general] Failover with unresponsive slaves
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hello, I've been lurking on the list for a while, but now have a problem that I haven't seen: I tried to failover a node, but a slave was also unresponsive and slonik errored out after timing out (so the failover didn't happen). Here's our set-up: We have a database replicated to three slave nodes and a total of three sites, like this site 1: db1 (master) and db2 site 2: db3 site 3: db4 Our problem started when site 1 went away completely and abruptly (so db1 and db2 were out of commission). Our plan called for failing the database over to db3. When I tried to failover, though, slonik timed out with the message 'could not connect to server: Connection timed out. Is the server running on host "x.x.x.x" and accepting TCP/IP connections on port 5432?'. The ip address was db2, so seeing that there is a logical problem to solve I tried dropping the downed slave node first. This timed out as well, and the slave was not dropped. While trying to figure out an intelligent next step, short of dropping replication entirely and just using db3 stand-alone (and rebuilding the cluster from scratch later) site1 mostly came back up. We lucked out and in the end saved some time by not being able to fail over the way we wanted, though we did lose an unknown number of sales because we were effectively down. How do we drop a non-responsive slave, or force the failover to ignore it? This is a situation that shouldn't come up frequently for us, but it could and this was rather troublesome. I understand why failover would want to communicate with every other server, but there must be a way to step over other dead servers to get a functional cluster (I just haven't found it yet). Also, shouldn't dropping a slave node happen whether the node can be seen or not? Quinn -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.slony.info/pipermail/slony1-general/attachments/20080716/= f5b56b49/attachment.htm
- Previous message: [Slony1-general] sl_nodelock messages
- Next message: [Slony1-general] Failover with unresponsive slaves
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Slony1-general mailing list