"Stéphane A. Schildknecht" stephane.schildknecht at postgresqlfr.org
Fri Jun 13 07:44:49 PDT 2008
Andrew Sullivan a écrit :
> On Fri, Jun 13, 2008 at 12:09:01PM +0200, "Stéphane A. Schildknecht" wrote:
>> Unfortunately, one of the node (72, see below) did not know about that set, and
>> subscription failed, leaving the whole replication in an unstable state.
>>
>> I can't do anything as node 72 is trying to subscribe a unknown set.
> 
> Did you ensure that every node had the subscription before you performed the
> merge?  If not, then yes, this is hopelessly broken.

Problem arose *before* I could try to merge. Node 72 did not subscribe that set
as it told "unknown set". Seems like every node but 72 knew there was a new set
with tables in.

> 
>> Do I have another option that rebuilding the wole replication ? That may be a
>> half day production break at least...
> 
> You have to grovel through the events that 72 is attempting to process, yank
> all the latest events (72 will be really broken now), then drop the tables
> that made up the added set from replication.  Probably you'll have to drop
> the 72 node.  Then rebuild the set, rebuild 72, and only _then_ once
> everyone has the subscription, perform the merge.

Well problem is I may have done something worse. In fact, I can't drop node 72
from node 1. Process hangs.

Seems to me that every node is trying to acces a no-more existing node. And
therefore, I'm afraid I can't execute any slonik configuration command...

I now have lines like :

2008-06-13 16:34:43 CEST ERROR  remoteListenThread_72: "select
"_slonturf".registerNodeConnection(1); listen "_slonturf_Event"; " - ERREUR:
le schéma « _slonturf » n'existe pas
2008-06-13 16:37:31 CEST ERROR  slon_connectdb: PQconnectdb("dbname=turf
host=code port=5432 user=slony password=poiklmnb") failed - could not connect
to server: Connection timed out
        Is the server running on host "code" and accepting
        TCP/IP connections on port 5432?
2008-06-13 16:38:02 CEST ERROR  remoteListenThread_72: "select
"_slonturf".registerNodeConnection(11); unlisten "_slonturf_Event"; " - ERREUR:
 le schéma « _slonturf » n'existe pas
...

Node 71 does not complain about not knowing 72, but it doesn't propagate data
to 11.

So, is there a way to drop any knowing of 72 from every node ?

Regards,
SAS


More information about the Slony1-general mailing list