Christopher Browne cbbrowne
Fri Aug 18 09:34:10 PDT 2006
Julian Scarfe wrote:
> So with 1.2RC3 I'm still seeing the same behaviour as I reported below (no 
> replies received to that).
>
> I have 10 sets (numbered 1001, 2001,... 10001) with initial origin on node 
> 1.
>
> On attempting to switchover all of them to node 2, I'm getting FATAL errors 
> on the 2nd and subsequent MOVE SETs:
>
> $ grep -a "MOVE" /tmp/slon-avbrief.out
> 2006-08-17 10:01:42 UTC DEBUG2 localListenThread: Received event 1,214 
> MOVE_SET
> 2006-08-17 10:01:44 UTC DEBUG2 localListenThread: Received event 1,217 
> MOVE_SET
> 2006-08-17 10:01:44 UTC FATAL  localListenThread: MOVE_SET but no provider 
> found for set 2001
> 2006-08-17 10:01:56 UTC DEBUG2 localListenThread: Received event 1,219 
> MOVE_SET
> 2006-08-17 10:01:56 UTC FATAL  localListenThread: MOVE_SET but no provider 
> found for set 3001
> ...
> 2006-08-17 10:03:20 UTC DEBUG2 localListenThread: Received event 1,240 
> MOVE_SET
> 2006-08-17 10:03:20 UTC FATAL  localListenThread: MOVE_SET but no provider 
> found for set 10001
>
> Of course with the new behaviour of slon in 1.2, it restarts after 10 
> seconds and processes another MOVE SET successfully before dying at the 
> second attempt.  So eventually the process completes.  As a side issue, if I 
> could modify the sleep time before restart to 1 second, the FATAL might be 
> acceptable, but 10 seconds per set is too long.
>
> Unless I'm misunderstanding, it looks like Christopher diagnosed the problem 
> in the message referenced below, but I can't see any corresponding 
> modification in HEAD to rewrite the query to include the set_id (sub_set).
>
>    	slon_mkquery(&query2,
> 							 "select sub_provider from %s.sl_subscribe "
> 							 "    where sub_receiver = %d",
> 							 rtcfg_namespace, rtcfg_nodeid);
> 				res2 = PQexec(dbconn, dstring_data(&query2));
> ...
>    	if (PQntuples(res2) != 1)
> 				{
> 					slon_log(SLON_FATAL, "localListenThread: MOVE_SET "
> 							 "but no provider found for set %d\n",
> 							 set_id);
> 					dstring_free(&query2);
> 					PQclear(res2);
> 					slon_retry();
> 				}
>
> Am I missing something?
>   
I probably am...

I'll see to adding a test for this today (e.g. - a MOVE SET test with
~10 sets).

Adding "and sub_set = %d' to the query looks likely to work out OK...





More information about the Slony1-general mailing list