[Slony1-general] Slon fails to start

Mon Oct 16 12:40:21 PDT 2006

I've pasted an initial E-mail conversation below.  The actual slon
launch command and output is as follows:

postgres at dali:/$ slon rep_atr_prod "dbname=atr_test user=postgres"
2006-10-12 08:49:16 EDT CONFIG main: slon version 1.1.0 starting up
2006-10-12 08:49:16 EDT FATAL  main: Node is not initialized properly
2006-10-12 08:49:16 EDT DEBUG2 slon: exit(-1)

Is it possible to get any more specific info about where and how the
node is not initialized?  Could there be a problem with the
_rep_atr_prod schema even though slon is running fine on the master host
and database?  If so is there a script that can actually remove all the
slony info (rowID columns, triggers, operators, etc.) from the master
database so I can start over from scratch, or is that a manual process?

Thanks very much,
Luke

-----Original Message-----
From: Christopher Browne 
Sent: Wednesday, October 11, 2006 1:48 PM
To: Luke Morehead
Subject: Re: Slony help

On 10/10/06, Luke Morehead wrote:
> Chris,
>
> I ran the slony1_extract_schema.sh script (using a test database) and
used the output to populate a fresh slave database.  I then ran
slonik_init_cluster.  After this slon_start ran fine for the master node
but failed for the slave node with the error, "FATAL  main: Node is not
initialized properly".
>
> The buzz online seems to indicate a problem with the (clustername)
schema.  However the schema looks fine in the slave database.  What
could this be?

That message comes from the following code...
	/*
	 * Get our local node ID
	 */
	rtcfg_nodeid = db_getLocalNodeId(startup_conn);
	if (rtcfg_nodeid < 0)
	{
		slon_log(SLON_FATAL, "main: Node is not initialized
properly\n");
		slon_exit(-1);
	}

If there isn't a rtcfg_nodeid value, then that means the call to
db_getLocalNodeId() failed.

	/*
	 * Select the last_value from the sl_local_node_id sequence
	 */
	snprintf(query, 1024, "select last_value::int4 from
%s.sl_local_node_id",
			 rtcfg_namespace);
	res = PQexec(conn, query);
	if (PQresultStatus(res) != PGRES_TUPLES_OK)
	{
		slon_log(SLON_ERROR,
				 "cannot get sl_local_node_id - %s",
				 PQresultErrorMessage(res));
		PQclear(res);
		return -1;
	}
	if (PQntuples(res) != 1)
	{
		slon_log(SLON_ERROR,
				 "query '%s' returned %d rows (expected
1)\n",
				 query, PQntuples(res));
		PQclear(res);
		return -1;
	}

The above are the expected failure cases, which would return negative
numbers.  The slon should have had one of those messages.

What I'm expecting is that perhaps the conn info to the subscriber
node isn't right.

So you might compare:
a) What you type in to get a psql session to the subscriber node
to
b) The arguments passed to slon to get it to connect.

You didn't say how you're launching slon processes; I'm suspicious
that THAT is what is breaking down.

I'd prefer to address the issue on the mailing list, if possible.