[Slony1-hackers] Correctly checking to see if there is a live slon...

Wed May 28 09:15:55 PDT 2008

Simon Riggs <simon at 2ndquadrant.com> writes:
>> Is there some more direct way to get at PIDs?  A search of
>> pg_attribute and pg_proc for '%pid%' doesn't show up anything.
>
> Seems like the best way is to encapsulate this. Work out the API you
> would like, then we can take that to pgsql-hackers. Once we agree the
> API we can write a function to do that for 8.3 and below and put the
> function into Postgres backend for 8.4 and later. That way any further
> changes to LISTEN/NOTIFY will be able to change that also, so you are
> future-proofed around the changes.

Notice that this usage isn't really even relevant to LISTEN/NOTIFY;
the point of the check really hasn't anything to do with that
particular infrastructure; what we want to check is whether or not a
particular backend process has gotten replaced.

That really hasn't got anything to do with the LISTEN/NOTIFY
"infrastructure;" it was merely convenient, in the old implementation,
that we could be certain that when a slon restarted that there would
be a new PID recorded in pg_catalog.pg_listener.

The patch below makes use of the already existant statistics collector
data, and, as previously observed, since the two usages (the read of
relevant PIDs, in the first portion, and the later search for slons
that haven't been restarted) are congruent, there is no particular
need for a new API.

That being said, if we had a way to get a direct list of all of the
"live" PIDs, that would be nice; it seems like a useful thing to have.

At any rate, the patch below "cleanses" out the 'evil' exposure of
LISTEN/NOTIFY internals that will be going away in 8.4.

There are two further references to pg_listener:

chris at dba2:~/Slony-I/CMD/slony1-HEAD> grep pg_listen src/*/*.{c,h,sql}
src/slon/cleanup_thread.c:	 * cluster will run into conflicts due to trying to vacuum pg_listener
src/backend/slony1_funcs.sql:	prec.relname := ''pg_listener'';

  1.  In a comment, which doesn't particularly need to go away :-)

  2.  In the list of tables that are vacuumed.  I can add a test
      to the code that verifies that the table actually exists, and
      that makes this benign in 8.4.  

      Or perhaps our usage of LISTEN/NOTIFY is now so small that it is
      unimportant to clean it up.

Note, here are all of the places where listen/notify are still being used:  

chris at dba2:~/Slony-I/CMD/slony1-HEAD> grep Restart src/*/*.{c,h,sql}
src/slon/local_listen.c:	sprintf(restart_notify, "_%s_Restart", rtcfg_cluster_name);
src/slon/local_listen.c:		     "listen \"_%s_Restart\"; ",
src/slon/local_listen.c:						 "notify \"_%s_Restart\";",
src/slon/remote_worker.c:								 "notify \"_%s_Restart\"; ",
src/slon/remote_worker.c:									 "notify \"_%s_Restart\"; ",
src/slon/remote_worker.c:						 "notify \"_%s_Restart\"; ",
src/slon/sync_thread.c:			 * Restart the timeout on a sync.
src/slonik/slonik.c:				 "notify \"_%s_Restart\"; ",
src/backend/slony1_funcs.sql:	notify "_ at CLUSTERNAME@_Restart";
src/backend/slony1_funcs.sql:	notify "_ at CLUSTERNAME@_Restart";

I don't see a big problem with continuing to use LISTEN/NOTIFY for
this.

Index: src/slonik/slonik.c
===================================================================
RCS file: /home/cvsd/slony1/slony1-engine/src/slonik/slonik.c,v
retrieving revision 1.89
diff -c -u -r1.89 slonik.c

--- src/slonik/slonik.c	26 May 2008 18:48:51 -0000	1.89
+++ src/slonik/slonik.c	28 May 2008 15:56:20 -0000
@@ -2464,8 +2464,12 @@
 
 		slon_mkquery(&query,
 					 "lock table \"_%s\".sl_config_lock; "
-					 "select listenerpid from \"pg_catalog\".pg_listener "
-					 "    where relname = '_%s_Restart'; ",
+					 "select nl_backendpid from \"_%s\".sl_nodelock "
+					 "    where nl_nodeid = \"_%s\".getLocalNodeId('_%s') and "
+					 "       exists (select 1 from pg_catalog.pg_stat_activity "
+					 "                 where procpid = nl_backendpid);"
+					 stmt->hdr.script->clustername,
+					 stmt->hdr.script->clustername,
 					 stmt->hdr.script->clustername,
 					 stmt->hdr.script->clustername);
 		res3 = db_exec_select((SlonikStmt *) stmt, nodeinfo[i].adminfo, &query);
@@ -2591,7 +2595,7 @@
 	}
 
 	/*
-	 * Wait until all slon replication engines that where running have
+	 * Wait until all slon replication engines that were running have
 	 * restarted.
 	 */
 	n = 0;
@@ -2608,9 +2612,8 @@
 			}
 
 			slon_mkquery(&query,
-						 "select listenerpid from \"pg_catalog\".pg_listener "
-						 "    where relname = '_%s_Restart' "
-						 "    and listenerpid <> %d; ",
+						 "select nl_backendpid from \"_%s\".sl_nodelock "
+						 "    where nl_backendpid <> %d; ",
 						 stmt->hdr.script->clustername,
 						 nodeinfo[i].slon_pid);
 			res1 = db_exec_select((SlonikStmt *) stmt, nodeinfo[i].adminfo, &query);

-- 
"cbbrowne","@","linuxdatabases.info"
http://linuxfinances.info/info/advocacy.html
:FATAL ERROR -- ERROR IN USER